Formula 1 Race Outcome Predictor

AI & ML · Personal project · Ongoing

As a lifelong Formula 1 fan, I built this project to explore whether race outcomes can be predicted using structured historical data alone. The goal is not just to improve prediction accuracy, but to understand which features best capture driver form, team strength, and circuit-specific performance.

Focus: tabular machine learning, feature engineering, and race prediction
Data: historical race results, constructor form, and track-level trends
Models: ensemble baselines and neural networks for ranking and position prediction
Goal: predict finishing order while understanding which features drive performance

Problem and motivation

Formula 1 is a useful prediction problem because performance depends on a mix of driver skill, constructor strength, track characteristics, and recent form. I wanted to see whether these factors could be modeled using publicly available structured data, without relying on betting markets or hidden telemetry.

The project predicts expected race outcomes for each driver and compares those predictions against simple baselines such as previous race results or championship order. That makes it a good setting for testing whether more advanced models actually add value beyond common heuristics.

Data and feature engineering

I built the pipeline around driver-race level rows pulled from public F1 data sources, then engineered features to capture both long-term ability and short-term momentum.

Driver form: rolling performance trends with recency weighting so recent races matter more than older ones.
Constructor strength: team-level indicators that capture overall pace and consistency.
Track history: driver and team performance on the same circuit across past seasons.
Race context: round number, circuit type, and other simple contextual signals.

These features are cleaned, normalized, and assembled into a tabular dataset for downstream modelling and evaluation.

Modelling approach

I experimented with multiple approaches to predict finishing position, starting with interpretable baselines and then moving toward neural networks for more flexible modelling.

Tree-based and ensemble models as strong tabular baselines.
A Keras MLP with batch normalization, dropout, and regularization for position prediction.
Keras Tuner for searching over architecture size, learning rate, and regularization settings.

I evaluate the models using position-based error metrics and ranking-style metrics to understand whether the model is ordering drivers more intelligently than simpler methods.

What I am learning

This project has been a good way to practice the full machine learning workflow on a problem I genuinely care about. It involves sourcing and structuring data, designing useful features, tuning models, and deciding whether performance improvements are actually meaningful.

More than anything, it has helped me think carefully about the gap between building a model and building a useful predictor. In a noisy domain like racing, feature quality and evaluation design matter just as much as the choice of model.

Tech stack

Python for data processing and experiments
pandas and NumPy for feature engineering
TensorFlow and Keras for neural networks
Keras Tuner for hyperparameter search
Matplotlib for evaluation and analysis

Repo and status

Code, notebooks, and experiment logs live in this repository:

GitHub - Formula1Project

Current work is focused on improving feature quality, refining ranking-based evaluation, and comparing model performance against simple race prediction baselines.