Tabular binary classification — predict lap completion outcomes from telemetry and race conditions.
| Model | 3-Fold CV | 5-Fold OOF | Hardware |
|---|---|---|---|
| LightGBM (Optuna) | 0.950 | 0.950 | GPU (CUDA) |
| XGBoost (Optuna) | 0.950 | 0.949 | GPU (CUDA) |
| CatBoost (Optuna) | 0.947 | — | GPU |
| Ensemble (avg) | 0.951 | 0.950 | — |
Target encoding on categoricals (Race, Driver, Team), lag features on LapNumber, rolling statistics, interaction terms. ~800 features with StratifiedKFold validation.
LGB + XGB + CatBoost with default params on 5-fold CV. Best single model: XGB 0.950 CV. Ensemble 0.951 CV. Public LB 0.94735.
200-trial TPE search per model. Parameter space: n_estimators 200-1200, lr 0.005-0.15, max_depth 3-12, subsample/colsample 0.5-1.0. Dropped CatBoost (underperforming). Currently running.
Blend with public high-score submissions, pseudo-labeling, seed averaging, final 5-fold OOF ensemble with BayesianRidge stacking.