EdgeFinder
Objective
The goal of this project is to create custom team and player metrics as well as environmental data in order to identify betting market inefficiencies using machine learning.
Tools & Technologies
Python, FastAPI, Meteostat, nfl_data_py, scikit-learn, Tableau
Challenge
This project has been one of the most complex and rewarding undertakings of my journey so far. From the beginning, it has pushed me to grow across several dimensions technically, analytically, and strategically. I had to go far beyond simply training a model. The process required me to deeply understand and implement everything from raw data wrangling to robust feature engineering and model evaluation, all while accounting for the unique challenges that come with predictive sports analytics.
One of the most eye-opening lessons has been how to identify and eliminate data leakage. In sports analytics, this is particularly tricky. Many seemingly harmless features can actually carry information that would not be available prior to game day. I had to rigorously audit and redesign parts of my pipeline to ensure no future-dependent data or post-game statistics were influencing my predictions. As a result, I've developed a keen eye for spotting subtle sources of leakage and an appreciation for how small leaks can dramatically inflate performance metrics.
In parallel, I built a custom comparator program that evaluates five powerful machine learning models side-by-side: Random Forest, Logistic Regression, XGBoost, Gradient Boosting, and AdaBoost. This tool tracks and compares each model's performance across key metrics like accuracy, profitability, and a custom betting confidence score on a weekly and seasonal basis. It has been instrumental in helping me identify which models are most stable and generalizable over time, rather than just overfitting to recent trends.
Despite the progress, I'm still exploring the best long-term direction for the project. There's a lot of potential for expanding it into real-time predictions, integrating with APIs for automation, or even turning it into a public-facing app. But for now, my primary focus is on refining the model to be consistently accurate, profitable, and robust. It's a work in progress, and one that continues to teach me with every iteration.