Hi there, and sorry for being late :)¶
It took me until the last moment to actually start thinking about what I want to present.
My first goal with this course was to find something, which is new to me as a topic so that I don't rely on any past knowledge or experience, hence the football "moneyball" idea (but also because I managed to quickly find data on Kaggle).
To be fair, I'm sure this is far from any meaningful insights, but it was still a fun experiment. Also frustrating at times as I didn't do much progress initially.
What I've learned:¶
- I didn't know about JAX, Numba, generally I never fully understood the power of optimization on that level and that scale. I always thought of that as something, which comes in at a later stage when scale increases. I now know that optimization is a consideration from the very beginning. That said, I didn't manage to make much use of Jax or Numba, but I replicated some experiments to do the benchmarks (seeing is believing).
- First time I started experimenting with code based (python) data viz instead of the tools I'm familiar with (Rapid Miner, KNIME, Dataiku etc.), and libraries like Matplotlib, Seaborn, scikit-learn etc.
- Not sure how I missed ICA in my previous learning. I've always sticked to PCA. Good to know the difference.
- From previous learning I new about most concepts related to statistics, ML, but it was a good refresher.
- I learned a few things about managing and synchronizing python environments between tools (local and cloud). Locally I shifted between a local Jupyter server and Antigravity, which I decided to try out while doing the course. I like the interface, but didn't use much of the AI agents. It's still very buggy and I quickly run into limits as I used larger datasets for experiments.
- Python is not something I've used before, althought it was always in the todo. I'm not much of a programmer anymore as I haven't been active many years now. But reading code is still something which comes in handy. Memorizing syntax on the other hand isn't something I even tried, so I had to rely on AI to help comnect my thoughts to the correct tabs and commas so that Python would obbey.
- The obvious thing is having good data. What I got to experience more and learn from it is feature engineering. I would say that I would enjoy having more focused time to look and analyze different charts, finding interesting patterns, correlations and causation. But in reallity I had to just go fast and see what happens.
- Analysing data you have some knowledge about is a) more fun b) more productive. Comparing to analyzing fooball and analyzing my personal wellbeing and performance metrics, the latter made way more sense - intuitively. That doesn't mean that I shouldn't be analyzing data I'm not familiar with, but the extra step of getting a bit knowledgable about the analyzed topic is needed.
- I also learned that November/December and not the best time for me to enroll in a course. Especially one with a high tempo, and also I'm a bit sad to not have had the time to focus on more.
- Another bonus item was learning about Nadieh Bremer! I got a few of her books and enjoy "Chart" already.
My data & Experiments¶
Initially, as mentioned above, I decided to work with fooball transfer data. I was interested to see how I can find overlaps in the datasets from different files.
Football Data from Transfermarkt¶
- 60,000+ games from many seasons on all major competitions
- 400+ clubs from those competitions
- 30,000+ players from those clubs
- 400,000+ player market valuations historical records
- 1,200,000+ player appearance records from all games
At some point I ran into doubts as my prediction was too far off (best R2 score of 0.268).
What I learned form the data¶
To be honest, I'm not sure. I do understand now that it's a very multi-dimensional area where depending on the case many different factors can play a role.
- My model compares all positions, which I don't think is accurate. Defenders vs attackers have different properties and the model is generalizing.
- Age is a factor, which is not necessariliy related to performance, but it is definitely related to valuation trends.
- Purely analyzing on-field performance metrics isn't going to cut it. There's much going on outside of the playfield - charisma, media, good PR and great agents. Maybe with a lot private data we'll get to a place where market valuation can be predicted better.
- If a model like this should be used, I would filter out all players, which have gotten to a level or prominence, which starts to impact the valuation because of other factors, i.e. only junior plaers where pure performance has a higher impact.
Apple Health Data¶
I had to find something else to experiment with and decided to download my entire apple health data set. That's 5 years of records, to be accurated more than 6.5 million records. The goal was to understand my Vo2Max fluctuations. Even better, understand how my body responds to different things (not just training). I do know the basics, but looking into the data itself was a different experience.
What I learned from the data¶
That I shouldn't be discouraged from dips when the curve is trending upwards!
- Illness detection is a bit arbitrary. I'm still not convinced that I got it to a level where it does make sense. It does overlap with Vo2Max drops, but I'm not 100% sure if I made it overlap or it's a real causation factor. I mean, I know it should be... but
- My resting hear rate seems to be more impacted by sleep than illness. So does my HRV (Hear rate variability). Although I need to say that I didn't dig much into detail. I should do tests and overlap data/visuals in smaller increments (e.g. 1 week/1 month) to be able to see better.
- The obvious was visible – longer work our duration and distance, therefore more active calories burned leads to more Vo2Max gains. No surprise here.
- There are no evident seasonal trends, although I believe so. The main factors are training. The Vo2Max chart is more of a represantation of my work related stress and period of not being able to train, rather than seasonal shifts. The sudden drops are still not 100% confirmed to be caused by illness.
- Apple health tracks sleep in a weird way and I still can't figure out why I'm getting average 13 hours of sleep (mean) when I know I've never slept that long.
I decided not to share my personal data, but I will share some of the graphs here.
Weekly Assignments & other experiments¶
Presentation slide¶
