The Winner Takes Them All (Project Map with Power BI Dashboard)

A penalty-won-adjusted goals metric, a demo Power BI dashboard, and a roadmap toward a refreshable data pipeline.
Football
Analytics
Python
R
Penalties
Published

December 21, 2025

Football credits the taker of a penalty, not the player who won it. That’s why we end up with nicknames like “Penaldo”. A BBC caption - “Misstiano Penaldo” - even went viral after CR7 missed against Slovenia in 2024.

This project asks a simple counterfactual:

What would happen if penalties were allocated back to the players who won them? How would that change the way we look at player statlines across seasons, careers or during Balon d’Or discussons?

And why not just give them the full goals? Because this is a counterfactual allocation problem, we have to account for the possibility that they would miss. So my made-up metric is Expected Goals from Penalties Won (xGpw), which is currently fixed at 0.757 for every player, though I do plan to tweak this. Explanation of all that to follow.


So far, the workflow is:

  1. Bulk scrape FBref season/competition player data from years where Penalty Kicks Won (PkWon) data is either tracked or deducible. This is the Big 5 Leagues plus the Champions League from 2016/2017.
  2. Clean + transform in pandas (consistency checks, joins, derived metrics)
  3. Load into MySQL using SQLAlchemy (from pandas → MySQL)
  4. Schema work (dimensions/facts + keys so the model is scalable)
  5. Publish a demo layer in Power BI (prototype dashboard)

This isn’t intended as a one-off analysis — it’s a pipeline project. Long-term, I plan to transition this pipeline to a paid football API to ensure scalability and commercial compliance. The data I scraped is just for my experimentation with the above workflow. In future, I intend to make a Shiny dashboard that automatically refreshes data, and fully explorable by site visitors. I will periodically publish small vignettes based on my own exploration.


Dashboard (prototype)

Here is a small prototype I made in Power Bi.

Known limitations (current dashboard)

  • This can only be filtered to a single-season / single-competition grain as I have encountered issues with my DAX measures.
  • Cross-filter interactions can be annoying (table clicks affecting cards in ways I don’t want)