TABLE OF CONTENTS

Sports Analytics

SPO. PTPLJUN 2026

Predicting the Premier League

A statistical journey from naive combinatorics to Bayesian updating in the Egyptian Premier League.

The Egyptian Premier League was witnessing a historic final round—a scenario that hadn't occurred in 34 years. Three teams were competing for the title until the very end: Zamalek (53 points), Pyramids (51 points), and Al Ahly (50 points).

One ordinary night, as midnight approached, a question popped into my head without warning: "What are the chances of Al Ahly actually winning the league?"

It's a strange question because anyone following the league knew the required scenario to win the championship was "miraculous". But we are statisticians; let's try to answer it.

Only one match remained for each team before the curtain closed on the season:

Zamalek vs Ceramica Cleopatra (4th place)
Pyramids vs Smouha (5th place)
Al Ahly vs Al Masry (6th place)

These were all incredibly tough matchups. We couldn't say for sure that any specific team would definitively win their encounter.

Phase 1: Naive Combinatorics

For each of these three matches, there are 3 possible outcomes: Win, Draw, or Lose. Assuming the results are completely independent from each other (e.g., Al Ahly winning doesn't force Zamalek to lose), we have $3 \times 3 \times 3 = 27$ different possible endings.

For Al Ahly to win the league, they must win, Zamalek must lose, and Pyramids must not win (Draw or Lose). That means we are looking at specific combination paths like [Win, Lose, Draw] or [Win, Lose, Lose].

That leaves only 2 scenarios out of the 27. Which means we can divide 2 by 27 directly to get $\approx 7.4\%$ .

This is a simple application of combinatorial probability. The probability of an event equals the number of desired outcomes divided by the total number of possible outcomes.

But what is the problem with this method? We implicitly assumed the probability of a Win, Draw, and Loss are all equally likely for all 3 teams ( $33\%$ ). Is the probability of Al Ahly winning exactly the same as them losing? The answer is mostly no. That's why we need to look at the probabilities of events individually.

Interactive Universe Explorer

Lock in specific match outcomes to filter the universe. Observe how the probability shifts as possibilities collapse.

Al Ahly

Pyramids

Zamalek

2 / 27

7.4% PROBABILITY

Hover over a valid block to inspect the universe

Phase 2: Adding Weights (Expected Goals)

Once we accept that probabilities aren't equal, we stop just counting scenarios and start weighting them. We move from combinatorial math to building a probabilistic model of the real world.

If we assume, for example, that Al Ahly's win probability is $70\%$ , a draw is $20\%$ , and a loss is $10\%$ , the overall probability of winning the league is no longer $2 \div 27$ . Each scenario now has a vastly different weight. We need to multiply the actual probabilities of the specific required outcomes.

But where do we get these probabilities? From guessing? No, we use available facts and data: the team's form over the last 5 matches, win ratios, historical direct matchups, and average goals scored and conceded.

Using data from SofaScore, the first step is to calculate two numbers for each team based on the entire season: Attacking Strength and Defensive Strength. From these two numbers, we calculate the Expected Goals (xG) for each match.

Using the Poisson Distribution—a famous statistical distribution in football analytics where xG represents

\lambda

(the expected rate of goals)—we convert these expected goal rates into the probabilities of scoring specific numbers of goals. This ultimately gives us the probability of winning, drawing, or losing. Now, the events are weighted by reality, and by multiplying our three required probabilities together, we arrive at

6.1\%

Adjust the Expected Goals (xG) sliders to see how small changes in performance metrics drastically alter the match probabilities and the compound scenario likelihood.

Al AhlyxG 1.28

Al MasryxG 0.95

Target Outcome

Al Ahly Wins

Probability44.0%

ZamalekxG 0.86

CeramicaxG 0.65

Target Outcome

Zamalek Loses

Probability25.7%

PyramidsxG 0.97

SmouhaxG 0.50

Target Outcome

Pyramids No-Win

Probability54.1%

0.440

0.257

0.541

6.1%

Phase 3: Monte Carlo Simulations

We've extracted the variables, calculated the probabilities, and arrived at a final result of $6.1\%$ . Beautiful. But what if these numbers changed, whether slightly or significantly? What if we tried all possible combinations thousands of times to see what happens?

What if the match wasn't played once... but thousands of times?

Enter one of the most famous simulation methods in the world: the Monte Carlo Simulation. Instead of manually calculating probabilities, we let the computer "play" the final round thousands of times. In each simulation, the computer draws a random result for each match based on our Poisson probabilities, checks who won the league, and repeats. Once, twice, a thousand, ten thousand times—as much as your machine can handle.

Initially, the model threw a curveball: Al Ahly had a $0\%$ chance, but there was a massive $23\%$ chance of a "Tie". The model didn't understand the league's head-to-head tiebreaker rules! It treated tied points as an "unknown result". Once we fed the tiebreaker rules into the engine, the unresolved ties collapsed into the final distribution, perfectly matching our mathematical calculation.

Monte Carlo Engine

Simulations

Iterations10,000

Zamalek0.0%

Pyramids0.0%

Al Ahly0.0%

Unknown Tie0.0%

Phase 4: Updating Beliefs (Bayes' Theorem)

Up until now, we've treated the probabilities as static. As if the world doesn't change. But what if new information appears before the match? A sudden injury? A tactical change? Do we stick to the old numbers, or do we update our conviction entirely?

In the previous phases, we dove into the worlds of scenarios that could happen, but we didn't think about what did happen. Suppose Al Ahly actually wins their match against Al Masry. We know the initial belief (the Prior) of Al Ahly winning the league is

6.2\%

, and the probability of them winning their match is

44.1\%

This brings us to the most beautiful philosophical part of our analysis: What is the probability of Al Ahly winning this match, GIVEN that they won the league? Because winning this match is a strict requirement for winning the league, this probability (the Likelihood) is exactly $100\%$ ( $1.0$ ).

The equation becomes brilliantly simple: $1 \times 0.062 / 0.441 \approx 14\%$ . The probability almost doubled. Here lies the secret beauty of Bayes' Theorem; based on a single piece of evidence, you shift your entire paradigm.

1. Prior Belief P(A)

Initial chance of winning league

6.2%

2. Evidence P(B)

Probability of winning the match

44.1%

3. Likelihood P(B|A)

Fixed at 100% (Must-win match)

100.0%

P(A|B) = \frac{100\% \times 6.2\%}{44.1\%}

P(B)

Posterior P(A|B)14.1%

Dynamic Inference

Adjust the prior belief or the new evidence probability to see how Bayes' Theorem updates our belief in Al Ahly winning the title.

Final Thoughts

From Guesswork to Logical Simulation

The Beauty of Statistics

Perhaps the real question wasn't "Who will win the league?", but rather, "How do we attempt to measure something so chaotic?" Statistics turns random chaos into a logical journey of building probabilities and updating convictions.

Behind the Numbers

A model doesn't "understand" football. It only sees patterns. Modeling is an attempt to match the past with the present to give a glimpse of the future.