[Wangd Lhamo] - Fab Futures - Data Science
Home About

Probability¶

Probability is calculated and simulated using Python code, typically with libraries like random, numpy, or scipy. Jupyter itself is just the interactive notebook where you run these calculations.

In [1]:
1. Probability Concept

Probability measures how likely an event is to occur.

Formula:

𝑃(𝐸)=
Number of favorable outcomes
Total number of outcomes
P(E)=Total number of outcomes
Number of favorable outcomes
	​Example: The probability of rolling a 3 on a standard die is:

𝑃(3)=16
P(3)=61
	​
2. Calculating Probability in Python (Jupyter)
You can use Python code in Jupyter to calculate probability. For example:

# Import libraries
import random

# Simulate rolling a die 1000 times
results = [random.randint(1, 6) for _ in range(1000)]

# Probability of rolling a 3
prob_3 = results.count(3) / 1000
print("Estimated probability of rolling a 3:", prob_3)

3. Using numpy or scipy

For more advanced probability calculations:

import numpy as np
from scipy.stats import binom

# Probability of getting exactly 3 heads in 5 coin flips
prob = binom.pmf(3, n=5, p=0.5)
print("Probability of 3 heads:", prob)

binom.pmf(k, n, p) → Probability mass function for binomial events.
  Cell In[1], line 12
    ​Example: The probability of rolling a 3 on a standard die is:
    ^
SyntaxError: invalid non-printable character U+200B

Sample¶

Coin Tosses simulate¶

In [ ]:
#### import random

# Simulate 1000 coin tosses
tosses = [random.choice(['Heads', 'Tails']) for _ in range(1000)]

# Calculate probabilities
prob_heads = tosses.count('Heads') / 1000
prob_tails = tosses.count('Tails') / 1000

print("Estimated Probability of Heads:", prob_heads)
print("Estimated Probability of Tails:", prob_tails)
In [2]:
import random

# Simulate 1000 coin tosses
tosses = [random.choice(['Heads', 'Tails']) for _ in range(1000)]

# Calculate probabilities
prob_heads = tosses.count('Heads') / 1000
prob_tails = tosses.count('Tails') / 1000

print("Estimated Probability of Heads:", prob_heads)
print("Estimated Probability of Tails:", prob_tails)
Estimated Probability of Heads: 0.524
Estimated Probability of Tails: 0.476
In [ ]:
 

Dice Rolls simulation¶

In [3]:
# Simulate rolling a 6-sided die 1000 times
dice_rolls = [random.randint(1, 6) for _ in range(1000)]

# Probability of rolling a 3
prob_3 = dice_rolls.count(3) / 1000
print("Estimated Probability of rolling a 3:", prob_3)
Estimated Probability of rolling a 3: 0.172

Visualizing Dice Probabilities¶

In [4]:
import matplotlib.pyplot as plt

# Count occurrences of each number
counts = [dice_rolls.count(i) for i in range(1,7)]

# Plot bar chart
plt.bar(range(1,7), counts)
plt.xlabel('Dice Number')
plt.ylabel('Frequency')
plt.title('Dice Roll Simulation')
plt.show()
No description has been provided for this image
In [ ]:
 

Binomial Distribution¶

  • The binomial distribution models the number of successes in a fixed number of independent trials, where each trial has only two possible outcomes:

Success (with probability p) Failure (with probability 1−p)

It is defined by two parameters: n = number of trials p = probability of success in a single trial

Probability Mass Function (PMF)

The probability of getting exactly 𝑘k successes in 𝑛 n trials is:

No description has been provided for this image

Probability¶

Explaination of the Binomialprobability formula in deatils. I want to know each formula in detal

No description has been provided for this image
In [8]:
import itertools
import matplotlib.pyplot as plt

# Parameters
n = 3          # number of trials
p = 0.5        # probability of success
k = 2          # number of successes

# Step 1: Generate all sequences of successes (H) and failures (T)
trials = ['H', 'T']
sequences = list(itertools.product(trials, repeat=n))

# Step 2: Count sequences with exactly k successes
valid_sequences = [seq for seq in sequences if seq.count('H') == k]

# Step 3: Calculate probability
prob = len(valid_sequences) * (p**k) * ((1-p)**(n-k))

# Step 4: Display results
print(f"All possible sequences ({len(sequences)} total):")
for seq in sequences:
    print(seq)

print("\nSequences with exactly 2 successes:")
for seq in valid_sequences:
    print(seq)

print(f"\nP(X={k}) = {prob}")

# Step 5: Visualize sequences
colors = ['green' if seq.count('H')==k else 'lightgray' for seq in sequences]

plt.figure(figsize=(8,4))
plt.bar(range(len(sequences)), [1]*len(sequences), color=colors)
plt.xticks(range(len(sequences)), [''.join(seq) for seq in sequences])
plt.ylabel("Probability weight (conceptual)")
plt.title(f"Binomial Distribution Visualization (n={n}, k={k})\ngreen = sequences with exactly {k} successes")
plt.show()
All possible sequences (8 total):
('H', 'H', 'H')
('H', 'H', 'T')
('H', 'T', 'H')
('H', 'T', 'T')
('T', 'H', 'H')
('T', 'H', 'T')
('T', 'T', 'H')
('T', 'T', 'T')

Sequences with exactly 2 successes:
('H', 'H', 'T')
('H', 'T', 'H')
('T', 'H', 'H')

P(X=2) = 0.375
No description has been provided for this image

Explanation of Visualization Each bar represents one possible sequence of trials. Green bars = sequences with exactly k successes. Probability formula: P(X=k) = \underbrace{\text{# of green sequences}}_{\binom{n}{k}} \cdot p^k \cdot (1-p)^{n-k}

This makes it very clear how the formula combines “arrangements × success probability × failure probability”.

In [ ]:
 
In [10]:
from IPython.display import display, Math

display(Math(r'P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes} }'))
display(Math(r'P(E^c) = 1 - P(E)'))
display(Math(r'P(A \cup B) = P(A) + P(B) - P(A \cap B)'))
display(Math(r'P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}'))
$\displaystyle P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes} }$
$\displaystyle P(E^c) = 1 - P(E)$
$\displaystyle P(A \cup B) = P(A) + P(B) - P(A \cap B)$
$\displaystyle P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}$
In [11]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom, norm

# Binomial Distribution
n, p = 10, 0.5
x = np.arange(0, n+1)
prob = binom.pmf(x, n, p)
plt.bar(x, prob, color='skyblue')
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.show()

# Normal Distribution
mu, sigma = 0, 1
x = np.linspace(-4, 4, 1000)
plt.plot(x, norm.pdf(x, mu, sigma), color='red')
plt.title("Normal Distribution (μ=0, σ=1)")
plt.show()
No description has been provided for this image
No description has been provided for this image

Students mark¶

In [12]:
import numpy as np

# Example marks of 20 students
marks = [55, 70, 80, 90, 65, 75, 85, 60, 50, 95, 88, 72, 78, 82, 68, 90, 77, 66, 84, 91]

Frequency Distribution¶

In [13]:
# Define bins (intervals) for marks
bins = [0, 50, 60, 70, 80, 90, 100]

# Count number of students in each bin
freq, bin_edges = np.histogram(marks, bins=bins)

print("Frequency:", freq)
print("Bins:", bin_edges)
Frequency: [0 2 4 5 5 4]
Bins: [  0  50  60  70  80  90 100]

Calculate probabilities¶

In [15]:
probabilities = freq / len(marks)
print("Probabilities:", probabilities)
Probabilities: [0.   0.1  0.2  0.25 0.25 0.2 ]

Visualize probability distribution¶

In [17]:
import matplotlib.pyplot as plt

# Midpoint of each bin for plotting
mid_points = (bin_edges[:-1] + bin_edges[1:]) / 2

plt.bar(mid_points, probabilities, width=8, color='skyblue', edgecolor='black')
plt.xlabel("Marks Range")
plt.ylabel("Probability")
plt.title("Probability Distribution of Students' Marks")
plt.show()
No description has been provided for this image

Cumulative Probability¶

In [18]:
cumulative_prob = np.cumsum(probabilities)
print("Cumulative Probabilities:", cumulative_prob)

plt.plot(mid_points, cumulative_prob, marker='o', color='red')
plt.xlabel("Marks Range")
plt.ylabel("Cumulative Probability")
plt.title("Cumulative Probability of Students' Marks")
plt.grid(True)
plt.show()
Cumulative Probabilities: [0.   0.1  0.3  0.55 0.8  1.  ]
No description has been provided for this image
In [ ]:
✅ Explanation
Histogram bins → divide marks into intervals (like 50–60, 60–70…).
Frequency → number of students in each bin.
Probability → frequency divided by total students.
Visualization: bar chart for probability, line chart for cumulative probability
In [ ]:
 
In [ ]:
 
In [ ]: