Module 1.1: Why Causal Inference Matters

15 min Prerequisites: Basic Python

What You'll Learn

  1. What time series data is (if you're new to it)
  2. The critical difference between correlation and causation
  3. Why standard machine learning can't answer "why" questions
  4. What makes time series causal inference special
  5. When to use Tigramite

First: What is Time Series Data?

Time series = data measured over time, in order.

Examples you encounter daily:

  • Stock prices (every minute/hour/day)
  • Temperature readings (every hour)
  • Your heart rate (every second)
  • Website traffic (every day)
  • Factory sensor readings (every minute)

What makes it special:

  • The ORDER matters (yesterday comes before today)
  • Values often depend on their past (today's temperature relates to yesterday's)
  • This ordering helps us figure out what CAUSES what!
Regular data: [Customer A, Customer B, Customer C] ← order doesn't matter Time series: [Monday, Tuesday, Wednesday, ...] ← order matters!

In this tutorial, we'll learn how to discover causal relationships in time series data.

The Ice Cream Murder Mystery

Here's a true statistical fact:

Ice cream sales and drowning deaths are highly correlated.

Does this mean ice cream causes drowning? Should we ban ice cream to save lives?

Of course not! Both are caused by a hidden third variable: summer heat.

Summer Heat / \ v v Ice Cream Drowning Sales Deaths

This is called a spurious correlation - two things that move together but don't cause each other.

Let's See This in Action

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

# Simulate 365 days
days = np.arange(365)

# Summer heat (the REAL cause) - peaks in summer
temperature = 20 + 15 * np.sin(2 * np.pi * days / 365 - np.pi/2) + np.random.randn(365) * 3

# Ice cream sales - CAUSED by temperature
ice_cream = 100 + 2 * temperature + np.random.randn(365) * 10

# Drowning deaths - ALSO caused by temperature (people swim more)
drowning = 5 + 0.1 * temperature + np.random.randn(365) * 1

# Plot the spurious correlation
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

axes[0].scatter(temperature, ice_cream, alpha=0.5, c='coral')
axes[0].set_xlabel('Temperature')
axes[0].set_ylabel('Ice Cream Sales')
axes[0].set_title('Temperature → Ice Cream\n(TRUE CAUSE)')

axes[1].scatter(temperature, drowning, alpha=0.5, c='steelblue')
axes[1].set_xlabel('Temperature')
axes[1].set_ylabel('Drowning Deaths')
axes[1].set_title('Temperature → Drowning\n(TRUE CAUSE)')

axes[2].scatter(ice_cream, drowning, alpha=0.5, c='purple')
axes[2].set_xlabel('Ice Cream Sales')
axes[2].set_ylabel('Drowning Deaths')
axes[2].set_title('Ice Cream vs Drowning\n(SPURIOUS CORRELATION!)')

plt.tight_layout()
plt.show()

# Calculate correlation
corr = np.corrcoef(ice_cream, drowning)[0, 1]
print(f"Correlation between ice cream and drowning: {corr:.3f}")
print("Looks strong! But it's NOT causation.")

The Three Types of Relationships

RelationshipDiagramExample
Direct CauseA → BRain → Wet Ground
Common Cause (Confounder)A ← C → BIce Cream ← Heat → Drowning
MediationA → M → BSmoking → Tar → Cancer

Causal inference helps us figure out which one we're dealing with!

Why Standard ML Can't Help

Machine learning is great at prediction:

  • "Given today's data, what will sales be tomorrow?"

But it CANNOT answer causal questions:

  • "If we CHANGE the price, what will happen to sales?"
  • "What is CAUSING our quality issues?"
  • "Which variable should we INTERVENE on?"

The Key Difference

Question TypeMLCausal Inference
"What will happen?"YesYes
"Why did it happen?"NoYes
"What if we change X?"NoYes

Why Time Series is Special

Time series data has a superpower for causal inference:

Causes must come BEFORE effects.

If yesterday's rain is correlated with today's flooding, we KNOW the direction:

  • Rain yesterday → Flood today (possible cause)
  • Flood today → Rain yesterday (impossible!)

This is called temporal precedence and it's why Tigramite works so well.

Example: Factory with Lagged Effects

# Example: Factory with lagged effects
np.random.seed(42)
T = 200  # 200 time steps

# Machine temperature (autocorrelated - depends on its own past)
machine_temp = np.zeros(T)
machine_temp[0] = 50
for t in range(1, T):
    machine_temp[t] = 0.8 * machine_temp[t-1] + np.random.randn() * 5

# Product quality - affected by temperature 2 steps ago!
quality = np.zeros(T)
for t in range(2, T):
    quality[t] = 80 - 0.5 * machine_temp[t-2] + np.random.randn() * 3

# Plot
fig, axes = plt.subplots(2, 1, figsize=(12, 6), sharex=True)

axes[0].plot(machine_temp, 'coral', linewidth=1)
axes[0].set_ylabel('Machine Temperature')
axes[0].set_title('Factory Time Series Data')

axes[1].plot(quality, 'steelblue', linewidth=1)
axes[1].set_ylabel('Product Quality')
axes[1].set_xlabel('Time Step')

plt.tight_layout()
plt.show()

print("Notice: Temperature spike causes quality drop with a 2-step lag")
print("This is what Tigramite can discover automatically!")

When to Use Tigramite

Use Tigramite when you have time series data and want to answer:

QuestionTigramite Method
"What variables cause what?"Causal Discovery (PCMCI)
"How strong is the causal effect?"Causal Effect Estimation
"What if we intervene on X?"Intervention Analysis
"Which variables should I use for prediction?"Causal Feature Selection

Real-World Applications

  • Climate science: What drives temperature changes?
  • Manufacturing: What causes quality defects?
  • Finance: What factors drive stock prices?
  • Healthcare: What affects patient outcomes?
  • Energy: What causes consumption spikes?

Quick Quiz

Q1: Your ML model shows "number of firefighters" predicts "fire damage." Does this mean firefighters CAUSE damage?

No! This is another spurious correlation. The hidden confounder is fire size:

  • Bigger fires → More firefighters sent
  • Bigger fires → More damage

Firefighters don't cause damage; they're both effects of the same cause.

Q2: You observe that sales drop 2 days after you raise prices. Is this evidence of causation?

Possibly! The time ordering (price change BEFORE sales drop) is consistent with causation. But you'd want to rule out other explanations (weekend effects, competitor actions, etc.). Tigramite can help with this!

Key Takeaways

  1. Correlation ≠ Causation - Two things moving together doesn't mean one causes the other
  2. Hidden confounders are everywhere - Always ask "What could be causing BOTH?"
  3. ML predicts, causal inference explains - Different tools for different questions
  4. Time series has an advantage - Time ordering helps identify causal direction
  5. Tigramite is your tool for discovering causal relationships in time series