Module 1.2: Core Concepts in Plain English
What You'll Learn
- The vocabulary of causal inference (DAG, node, edge, etc.)
- How to read a causal graph
- Special concepts for time series (lags, autocorrelation)
- Tigramite's notation system
Causal Graphs: The Family Tree of Variables
A causal graph is like a family tree, but for variables:
| Term | Meaning | Family Analogy |
|---|---|---|
| Node | A variable (circle) | A person |
| Edge | A causal link (arrow) | Parent-child relationship |
| Parent | Direct cause | Your mom or dad |
| Child | Direct effect | Your son or daughter |
| Ancestor | Any cause upstream | Grandparents, great-grandparents |
| Descendant | Any effect downstream | Grandchildren |
DAG: Directed Acyclic Graph
A DAG is a causal graph with two rules:
- Directed: Arrows point one way (cause → effect)
- Acyclic: No loops (you can't be your own grandparent!)
Valid DAG:
Invalid (has a cycle):
Visualizing a DAG
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
fig, ax = plt.subplots(figsize=(10, 6))
# Node positions
nodes = {
'Temperature': (0.2, 0.7),
'Pressure': (0.5, 0.7),
'Humidity': (0.8, 0.7),
'Quality': (0.5, 0.3)
}
# Draw nodes
for name, (x, y) in nodes.items():
circle = plt.Circle((x, y), 0.08, color='lightblue', ec='steelblue', linewidth=2)
ax.add_patch(circle)
ax.text(x, y, name, ha='center', va='center', fontsize=10, fontweight='bold')
# Draw edges (arrows)
edges = [
('Temperature', 'Quality'),
('Pressure', 'Quality'),
('Humidity', 'Pressure'),
]
for start, end in edges:
x1, y1 = nodes[start]
x2, y2 = nodes[end]
ax.annotate('', xy=(x2, y2 + 0.08), xytext=(x1, y1 - 0.08),
arrowprops=dict(arrowstyle='->', color='coral', lw=2))
ax.set_title('Example DAG: Factory Process', fontsize=14)
plt.show()
# Reading this graph:
# - Temperature DIRECTLY causes Quality
# - Pressure DIRECTLY causes Quality
# - Humidity causes Quality INDIRECTLY (through Pressure)
The Big Three: Confounder, Mediator, Collider
These are the three fundamental patterns in causal graphs:
1. Confounder (Common Cause) - MOST IMPORTANT!
- C causes BOTH A and B
- Creates spurious correlation between A and B
- Example: Summer (C) causes both ice cream sales (A) and drowning (B)
- Key insight: A and B look related, but neither causes the other!
2. Mediator (Chain)
- A causes M, which causes B
- M is the "middle step" in the causal path
- Example: Smoking (A) → Tar in lungs (M) → Cancer (B)
- Key insight: A affects B, but indirectly through M
3. Collider (Common Effect)
- Both A and B cause C
- A and B are NOT related to each other
- Example: Both Talent (A) and Luck (B) cause Success (C)
- Key insight: A and B are independent - knowing one tells you nothing about the other
Time Series Graphs: Adding Time
In time series, the SAME variable at DIFFERENT times are treated as DIFFERENT nodes!
Key Time Series Terms
| Term | Symbol | Meaning |
|---|---|---|
| Lag | τ (tau) | How many time steps back |
| Autocorrelation | X(t-1) → X(t) | Variable depends on its own past |
| Lagged effect | X(t-τ) → Y(t) | X at time t-τ causes Y at time t |
| Contemporaneous | X(t) ↔ Y(t) | Same-time relationship |
Tigramite's Notation: (variable, -lag)
Tigramite uses a simple tuple notation: (variable_index, -lag)
| Notation | Meaning |
|---|---|
(0, 0) | Variable 0 at current time |
(0, -1) | Variable 0, one step in the past |
(1, -2) | Variable 1, two steps in the past |
(2, -5) | Variable 2, five steps in the past |
Example
If we have variables: ['Temperature', 'Pressure', 'Quality']
(0, -1)= Temperature yesterday(1, -2)= Pressure 2 days ago(2, 0)= Quality today
Code Example: Understanding Tigramite Links
import numpy as np
var_names = ['Temperature', 'Pressure', 'Quality']
# A link is represented as: ((cause_var, -lag), target_var)
# Example links in Tigramite format:
example_links = [
((0, -1), 2), # Temperature at lag 1 → Quality
((1, -2), 2), # Pressure at lag 2 → Quality
((0, -1), 0), # Temperature autocorrelation
]
print("Example causal links:")
print("="*50)
for (cause_var, lag), target in example_links:
cause_name = var_names[cause_var]
target_name = var_names[target]
print(f"({cause_var}, {lag}) → {target}")
print(f" Meaning: {cause_name} at lag {-lag} causes {target_name}")
print()
Graph Output Types in Tigramite
When you run causal discovery, Tigramite returns a graph with edge symbols:
| Symbol | Meaning | When it occurs |
|---|---|---|
--> | Definite causal direction | Lagged links (time settles direction) |
<-- | Reverse direction | Same as above, other direction |
o-o | Undetermined direction | Contemporaneous (same time) |
x-x | Conflicting evidence | Rare, indicates issues |
<-> | Bidirected (hidden confounder) | Only with LPCMCI method |
| (empty) | No link | Variables not causally related |
Reading a Tigramite Graph Output
# Example: Understanding a Tigramite graph output
import numpy as np
# Simulated graph array (3 variables, max lag 2)
# Shape: (N, N, tau_max+1) where graph[i, j, tau] = link from j at lag tau to i
example_graph = np.array([
# Links TO variable 0 (Temperature)
[['', '-->', ''], # FROM var 0 at lags [0, 1, 2]
['', '', ''], # FROM var 1
['', '', '']], # FROM var 2
# Links TO variable 1 (Pressure)
[['', '', ''],
['', '-->', ''], # Pressure autocorrelation at lag 1
['', '', '']],
# Links TO variable 2 (Quality)
[['', '-->', ''], # Temperature at lag 1 → Quality
['', '', '-->'], # Pressure at lag 2 → Quality
['', '', '']]
])
var_names = ['Temperature', 'Pressure', 'Quality']
print("Reading the graph array:")
print("="*50)
for j in range(3):
for i in range(3):
for tau in range(3):
if example_graph[j, i, tau] != '':
print(f"{var_names[i]}(t-{tau}) {example_graph[j, i, tau]} {var_names[j]}(t)")
Conditional Independence: The Key Insight
Causal discovery works by testing conditional independence:
Why This Matters
- If X → Y (direct cause), then X and Y will be dependent even after conditioning on other variables.
- If X ← Z → Y (confounder), then X and Y become independent once we condition on Z.
Tigramite tests many conditional independencies to figure out the causal structure!
Demonstration: Conditional Independence
import numpy as np
np.random.seed(42)
n = 1000
# Confounder structure: Z → X and Z → Y
Z = np.random.randn(n) # Common cause
X = 0.8 * Z + np.random.randn(n) * 0.5 # Caused by Z
Y = 0.8 * Z + np.random.randn(n) * 0.5 # Also caused by Z
# Unconditional correlation (X and Y)
corr_xy = np.corrcoef(X, Y)[0, 1]
print(f"Correlation X-Y (unconditional): {corr_xy:.3f}")
print("They look correlated!\n")
# Partial correlation (X and Y given Z)
# Regress X on Z, get residuals
slope_xz = np.polyfit(Z, X, 1)[0]
X_resid = X - slope_xz * Z
# Regress Y on Z, get residuals
slope_yz = np.polyfit(Z, Y, 1)[0]
Y_resid = Y - slope_yz * Z
# Correlation of residuals = partial correlation
partial_corr = np.corrcoef(X_resid, Y_resid)[0, 1]
print(f"Partial correlation X-Y | Z: {partial_corr:.3f}")
print("Almost zero! X and Y are conditionally independent given Z.")
print("\nThis tells us: X doesn't directly cause Y (and vice versa).")
print("The correlation was spurious - caused by the confounder Z.")
Summary: Your Causal Vocabulary Cheat Sheet
| Term | Simple Definition |
|---|---|
| DAG | A causal diagram with arrows, no loops |
| Node | A variable in the graph (a circle) |
| Edge | A causal link (an arrow) |
| Parent | Direct cause of a variable |
| Confounder | Hidden common cause creating spurious correlation |
| Mediator | Variable in the middle of a causal chain |
| Collider | Variable caused by two others |
| Lag (τ) | Time steps between cause and effect |
| Autocorrelation | Variable depending on its own past |
(i, -τ) | Tigramite notation: variable i at lag τ |
--> | Definite causal direction in output |
| Conditional independence | X and Y unrelated given Z |
Quick Quiz
Q1: In notation (2, -3), what does the -3 mean?
The variable is 3 time steps in the PAST (lag of 3).
Q2: You see --> between Temperature and Quality. What does this mean?
Temperature CAUSES Quality, and we're confident about the direction.
Q3: X and Y are correlated. After conditioning on Z, they're not. What pattern is this?
Confounder pattern! Z is a common cause of both X and Y.