Module 1.2: Core Concepts in Plain English

20 min Prerequisites: Module 1.1

What You'll Learn

The vocabulary of causal inference (DAG, node, edge, etc.)
How to read a causal graph
Special concepts for time series (lags, autocorrelation)
Tigramite's notation system

Causal Graphs: The Family Tree of Variables

A causal graph is like a family tree, but for variables:

Term	Meaning	Family Analogy
Node	A variable (circle)	A person
Edge	A causal link (arrow)	Parent-child relationship
Parent	Direct cause	Your mom or dad
Child	Direct effect	Your son or daughter
Ancestor	Any cause upstream	Grandparents, great-grandparents
Descendant	Any effect downstream	Grandchildren

DAG: Directed Acyclic Graph

A DAG is a causal graph with two rules:

Directed: Arrows point one way (cause → effect)
Acyclic: No loops (you can't be your own grandparent!)

Valid DAG:

A → B → C ↓ ↓ D → E

Invalid (has a cycle):

A → B → C ↑ | └───────┘ ← Not allowed!

Visualizing a DAG

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

fig, ax = plt.subplots(figsize=(10, 6))

# Node positions
nodes = {
    'Temperature': (0.2, 0.7),
    'Pressure': (0.5, 0.7),
    'Humidity': (0.8, 0.7),
    'Quality': (0.5, 0.3)
}

# Draw nodes
for name, (x, y) in nodes.items():
    circle = plt.Circle((x, y), 0.08, color='lightblue', ec='steelblue', linewidth=2)
    ax.add_patch(circle)
    ax.text(x, y, name, ha='center', va='center', fontsize=10, fontweight='bold')

# Draw edges (arrows)
edges = [
    ('Temperature', 'Quality'),
    ('Pressure', 'Quality'),
    ('Humidity', 'Pressure'),
]

for start, end in edges:
    x1, y1 = nodes[start]
    x2, y2 = nodes[end]
    ax.annotate('', xy=(x2, y2 + 0.08), xytext=(x1, y1 - 0.08),
                arrowprops=dict(arrowstyle='->', color='coral', lw=2))

ax.set_title('Example DAG: Factory Process', fontsize=14)
plt.show()

# Reading this graph:
# - Temperature DIRECTLY causes Quality
# - Pressure DIRECTLY causes Quality
# - Humidity causes Quality INDIRECTLY (through Pressure)

The Big Three: Confounder, Mediator, Collider

These are the three fundamental patterns in causal graphs:

1. Confounder (Common Cause) - MOST IMPORTANT!

C / \ v v A B

C causes BOTH A and B
Creates spurious correlation between A and B
Example: Summer (C) causes both ice cream sales (A) and drowning (B)
Key insight: A and B look related, but neither causes the other!

2. Mediator (Chain)

A → M → B

A causes M, which causes B
M is the "middle step" in the causal path
Example: Smoking (A) → Tar in lungs (M) → Cancer (B)
Key insight: A affects B, but indirectly through M

3. Collider (Common Effect)

A B \ / v C

Both A and B cause C
A and B are NOT related to each other
Example: Both Talent (A) and Luck (B) cause Success (C)
Key insight: A and B are independent - knowing one tells you nothing about the other

Good news: You don't need to memorize these! Tigramite figures out which pattern fits your data automatically.

Time Series Graphs: Adding Time

In time series, the SAME variable at DIFFERENT times are treated as DIFFERENT nodes!

Time: t-2 t-1 t ┌───┐ ┌───┐ ┌───┐ X: │X₋₂│────▶│X₋₁│────▶│X₀ │ └───┘ └───┘ └───┘ │ ▼ ┌───┐ ┌───┐ ┌───┐ Y: │Y₋₂│────▶│Y₋₁│────▶│Y₀ │ └───┘ └───┘ └───┘

Key Time Series Terms

Term	Symbol	Meaning
Lag	τ (tau)	How many time steps back
Autocorrelation	X(t-1) → X(t)	Variable depends on its own past
Lagged effect	X(t-τ) → Y(t)	X at time t-τ causes Y at time t
Contemporaneous	X(t) ↔ Y(t)	Same-time relationship

Tigramite's Notation: (variable, -lag)

Tigramite uses a simple tuple notation: (variable_index, -lag)

Notation	Meaning
`(0, 0)`	Variable 0 at current time
`(0, -1)`	Variable 0, one step in the past
`(1, -2)`	Variable 1, two steps in the past
`(2, -5)`	Variable 2, five steps in the past

Example

If we have variables: ['Temperature', 'Pressure', 'Quality']

(0, -1) = Temperature yesterday
(1, -2) = Pressure 2 days ago
(2, 0) = Quality today

Code Example: Understanding Tigramite Links

import numpy as np

var_names = ['Temperature', 'Pressure', 'Quality']

# A link is represented as: ((cause_var, -lag), target_var)
# Example links in Tigramite format:

example_links = [
    ((0, -1), 2),  # Temperature at lag 1 → Quality
    ((1, -2), 2),  # Pressure at lag 2 → Quality
    ((0, -1), 0),  # Temperature autocorrelation
]

print("Example causal links:")
print("="*50)
for (cause_var, lag), target in example_links:
    cause_name = var_names[cause_var]
    target_name = var_names[target]
    print(f"({cause_var}, {lag}) → {target}")
    print(f"  Meaning: {cause_name} at lag {-lag} causes {target_name}")
    print()

Graph Output Types in Tigramite

When you run causal discovery, Tigramite returns a graph with edge symbols:

Symbol	Meaning	When it occurs
`-->`	Definite causal direction	Lagged links (time settles direction)
`<--`	Reverse direction	Same as above, other direction
`o-o`	Undetermined direction	Contemporaneous (same time)
`x-x`	Conflicting evidence	Rare, indicates issues
`<->`	Bidirected (hidden confounder)	Only with LPCMCI method
(empty)	No link	Variables not causally related

Reading a Tigramite Graph Output

# Example: Understanding a Tigramite graph output
import numpy as np

# Simulated graph array (3 variables, max lag 2)
# Shape: (N, N, tau_max+1) where graph[i, j, tau] = link from j at lag tau to i

example_graph = np.array([
    # Links TO variable 0 (Temperature)
    [['', '-->', ''],      # FROM var 0 at lags [0, 1, 2]
     ['', '', ''],         # FROM var 1
     ['', '', '']],        # FROM var 2

    # Links TO variable 1 (Pressure)
    [['', '', ''],
     ['', '-->', ''],      # Pressure autocorrelation at lag 1
     ['', '', '']],

    # Links TO variable 2 (Quality)
    [['', '-->', ''],      # Temperature at lag 1 → Quality
     ['', '', '-->'],      # Pressure at lag 2 → Quality
     ['', '', '']]
])

var_names = ['Temperature', 'Pressure', 'Quality']

print("Reading the graph array:")
print("="*50)
for j in range(3):
    for i in range(3):
        for tau in range(3):
            if example_graph[j, i, tau] != '':
                print(f"{var_names[i]}(t-{tau}) {example_graph[j, i, tau]} {var_names[j]}(t)")

Conditional Independence: The Key Insight

Causal discovery works by testing conditional independence:

"Are X and Y independent GIVEN Z?" Written as: X ⊥ Y | Z

Why This Matters

If X → Y (direct cause), then X and Y will be dependent even after conditioning on other variables.
If X ← Z → Y (confounder), then X and Y become independent once we condition on Z.

Tigramite tests many conditional independencies to figure out the causal structure!

Demonstration: Conditional Independence

import numpy as np

np.random.seed(42)
n = 1000

# Confounder structure: Z → X and Z → Y
Z = np.random.randn(n)  # Common cause
X = 0.8 * Z + np.random.randn(n) * 0.5  # Caused by Z
Y = 0.8 * Z + np.random.randn(n) * 0.5  # Also caused by Z

# Unconditional correlation (X and Y)
corr_xy = np.corrcoef(X, Y)[0, 1]
print(f"Correlation X-Y (unconditional): {corr_xy:.3f}")
print("They look correlated!\n")

# Partial correlation (X and Y given Z)
# Regress X on Z, get residuals
slope_xz = np.polyfit(Z, X, 1)[0]
X_resid = X - slope_xz * Z

# Regress Y on Z, get residuals
slope_yz = np.polyfit(Z, Y, 1)[0]
Y_resid = Y - slope_yz * Z

# Correlation of residuals = partial correlation
partial_corr = np.corrcoef(X_resid, Y_resid)[0, 1]
print(f"Partial correlation X-Y | Z: {partial_corr:.3f}")
print("Almost zero! X and Y are conditionally independent given Z.")
print("\nThis tells us: X doesn't directly cause Y (and vice versa).")
print("The correlation was spurious - caused by the confounder Z.")

Summary: Your Causal Vocabulary Cheat Sheet

Term	Simple Definition
DAG	A causal diagram with arrows, no loops
Node	A variable in the graph (a circle)
Edge	A causal link (an arrow)
Parent	Direct cause of a variable
Confounder	Hidden common cause creating spurious correlation
Mediator	Variable in the middle of a causal chain
Collider	Variable caused by two others
Lag (τ)	Time steps between cause and effect
Autocorrelation	Variable depending on its own past
`(i, -τ)`	Tigramite notation: variable i at lag τ
`-->`	Definite causal direction in output
Conditional independence	X and Y unrelated given Z

Quick Quiz

Q1: In notation (2, -3), what does the -3 mean?

The variable is 3 time steps in the PAST (lag of 3).

Q2: You see --> between Temperature and Quality. What does this mean?

Temperature CAUSES Quality, and we're confident about the direction.

Q3: X and Y are correlated. After conditioning on Z, they're not. What pattern is this?

Confounder pattern! Z is a common cause of both X and Y.

← Previous

1.1 Why Causal Inference?

1.3 Your First Workflow