Case Study: Smart Building Energy Analysis
Scenario
You're a data scientist at a smart building company. The building manager complains about unexpected energy spikes. Your job: find what's causing them.
Available Data
- Temperature: Indoor temperature sensor
- Occupancy: Number of people in building
- HVAC_Status: Heating/cooling system state
- Energy: Total energy consumption (our target!)
- Outdoor_Temp: External temperature
Questions to Answer
- What variables CAUSE energy consumption?
- How strong are these effects?
- What interventions could reduce energy use?
Setup
import numpy as np
import matplotlib.pyplot as plt
from tigramite import data_processing as pp
from tigramite import plotting as tp
from tigramite.pcmci import PCMCI
from tigramite.independence_tests.parcorr import ParCorr
from tigramite.causal_effects import CausalEffects
from sklearn.linear_model import LinearRegression
np.random.seed(42)
Step 1: Generate Realistic Smart Building Data
We'll simulate data with a known causal structure to validate our analysis.
# Simulate smart building data (hourly measurements for 30 days)
T = 24 * 30 # 720 hours
# Hour of day (for realistic patterns)
hour = np.tile(np.arange(24), 30)
# Outdoor temperature (follows daily pattern)
outdoor_temp = 20 + 10 * np.sin(2 * np.pi * hour / 24 - np.pi/2) + np.random.randn(T) * 2
# Occupancy (high during work hours)
occupancy = np.zeros(T)
for t in range(T):
h = hour[t]
if 9 <= h <= 18: # Work hours
occupancy[t] = 50 + np.random.randn() * 10
else:
occupancy[t] = 5 + np.random.randn() * 3
occupancy = np.clip(occupancy, 0, None)
# Indoor temperature (affected by outdoor temp and HVAC)
# HVAC turns on based on occupancy and temperature
# Energy consumption is driven by HVAC, Occupancy, and Outdoor_Temp
var_names = ['Indoor_Temp', 'Occupancy', 'HVAC', 'Energy', 'Outdoor_Temp']
Step 2: Explore the Data (ALWAYS FIRST!)
# Create DataFrame
dataframe = pp.DataFrame(data, var_names=var_names)
# Plot time series
tp.plot_timeseries(dataframe, figsize=(14, 10))
plt.suptitle('Smart Building Sensor Data', fontsize=14)
plt.show()
# Check for linear relationships
tp.plot_scatterplots(dataframe=dataframe, figsize=(12, 12))
plt.suptitle('Checking Linearity of Relationships', fontsize=14)
plt.show()
# Relationships look reasonably linear → ParCorr is appropriate
# Find optimal tau_max using lag function
parcorr = ParCorr(significance='analytic')
pcmci = PCMCI(dataframe=dataframe, cond_ind_test=parcorr, verbosity=0)
correlations = pcmci.get_lagged_dependencies(tau_max=24, val_only=True)['val_matrix']
tp.plot_lagfuncs(
val_matrix=correlations,
setup_args={'var_names': var_names, 'x_base': 6},
figsize=(14, 10)
)
plt.suptitle('Lag Function - Choose tau_max where effects decay', fontsize=14)
plt.show()
# Effects seem to decay by lag 6-8 → use tau_max=8
Step 3: Discover Causal Structure
# Run PCMCI
results = pcmci.run_pcmciplus(tau_max=8, pc_alpha=0.05)
# Apply FDR correction
q_matrix = pcmci.get_corrected_pvalues(
p_matrix=results['p_matrix'],
tau_max=8,
fdr_method='fdr_bh'
)
print("Discovered Causal Links (FDR corrected):")
pcmci.print_significant_links(
p_matrix=q_matrix,
val_matrix=results['val_matrix'],
alpha_level=0.05
)
# Visualize the causal graph
corrected_graph = pcmci.get_graph_from_pmatrix(
p_matrix=q_matrix,
alpha_level=0.05,
tau_min=0,
tau_max=8
)
results['graph'] = corrected_graph
tp.plot_graph(
graph=results['graph'],
val_matrix=results['val_matrix'],
var_names=var_names,
figsize=(10, 8),
link_colorbar_label='MCI Strength',
node_colorbar_label='Auto-MCI',
show_autodependency_lags=True
)
plt.title('Causal Graph: What Drives Energy Consumption?')
plt.show()
Step 4: Quantify Causal Effects on Energy
# What are the causal parents of Energy?
energy_idx = 3
print("Causal drivers of Energy consumption:")
for i in range(5):
for tau in range(9):
if results['graph'][energy_idx, i, tau] == '-->':
val = results['val_matrix'][energy_idx, i, tau]
pval = q_matrix[energy_idx, i, tau]
print(f" {var_names[i]}(t-{tau}) → Energy: strength={val:.3f}, q={pval:.4f}")
# Estimate the effect of HVAC on Energy
X = [(2, -1)] # HVAC at lag 1
Y = [(3, 0)] # Energy at current time
causal_effects = CausalEffects(
graph=results['graph'],
graph_type='stationary_dag',
X=X, Y=Y,
tau_max=8,
verbosity=0
)
causal_effects.fit_total_effect(
dataframe=dataframe,
estimator=LinearRegression()
)
# What if we reduce HVAC activity by 0.5 units?
hvac_reduction = np.array([[-0.5]])
energy_change = causal_effects.predict_total_effect(intervention_data=hvac_reduction)
print(f"\nIntervention Analysis:")
print(f"If we reduce HVAC activity by 0.5 units...")
print(f"Predicted energy reduction: {-energy_change[0, 0]:.2f} units")
Step 5: Actionable Insights
KEY FINDINGS
- HVAC system is the PRIMARY driver of energy consumption
- Occupancy has a moderate direct effect
- Outdoor temperature indirectly affects energy (via HVAC)
- Indoor temperature is EFFECT, not cause (don't optimize for it!)
RECOMMENDATIONS
- Optimize HVAC scheduling - biggest impact potential
- Pre-cool/heat building before peak occupancy
- Implement smart occupancy detection for HVAC control
- Weather-based predictive HVAC management
WARNING - DO NOT
- Use indoor temperature as a control variable (it's an effect!)
- Assume correlation = causation without this analysis
Conclusion
In this case study, we:
- Explored the data thoroughly before analysis
- Discovered the true causal drivers of energy consumption
- Quantified the effect of potential interventions
- Generated actionable recommendations
Key insight: Without causal analysis, we might have wrongly targeted indoor temperature (which is an EFFECT of HVAC, not a cause of energy use).
Congratulations!
You've completed the Tigramite beginner tutorial. You now have the skills to:
- Prepare data for causal analysis
- Choose appropriate tests and methods
- Discover causal relationships
- Quantify causal effects
- Make data-driven intervention decisions
Happy causal discovery!