Tutorial: Random Intercepts for Clustered Data

Goal

You have clustered data (e.g., students in schools) and need to account for the clustering in your power analysis.


Table of Contents


Full Working Example

from mcpower import MCPower

# 1. Define the model with a random intercept for school
model = MCPower("satisfaction ~ treatment + motivation + (1|school)")
model.set_simulations(400)

# 2. Configure the clustering structure
model.set_cluster("school", ICC=0.2, n_clusters=20)

# 3. Set the expected effect sizes (standardized coefficients)
model.set_effects("treatment=0.5, motivation=0.3")

# 4. Allow up to 10% convergence failures
model.set_max_failed_simulations(0.10)

# 5. Run the power analysis
model.find_power(sample_size=1000)
Cluster variables configured: school
Effects: treatment=0.5, motivation=0.3
Model settings applied successfully
================================================================================
MONTE CARLO POWER ANALYSIS RESULTS
================================================================================

Power Analysis Results (N=1000):
Test                                     Power    Target   Status  
-------------------------------------------------------------------
overall                                  100.0    80       ✓       
treatment                                100.0    80       ✓       
motivation                               100.0    80       ✓       

Result: 3/3 tests achieved target power

Copy this script into a Python file and run it. With 1000 observations split across 20 schools (50 students per school), you should see power well above 80% for both effects.


Step-by-Step Walkthrough

Step 1: Define the model formula

model = MCPower("satisfaction ~ treatment + motivation + (1|school)")
model.set_simulations(400)

The formula uses R-style syntax. The key part is (1|school), which tells MCPower that observations are clustered within schools and each school has its own random intercept. The 1 means “intercept” – every school gets a baseline shift up or down from the grand mean.

Both ~ and = are valid as the formula separator.

Step 2: Configure the clustering

model.set_cluster("school", ICC=0.2, n_clusters=20)
  • "school" – matches the grouping variable name in the formula.

  • ICC=0.2 – the intraclass correlation coefficient. An ICC of 0.2 means that 20% of the total variance in satisfaction is due to between-school differences. This is a typical value for educational research.

  • n_clusters=20 – the number of schools in the study. With sample_size=1000 and 20 clusters, each school will have 50 students.

You must specify either n_clusters or cluster_size, but not both:

# These two are equivalent when sample_size=1000:
model.set_cluster("school", ICC=0.2, n_clusters=20)    # 1000/20 = 50 per school
model.set_cluster("school", ICC=0.2, cluster_size=50)   # 1000/50 = 20 schools

Step 3: Set effect sizes

model.set_effects("treatment=0.5, motivation=0.3")

These are standardized regression coefficients (beta weights):

  • treatment=0.5 – a medium-large effect. Receiving treatment increases satisfaction by 0.5 standard deviations.

  • motivation=0.3 – a small-to-medium effect. Each 1 SD increase in motivation predicts a 0.3 SD increase in satisfaction.

Step 4: Set the convergence failure threshold

model.set_max_failed_simulations(0.10)

Mixed-effects models occasionally fail to converge during individual simulations. The default threshold is 3% (0.03). For random-intercept models, 10% is a safe tolerance – most simulations will converge. If the failure rate exceeds this threshold, MCPower raises an error, signaling that your design may have too few clusters or too few observations per cluster.

Step 5: Run the analysis

model.find_power(sample_size=1000)
Cluster variables configured: school
Effects: treatment=0.5, motivation=0.3
Model settings applied successfully
================================================================================
MONTE CARLO POWER ANALYSIS RESULTS
================================================================================

Power Analysis Results (N=1000):
Test                                     Power    Target   Status  
-------------------------------------------------------------------
overall                                  100.0    80       ✓       
treatment                                100.0    80       ✓       
motivation                               100.0    80       ✓       

Result: 3/3 tests achieved target power

MCPower runs 800 Monte Carlo simulations (the default for mixed models). In each simulation, it:

  1. Generates synthetic clustered data with the specified ICC and effects

  2. Fits a linear mixed-effects model using the C++ native solver

  3. Tests each fixed effect for significance at alpha = 0.05

Power is the proportion of simulations where each effect is significant.


Statistical Model

The random intercept model is:

\[y_{ij} = \mathbf{x}_{ij}'\boldsymbol{\beta} + b_j + e_{ij}\]
\[b_j \sim \mathcal{N}(0, \tau^2), \quad e_{ij} \sim \mathcal{N}(0, \sigma^2)\]

Where:

  • \(y_{ij}\) is the outcome for observation \(i\) in cluster \(j\)

  • \(\mathbf{x}_{ij}'\boldsymbol{\beta}\) is the fixed-effects component (treatment, motivation)

  • \(b_j\) is the random intercept for cluster \(j\) – the school-level deviation from the grand mean

  • \(e_{ij}\) is the residual error for observation \(i\) in cluster \(j\)

The between-cluster variance \(\tau^2\) is derived from the ICC:

\[\tau^2 = \frac{\text{ICC}}{1 - \text{ICC}} \cdot \sigma^2_{\text{within}}\]

where \(\sigma^2_{\text{within}}\) includes both the residual variance and the variance explained by fixed effects. Since MCPower standardizes continuous predictors to variance 1, this simplifies to:

\[\sigma^2_{\text{within}} = 1 + \sum_k \beta_k^2 \cdot \text{Var}(X_k)\]

MCPower automatically adjusts \(\tau^2\) upward so that the actual ICC in the generated data matches the value you specify, even after accounting for the variance explained by fixed effects.


Design Recommendations

MCPower assigns treatment at the individual level within each cluster. Because the treatment contrast is estimated within each cluster, the cluster random intercept largely cancels out. As a result, ICC and cluster allocation have minimal impact on fixed-effect power – power depends primarily on total N.

Note: This applies because MCPower generates treatment at the individual level within each cluster. In cluster-randomized trials where entire clusters receive the same treatment, the design effect (1 + (m-1) × ICC) would substantially reduce power.

The number of clusters matters for convergence and estimation stability, not for power:

  • Minimum: 5 observations per cluster (enforced by MCPower; warning below 10).

  • Recommended: 10–20 clusters for stable random effect estimation and convergence.


Common Variations

Using cluster_size instead of n_clusters

If you know the cluster size rather than the number of clusters:

model = MCPower("satisfaction ~ treatment + motivation + (1|school)")
model.set_simulations(400)
model.set_cluster("school", ICC=0.2, cluster_size=50)
model.set_effects("treatment=0.5, motivation=0.3")
model.set_max_failed_simulations(0.10)
model.find_power(sample_size=1000)  # 1000/50 = 20 schools
Cluster variables configured: school
Effects: treatment=0.5, motivation=0.3
Model settings applied successfully
================================================================================
MONTE CARLO POWER ANALYSIS RESULTS
================================================================================

Power Analysis Results (N=1000):
Test                                     Power    Target   Status  
-------------------------------------------------------------------
overall                                  100.0    80       ✓       
treatment                                100.0    80       ✓       
motivation                               100.0    80       ✓       

Result: 3/3 tests achieved target power

Finding the required sample size

Search for the minimum sample size that achieves 80% power:

model = MCPower("satisfaction ~ treatment + motivation + (1|school)")
model.set_simulations(400)
model.set_cluster("school", ICC=0.2, n_clusters=20)
model.set_effects("treatment=0.5, motivation=0.3")
model.set_max_failed_simulations(0.10)
model.find_sample_size(
    from_size=200,
    to_size=2000,
    by=200,
    target_test="treatment",
)
Cluster variables configured: school
Effects: treatment=0.5, motivation=0.3
Model settings applied successfully
================================================================================
SAMPLE SIZE ANALYSIS RESULTS
================================================================================

Sample Size Requirements:
Test                                     Required N  
-----------------------------------------------------
treatment                                200         

Note: as the total sample size changes, the cluster size changes too (sample_size / n_clusters). MCPower handles this automatically.

Binary treatment variable

If treatment is a binary variable (0/1):

model = MCPower("satisfaction ~ treatment + motivation + (1|school)")
model.set_simulations(400)
model.set_cluster("school", ICC=0.2, n_clusters=20)
model.set_variable_type("treatment=binary")
model.set_effects("treatment=0.5, motivation=0.3")
model.set_max_failed_simulations(0.10)
model.find_power(sample_size=1000)
Variable types: treatment=(binary,0.5)
Cluster variables configured: school
Effects: treatment=0.5, motivation=0.3
Model settings applied successfully
================================================================================
MONTE CARLO POWER ANALYSIS RESULTS
================================================================================

Power Analysis Results (N=1000):
Test                                     Power    Target   Status  
-------------------------------------------------------------------
overall                                  100.0    80       ✓       
treatment                                100.0    80       ✓       
motivation                               100.0    80       ✓       

Result: 3/3 tests achieved target power

Multiple fixed effects

Add as many fixed effects as your design requires:

model = MCPower("satisfaction ~ treatment + motivation + age + gender + (1|school)")
model.set_simulations(400)
model.set_cluster("school", ICC=0.2, n_clusters=30)
model.set_variable_type("treatment=binary, gender=binary")
model.set_effects("treatment=0.5, motivation=0.3, age=0.1, gender=0.2")
model.set_max_failed_simulations(0.10)
model.find_power(sample_size=1500)
Variable types: treatment=(binary,0.5), gender=(binary,0.5)
Cluster variables configured: school
Effects: treatment=0.5, motivation=0.3, age=0.1, gender=0.2
Model settings applied successfully
================================================================================
MONTE CARLO POWER ANALYSIS RESULTS
================================================================================

Power Analysis Results (N=1500):
Test                                     Power    Target   Status  
-------------------------------------------------------------------
overall                                  100.0    80       ✓       
treatment                                100.0    80       ✓       
motivation                               100.0    80       ✓       
age                                      96.8     80       ✓       
gender                                   96.8     80       ✓       

Result: 5/5 tests achieved target power

Next Steps