LME Validation (vs R’s lme4)

What It Is

MCPower’s mixed-effects solver uses a custom C++ implementation (profiled-deviance optimization with Brent’s method for random intercepts, L-BFGS-B for random slopes and nested models). To ensure this solver produces correct results, MCPower includes a comprehensive validation framework that compares it against R’s lme4 — the gold-standard mixed-effects package used across statistics, psychology, ecology, and medicine.

The validation covers 95 scenarios across three model types (random intercepts, random slopes, nested effects) using four independent strategies. The latest run (February 22, 2026) shows 230/230 scenario-strategy combinations PASS.


How It Works

The validation framework (LME4-validation/) uses four complementary strategies. Each tests a different aspect of correctness, so a scenario must pass all applicable strategies to be considered valid.

Strategy 1: External Data Agreement

Generates datasets using pure NumPy (independent of MCPower), then fits the same data with both MCPower’s C++ solver and R’s lme4. Compares significance decisions (reject/fail-to-reject) across 100 datasets per scenario.

What it proves: MCPower’s solver reaches the same conclusions as lme4 on identical data.

Strategy 2: MCPower DGP Validation

Uses MCPower’s internal data generation pipeline, exports the raw data, and fits it with both solvers. Compares parameter estimates (betas, standard errors, variance components) and significance decisions.

What it proves: MCPower generates realistic clustered data and fits it correctly.

Strategy 3: Parallel Power Simulation

Both MCPower and R independently generate data and run power simulations using the same design parameters but independent RNG streams. Compares the resulting power estimates.

What it proves: MCPower’s full pipeline (data generation + model fitting + power calculation) produces equivalent power estimates to an independent R implementation.

Strategy 4: Statistical Z-test with FDR Correction

Runs a two-proportion z-test comparing MCPower and R power estimates, with Benjamini-Hochberg FDR correction across all scenarios. This is the most stringent test — it detects statistically significant differences even when absolute differences are small.

What it proves: Power estimates are statistically indistinguishable after controlling for multiple comparisons.


Guidelines

Pass/Fail Thresholds

Metric

Threshold

Strategy

Significance agreement rate

≥ 95%

1, 2

Beta correlation (Pearson r)

≥ 0.98

1, 2

SE correlation (Pearson r)

≥ 0.95

1, 2

Tau² correlation (Pearson r)

≥ 0.95

1, 2

Power difference

≤ 0.05

3

Z-test (FDR-corrected)

p > 0.05

4

Type I error rate

0.03–0.07

1, 2

Scenario Coverage

Model Type

Core Scenarios

Sensitivity Sweep

Total

Random intercepts (1 predictor)

36

24

60

Random intercepts (2 predictors)

2

8

10

Random slopes

4

10

14

Nested effects

3

8

11

Total

45

50

95 unique scenarios

Core scenarios run all four strategies; sensitivity scenarios run Strategy 4 only — yielding 230 total scenario-strategy combinations.

What the Scenarios Vary

  • ICC: 0.1 to 0.5

  • Number of clusters: 5 to 50

  • Sample size: 50 to 2,400

  • Effect sizes: 0.05 to 0.50

  • Model complexity: 1–2 predictors, intercept-only through nested


Common Patterns

Running Validation Yourself

The validation suite lives in a separate repository: MCPower-LME4-validation. See its README for setup instructions, usage, and available options.

Reading the Report

The HTML report contains:

  1. Summary matrix — pass/fail grid across all scenarios and strategies

  2. Strategy detail tables — agreement rates, correlations, power differences

  3. Parameter recovery — scatter plots of MCPower vs R estimates

  4. FDR-corrected p-values — Strategy 4 statistical test results


Learn More