How It Is Validated¶

MCPower’s statistical engine is validated through two complementary systems: an internal test suite covering OLS and mixed-effects accuracy, and an external cross-validation framework comparing MCPower’s LME solver against R’s lme4 package.

Internal Test Suite¶

MCPower includes ~11,000 lines of tests organized into specs (accuracy/validation), integration, unit, and mixed-model tests. The specs tests are the core statistical validation.

Power Accuracy Tests¶

Monte Carlo power estimates are compared against exact analytical power from non-central t and F distributions.

OLS models (test_power_accuracy.py):

Single predictor: 5 parametrized cases varying β and N
Two uncorrelated predictors: 3 cases with Σ = I
Two correlated predictors: 4 cases with VIF correction (ρ = 0.3, 0.5, 0.7)

Acceptance criterion: MC estimate within 3.5 × √[p(1−p)/5000] × 100 + 1pp of the analytical value. This is a Bonferroni-safe margin (~2-3 percentage points at typical power levels) using 5,000 simulations.

LME models (test_power_accuracy_lme.py):

Single predictor z-test: 7 parametrized cases
Single predictor likelihood-ratio test: 3 cases
Two uncorrelated predictors: 2 cases
Two correlated predictors: 2 cases (ρ = 0.3, 0.5)

All LME accuracy tests use m = 50 observations per cluster, where the within-cluster design effect is small (~1.02–1.06), allowing comparison against analytical formulas.

Type I Error Control¶

Under the null hypothesis (all effects = 0), the rejection rate must equal the nominal α.

OLS (test_type1_error.py):

Single predictor null (F-test and t-test)
Two predictors null (each rejects at ~α)
Large-sample null (catches bugs where power inflates with N)
Alpha calibration at α ∈ {0.01, 0.05, 0.10}

LME (test_type1_error_lme.py):

Same structure with K = 20–50 clusters, ICC = 0.2
Alpha calibration across standard levels

Criterion: |observed rejection rate − α × 100| < MC margin

Monotonicity Tests¶

Power must strictly increase with:

Effect size (larger β → more power)
Sample size (larger N → more power)
Significance level (larger α → more power)

Tested for both OLS (test_monotonicity.py) and LME (test_monotonicity_lme.py) models. These tests catch subtle implementation bugs that wouldn’t violate accuracy bounds but would produce nonsensical results.

Multiple Comparison Corrections¶

Correction conservativeness (test_corrections.py):

Corrected power ≤ uncorrected power under H₀
Bonferroni more conservative than FDR
FWER ≤ α for Bonferroni and Holm

Extended alpha validation (test_alpha_levels.py):

9 tests validating Bonferroni/Holm/FDR at non-default α ∈ {0.01, 0.10}
Multi-predictor null calibration with corrections

LME Accuracy Tests¶

The analytical formulas used as benchmarks for LME tests:

Design effect (within-cluster): $$D_{\text{eff}} = \frac{1 + (m-1) \times \text{ICC}}{1 + (m-2) \times \text{ICC}}$$

This is much milder than the between-cluster design effect for iid predictors — typically 1.02–1.06 for m = 50.

z-test non-centrality parameter: $$\text{NCP} = \frac{\beta \sqrt{n_{\text{eff}}}}{\sigma \sqrt{\text{VIF} \times D_{\text{eff}}}}$$

Likelihood-ratio test NCP: $$\text{NCP} = \frac{n \cdot \boldsymbol{\beta}' \Sigma \boldsymbol{\beta}}{\sigma^2 \times D_{\text{eff}}}$$

External Cross-Validation (LME4)¶

MCPower’s C++ LME solver is cross-validated against R’s lme4 package using the MCPower-LME4-validation framework. This is a separate repository with its own test harness.

Four Validation Strategies¶

Strategy	What It Tests	How
1. External Data Agreement	Do MCPower and lme4 reach the same significance decision on identical data?	Generate data with numpy, fit both solvers, compare significance decisions. Target: ≥95% agreement rate.
2. MCPower Pipeline Validation	Does MCPower’s full pipeline (data generation → fitting) produce results consistent with lme4?	Extract raw data from MCPower’s simulations, re-fit with lme4, compare significance decisions.
3. Parallel Power Simulation	Do independent power simulations produce the same power estimate?	Both MCPower and R independently generate data and estimate power. Target:
4. Statistical z-Test	Is the power difference statistically significant?	Two-proportion z-test on the power estimates from Strategy 3, with Benjamini-Hochberg FDR correction across all scenarios.

Strategy 1 validates the solver in isolation. Strategy 2 validates the full pipeline (including data generation). Strategy 3 validates end-to-end power estimates. Strategy 4 provides statistical rigor for the power comparison.

Scenario Coverage¶

95 unique scenarios across three model types:

Model Type	Core	Sensitivity	Total
Random intercepts (1 predictor)	36	24	60
Random intercepts (2 predictors)	2	8	10
Random slopes	4	10	14
Nested effects	3	8	11
Total	45	50	95

Core scenarios run all 4 strategies (45 × 4 = 180 tests). Sensitivity scenarios run Strategy 4 only (50 × 1 = 50 tests). Total: 230 scenario-strategy combinations.

Core scenarios vary: ICC ∈ {0.1, 0.2, 0.3}, clusters ∈ {10, 20, 50}, N ∈ {500, 1000}, effects ∈ {small, medium}.

Sensitivity scenarios systematically sweep one parameter while holding others fixed, producing power curves for visual and statistical comparison.

Pass/Fail Thresholds¶

Metric	Threshold	Strategy
Significance agreement rate	≥ 95%	1, 2
Beta estimate correlation	≥ 0.98	1, 2
SE estimate correlation	≥ 0.95	1, 2
τ² estimate correlation	≥ 0.95	1, 2
Power difference (absolute)	≤ 5 pp	3
Type I error rate	3%–7% (at α = 0.05)	3
z-test (FDR-corrected)	p > 0.05	4

Latest Results¶

Result: 230/230 PASS

The validation report is published at: https://freestylerscientist.pl/reports/lme4-validation-report.html

How to Run¶

Internal test suite¶

# OLS tests only (fast, ~30s)
python -m pytest MCPower/tests/ -v -m "not lme"

# All tests including LME (~6 min)
python -m pytest MCPower/tests/ -v

# Accuracy tests only
python -m pytest MCPower/tests/specs/ -v

External LME4 validation¶

The LME4 cross-validation lives in a separate repository: MCPower-LME4-validation. Clone it and follow the instructions in its README.

See the MCPower-LME4-validation repository README for setup instructions and usage.

Validation Methodology¶

Why Monte Carlo margins?¶

Monte Carlo power estimates are inherently noisy — each estimate is a binomial proportion (fraction of simulations where p < α). The standard error is √[p(1−p)/n_sims]. MCPower uses 5,000 simulations for accuracy tests, giving SE ≈ 1% at typical power levels.

The acceptance margin 3.5 × SE + 1pp uses z = 3.5 (Bonferroni correction for ~100 simultaneous tests) plus 1 percentage point for finite-sample approximation bias.

Why cross-validate against lme4?¶

For OLS models, exact analytical power formulas exist (non-central t and F distributions), so MCPower can be validated against theory. For mixed-effects models, no closed-form power formulas exist in general. The gold standard is R’s lme4 package (Bates et al., 2015), which MCPower’s C++ solver reimplements using the same profiled-deviance algorithm.

Cross-validation against lme4 verifies that:

MCPower’s C++ solver produces the same parameter estimates
MCPower’s data generation produces valid clustered data
MCPower’s power estimates match R’s independent estimates

Reproducibility¶

All tests use fixed random seeds (default: 2137 for MCPower tests, 42 for LME4 validation). Results are deterministic given the same seed and platform.

Learn More¶

LME Validation Details — detailed strategy descriptions and scenario configuration
Performance & Backends — C++ backend details and simulation precision
MCPower-LME4-validation — external validation repository