# How It Is Validated

MCPower's statistical engine is validated through two complementary systems: an **internal test suite** covering OLS and mixed-effects accuracy, and an **external cross-validation framework** comparing MCPower's LME solver against R's lme4 package.

## Internal Test Suite

MCPower includes ~11,000 lines of tests organized into specs (accuracy/validation), integration, unit, and mixed-model tests. The specs tests are the core statistical validation.

### Power Accuracy Tests

Monte Carlo power estimates are compared against **exact analytical power** from non-central t and F distributions.

**OLS models** (`test_power_accuracy.py`):
- Single predictor: 5 parametrized cases varying β and N
- Two uncorrelated predictors: 3 cases with Σ = I
- Two correlated predictors: 4 cases with VIF correction (ρ = 0.3, 0.5, 0.7)

**Acceptance criterion:** MC estimate within `3.5 × √[p(1−p)/5000] × 100 + 1pp` of the analytical value. This is a Bonferroni-safe margin (~2-3 percentage points at typical power levels) using 5,000 simulations.

**LME models** (`test_power_accuracy_lme.py`):
- Single predictor z-test: 7 parametrized cases
- Single predictor likelihood-ratio test: 3 cases
- Two uncorrelated predictors: 2 cases
- Two correlated predictors: 2 cases (ρ = 0.3, 0.5)

All LME accuracy tests use m = 50 observations per cluster, where the within-cluster design effect is small (~1.02–1.06), allowing comparison against analytical formulas.

### Type I Error Control

Under the null hypothesis (all effects = 0), the rejection rate must equal the nominal α.

**OLS** (`test_type1_error.py`):
- Single predictor null (F-test and t-test)
- Two predictors null (each rejects at ~α)
- Large-sample null (catches bugs where power inflates with N)
- Alpha calibration at α ∈ {0.01, 0.05, 0.10}

**LME** (`test_type1_error_lme.py`):
- Same structure with K = 20–50 clusters, ICC = 0.2
- Alpha calibration across standard levels

**Criterion:** |observed rejection rate − α × 100| < MC margin

### Monotonicity Tests

Power must **strictly increase** with:
- Effect size (larger β → more power)
- Sample size (larger N → more power)
- Significance level (larger α → more power)

Tested for both OLS (`test_monotonicity.py`) and LME (`test_monotonicity_lme.py`) models. These tests catch subtle implementation bugs that wouldn't violate accuracy bounds but would produce nonsensical results.

### Multiple Comparison Corrections

**Correction conservativeness** (`test_corrections.py`):
- Corrected power ≤ uncorrected power under H₀
- Bonferroni more conservative than FDR
- FWER ≤ α for Bonferroni and Holm

**Extended alpha validation** (`test_alpha_levels.py`):
- 9 tests validating Bonferroni/Holm/FDR at non-default α ∈ {0.01, 0.10}
- Multi-predictor null calibration with corrections

### LME Accuracy Tests

The analytical formulas used as benchmarks for LME tests:

**Design effect (within-cluster):**
$$D_{\text{eff}} = \frac{1 + (m-1) \times \text{ICC}}{1 + (m-2) \times \text{ICC}}$$

This is much milder than the between-cluster design effect for iid predictors — typically 1.02–1.06 for m = 50.

**z-test non-centrality parameter:**
$$\text{NCP} = \frac{\beta \sqrt{n_{\text{eff}}}}{\sigma \sqrt{\text{VIF} \times D_{\text{eff}}}}$$

**Likelihood-ratio test NCP:**
$$\text{NCP} = \frac{n \cdot \boldsymbol{\beta}' \Sigma \boldsymbol{\beta}}{\sigma^2 \times D_{\text{eff}}}$$

---

## External Cross-Validation (LME4)

MCPower's C++ LME solver is cross-validated against R's lme4 package using the **[MCPower-LME4-validation](https://github.com/pawlenartowicz/MCPower-LME4-validation)** framework. This is a separate repository with its own test harness.

### Four Validation Strategies

| Strategy | What It Tests | How |
|----------|--------------|-----|
| **1. External Data Agreement** | Do MCPower and lme4 reach the same significance decision on identical data? | Generate data with numpy, fit both solvers, compare significance decisions. Target: ≥95% agreement rate. |
| **2. MCPower Pipeline Validation** | Does MCPower's full pipeline (data generation → fitting) produce results consistent with lme4? | Extract raw data from MCPower's simulations, re-fit with lme4, compare significance decisions. |
| **3. Parallel Power Simulation** | Do independent power simulations produce the same power estimate? | Both MCPower and R independently generate data and estimate power. Target: |difference| ≤ 5 percentage points. |
| **4. Statistical z-Test** | Is the power difference statistically significant? | Two-proportion z-test on the power estimates from Strategy 3, with Benjamini-Hochberg FDR correction across all scenarios. |

Strategy 1 validates the **solver** in isolation. Strategy 2 validates the **full pipeline** (including data generation). Strategy 3 validates **end-to-end power estimates**. Strategy 4 provides **statistical rigor** for the power comparison.

### Scenario Coverage

**95 unique scenarios** across three model types:

| Model Type | Core | Sensitivity | Total |
|------------|------|-------------|-------|
| Random intercepts (1 predictor) | 36 | 24 | 60 |
| Random intercepts (2 predictors) | 2 | 8 | 10 |
| Random slopes | 4 | 10 | 14 |
| Nested effects | 3 | 8 | 11 |
| **Total** | **45** | **50** | **95** |

Core scenarios run all 4 strategies (45 × 4 = 180 tests). Sensitivity scenarios run Strategy 4 only (50 × 1 = 50 tests). **Total: 230 scenario-strategy combinations.**

Core scenarios vary: ICC ∈ {0.1, 0.2, 0.3}, clusters ∈ {10, 20, 50}, N ∈ {500, 1000}, effects ∈ {small, medium}.

Sensitivity scenarios systematically sweep one parameter while holding others fixed, producing power curves for visual and statistical comparison.

### Pass/Fail Thresholds

| Metric | Threshold | Strategy |
|--------|-----------|----------|
| Significance agreement rate | ≥ 95% | 1, 2 |
| Beta estimate correlation | ≥ 0.98 | 1, 2 |
| SE estimate correlation | ≥ 0.95 | 1, 2 |
| τ² estimate correlation | ≥ 0.95 | 1, 2 |
| Power difference (absolute) | ≤ 5 pp | 3 |
| Type I error rate | 3%–7% (at α = 0.05) | 3 |
| z-test (FDR-corrected) | p > 0.05 | 4 |

### Latest Results

**Result:** **230/230 PASS**

The validation report is published at: https://freestylerscientist.pl/reports/lme4-validation-report.html

---

## How to Run

### Internal test suite

```bash
# OLS tests only (fast, ~30s)
python -m pytest MCPower/tests/ -v -m "not lme"

# All tests including LME (~6 min)
python -m pytest MCPower/tests/ -v

# Accuracy tests only
python -m pytest MCPower/tests/specs/ -v
```

### External LME4 validation

The LME4 cross-validation lives in a **separate repository**: [MCPower-LME4-validation](https://github.com/pawlenartowicz/MCPower-LME4-validation). Clone it and follow the instructions in its README.

See the [MCPower-LME4-validation repository](https://github.com/pawlenartowicz/MCPower-LME4-validation) README for setup instructions and usage.

---

## Validation Methodology

### Why Monte Carlo margins?

Monte Carlo power estimates are inherently noisy — each estimate is a binomial proportion (fraction of simulations where p < α). The standard error is `√[p(1−p)/n_sims]`. MCPower uses 5,000 simulations for accuracy tests, giving SE ≈ 1% at typical power levels.

The acceptance margin `3.5 × SE + 1pp` uses z = 3.5 (Bonferroni correction for ~100 simultaneous tests) plus 1 percentage point for finite-sample approximation bias.

### Why cross-validate against lme4?

For OLS models, exact analytical power formulas exist (non-central t and F distributions), so MCPower can be validated against theory. For mixed-effects models, no closed-form power formulas exist in general. The gold standard is R's lme4 package (Bates et al., 2015), which MCPower's C++ solver reimplements using the same profiled-deviance algorithm.

Cross-validation against lme4 verifies that:
1. MCPower's C++ solver produces the same parameter estimates
2. MCPower's data generation produces valid clustered data
3. MCPower's power estimates match R's independent estimates

### Reproducibility

All tests use fixed random seeds (default: 2137 for MCPower tests, 42 for LME4 validation). Results are deterministic given the same seed and platform.

---

## Learn More

- **[LME Validation Details](lme-validation.md)** — detailed strategy descriptions and scenario configuration
- **[Performance & Backends](concepts/performance.md)** — C++ backend details and simulation precision
- **[MCPower-LME4-validation](https://github.com/pawlenartowicz/MCPower-LME4-validation)** — external validation repository