Tutorial: Finding the Right Sample Size¶
Goal: You want to find the minimum sample size needed for adequate power.
Table of Contents¶
Full Working Example¶
You are designing an educational intervention study. A new teaching method is tested against the standard approach, while controlling for students’ prior achievement. You need to determine the minimum number of students to recruit.
from mcpower import MCPower
# 1. Define the model
model = MCPower("test_score = teaching_method + prior_achievement")
model.set_simulations(400)
# 2. Set variable types
model.set_variable_type("teaching_method=binary")
# 3. Set expected effect sizes
model.set_effects("teaching_method=0.40, prior_achievement=0.25")
# 4. Search for the minimum sample size
model.find_sample_size(
target_test="teaching_method",
from_size=50,
to_size=300,
by=28,
)
Variable types: teaching_method=(binary,0.5)
Effects: teaching_method=0.4, prior_achievement=0.25
Model settings applied successfully
================================================================================
SAMPLE SIZE ANALYSIS RESULTS
================================================================================
Sample Size Requirements:
Test Required N
-----------------------------------------------------
teaching_method 190
Step-by-Step Walkthrough¶
Lines 1-3: Model setup¶
model = MCPower("test_score = teaching_method + prior_achievement")
model.set_simulations(400)
model.set_variable_type("teaching_method=binary")
model.set_effects("teaching_method=0.40, prior_achievement=0.25")
This is the same setup as a basic power analysis (see Tutorial: Your First Power Analysis). We define a model with one binary predictor (teaching_method) and one continuous covariate (prior_achievement), then set their expected effect sizes.
teaching_method=0.40 – a small-to-medium effect for a binary predictor (between 0.20 small and 0.50 medium)
prior_achievement=0.25 – a medium effect for a continuous predictor
Line 4: Search for the minimum sample size¶
model.find_sample_size(
target_test="teaching_method",
from_size=50,
to_size=300,
by=28,
)
Instead of checking power at a single sample size, find_sample_size evaluates power across a search grid:
Parameter |
Value |
Meaning |
|---|---|---|
|
|
Find the sample size needed to detect this specific effect |
|
|
Start searching at N=50 |
|
|
Stop searching at N=300 |
|
|
Evaluate power every 10 participants: N=50, 60, 70, …, 300 |
MCPower runs 1,600 simulations at each sample size in the grid and reports the smallest N where power first reaches the target (default: 80%).
The search grid concept¶
The search grid is the set of sample sizes that MCPower evaluates:
N=50 → N=60 → N=70 → ... → N=200 (first ≥ 80%) → ... → N=300
MCPower evaluates every point in the grid. Smaller by values give a more precise answer but take longer. Larger by values run faster but might overshoot the true minimum.
Practical advice:
Start with
by=10orby=25for a rough estimateNarrow the range and reduce
byfor a precise answerThe default grid (
from_size=30,to_size=200,by=5) works well for many common designs
Output Interpretation¶
Sample Size Requirements:
Test Required N
-----------------------------------------------------
teaching_method 200
Column |
Meaning |
|---|---|
Test |
The coefficient being tested |
Required N |
The smallest sample size in the grid where power meets the target |
Verdict: You need at least 200 students (100 per group) to detect the teaching method effect at alpha = 0.05.
If the search range is exhausted without reaching the target, increase to_size or reconsider your effect size assumptions.
Adding Scenario Analysis¶
Use scenarios=True to see how the required sample size changes under realistic conditions:
model.find_sample_size(
target_test="teaching_method",
from_size=50,
to_size=300,
by=28,
scenarios=True,
)
================================================================================
SCENARIO-BASED MONTE CARLO POWER ANALYSIS RESULTS
================================================================================
================================================================================
SCENARIO SUMMARY
================================================================================
Uncorrected Sample Sizes:
Test Optimistic Realistic Doomer
-------------------------------------------------------------------------------
teaching_method 190 218 218
================================================================================
Scenario |
Min N |
Interpretation |
|---|---|---|
Optimistic |
200 |
If everything goes perfectly |
Realistic |
200 |
Under mild violations of assumptions |
Doomer |
220 |
Under strong violations of assumptions |
Planning advice: Use the Realistic estimate (N=200) for your primary justification. If you can afford N=220 (the Doomer estimate), your study is robust against substantial assumption violations.
Raising the Target to 90% Power¶
The default target is 80%. To use 90% power instead:
model.set_power(90)
model.find_sample_size(
target_test="teaching_method",
from_size=50,
to_size=400,
by=40,
)
================================================================================
SAMPLE SIZE ANALYSIS RESULTS
================================================================================
Sample Size Requirements:
Test Required N
-----------------------------------------------------
teaching_method 250
Moving from 80% to 90% power typically requires 20-30% more participants.
Common Variations¶
Detailed output with power curve¶
model.find_sample_size(
target_test="teaching_method",
from_size=50,
to_size=300,
by=28,
summary="long",
)
The summary="long" option prints a full table showing power at every sample size in the grid, and displays a power curve plot (when running in an environment that supports plotting). This is useful for seeing how power grows across the range, rather than just the single threshold crossing.
Test all effects¶
model.find_sample_size(
target_test="all",
from_size=50,
to_size=300,
by=28,
)
This reports the minimum N for each predictor and the overall F-test. Different effects may require different sample sizes.
Combine scenarios with corrections¶
model.find_sample_size(
target_test="teaching_method",
from_size=50,
to_size=400,
by=40,
scenarios=True,
correction="bonferroni",
)
When testing multiple effects with a correction for multiple comparisons, the required sample size increases. Scenario analysis then shows how much additional buffer you need for assumption violations.
Coarse-to-fine search¶
For efficient searching, start coarse and then zoom in:
# Step 1: Rough scan
model.find_sample_size(
target_test="teaching_method",
from_size=50,
to_size=500,
by=50,
)
# Result: Min N ≈ 200
# Step 2: Precise scan around the rough estimate
model.find_sample_size(
target_test="teaching_method",
from_size=150,
to_size=250,
by=12,
)
# Result: Min N = 200
Get results programmatically¶
result = model.find_sample_size(
target_test="teaching_method",
from_size=50,
to_size=300,
by=28,
return_results=True,
print_results=False,
)
min_n = result["results"]["first_achieved"]["teaching_method"]
print(f"Minimum sample size: {min_n}")
Next Steps¶
Tutorial: Your First Power Analysis – Check power at a specific sample size
Tutorial: Testing Interactions – Interactions typically require larger samples
Tutorial: Correlated Predictors – Predictor correlations affect sample size requirements
Effect Sizes – Guidelines for choosing appropriate effect sizes
Scenario Analysis – Understanding the three scenarios in depth
API Reference – Full parameter documentation for
find_sample_size