--- jupytext: text_representation: format_name: myst kernelspec: name: python3 --- # Tutorial: Using Custom Scenarios **Goal:** You want to customize the scenario analysis to match the specific assumption violations you expect in your field. ```{code-cell} ipython3 :tags: [remove-input, remove-output] import numpy as np np.random.seed(42) import warnings warnings.filterwarnings("ignore", message="Low simulation") ``` --- ## Table of Contents - [Full Working Example](#full-working-example) - [Step-by-Step Walkthrough](#step-by-step-walkthrough) - [Output Interpretation](#output-interpretation) - [Adding Entirely New Scenarios](#adding-entirely-new-scenarios) - [Custom Scenarios for Mixed Models](#custom-scenarios-for-mixed-models) - [Common Patterns](#common-patterns) - [Next Steps](#next-steps) --- ## Full Working Example You are planning a study in educational psychology where heterogeneity in effect sizes is a major concern but distribution violations are unlikely (you have good measurement instruments). The default scenarios apply too much distribution noise and too little effect heterogeneity for your situation. ```{code-cell} ipython3 :tags: [remove-stderr] from mcpower import MCPower # 1. Define the model model = MCPower("achievement = teaching_method + motivation + ses") model.set_simulations(400) model.set_variable_type("teaching_method=binary") model.set_effects("teaching_method=0.40, motivation=0.30, ses=0.20") # 2. Customize scenarios for your field model.set_scenario_configs({ "realistic": { "heterogeneity": 0.35, # More effect heterogeneity "distribution_change_prob": 0.10, # Less distribution noise }, "doomer": { "heterogeneity": 0.60, # Much more effect heterogeneity "distribution_change_prob": 0.20, # Still mild distribution noise "correlation_noise_sd": 0.3, # Moderate correlation instability }, }) # 3. Find required sample sizes under custom scenarios model.find_sample_size( target_test="teaching_method", from_size=50, to_size=400, by=40, scenarios=True, ) ``` --- ## Step-by-Step Walkthrough ### Lines 1-3: Model setup ```{code-cell} ipython3 :tags: [remove-output] model = MCPower("achievement = teaching_method + motivation + ses") model.set_simulations(400) model.set_variable_type("teaching_method=binary") model.set_effects("teaching_method=0.40, motivation=0.30, ses=0.20") ``` A standard three-predictor model. Nothing changes about model setup when using custom scenarios. ### Line 4: Customize scenario parameters ```{code-cell} ipython3 :tags: [remove-output] model.set_scenario_configs({ "realistic": { "heterogeneity": 0.35, "distribution_change_prob": 0.10, }, "doomer": { "heterogeneity": 0.60, "distribution_change_prob": 0.20, "correlation_noise_sd": 0.3, }, }) ``` This is the key step. `set_scenario_configs()` takes a dictionary mapping scenario names to configuration overrides. The merge rules are: 1. **You only specify what you want to change.** Any key you omit keeps its default value. 2. **Existing scenarios are updated.** Here, `"realistic"` and `"doomer"` already exist, so only the specified keys are overridden. 3. **The optimistic scenario is never modified.** It always uses your exact settings with zero perturbations. For example, the realistic scenario above changes `heterogeneity` from 0.2 (default) to 0.35 and `distribution_change_prob` from 0.3 to 0.10, but keeps `heteroskedasticity` at 0.1, `correlation_noise_sd` at 0.2, and all other keys at their default values. ### Line 5: Run with scenarios ```{code-cell} ipython3 :tags: [remove-stderr] model.find_sample_size( target_test="teaching_method", from_size=50, to_size=400, by=40, scenarios=True, ) ``` The `scenarios=True` flag runs the analysis under all configured scenarios (optimistic, realistic, doomer, plus any custom scenarios you've added). This works the same way with `find_power()`. You can also pass a list of scenario names to run only specific ones — see [Running Selected Scenarios](#running-selected-scenarios). --- ## Output Interpretation ``` Test Optimistic Realistic Doomer ------------------------------------------------------------------------------- teaching_method 80 100 160 ``` - **Optimistic (N=80):** Under ideal conditions, 80 participants suffice. - **Realistic (N=100):** With your field-specific heterogeneity, you need 100 participants. - **Doomer (N=160):** Under worst-case conditions, you need 160 participants. **Recommendation:** Plan for N=100 (the realistic estimate). If the study is high-stakes, plan for N=160 to be safe under worst-case conditions. --- ## Adding Entirely New Scenarios You are not limited to the three built-in scenarios. Add custom scenarios with any name: ```{code-cell} ipython3 :tags: [remove-input, remove-output] # Setup model for new scenario examples model = MCPower("outcome = treatment + covariate") model.set_simulations(400) model.set_variable_type("treatment=binary") model.set_effects("treatment=0.40, covariate=0.25") ``` ```{code-cell} ipython3 :tags: [remove-stderr] model.set_scenario_configs({ "field_specific": { "heterogeneity": 0.30, "heteroskedasticity": 0.05, "correlation_noise_sd": 0.15, "distribution_change_prob": 0.10, }, }) model.find_power(sample_size=150, target_test="treatment", scenarios=True) # Runs: optimistic, realistic, doomer, AND field_specific ``` New scenarios inherit all keys from the optimistic baseline (all zeros), so you only specify the perturbations you want. The analysis now runs four scenarios instead of three. ### Multiple custom scenarios ```{code-cell} ipython3 :tags: [remove-stderr] model.set_scenario_configs({ "mild": { "heterogeneity": 0.10, "heteroskedasticity": 0.05, }, "moderate": { "heterogeneity": 0.25, "heteroskedasticity": 0.15, "correlation_noise_sd": 0.20, }, "severe": { "heterogeneity": 0.50, "heteroskedasticity": 0.30, "correlation_noise_sd": 0.40, "distribution_change_prob": 0.50, }, }) model.find_power(sample_size=200, scenarios=True) # Runs: optimistic, realistic, doomer, mild, moderate, AND severe ``` Note that the built-in realistic and doomer scenarios still run alongside your custom ones. ### Running Selected Scenarios You don't have to run all scenarios every time. Pass a list of scenario names (case-insensitive) to run only the ones you need: ```{code-cell} ipython3 :tags: [remove-stderr] # Run only optimistic and your custom "severe" scenario model.find_power(sample_size=200, scenarios=["optimistic", "severe"]) ``` This saves computation when you only want to compare specific scenarios. Unknown names raise a `ValueError` listing all available scenarios. --- ## Custom Scenarios for Mixed Models Mixed models have additional perturbation keys that control random effect and ICC violations. These keys are ignored for standard OLS models. ```{code-cell} ipython3 :tags: [remove-output] model = MCPower("score ~ treatment + (1|school)") model.set_simulations(400) model.set_cluster("school", ICC=0.15, n_clusters=25) model.set_effects("treatment=0.40") model.set_scenario_configs({ "realistic": { "heterogeneity": 0.25, "icc_noise_sd": 0.20, # More ICC instability "random_effect_dist": "normal", # Keep random effects normal }, "doomer": { "heterogeneity": 0.45, "icc_noise_sd": 0.40, # Heavy ICC instability "random_effect_dist": "heavy_tailed", "random_effect_df": 3, # Very heavy tails "residual_change_prob": 0.6, "residual_df": 4, }, }) ``` ```{code-cell} ipython3 :tags: [remove-output, remove-stderr] model.find_power(sample_size=750, scenarios=True) ``` ### Mixed model configuration keys | Key | What It Does | Default (Realistic) | Default (Doomer) | |---|---|---|---| | `icc_noise_sd` | Jitters the ICC per simulation (multiplicative noise on tau-squared) | 0.15 | 0.30 | | `random_effect_dist` | Distribution of random effects: `"normal"`, `"heavy_tailed"`, or `"skewed"` | `"heavy_tailed"` | `"heavy_tailed"` | | `random_effect_df` | Degrees of freedom when random effects are non-normal | 5 | 3 | Lower `random_effect_df` means heavier tails (more extreme cluster deviations). These keys are silently ignored for models without cluster specifications. --- ## Common Patterns ### When effect sizes are well-established If your field has strong meta-analytic estimates and the main concern is non-normality: ```{code-cell} ipython3 :tags: [remove-output] model.set_scenario_configs({ "realistic": { "heterogeneity": 0.05, # Effects are well-known "residual_change_prob": 0.5, # But distributions may be non-normal "residual_df": 8, }, "doomer": { "heterogeneity": 0.10, "residual_change_prob": 0.8, "residual_df": 4, }, }) ``` ### When correlations are uncertain If predictor correlations come from a small pilot study: ```{code-cell} ipython3 :tags: [remove-output] model.set_scenario_configs({ "realistic": { "correlation_noise_sd": 0.35, # More noise than default (0.2) }, "doomer": { "correlation_noise_sd": 0.50, # Much more noise than default (0.4) }, }) ``` ### Disabling a specific perturbation Set a key to its optimistic value (usually 0.0 or `"normal"`) to disable it: ```{code-cell} ipython3 :tags: [remove-output] model.set_scenario_configs({ "realistic": { "distribution_change_prob": 0.0, # No distribution swaps "heteroskedasticity": 0.0, # No heteroskedasticity }, }) ``` ### Combine custom scenarios with uploaded data Custom scenarios work alongside `upload_data()`. The scenario perturbations (correlation noise, distribution swaps) are applied on top of the data-driven generation: ```python import pandas as pd data = pd.read_csv("pilot_data.csv") model = MCPower("outcome = treatment + age + severity") model.upload_data(data[["age", "severity"]]) model.set_variable_type("treatment=binary") model.set_effects("treatment=0.50, age=0.20, severity=0.30") model.set_scenario_configs({ "realistic": {"heterogeneity": 0.30}, }) model.find_sample_size( target_test="treatment", from_size=50, to_size=300, scenarios=True, ) ``` --- ## Configuration Reference All available configuration keys with their defaults: | Key | Type | Optimistic | Realistic | Doomer | Description | |---|---|---|---|---|---| | `heterogeneity` | float | 0.0 | 0.2 | 0.4 | SD of per-simulation effect size multiplier | | `heteroskedasticity` | float | 0.0 | 0.15 | 0.35 | Correlation between predicted values and error SD | | `correlation_noise_sd` | float | 0.0 | 0.15 | 0.30 | SD of noise added to predictor correlations | | `distribution_change_prob` | float | 0.0 | 0.5 | 0.8 | Probability a variable's distribution is swapped | | `new_distributions` | list | `["right_skewed", "left_skewed", "uniform"]` | same | same | Pool of replacement distributions | | `residual_change_prob` | float | 0.0 | 0.5 | 0.8 | Probability residuals use non-normal distribution | | `residual_df` | int | 10 | 8 | 5 | Degrees of freedom for non-normal residuals | | `residual_dists` | list | `["heavy_tailed", "skewed"]` | same | same | Pool of non-normal residual distributions | | `icc_noise_sd` | float | 0.0 | 0.15 | 0.30 | SD of multiplicative noise on ICC (mixed models only) | | `random_effect_dist` | str | `"normal"` | `"heavy_tailed"` | `"heavy_tailed"` | Random effect distribution (mixed models only) | | `random_effect_df` | int | 5 | 10 | 3 | DF for non-normal random effects (mixed models only) | --- ## Next Steps - **[Scenario Analysis](../concepts/scenario-analysis.md)** -- Conceptual guide to how scenarios work and when to use them - **[Tutorial: Your First Power Analysis](first-analysis.md)** -- Basics of `find_power` with a simple scenario example - **[Tutorial: Finding the Right Sample Size](sample-size.md)** -- Systematic sample size search - **[Tutorial: Mixed Models - Random Intercepts](mixed-intercepts.md)** -- Cluster-based models where mixed model scenario keys apply - **[API: set_scenario_configs](../api/scenarios.md)** -- Full parameter reference - **[API Reference](../api/index.md)** -- All configuration methods