--- jupytext: text_representation: format_name: myst kernelspec: name: python3 --- # Configuration ```{code-cell} ipython3 :tags: [remove-input, remove-output] import warnings warnings.filterwarnings("ignore", message="Low simulation") import numpy as np np.random.seed(42) ``` Methods for configuring effect sizes, variable types, correlations, and simulation parameters. --- (set-effects)= ## set_effects() ```{eval-rst} .. automethod:: mcpower.MCPower.set_effects ``` ### String Format Comma-separated `name=value` pairs: ```python model.set_effects("x1=0.5, x2=0.3") ``` **Interaction effects** -- use `:` notation: ```python model.set_effects("x1=0.5, x2=0.3, x1:x2=0.2") ``` **Factor variables** -- assign different effects per level using bracket notation: ```python # Integer-indexed levels (no uploaded data or named levels) model.set_effects("group[2]=0.4, group[3]=0.6") # Named levels (after set_factor_levels or upload_data) model.set_effects("group[drug_a]=0.4, group[drug_b]=0.6") ``` ### Updating Effects After running an analysis, a new `set_effects()` updates (merges with) the previously applied effects: ```python model.set_effects("x1=0.5, x2=0.3") model.find_power(sample_size=100) model.set_effects("x2=0.4") # x1 remains 0.5, x2 is now 0.4 model.find_power(sample_size=100) # uses x1=0.5, x2=0.4 ``` ### Examples ```{code-cell} ipython3 :tags: [remove-output, remove-stderr] from mcpower import MCPower model = MCPower("y = treatment + motivation + treatment:motivation") model.set_simulations(400) model.set_variable_type("treatment=binary") model.set_effects("treatment=0.5, motivation=0.3, treatment:motivation=0.2") model.find_power(sample_size=100) ``` ### Notes - Effect sizes are **standardized** -- they represent the change in outcome (in SDs) per 1 SD change in the predictor. - For binary predictors, the effect size represents the difference between the two groups in standard deviation units (equivalent to Cohen's d). - For factor variables, each dummy's effect size represents the difference between that level and the reference level. - A common guideline: 0.2 = small, 0.5 = medium, 0.8 = large (Cohen's conventions). - The method raises `TypeError` if the argument is not a string and `ValueError` if it is empty. ### See Also - [Effect Sizes](../concepts/effect-sizes.md) -- Understanding and choosing effect sizes - {ref}`set_factor_levels() ` -- Define named factor levels for bracket notation - {ref}`set_variable_type() ` -- Declare binary/factor variables before setting effects - [Tutorial: Your First Power Analysis](../tutorials/first-analysis.md) -- Complete walkthrough including effect sizes --- (set-variable-type)= ## set_variable_type() ```{eval-rst} .. automethod:: mcpower.MCPower.set_variable_type ``` ### Supported Types | Type String | Description | Generated Distribution | |---|---|---| | `normal` | Standard normal (default) | N(0, 1) | | `binary` | Binary variable with 50/50 split | Bernoulli(0.5) | | `(binary, p)` | Binary with custom proportion | Bernoulli(p), where 0 < p < 1 | | `(factor, k)` | Factor with k levels, equal proportions | k-1 dummy variables | | `(factor, p1, p2, ..., pk)` | Factor with custom level proportions | k-1 dummies; proportions are normalized to sum to 1 | | `right_skewed` | Right-skewed (heavy right tail) | Chi-squared-like transform | | `left_skewed` | Left-skewed (heavy left tail) | Mirrored right-skew | | `high_kurtosis` | Heavy-tailed (leptokurtic) | t-distribution (df=3) | | `uniform` | Uniform distribution | U(0, 1) transformed | ### Examples ```{code-cell} ipython3 :tags: [remove-output, remove-stderr] from mcpower import MCPower # Basic type declarations model = MCPower("y = treatment + condition + income") model.set_simulations(400) model.set_variable_type("treatment=binary, condition=(factor,3), income=right_skewed") model.set_effects("treatment=0.5, condition[2]=0.3, condition[3]=0.4, income=0.2") model.find_power(sample_size=150) ``` **Binary with custom proportion:** ```python model.set_variable_type("treatment=(binary,0.3)") # 30% in treatment group ``` **Factor with equal proportions:** ```python model.set_variable_type("condition=(factor,3)") # 3 levels, ~33% each ``` **Factor with custom proportions:** ```python # 3 levels with proportions 20%, 50%, 30% -- must sum to 1.0 model.set_variable_type("group=(factor,0.2,0.5,0.3)") ``` **Updating types** -- calling again updates existing entries without clearing others: ```python model.set_variable_type("x1=binary, x2=right_skewed") model.set_variable_type("x2=normal") # x1 remains binary, x2 is now normal ``` ### Notes - Factor variables create k-1 dummy variables (level 1 is the reference by default). After declaring a factor, use bracket notation in `set_effects()` to assign effects to each dummy. - When `upload_data()` is used, variable types are auto-detected and typically do not need to be set manually. Use `set_variable_type()` to override auto-detection. - Validation of types and proportions happens when `find_power()` or `find_sample_size()` is called. ### See Also - [Variable Types](../concepts/variable-types.md) -- Full guide to variable types and distributions - {ref}`set_effects() ` -- Setting effect sizes for typed variables - {ref}`set_factor_levels() ` -- Define named factor levels - {ref}`upload_data() ` -- Automatic type detection from empirical data --- (set-correlations)= ## set_correlations() ```{eval-rst} .. automethod:: mcpower.MCPower.set_correlations ``` ### Input Formats **String format -- full syntax:** ```python model.set_correlations("corr(x1, x2)=0.3, corr(x1, x3)=-0.2") ``` **String format -- shorthand** (the `corr()` wrapper is optional): ```python model.set_correlations("(x1, x2)=0.3, (x1, x3)=-0.2") ``` **NumPy matrix** -- dimensions must match the number of non-factor predictors, in formula order: ```python import numpy as np # For a model with predictors x1, x2, x3 (all continuous) model.set_correlations(np.array([ [1.0, 0.3, -0.2], [0.3, 1.0, 0.1], [-0.2, 0.1, 1.0], ])) ``` ### Correlation Values - Valid range: -1 to 1 (exclusive of exact -1 and 1 for off-diagonal entries) - Diagonal entries must be 1.0 (for matrix input) - The matrix must be symmetric - The matrix must be **positive semi-definite** (PSD) -- MCPower validates this and raises an error if the matrix is not PSD ### Examples ```{code-cell} ipython3 :tags: [remove-output, remove-stderr] from mcpower import MCPower model = MCPower("y = x1 + x2 + x3") model.set_simulations(400) model.set_effects("x1=0.5, x2=0.3, x3=0.2") model.set_correlations("(x1, x2)=0.4, (x2, x3)=0.2") model.find_power(sample_size=100) ``` ### Notes - **Factor variables cannot be correlated.** Correlations are defined only between continuous and binary predictors. - Unspecified pairs default to zero correlation (independence). - When using `upload_data()` with `preserve_correlation="partial"`, correlations are computed from the data and merged with any user-specified values. With `preserve_correlation="strict"` (the default), the full row-bootstrap approach preserves the empirical correlation structure automatically. ### See Also - [Correlations](../concepts/correlations.md) -- Full guide to predictor correlations - {ref}`upload_data() ` -- Automatic correlation preservation from empirical data - [Tutorial: Your First Power Analysis](../tutorials/first-analysis.md) -- Adding correlations to a model --- (set-alpha)= ## set_alpha() ```{eval-rst} .. automethod:: mcpower.MCPower.set_alpha ``` ### Common Alpha Levels | Alpha | Use Case | |---|---| | 0.05 | Standard threshold (default) | | 0.01 | Stricter threshold, common in some fields | | 0.005 | Proposed "redefine statistical significance" threshold | | 0.10 | Exploratory research, pilot studies | ### Examples ```{code-cell} ipython3 :tags: [remove-output, remove-stderr] from mcpower import MCPower # Use stricter significance threshold model = MCPower("y = x1 + x2") model.set_simulations(400) model.set_effects("x1=0.5, x2=0.3") model.set_alpha(0.01) model.find_power(sample_size=100) ``` ```{code-cell} ipython3 :tags: [remove-output] # Chained model = ( MCPower("y = x1 + x2") .set_effects("x1=0.5, x2=0.3") .set_alpha(0.01) ) ``` --- (set-power)= ## set_power() ```{eval-rst} .. automethod:: mcpower.MCPower.set_power ``` ### Common Power Targets | Power | Use Case | |---|---| | 80% | Standard target (default). Accepted in most fields. | | 90% | Higher confidence. Common for clinical trials and well-funded studies. | | 95% | Very conservative. Requires substantially larger samples. | ### Examples ```{code-cell} ipython3 :tags: [remove-output, remove-stderr] from mcpower import MCPower # Require 90% power instead of the default 80% model = MCPower("y = x1 + x2") model.set_simulations(400) model.set_effects("x1=0.5, x2=0.3") model.set_power(90) model.find_sample_size(from_size=50, to_size=300, by=30) ``` --- (set-seed)= ## set_seed() ```{eval-rst} .. automethod:: mcpower.MCPower.set_seed ``` ### Examples ```{code-cell} ipython3 :tags: [remove-output, remove-stderr] from mcpower import MCPower model = MCPower("y = x1 + x2") model.set_simulations(400) model.set_effects("x1=0.5, x2=0.3") # Reproducible results model.set_seed(42) model.find_power(sample_size=100) # Always produces the same output ``` ```{code-cell} ipython3 :tags: [remove-output, remove-stderr] # Random seeding (different results each run) model.set_seed(None) model.find_power(sample_size=100) ``` ### Notes - The C++ backend uses `std::mt19937`, while Python uses `numpy.random`. The same seed produces **different** random sequences across backends, but statistical properties (power estimates) are comparable. - The default seed is `2137`. Change it if you want a different reproducible sequence. --- (set-simulations)= ## set_simulations() ```{eval-rst} .. automethod:: mcpower.MCPower.set_simulations ``` ### Default Simulation Counts | Model Type | Default Count | |---|---| | OLS (linear regression) | 1,600 | | Mixed-effects models | 800 | Mixed-effects models use fewer simulations by default because each simulation is more computationally expensive (LME fitting vs. OLS). ### Precision vs. Runtime | Simulations | Approximate SE of Power Estimate | Use Case | |---|---|---| | 400 | ~2.5% | Quick exploration | | 800 | ~1.8% | Mixed-model default | | 1,600 | ~1.2% | OLS default; good for most analyses | | 5,000 | ~0.7% | High-precision estimates | | 10,000 | ~0.5% | Publication-quality precision | The standard error of a power estimate at 80% power is approximately `sqrt(0.8 * 0.2 / n_sims)`. ### Examples ```{code-cell} ipython3 :tags: [remove-output, remove-stderr] from mcpower import MCPower # Set both OLS and mixed to the same count model = MCPower("y = x1 + x2") model.set_simulations(3200) model.set_effects("x1=0.5, x2=0.3") model.find_power(sample_size=100) ``` ```{code-cell} ipython3 :tags: [remove-output] # Set OLS and mixed independently model = MCPower("satisfaction ~ treatment + (1|school)") model.set_simulations(2000, model_type="linear") model.set_simulations(1000, model_type="mixed") # Method chaining model = ( MCPower("y = x1 + x2") .set_simulations(3200) .set_effects("x1=0.5, x2=0.3") ) ``` --- (set-parallel)= ## set_parallel() ```{eval-rst} .. automethod:: mcpower.MCPower.set_parallel ``` ### Parallelization Modes | Value | Behavior | |---|---| | `True` | Parallel processing for all analyses (OLS and mixed). | | `False` | Sequential processing only. | | `"mixedmodels"` | Parallel only for mixed-model analyses; OLS stays sequential. **(Default)** | ### When to Use Each Mode | Scenario | Recommended Mode | Reasoning | |---|---|---| | OLS with default 1,600 sims | `"mixedmodels"` (default) | C++ backend is fast enough; parallel overhead not worthwhile. | | OLS with 5,000+ sims | `True` | High simulation count justifies parallelization overhead. | | Mixed models | `"mixedmodels"` (default) | LME fitting is expensive; parallel processing helps substantially. | | Debugging / profiling | `False` | Sequential execution is easier to reason about and profile. | | Resource-constrained environment | `False` or low `n_cores` | Avoid saturating shared machines. | ### Examples ```{code-cell} ipython3 :tags: [remove-output, remove-stderr] from mcpower import MCPower # Default: parallel for mixed models only model = MCPower("y ~ treatment + (1|school)") model.set_simulations(400) model.set_cluster("school", ICC=0.2, n_clusters=20) model.set_effects("treatment=0.5") model.find_power(sample_size=1000) # Runs in parallel automatically ``` ```{code-cell} ipython3 :tags: [remove-output, remove-stderr] # Force parallel for OLS (useful with very high simulation counts) model = MCPower("y = x1 + x2 + x3 + x4 + x5") model.set_effects("x1=0.5, x2=0.3, x3=0.2, x4=0.1, x5=0.4") model.set_simulations(10000) model.set_parallel(True, n_cores=4) model.find_power(sample_size=200) ``` ```{code-cell} ipython3 :tags: [remove-output] # Disable parallelization entirely model.set_parallel(False) ``` ### Notes on n_cores - When `n_cores` is `None`, MCPower uses half the available CPU cores (`cpu_count // 2`). - Setting `n_cores=1` is equivalent to `enabled=False`. - Using more cores than physically available provides no benefit and may hurt performance due to context-switching overhead. - Requires `joblib` to be installed. If `joblib` is not available, MCPower falls back to sequential processing with a warning. ### See Also - [Performance & Backends](../concepts/performance.md) -- Runtime considerations - [Mixed-Effects Models](../concepts/mixed-effects.md) -- Complete guide to mixed-effects support