Configuration

Methods for configuring effect sizes, variable types, correlations, and simulation parameters.


set_effects()

MCPower.set_effects(effects_string)[source]

Set standardised effect sizes for predictors.

Effect sizes are expressed as standardised regression coefficients (beta weights). Each assignment maps an effect name to its size. Interaction effects use : notation. For factor variables, specify effects for each dummy level with bracket notation.

This setting is deferred until apply() is called.

Parameters:

effects_string (str) – Comma-separated name=value pairs. Examples: "x1=0.5, x2=0.3, x1:x2=0.2", "treatment=0.4, cyl[2]=0.2, cyl[3]=0.5".

Returns:

For method chaining.

Return type:

self

Raises:
  • TypeError – If effects_string is not a string.

  • ValueError – If effects_string is empty or contains invalid assignments (checked at apply time).

String Format

Comma-separated name=value pairs:

model.set_effects("x1=0.5, x2=0.3")

Interaction effects – use : notation:

model.set_effects("x1=0.5, x2=0.3, x1:x2=0.2")

Factor variables – assign different effects per level using bracket notation:

# Integer-indexed levels (no uploaded data or named levels)
model.set_effects("group[2]=0.4, group[3]=0.6")

# Named levels (after set_factor_levels or upload_data)
model.set_effects("group[drug_a]=0.4, group[drug_b]=0.6")

Updating Effects

After running an analysis, a new set_effects() updates (merges with) the previously applied effects:

model.set_effects("x1=0.5, x2=0.3")
model.find_power(sample_size=100)
model.set_effects("x2=0.4")  # x1 remains 0.5, x2 is now 0.4
model.find_power(sample_size=100)  # uses x1=0.5, x2=0.4

Examples

from mcpower import MCPower

model = MCPower("y = treatment + motivation + treatment:motivation")
model.set_simulations(400)
model.set_variable_type("treatment=binary")
model.set_effects("treatment=0.5, motivation=0.3, treatment:motivation=0.2")
model.find_power(sample_size=100)

Notes

  • Effect sizes are standardized – they represent the change in outcome (in SDs) per 1 SD change in the predictor.

  • For binary predictors, the effect size represents the difference between the two groups in standard deviation units (equivalent to Cohen’s d).

  • For factor variables, each dummy’s effect size represents the difference between that level and the reference level.

  • A common guideline: 0.2 = small, 0.5 = medium, 0.8 = large (Cohen’s conventions).

  • The method raises TypeError if the argument is not a string and ValueError if it is empty.

See Also


set_variable_type()

MCPower.set_variable_type(variable_types_string)[source]

Set distribution types for predictor variables.

Variables default to "normal" (standard Gaussian). Use this method to specify alternative distributions.

This setting is deferred until apply() is called.

Parameters:

variable_types_string (str) –

Comma-separated name=type assignments. Supported types:

  • "normal" — standard normal (default).

  • "binary" or "binary(p)" — Bernoulli with proportion p (default 0.5).

  • "right_skewed" — positively skewed distribution.

  • "left_skewed" — negatively skewed distribution.

  • "high_kurtosis" — heavy-tailed (t-distribution, df=3).

  • "uniform" — uniform distribution.

  • "factor(k)" — categorical with k levels (creates k-1 dummy variables).

  • "factor(k, p1, p2, ...)" — factor with custom level proportions.

Example: "x1=binary, x2=right_skewed, x3=factor(3)".

Returns:

For method chaining.

Return type:

self

Raises:
  • TypeError – If variable_types_string is not a string.

  • ValueError – If types are unrecognised or proportions invalid (checked at apply time).

Supported Types

Type String

Description

Generated Distribution

normal

Standard normal (default)

N(0, 1)

binary

Binary variable with 50/50 split

Bernoulli(0.5)

(binary, p)

Binary with custom proportion

Bernoulli(p), where 0 < p < 1

(factor, k)

Factor with k levels, equal proportions

k-1 dummy variables

(factor, p1, p2, ..., pk)

Factor with custom level proportions

k-1 dummies; proportions are normalized to sum to 1

right_skewed

Right-skewed (heavy right tail)

Chi-squared-like transform

left_skewed

Left-skewed (heavy left tail)

Mirrored right-skew

high_kurtosis

Heavy-tailed (leptokurtic)

t-distribution (df=3)

uniform

Uniform distribution

U(0, 1) transformed

Examples

from mcpower import MCPower

# Basic type declarations
model = MCPower("y = treatment + condition + income")
model.set_simulations(400)
model.set_variable_type("treatment=binary, condition=(factor,3), income=right_skewed")
model.set_effects("treatment=0.5, condition[2]=0.3, condition[3]=0.4, income=0.2")
model.find_power(sample_size=150)

Binary with custom proportion:

model.set_variable_type("treatment=(binary,0.3)")  # 30% in treatment group

Factor with equal proportions:

model.set_variable_type("condition=(factor,3)")  # 3 levels, ~33% each

Factor with custom proportions:

# 3 levels with proportions 20%, 50%, 30% -- must sum to 1.0
model.set_variable_type("group=(factor,0.2,0.5,0.3)")

Updating types – calling again updates existing entries without clearing others:

model.set_variable_type("x1=binary, x2=right_skewed")
model.set_variable_type("x2=normal")  # x1 remains binary, x2 is now normal

Notes

  • Factor variables create k-1 dummy variables (level 1 is the reference by default). After declaring a factor, use bracket notation in set_effects() to assign effects to each dummy.

  • When upload_data() is used, variable types are auto-detected and typically do not need to be set manually. Use set_variable_type() to override auto-detection.

  • Validation of types and proportions happens when find_power() or find_sample_size() is called.

See Also


set_correlations()

MCPower.set_correlations(correlations_input)[source]

Set correlations between predictor variables.

Correlations are only defined for non-factor (continuous/binary) predictors. Factor dummies are generated independently.

This setting is deferred until apply() is called.

Parameters:

correlations_input – Either a comma-separated string of pair-wise assignments (e.g. "x1:x2=0.3, x1:x3=-0.1") or a full NumPy correlation matrix whose dimensions match the number of non-factor predictors.

Returns:

For method chaining.

Return type:

self

Raises:
  • TypeError – If correlations_input is not a string or ndarray.

  • ValueError – If the matrix is not positive semi-definite or has wrong dimensions (checked at apply time).

Input Formats

String format – full syntax:

model.set_correlations("corr(x1, x2)=0.3, corr(x1, x3)=-0.2")

String format – shorthand (the corr() wrapper is optional):

model.set_correlations("(x1, x2)=0.3, (x1, x3)=-0.2")

NumPy matrix – dimensions must match the number of non-factor predictors, in formula order:

import numpy as np

# For a model with predictors x1, x2, x3 (all continuous)
model.set_correlations(np.array([
    [1.0, 0.3, -0.2],
    [0.3, 1.0,  0.1],
    [-0.2, 0.1, 1.0],
]))

Correlation Values

  • Valid range: -1 to 1 (exclusive of exact -1 and 1 for off-diagonal entries)

  • Diagonal entries must be 1.0 (for matrix input)

  • The matrix must be symmetric

  • The matrix must be positive semi-definite (PSD) – MCPower validates this and raises an error if the matrix is not PSD

Examples

from mcpower import MCPower

model = MCPower("y = x1 + x2 + x3")
model.set_simulations(400)
model.set_effects("x1=0.5, x2=0.3, x3=0.2")
model.set_correlations("(x1, x2)=0.4, (x2, x3)=0.2")
model.find_power(sample_size=100)

Notes

  • Factor variables cannot be correlated. Correlations are defined only between continuous and binary predictors.

  • Unspecified pairs default to zero correlation (independence).

  • When using upload_data() with preserve_correlation="partial", correlations are computed from the data and merged with any user-specified values. With preserve_correlation="strict" (the default), the full row-bootstrap approach preserves the empirical correlation structure automatically.

See Also


set_alpha()

MCPower.set_alpha(alpha)[source]

Set the significance level for hypothesis testing.

Parameters:

alpha (float) – Type-I error rate (0–0.25). Default is 0.05.

Returns:

For method chaining.

Return type:

self

Raises:

ValueError – If alpha is outside the valid range.

Common Alpha Levels

Alpha

Use Case

0.05

Standard threshold (default)

0.01

Stricter threshold, common in some fields

0.005

Proposed “redefine statistical significance” threshold

0.10

Exploratory research, pilot studies

Examples

from mcpower import MCPower

# Use stricter significance threshold
model = MCPower("y = x1 + x2")
model.set_simulations(400)
model.set_effects("x1=0.5, x2=0.3")
model.set_alpha(0.01)
model.find_power(sample_size=100)
# Chained
model = (
    MCPower("y = x1 + x2")
    .set_effects("x1=0.5, x2=0.3")
    .set_alpha(0.01)
)

set_power()

MCPower.set_power(power)[source]

Set the target statistical power level.

Used by find_sample_size to determine when power is sufficient.

Parameters:

power (float) – Target power as a percentage (0–100). Default is 80.

Returns:

For method chaining.

Return type:

self

Raises:

ValueError – If power is outside the valid range.

Common Power Targets

Power

Use Case

80%

Standard target (default). Accepted in most fields.

90%

Higher confidence. Common for clinical trials and well-funded studies.

95%

Very conservative. Requires substantially larger samples.

Examples

from mcpower import MCPower

# Require 90% power instead of the default 80%
model = MCPower("y = x1 + x2")
model.set_simulations(400)
model.set_effects("x1=0.5, x2=0.3")
model.set_power(90)
model.find_sample_size(from_size=50, to_size=300, by=30)

set_seed()

MCPower.set_seed(seed=None)[source]

Set random seed for reproducibility.

Parameters:

seed (int | None) – Non-negative integer up to 3,000,000,000. Pass None to enable fully random seeding.

Returns:

For method chaining.

Return type:

self

Raises:
  • TypeError – If seed is not an integer or None.

  • ValueError – If seed is negative or exceeds the maximum.

Examples

from mcpower import MCPower

model = MCPower("y = x1 + x2")
model.set_simulations(400)
model.set_effects("x1=0.5, x2=0.3")

# Reproducible results
model.set_seed(42)
model.find_power(sample_size=100)  # Always produces the same output
# Random seeding (different results each run)
model.set_seed(None)
model.find_power(sample_size=100)

Notes

  • The C++ backend uses std::mt19937, while Python uses numpy.random. The same seed produces different random sequences across backends, but statistical properties (power estimates) are comparable.

  • The default seed is 2137. Change it if you want a different reproducible sequence.


set_simulations()

MCPower.set_simulations(n_simulations, model_type=None)[source]

Set the number of Monte Carlo simulations.

More simulations yield more precise power estimates at the cost of longer runtime. The default is 1600 for OLS and 800 for mixed models.

Parameters:
  • n_simulations (int) – Number of simulations (positive integer).

  • model_type (str | None) – Which simulation count to update: - None (default): sets both OLS and mixed-model counts. - "linear": sets only the OLS count. - "mixed": sets only the mixed-model count.

Returns:

For method chaining.

Return type:

self

Raises:

ValueError – If n_simulations is not a positive integer or model_type is unrecognised.

Default Simulation Counts

Model Type

Default Count

OLS (linear regression)

1,600

Mixed-effects models

800

Mixed-effects models use fewer simulations by default because each simulation is more computationally expensive (LME fitting vs. OLS).

Precision vs. Runtime

Simulations

Approximate SE of Power Estimate

Use Case

400

~2.5%

Quick exploration

800

~1.8%

Mixed-model default

1,600

~1.2%

OLS default; good for most analyses

5,000

~0.7%

High-precision estimates

10,000

~0.5%

Publication-quality precision

The standard error of a power estimate at 80% power is approximately sqrt(0.8 * 0.2 / n_sims).

Examples

from mcpower import MCPower

# Set both OLS and mixed to the same count
model = MCPower("y = x1 + x2")
model.set_simulations(3200)
model.set_effects("x1=0.5, x2=0.3")
model.find_power(sample_size=100)
# Set OLS and mixed independently
model = MCPower("satisfaction ~ treatment + (1|school)")
model.set_simulations(2000, model_type="linear")
model.set_simulations(1000, model_type="mixed")

# Method chaining
model = (
    MCPower("y = x1 + x2")
    .set_simulations(3200)
    .set_effects("x1=0.5, x2=0.3")
)

set_parallel()

MCPower.set_parallel(enable=True, n_cores=None)[source]

Enable or disable parallel processing.

Requires joblib to be installed. Falls back to sequential processing with a warning if joblib is unavailable.

Parameters:
  • enable (bool | str) – Parallel mode — True for all analyses, False for sequential processing, or "mixedmodels" for mixed-model analyses only (default).

  • n_cores (int | None) – Number of CPU cores to use. Defaults to cpu_count // 2.

Returns:

For method chaining.

Return type:

self

Parallelization Modes

Value

Behavior

True

Parallel processing for all analyses (OLS and mixed).

False

Sequential processing only.

"mixedmodels"

Parallel only for mixed-model analyses; OLS stays sequential. (Default)

When to Use Each Mode

Scenario

Recommended Mode

Reasoning

OLS with default 1,600 sims

"mixedmodels" (default)

C++ backend is fast enough; parallel overhead not worthwhile.

OLS with 5,000+ sims

True

High simulation count justifies parallelization overhead.

Mixed models

"mixedmodels" (default)

LME fitting is expensive; parallel processing helps substantially.

Debugging / profiling

False

Sequential execution is easier to reason about and profile.

Resource-constrained environment

False or low n_cores

Avoid saturating shared machines.

Examples

from mcpower import MCPower

# Default: parallel for mixed models only
model = MCPower("y ~ treatment + (1|school)")
model.set_simulations(400)
model.set_cluster("school", ICC=0.2, n_clusters=20)
model.set_effects("treatment=0.5")
model.find_power(sample_size=1000)  # Runs in parallel automatically
# Force parallel for OLS (useful with very high simulation counts)
model = MCPower("y = x1 + x2 + x3 + x4 + x5")
model.set_effects("x1=0.5, x2=0.3, x3=0.2, x4=0.1, x5=0.4")
model.set_simulations(10000)
model.set_parallel(True, n_cores=4)
model.find_power(sample_size=200)
# Disable parallelization entirely
model.set_parallel(False)

Notes on n_cores

  • When n_cores is None, MCPower uses half the available CPU cores (cpu_count // 2).

  • Setting n_cores=1 is equivalent to enabled=False.

  • Using more cores than physically available provides no benefit and may hurt performance due to context-switching overhead.

  • Requires joblib to be installed. If joblib is not available, MCPower falls back to sequential processing with a warning.

See Also