Variable Types¶

What It Is¶

MCPower needs to know the statistical type of each predictor to generate realistic simulated data. The three main types are continuous (numeric values on a scale), binary (two groups, coded 0/1), and factor (categorical with three or more levels, dummy-coded).

Beyond these core types, continuous variables can follow different distributions: normal (default), right-skewed (e.g., income), left-skewed (e.g., ceiling effects), high-kurtosis (heavy tails with outliers), or uniform (evenly spread). These distribution shapes affect how robust your power estimates are to real-world data characteristics.

When you upload empirical data, MCPower auto-detects variable types based on the number of unique values. You can override any detection with explicit type declarations. String columns (e.g., “Europe”, “Japan”, “USA”) are automatically recognized as factors.

How It Works in MCPower¶

Set types manually with set_variable_type():

model.set_variable_type("treatment=binary, group=(factor,3), income=right_skewed")

Custom proportions are supported for binary and factor variables:

model.set_variable_type("drug=(binary,0.3), dose=(factor,0.2,0.5,0.3)")

Name factor levels with set_factor_levels():

model.set_factor_levels("group=control,drug_a,drug_b")  # first = reference

Guidelines¶

Distribution Types¶

Type	Syntax	When to Use
Normal	`var=normal`	Default. Symmetric bell-shaped data.
Binary	`var=binary`	Two groups (treatment/control, yes/no).
Binary (custom)	`var=(binary,0.3)`	Unequal split (30% in group 1).
Factor	`var=(factor,3)`	3+ categorical levels, equal proportions.
Factor (custom)	`var=(factor,0.2,0.5,0.3)`	Custom proportions (must sum to 1).
Right skewed	`var=right_skewed`	Income, reaction times, counts.
Left skewed	`var=left_skewed`	Ceiling effects, negatively skewed scores.
High kurtosis	`var=high_kurtosis`	Heavy-tailed data with outliers.
Uniform	`var=uniform`	Evenly spread, no clustering around mean.

Auto-Detection from Uploaded Data¶

Unique Values	Detected Type
1	Dropped (constant column)
2	Binary
3–6	Factor
7+	Continuous
String column, 2–20 unique	Factor
String column, >20 unique	Error (too many levels)

Override auto-detection with the data_types parameter in upload_data().

Factor Variables¶

Level 1 (or first sorted value from data) is the reference level.
Each non-reference level becomes a dummy variable needing its own effect size.
Use data_types={"cyl": ("factor", 8)} to pick a specific reference level.
With uploaded data, dummies use original values: cyl[6], cyl[8] (default preserve_factor_level_names=True).

Common Patterns¶

Scenario	Type String	Notes
Treatment vs. control	`treatment=binary`	Default 50/50 split
Unbalanced treatment	`treatment=(binary,0.3)`	30% treatment, 70% control
3-group comparison	`group=(factor,3)`	Equal thirds
Weighted groups	`group=(factor,0.2,0.5,0.3)`	Custom proportions
Income variable	`income=right_skewed`	Positive skew
Rating scale (1–7)	`rating=uniform`	Evenly distributed
Named conditions	`set_factor_levels("cond=placebo,low_dose,high_dose")`	Explicit level names

Variables left unset default to standard normal (continuous, mean=0, SD=1).¶

Learn More¶

Uploading Data – auto-detection, correlation preservation, named levels
ANOVA & Post-Hoc Tests – factor variables in ANOVA designs
API Reference: set_variable_type – full parameter documentation
API Reference: set_factor_levels – named levels without data