Variable Types¶
What It Is¶
MCPower needs to know the statistical type of each predictor to generate realistic simulated data. The three main types are continuous (numeric values on a scale), binary (two groups, coded 0/1), and factor (categorical with three or more levels, dummy-coded).
Beyond these core types, continuous variables can follow different distributions: normal (default), right-skewed (e.g., income), left-skewed (e.g., ceiling effects), high-kurtosis (heavy tails with outliers), or uniform (evenly spread). These distribution shapes affect how robust your power estimates are to real-world data characteristics.
When you upload empirical data, MCPower auto-detects variable types based on the number of unique values. You can override any detection with explicit type declarations. String columns (e.g., “Europe”, “Japan”, “USA”) are automatically recognized as factors.
How It Works in MCPower¶
Set types manually with set_variable_type():
model.set_variable_type("treatment=binary, group=(factor,3), income=right_skewed")
Custom proportions are supported for binary and factor variables:
model.set_variable_type("drug=(binary,0.3), dose=(factor,0.2,0.5,0.3)")
Name factor levels with set_factor_levels():
model.set_factor_levels("group=control,drug_a,drug_b") # first = reference
Guidelines¶
Distribution Types¶
Type |
Syntax |
When to Use |
|---|---|---|
Normal |
|
Default. Symmetric bell-shaped data. |
Binary |
|
Two groups (treatment/control, yes/no). |
Binary (custom) |
|
Unequal split (30% in group 1). |
Factor |
|
3+ categorical levels, equal proportions. |
Factor (custom) |
|
Custom proportions (must sum to 1). |
Right skewed |
|
Income, reaction times, counts. |
Left skewed |
|
Ceiling effects, negatively skewed scores. |
High kurtosis |
|
Heavy-tailed data with outliers. |
Uniform |
|
Evenly spread, no clustering around mean. |
Auto-Detection from Uploaded Data¶
Unique Values |
Detected Type |
|---|---|
1 |
Dropped (constant column) |
2 |
Binary |
3–6 |
Factor |
7+ |
Continuous |
String column, 2–20 unique |
Factor |
String column, >20 unique |
Error (too many levels) |
Override auto-detection with the data_types parameter in upload_data().
Factor Variables¶
Level 1 (or first sorted value from data) is the reference level.
Each non-reference level becomes a dummy variable needing its own effect size.
Use
data_types={"cyl": ("factor", 8)}to pick a specific reference level.With uploaded data, dummies use original values:
cyl[6],cyl[8](defaultpreserve_factor_level_names=True).
Common Patterns¶
Scenario |
Type String |
Notes |
|---|---|---|
Treatment vs. control |
|
Default 50/50 split |
Unbalanced treatment |
|
30% treatment, 70% control |
3-group comparison |
|
Equal thirds |
Weighted groups |
|
Custom proportions |
Income variable |
|
Positive skew |
Rating scale (1–7) |
|
Evenly distributed |
Named conditions |
|
Explicit level names |
Variables left unset default to standard normal (continuous, mean=0, SD=1).¶
Learn More¶
Uploading Data – auto-detection, correlation preservation, named levels
ANOVA & Post-Hoc Tests – factor variables in ANOVA designs
API Reference: set_variable_type – full parameter documentation
API Reference: set_factor_levels – named levels without data