Variable Types

What It Is

MCPower needs to know the statistical type of each predictor to generate realistic simulated data. The three main types are continuous (numeric values on a scale), binary (two groups, coded 0/1), and factor (categorical with three or more levels, dummy-coded).

Beyond these core types, continuous variables can follow different distributions: normal (default), right-skewed (e.g., income), left-skewed (e.g., ceiling effects), high-kurtosis (heavy tails with outliers), or uniform (evenly spread). These distribution shapes affect how robust your power estimates are to real-world data characteristics.

When you upload empirical data, MCPower auto-detects variable types based on the number of unique values. You can override any detection with explicit type declarations. String columns (e.g., “Europe”, “Japan”, “USA”) are automatically recognized as factors.


How It Works in MCPower

Set types manually with set_variable_type():

model.set_variable_type("treatment=binary, group=(factor,3), income=right_skewed")

Custom proportions are supported for binary and factor variables:

model.set_variable_type("drug=(binary,0.3), dose=(factor,0.2,0.5,0.3)")

Name factor levels with set_factor_levels():

model.set_factor_levels("group=control,drug_a,drug_b")  # first = reference

Guidelines

Distribution Types

Type

Syntax

When to Use

Normal

var=normal

Default. Symmetric bell-shaped data.

Binary

var=binary

Two groups (treatment/control, yes/no).

Binary (custom)

var=(binary,0.3)

Unequal split (30% in group 1).

Factor

var=(factor,3)

3+ categorical levels, equal proportions.

Factor (custom)

var=(factor,0.2,0.5,0.3)

Custom proportions (must sum to 1).

Right skewed

var=right_skewed

Income, reaction times, counts.

Left skewed

var=left_skewed

Ceiling effects, negatively skewed scores.

High kurtosis

var=high_kurtosis

Heavy-tailed data with outliers.

Uniform

var=uniform

Evenly spread, no clustering around mean.

Auto-Detection from Uploaded Data

Unique Values

Detected Type

1

Dropped (constant column)

2

Binary

3–6

Factor

7+

Continuous

String column, 2–20 unique

Factor

String column, >20 unique

Error (too many levels)

Override auto-detection with the data_types parameter in upload_data().

Factor Variables

  • Level 1 (or first sorted value from data) is the reference level.

  • Each non-reference level becomes a dummy variable needing its own effect size.

  • Use data_types={"cyl": ("factor", 8)} to pick a specific reference level.

  • With uploaded data, dummies use original values: cyl[6], cyl[8] (default preserve_factor_level_names=True).


Common Patterns

Scenario

Type String

Notes

Treatment vs. control

treatment=binary

Default 50/50 split

Unbalanced treatment

treatment=(binary,0.3)

30% treatment, 70% control

3-group comparison

group=(factor,3)

Equal thirds

Weighted groups

group=(factor,0.2,0.5,0.3)

Custom proportions

Income variable

income=right_skewed

Positive skew

Rating scale (1–7)

rating=uniform

Evenly distributed

Named conditions

set_factor_levels("cond=placebo,low_dose,high_dose")

Explicit level names

Variables left unset default to standard normal (continuous, mean=0, SD=1).

Learn More