# Variable Types

## What It Is

MCPower needs to know the statistical type of each predictor to generate realistic simulated data. The three main types are **continuous** (numeric values on a scale), **binary** (two groups, coded 0/1), and **factor** (categorical with three or more levels, dummy-coded).

Beyond these core types, continuous variables can follow different distributions: normal (default), right-skewed (e.g., income), left-skewed (e.g., ceiling effects), high-kurtosis (heavy tails with outliers), or uniform (evenly spread). These distribution shapes affect how robust your power estimates are to real-world data characteristics.

When you upload empirical data, MCPower auto-detects variable types based on the number of unique values. You can override any detection with explicit type declarations. String columns (e.g., "Europe", "Japan", "USA") are automatically recognized as factors.

---

## How It Works in MCPower

Set types manually with [`set_variable_type()`](../api/configuration.md):

```python
model.set_variable_type("treatment=binary, group=(factor,3), income=right_skewed")
```

Custom proportions are supported for binary and factor variables:

```python
model.set_variable_type("drug=(binary,0.3), dose=(factor,0.2,0.5,0.3)")
```

Name factor levels with [`set_factor_levels()`](../api/data.md):

```python
model.set_factor_levels("group=control,drug_a,drug_b")  # first = reference
```

---

## Guidelines

### Distribution Types

| Type | Syntax | When to Use |
|---|---|---|
| Normal | `var=normal` | Default. Symmetric bell-shaped data. |
| Binary | `var=binary` | Two groups (treatment/control, yes/no). |
| Binary (custom) | `var=(binary,0.3)` | Unequal split (30% in group 1). |
| Factor | `var=(factor,3)` | 3+ categorical levels, equal proportions. |
| Factor (custom) | `var=(factor,0.2,0.5,0.3)` | Custom proportions (must sum to 1). |
| Right skewed | `var=right_skewed` | Income, reaction times, counts. |
| Left skewed | `var=left_skewed` | Ceiling effects, negatively skewed scores. |
| High kurtosis | `var=high_kurtosis` | Heavy-tailed data with outliers. |
| Uniform | `var=uniform` | Evenly spread, no clustering around mean. |

### Auto-Detection from Uploaded Data

| Unique Values | Detected Type |
|---|---|
| 1 | Dropped (constant column) |
| 2 | Binary |
| 3--6 | Factor |
| 7+ | Continuous |
| String column, 2--20 unique | Factor |
| String column, >20 unique | Error (too many levels) |

Override auto-detection with the `data_types` parameter in `upload_data()`.

### Factor Variables

- Level 1 (or first sorted value from data) is the reference level.
- Each non-reference level becomes a dummy variable needing its own effect size.
- Use `data_types={"cyl": ("factor", 8)}` to pick a specific reference level.
- With uploaded data, dummies use original values: `cyl[6]`, `cyl[8]` (default `preserve_factor_level_names=True`).

---

## Common Patterns

| Scenario | Type String | Notes |
|---|---|---|
| Treatment vs. control | `treatment=binary` | Default 50/50 split |
| Unbalanced treatment | `treatment=(binary,0.3)` | 30% treatment, 70% control |
| 3-group comparison | `group=(factor,3)` | Equal thirds |
| Weighted groups | `group=(factor,0.2,0.5,0.3)` | Custom proportions |
| Income variable | `income=right_skewed` | Positive skew |
| Rating scale (1--7) | `rating=uniform` | Evenly distributed |
| Named conditions | `set_factor_levels("cond=placebo,low_dose,high_dose")` | Explicit level names |

### Variables left unset default to standard normal (continuous, mean=0, SD=1).

## Learn More

- **[Uploading Data](../tutorials/own-data.md)** -- auto-detection, correlation preservation, named levels
- **[ANOVA & Post-Hoc Tests](../tutorials/anova-posthoc.md)** -- factor variables in ANOVA designs
- **[API Reference: set_variable_type](../api/configuration.md)** -- full parameter documentation
- **[API Reference: set_factor_levels](../api/data.md)** -- named levels without data