Tutorial: Preparing Your CSV for MCPower¶

Goal¶

You have data in Excel, Google Sheets, SPSS, or another tool, and you want to export it as a CSV file that MCPower can read without errors.

This page covers:

How to export a CSV from common tools
What format rules your CSV must follow
How MCPower auto-detects variable types from your data

Creating Your CSV File¶

From Excel¶

Open your spreadsheet in Excel
Make sure your data starts in row 1 with column headers (no blank rows above)
Remove any merged cells, formulas, or special formatting
Go to File > Save As
Choose CSV UTF-8 (Comma delimited) (*.csv) as the file type
Click Save

Tip: If you only see “CSV (Comma delimited)” without “UTF-8”, that older format usually works too, but UTF-8 is safer for special characters (accents, non-English text).

From Google Sheets¶

Open your spreadsheet in Google Sheets
Make sure your data starts in row 1 with column headers
Go to File > Download > Comma Separated Values (.csv)
The file is saved to your Downloads folder

Google Sheets exports as UTF-8 with comma separators by default — no extra settings needed.

From SPSS¶

Open your dataset in SPSS
Go to File > Save As
Set the file type to Comma Separated Values (*.csv)
Check “Write variable names to spreadsheet” so the header row is included
Click Save

Alternatively, use SPSS syntax:

SAVE TRANSLATE
  /OUTFILE="pilot_data.csv"
  /TYPE=CSV
  /ENCODING="UTF8"
  /FIELDNAMES
  /REPLACE.

Note: SPSS may encode missing values as . (dot). MCPower does not accept missing values — remove or impute them before exporting (see Step 4 below).

Full Working Example¶

A correctly formatted CSV file (pilot_data.csv):

treatment,score,age,education
1,72.5,34,high_school
0,68.1,29,college
1,81.3,41,graduate
0,65.9,26,college
1,77.2,38,high_school
0,70.4,31,graduate
1,84.6,45,college
0,62.3,24,high_school

Loading and using it in MCPower:

import pandas as pd
from mcpower import MCPower

data = pd.read_csv("pilot_data.csv")

model = MCPower("outcome = treatment + score + age + education")
model.upload_data(data[["treatment", "score", "age", "education"]])
# treatment: 2 unique values → binary
# score: 8 unique values → continuous
# age: 8 unique values → continuous
# education: 3 unique values ["college", "graduate", "high_school"] → factor
#   Reference: "college" (first alphabetically)
#   Dummies: education[graduate], education[high_school]

model.set_effects(
    "treatment=0.50, score=0.25, age=0.10, "
    "education[graduate]=0.40, education[high_school]=0.20"
)
model.find_power(sample_size=200)

Step-by-Step Walkthrough¶

1. Use comma separators¶

MCPower expects standard comma-separated values. Do not use semicolons, tabs, or pipes.

Correct:

group,score,age
A,85.2,34
B,72.1,28

Wrong:

group;score;age
A;85.2;34
B;72.1;28

If your file uses a different delimiter, convert it first:

# Read with semicolon delimiter, then save as comma-separated
data = pd.read_csv("data.csv", sep=";")
data.to_csv("data_fixed.csv", index=False)

2. Include a header row¶

The first row must contain column names. These names become the variable names in your formula.

Correct:

treatment,score,age
1,72.5,34
0,68.1,29

Wrong (no header):

1,72.5,34
0,68.1,29

3. Keep types consistent within each column¶

Every value in a column should be the same type – all numbers or all strings. Do not mix.

Correct:

group,score
control,72.5
treatment,81.3
control,68.1

Wrong (mixed types in the group column):

group,score
control,72.5
1,81.3
treatment,68.1

4. No missing values or empty cells¶

MCPower does not handle missing data. Remove or impute incomplete rows before uploading.

Correct:

treatment,score,age
1,72.5,34
0,68.1,29
1,81.3,41

Wrong (missing values):

treatment,score,age
1,72.5,34
0,,29
1,81.3,

Wrong (NA or NULL markers):

treatment,score,age
1,72.5,34
0,NA,29
1,81.3,NULL

Clean your data in pandas before uploading:

data = pd.read_csv("data_with_missing.csv")
data = data.dropna(subset=["treatment", "score", "age"])  # drop rows with NAs

5. Save with UTF-8 encoding¶

Save your CSV as UTF-8. pandas tries utf-8-sig first (MCPower does not read CSV files directly), but UTF-8 is the safest choice.

In Excel: File > Save As > CSV UTF-8 (Comma delimited)

In R:

write.csv(data, "pilot_data.csv", row.names = FALSE, fileEncoding = "UTF-8")

6. No trailing commas¶

Ensure rows do not end with extra commas. This is a common issue when exporting from spreadsheet software.

Correct:

group,score,age
A,85.2,34
B,72.1,28

Wrong (trailing commas create a phantom empty column):

group,score,age,
A,85.2,34,
B,72.1,28,

7. Use simple column names¶

Stick to letters, numbers, and underscores. Avoid spaces, special characters, and names that start with numbers.

Good names: treatment, score, baseline_pain, group_A, age

Problematic names: treatment group, score (%), 1st_measure, age+1

How Auto-Detection Works¶

MCPower classifies each column based on the count of unique values:

Unique Values	Detected Type	Example
1	Dropped with warning	A column that is all `1`
2	Binary	`treatment` with values `0, 1`
3–6	Factor	`education` with values `"high_school", "college", "graduate"`
7+	Continuous	`age` with values `24, 26, 29, 31, 34, 38, 41, 45`

String columns with 2–20 unique values are auto-detected as factors. String columns with more than 20 unique values raise an error.

If auto-detection gets it wrong, override with data_types:

# "rating" has 5 unique values, auto-detected as factor
# Override to continuous
model.upload_data(data, data_types={"rating": "continuous"})

Quick Checklist¶

Before uploading your CSV, verify:

[ ] Comma-separated (not semicolons or tabs)
[ ] Header row present with column names
[ ] Consistent types per column (all numbers or all strings)
[ ] No missing values, empty cells, NA, or NULL
[ ] UTF-8 encoding
[ ] No trailing commas
[ ] Column names use only letters, numbers, and underscores

Common Pitfalls¶

Problem	Symptom	Fix
Semicolons instead of commas	All data lands in one column	Re-export with comma separator
Missing header	First data row used as names	Add a header row
Mixed types in a column	Unexpected type detection	Clean column to one type
Empty cells or NA values	Error during upload	Drop or impute missing rows
Trailing commas	Phantom empty column detected	Remove trailing commas
BOM encoding marker	Column name has invisible prefix	Save as UTF-8 (not UTF-8 with BOM), or use `utf-8-sig` encoding
Too many unique strings (>20)	ValueError on upload	Override with `data_types={"col": "continuous"}` or reduce categories

Next Steps¶

Tutorial: Power Analysis with Your Own Data — using your uploaded data in a full analysis
upload_data() API Reference — complete parameter documentation