Tutorial: Preparing Your CSV for MCPower¶
Goal¶
You have data in Excel, Google Sheets, SPSS, or another tool, and you want to export it as a CSV file that MCPower can read without errors.
This page covers:
How to export a CSV from common tools
What format rules your CSV must follow
How MCPower auto-detects variable types from your data
Creating Your CSV File¶
From Excel¶
Open your spreadsheet in Excel
Make sure your data starts in row 1 with column headers (no blank rows above)
Remove any merged cells, formulas, or special formatting
Go to File > Save As
Choose CSV UTF-8 (Comma delimited) (*.csv) as the file type
Click Save
Tip: If you only see “CSV (Comma delimited)” without “UTF-8”, that older format usually works too, but UTF-8 is safer for special characters (accents, non-English text).
From Google Sheets¶
Open your spreadsheet in Google Sheets
Make sure your data starts in row 1 with column headers
Go to File > Download > Comma Separated Values (.csv)
The file is saved to your Downloads folder
Google Sheets exports as UTF-8 with comma separators by default — no extra settings needed.
From SPSS¶
Open your dataset in SPSS
Go to File > Save As
Set the file type to Comma Separated Values (*.csv)
Check “Write variable names to spreadsheet” so the header row is included
Click Save
Alternatively, use SPSS syntax:
SAVE TRANSLATE
/OUTFILE="pilot_data.csv"
/TYPE=CSV
/ENCODING="UTF8"
/FIELDNAMES
/REPLACE.
Note: SPSS may encode missing values as
.(dot). MCPower does not accept missing values — remove or impute them before exporting (see Step 4 below).
Full Working Example¶
A correctly formatted CSV file (pilot_data.csv):
treatment,score,age,education
1,72.5,34,high_school
0,68.1,29,college
1,81.3,41,graduate
0,65.9,26,college
1,77.2,38,high_school
0,70.4,31,graduate
1,84.6,45,college
0,62.3,24,high_school
Loading and using it in MCPower:
import pandas as pd
from mcpower import MCPower
data = pd.read_csv("pilot_data.csv")
model = MCPower("outcome = treatment + score + age + education")
model.upload_data(data[["treatment", "score", "age", "education"]])
# treatment: 2 unique values → binary
# score: 8 unique values → continuous
# age: 8 unique values → continuous
# education: 3 unique values ["college", "graduate", "high_school"] → factor
# Reference: "college" (first alphabetically)
# Dummies: education[graduate], education[high_school]
model.set_effects(
"treatment=0.50, score=0.25, age=0.10, "
"education[graduate]=0.40, education[high_school]=0.20"
)
model.find_power(sample_size=200)
Step-by-Step Walkthrough¶
1. Use comma separators¶
MCPower expects standard comma-separated values. Do not use semicolons, tabs, or pipes.
Correct:
group,score,age
A,85.2,34
B,72.1,28
Wrong:
group;score;age
A;85.2;34
B;72.1;28
If your file uses a different delimiter, convert it first:
# Read with semicolon delimiter, then save as comma-separated
data = pd.read_csv("data.csv", sep=";")
data.to_csv("data_fixed.csv", index=False)
2. Include a header row¶
The first row must contain column names. These names become the variable names in your formula.
Correct:
treatment,score,age
1,72.5,34
0,68.1,29
Wrong (no header):
1,72.5,34
0,68.1,29
3. Keep types consistent within each column¶
Every value in a column should be the same type – all numbers or all strings. Do not mix.
Correct:
group,score
control,72.5
treatment,81.3
control,68.1
Wrong (mixed types in the group column):
group,score
control,72.5
1,81.3
treatment,68.1
4. No missing values or empty cells¶
MCPower does not handle missing data. Remove or impute incomplete rows before uploading.
Correct:
treatment,score,age
1,72.5,34
0,68.1,29
1,81.3,41
Wrong (missing values):
treatment,score,age
1,72.5,34
0,,29
1,81.3,
Wrong (NA or NULL markers):
treatment,score,age
1,72.5,34
0,NA,29
1,81.3,NULL
Clean your data in pandas before uploading:
data = pd.read_csv("data_with_missing.csv")
data = data.dropna(subset=["treatment", "score", "age"]) # drop rows with NAs
5. Save with UTF-8 encoding¶
Save your CSV as UTF-8. pandas tries utf-8-sig first (MCPower does not read CSV files directly), but UTF-8 is the safest choice.
In Excel: File > Save As > CSV UTF-8 (Comma delimited)
In R:
write.csv(data, "pilot_data.csv", row.names = FALSE, fileEncoding = "UTF-8")
6. No trailing commas¶
Ensure rows do not end with extra commas. This is a common issue when exporting from spreadsheet software.
Correct:
group,score,age
A,85.2,34
B,72.1,28
Wrong (trailing commas create a phantom empty column):
group,score,age,
A,85.2,34,
B,72.1,28,
7. Use simple column names¶
Stick to letters, numbers, and underscores. Avoid spaces, special characters, and names that start with numbers.
Good names: treatment, score, baseline_pain, group_A, age
Problematic names: treatment group, score (%), 1st_measure, age+1
How Auto-Detection Works¶
MCPower classifies each column based on the count of unique values:
Unique Values |
Detected Type |
Example |
|---|---|---|
1 |
Dropped with warning |
A column that is all |
2 |
Binary |
|
3–6 |
Factor |
|
7+ |
Continuous |
|
String columns with 2–20 unique values are auto-detected as factors. String columns with more than 20 unique values raise an error.
If auto-detection gets it wrong, override with data_types:
# "rating" has 5 unique values, auto-detected as factor
# Override to continuous
model.upload_data(data, data_types={"rating": "continuous"})
Quick Checklist¶
Before uploading your CSV, verify:
[ ] Comma-separated (not semicolons or tabs)
[ ] Header row present with column names
[ ] Consistent types per column (all numbers or all strings)
[ ] No missing values, empty cells,
NA, orNULL[ ] UTF-8 encoding
[ ] No trailing commas
[ ] Column names use only letters, numbers, and underscores
Common Pitfalls¶
Problem |
Symptom |
Fix |
|---|---|---|
Semicolons instead of commas |
All data lands in one column |
Re-export with comma separator |
Missing header |
First data row used as names |
Add a header row |
Mixed types in a column |
Unexpected type detection |
Clean column to one type |
Empty cells or NA values |
Error during upload |
Drop or impute missing rows |
Trailing commas |
Phantom empty column detected |
Remove trailing commas |
BOM encoding marker |
Column name has invisible prefix |
Save as UTF-8 (not UTF-8 with BOM), or use |
Too many unique strings (>20) |
ValueError on upload |
Override with |
Next Steps¶
Tutorial: Power Analysis with Your Own Data — using your uploaded data in a full analysis
upload_data()API Reference — complete parameter documentation