Tutorial: Preparing Your CSV for MCPower

Goal

You have data in Excel, Google Sheets, SPSS, or another tool, and you want to export it as a CSV file that MCPower can read without errors.

This page covers:

  1. How to export a CSV from common tools

  2. What format rules your CSV must follow

  3. How MCPower auto-detects variable types from your data


Creating Your CSV File

From Excel

  1. Open your spreadsheet in Excel

  2. Make sure your data starts in row 1 with column headers (no blank rows above)

  3. Remove any merged cells, formulas, or special formatting

  4. Go to File > Save As

  5. Choose CSV UTF-8 (Comma delimited) (*.csv) as the file type

  6. Click Save

Tip: If you only see “CSV (Comma delimited)” without “UTF-8”, that older format usually works too, but UTF-8 is safer for special characters (accents, non-English text).

From Google Sheets

  1. Open your spreadsheet in Google Sheets

  2. Make sure your data starts in row 1 with column headers

  3. Go to File > Download > Comma Separated Values (.csv)

  4. The file is saved to your Downloads folder

Google Sheets exports as UTF-8 with comma separators by default — no extra settings needed.

From SPSS

  1. Open your dataset in SPSS

  2. Go to File > Save As

  3. Set the file type to Comma Separated Values (*.csv)

  4. Check “Write variable names to spreadsheet” so the header row is included

  5. Click Save

Alternatively, use SPSS syntax:

SAVE TRANSLATE
  /OUTFILE="pilot_data.csv"
  /TYPE=CSV
  /ENCODING="UTF8"
  /FIELDNAMES
  /REPLACE.

Note: SPSS may encode missing values as . (dot). MCPower does not accept missing values — remove or impute them before exporting (see Step 4 below).


Full Working Example

A correctly formatted CSV file (pilot_data.csv):

treatment,score,age,education
1,72.5,34,high_school
0,68.1,29,college
1,81.3,41,graduate
0,65.9,26,college
1,77.2,38,high_school
0,70.4,31,graduate
1,84.6,45,college
0,62.3,24,high_school

Loading and using it in MCPower:

import pandas as pd
from mcpower import MCPower

data = pd.read_csv("pilot_data.csv")

model = MCPower("outcome = treatment + score + age + education")
model.upload_data(data[["treatment", "score", "age", "education"]])
# treatment: 2 unique values → binary
# score: 8 unique values → continuous
# age: 8 unique values → continuous
# education: 3 unique values ["college", "graduate", "high_school"] → factor
#   Reference: "college" (first alphabetically)
#   Dummies: education[graduate], education[high_school]

model.set_effects(
    "treatment=0.50, score=0.25, age=0.10, "
    "education[graduate]=0.40, education[high_school]=0.20"
)
model.find_power(sample_size=200)

Step-by-Step Walkthrough

1. Use comma separators

MCPower expects standard comma-separated values. Do not use semicolons, tabs, or pipes.

Correct:

group,score,age
A,85.2,34
B,72.1,28

Wrong:

group;score;age
A;85.2;34
B;72.1;28

If your file uses a different delimiter, convert it first:

# Read with semicolon delimiter, then save as comma-separated
data = pd.read_csv("data.csv", sep=";")
data.to_csv("data_fixed.csv", index=False)

2. Include a header row

The first row must contain column names. These names become the variable names in your formula.

Correct:

treatment,score,age
1,72.5,34
0,68.1,29

Wrong (no header):

1,72.5,34
0,68.1,29

3. Keep types consistent within each column

Every value in a column should be the same type – all numbers or all strings. Do not mix.

Correct:

group,score
control,72.5
treatment,81.3
control,68.1

Wrong (mixed types in the group column):

group,score
control,72.5
1,81.3
treatment,68.1

4. No missing values or empty cells

MCPower does not handle missing data. Remove or impute incomplete rows before uploading.

Correct:

treatment,score,age
1,72.5,34
0,68.1,29
1,81.3,41

Wrong (missing values):

treatment,score,age
1,72.5,34
0,,29
1,81.3,

Wrong (NA or NULL markers):

treatment,score,age
1,72.5,34
0,NA,29
1,81.3,NULL

Clean your data in pandas before uploading:

data = pd.read_csv("data_with_missing.csv")
data = data.dropna(subset=["treatment", "score", "age"])  # drop rows with NAs

5. Save with UTF-8 encoding

Save your CSV as UTF-8. pandas tries utf-8-sig first (MCPower does not read CSV files directly), but UTF-8 is the safest choice.

In Excel: File > Save As > CSV UTF-8 (Comma delimited)

In R:

write.csv(data, "pilot_data.csv", row.names = FALSE, fileEncoding = "UTF-8")

6. No trailing commas

Ensure rows do not end with extra commas. This is a common issue when exporting from spreadsheet software.

Correct:

group,score,age
A,85.2,34
B,72.1,28

Wrong (trailing commas create a phantom empty column):

group,score,age,
A,85.2,34,
B,72.1,28,

7. Use simple column names

Stick to letters, numbers, and underscores. Avoid spaces, special characters, and names that start with numbers.

Good names: treatment, score, baseline_pain, group_A, age

Problematic names: treatment group, score (%), 1st_measure, age+1


How Auto-Detection Works

MCPower classifies each column based on the count of unique values:

Unique Values

Detected Type

Example

1

Dropped with warning

A column that is all 1

2

Binary

treatment with values 0, 1

3–6

Factor

education with values "high_school", "college", "graduate"

7+

Continuous

age with values 24, 26, 29, 31, 34, 38, 41, 45

String columns with 2–20 unique values are auto-detected as factors. String columns with more than 20 unique values raise an error.

If auto-detection gets it wrong, override with data_types:

# "rating" has 5 unique values, auto-detected as factor
# Override to continuous
model.upload_data(data, data_types={"rating": "continuous"})

Quick Checklist

Before uploading your CSV, verify:

  • [ ] Comma-separated (not semicolons or tabs)

  • [ ] Header row present with column names

  • [ ] Consistent types per column (all numbers or all strings)

  • [ ] No missing values, empty cells, NA, or NULL

  • [ ] UTF-8 encoding

  • [ ] No trailing commas

  • [ ] Column names use only letters, numbers, and underscores


Common Pitfalls

Problem

Symptom

Fix

Semicolons instead of commas

All data lands in one column

Re-export with comma separator

Missing header

First data row used as names

Add a header row

Mixed types in a column

Unexpected type detection

Clean column to one type

Empty cells or NA values

Error during upload

Drop or impute missing rows

Trailing commas

Phantom empty column detected

Remove trailing commas

BOM encoding marker

Column name has invisible prefix

Save as UTF-8 (not UTF-8 with BOM), or use utf-8-sig encoding

Too many unique strings (>20)

ValueError on upload

Override with data_types={"col": "continuous"} or reduce categories


Next Steps