Note
Click here to download the full example code
Average ERPs with fitgrid¶
For designed EEG experiments with categorical variables, a useful range of Hunt-Dawson average ERPs and ERP effects fall out of ordinary least squares regression modeling by selecting the appropriate categorical predictor variable coding with the patsy formula language. The results are identical to the addition, subtraction of average ERP waveforms without programming ad hoc algebraic manipulations.
Prepare and load epochs¶
import pandas as pd
import fitgrid as fg
from fitgrid import DATA_DIR, sample_data
sample_data.get_file("sub000p3.ms1500.epochs.feather")
p3_epochs_df = pd.read_feather(DATA_DIR / "sub000p3.ms1500.epochs.feather")
# select 3 types of stimulus event: standards, targets, and bioamp calibration triggers
p3_epochs_df = p3_epochs_df.query("stim in ['standard', 'target', 'cal']")
# look up the data QC flags and select the good epochs
good_epochs = p3_epochs_df.query("match_time == 0 and log_flags == 0")[
"epoch_id"
]
p3_epochs_df = p3_epochs_df.query("epoch_id in @good_epochs")
# rename the time stamp column
p3_epochs_df.rename(columns={"match_time": "time_ms"}, inplace=True)
# select columns of interest for modeling
indices = ["epoch_id", "time_ms"]
predictors = ["stim"] # categorical with 2 levels: standard, target
channels = ["MiPf", "MiCe", "MiPa", "MiOc"] # midline electrodes
p3_epochs_df = p3_epochs_df[indices + predictors + channels]
# set the epoch and time column index for fg.Epochs
p3_epochs_df.set_index(["epoch_id", "time_ms"], inplace=True)
# "baseline", i.e., center each epoch on the 200 ms pre-stimulus interval
centered = []
for epoch_id, vals in p3_epochs_df.groupby("epoch_id"):
centered.append(
vals[channels]
- vals[channels].query("time_ms >= -200 and time_ms < 0").mean()
)
p3_epochs_df[channels] = pd.concat(centered)
# load data into fitgrid.Epochs
p3_epochs_fg = fg.epochs_from_dataframe(
p3_epochs_df, epoch_id="epoch_id", time="time_ms", channels=channels
)
average ERPs by condition: \(\sim \mathsf{0 + stim}\)¶
Supressing the intercept term in the patsy model formula triggers full-rank dummy (indicator) coding of the two-level categorical variable. The estimated coefficients are identical to the average ERPs in each condition. The minimal design matrix illustrates dummy coding for one categorical variable with two levels.
lmg_0_stim = fg.lm(p3_epochs_fg, RHS="0 + stim")
Out:
0%| | 0/375 [00:00<?, ?it/s]
1%|1 | 5/375 [00:00<00:08, 44.23it/s]
3%|2 | 11/375 [00:00<00:07, 49.20it/s]
5%|4 | 17/375 [00:00<00:07, 50.74it/s]
6%|6 | 23/375 [00:00<00:06, 51.52it/s]
8%|7 | 29/375 [00:00<00:06, 51.65it/s]
9%|9 | 35/375 [00:00<00:06, 52.20it/s]
11%|# | 41/375 [00:00<00:06, 52.57it/s]
13%|#2 | 47/375 [00:00<00:06, 52.76it/s]
14%|#4 | 53/375 [00:01<00:06, 52.98it/s]
16%|#5 | 59/375 [00:01<00:05, 53.12it/s]
17%|#7 | 65/375 [00:01<00:05, 52.94it/s]
19%|#8 | 71/375 [00:01<00:05, 53.07it/s]
21%|## | 77/375 [00:01<00:05, 53.22it/s]
22%|##2 | 83/375 [00:01<00:05, 53.14it/s]
24%|##3 | 89/375 [00:01<00:05, 53.13it/s]
25%|##5 | 95/375 [00:01<00:05, 52.95it/s]
27%|##6 | 101/375 [00:01<00:05, 52.96it/s]
29%|##8 | 107/375 [00:02<00:05, 52.93it/s]
30%|### | 113/375 [00:02<00:04, 52.98it/s]
32%|###1 | 119/375 [00:02<00:04, 53.11it/s]
33%|###3 | 125/375 [00:02<00:04, 53.08it/s]
35%|###4 | 131/375 [00:02<00:04, 53.23it/s]
37%|###6 | 137/375 [00:02<00:04, 53.09it/s]
38%|###8 | 143/375 [00:02<00:04, 53.08it/s]
40%|###9 | 149/375 [00:02<00:04, 53.18it/s]
41%|####1 | 155/375 [00:02<00:04, 53.06it/s]
43%|####2 | 161/375 [00:03<00:04, 53.10it/s]
45%|####4 | 167/375 [00:03<00:03, 53.04it/s]
46%|####6 | 173/375 [00:03<00:03, 53.13it/s]
48%|####7 | 179/375 [00:03<00:03, 53.07it/s]
49%|####9 | 185/375 [00:03<00:03, 53.20it/s]
51%|##### | 191/375 [00:03<00:03, 53.34it/s]
53%|#####2 | 197/375 [00:03<00:03, 53.29it/s]
54%|#####4 | 203/375 [00:03<00:03, 53.34it/s]
56%|#####5 | 209/375 [00:03<00:03, 53.22it/s]
57%|#####7 | 215/375 [00:04<00:03, 53.11it/s]
59%|#####8 | 221/375 [00:04<00:02, 53.00it/s]
61%|###### | 227/375 [00:04<00:02, 53.03it/s]
62%|######2 | 233/375 [00:04<00:02, 52.92it/s]
64%|######3 | 239/375 [00:04<00:02, 52.71it/s]
65%|######5 | 245/375 [00:04<00:02, 52.83it/s]
67%|######6 | 251/375 [00:04<00:02, 52.84it/s]
69%|######8 | 257/375 [00:04<00:02, 52.98it/s]
70%|####### | 263/375 [00:04<00:02, 52.96it/s]
72%|#######1 | 269/375 [00:05<00:02, 52.96it/s]
73%|#######3 | 275/375 [00:05<00:01, 52.99it/s]
75%|#######4 | 281/375 [00:05<00:01, 53.07it/s]
77%|#######6 | 287/375 [00:05<00:01, 53.05it/s]
78%|#######8 | 293/375 [00:05<00:01, 52.76it/s]
80%|#######9 | 299/375 [00:05<00:01, 52.35it/s]
81%|########1 | 305/375 [00:05<00:01, 52.34it/s]
83%|########2 | 311/375 [00:05<00:01, 52.49it/s]
85%|########4 | 317/375 [00:06<00:01, 52.32it/s]
86%|########6 | 323/375 [00:06<00:00, 52.41it/s]
88%|########7 | 329/375 [00:06<00:00, 52.58it/s]
89%|########9 | 335/375 [00:06<00:00, 52.56it/s]
91%|######### | 341/375 [00:06<00:00, 52.66it/s]
93%|#########2| 347/375 [00:06<00:00, 52.61it/s]
94%|#########4| 353/375 [00:06<00:00, 52.79it/s]
96%|#########5| 359/375 [00:06<00:00, 52.64it/s]
97%|#########7| 365/375 [00:06<00:00, 50.97it/s]
99%|#########8| 371/375 [00:07<00:00, 51.32it/s]
100%|##########| 375/375 [00:07<00:00, 52.67it/s]
Parameter estimates = Smith & Kutas (2015) regression ERPs
beta_hats = lmg_0_stim.params
beta_hats
Parameter estimate standard errors
bses = lmg_0_stim.bse
bses
Visualize parameter estimates +/- standard error
from matplotlib import pyplot as plt
# label index columns for pandas groupby
for attr_df in [beta_hats, bses]:
attr_df.index.set_names(["time_ms", "beta_hats"], inplace=True)
for beta_hat, vals in beta_hats.groupby("beta_hats"):
vals.reset_index('beta_hats', inplace=True, drop=True)
times = vals.index.to_numpy()
bse = bses.query("beta_hats==@beta_hat")
fig, ax = plt.subplots(figsize=(8, 4))
ax.set_title(beta_hat)
ax.set(
xlabel="Time (ms)",
xlim=(times[0], times[-1]),
ylabel=r"$\mu$V",
ylim=(-15, 15),
)
ax.axhline(0, color="lightgray", lw=1)
ax.axvline(0, color="gray", lw=1)
for jdx, chan in enumerate(vals.columns):
ax.plot(times, vals[chan], label=chan)
ax.fill_between(
times, vals[chan] - bse[chan], vals[chan] + bse[chan], alpha=0.2
)
ax.legend(loc=(1.05, 0.5))
Why this works¶
Here is a small (“right hand side”) design matrix for 9 observations of a categorical variable with 3 levels. There is no intercept (constant) and when one of the 3 regressors is 1, the others are 0. The \(\hat{\beta}\) weights that minimize overall error are the means of the data at each level of the categorical variable.
from patsy import demo_data, dmatrix
cat_2 = demo_data("a", nlevels=3, min_rows=8)
dmatrix("0 + a", data=cat_2, return_type="dataframe")
For EEG data, the “means of the data at each level of the categorical variable” are the time-domain average ERPs. In the sample data, the categorical stimulus variable has three levels: standard, target, and cal for the 10 \(\mu\mathsf{V}\) calibration square wave.
We can reach into one cell of the FitGrid at time = 0 and channel = MiPa and pull out the design matrix. The three column indicator coding is the same as the demo_data example except for the column labels and hundreds observations instead of 9.
lmg_0_stim[0, "MiPa"].model.exog_names.unstack(-2)
lmg_0_stim[0, "MiPa"].model.exog.unstack(-1)
Out:
/home/runner/work/fitgrid-dev/fitgrid-dev/fitgrid/fitgrid.py:302: FutureWarning: inplace is deprecated and will be removed in a future version.
temp.index.set_levels(self.epoch_index, level=i, inplace=True)
Total running time of the script: ( 0 minutes 10.492 seconds)