Average ERPs with fitgrid

For designed EEG experiments with categorical variables, a useful range of Hunt-Dawson average ERPs and ERP effects fall out of ordinary least squares regression modeling by selecting the appropriate categorical predictor variable coding with the patsy formula language. The results are identical to the addition, subtraction of average ERP waveforms without programming ad hoc algebraic manipulations.

Prepare and load epochs

import pandas as pd
import fitgrid as fg
from fitgrid import DATA_DIR, sample_data

sample_data.get_file("sub000p3.ms1500.epochs.feather")
p3_epochs_df = pd.read_feather(DATA_DIR / "sub000p3.ms1500.epochs.feather")

# select 3 types of stimulus event: standards, targets, and bioamp calibration triggers
p3_epochs_df = p3_epochs_df.query("stim in ['standard', 'target', 'cal']")

# look up the data QC flags and select the good epochs
good_epochs = p3_epochs_df.query("match_time == 0 and log_flags == 0")[
    "epoch_id"
]
p3_epochs_df = p3_epochs_df.query("epoch_id in @good_epochs")

# rename the time stamp column
p3_epochs_df.rename(columns={"match_time": "time_ms"}, inplace=True)

# select columns of interest for modeling
indices = ["epoch_id", "time_ms"]
predictors = ["stim"]  # categorical with 2 levels: standard, target
channels = ["MiPf", "MiCe", "MiPa", "MiOc"]  # midline electrodes
p3_epochs_df = p3_epochs_df[indices + predictors + channels]

# set the epoch and time column index for fg.Epochs
p3_epochs_df.set_index(["epoch_id", "time_ms"], inplace=True)

# "baseline", i.e., center each epoch on the 200 ms pre-stimulus interval
centered = []
for epoch_id, vals in p3_epochs_df.groupby("epoch_id"):
    centered.append(
        vals[channels]
        - vals[channels].query("time_ms >= -200 and time_ms < 0").mean()
    )
p3_epochs_df[channels] = pd.concat(centered)

# load data into fitgrid.Epochs
p3_epochs_fg = fg.epochs_from_dataframe(
    p3_epochs_df, epoch_id="epoch_id", time="time_ms", channels=channels
)

average ERPs by condition: \(\sim \mathsf{0 + stim}\)

Supressing the intercept term in the patsy model formula triggers full-rank dummy (indicator) coding of the two-level categorical variable. The estimated coefficients are identical to the average ERPs in each condition. The minimal design matrix illustrates dummy coding for one categorical variable with two levels.

lmg_0_stim = fg.lm(p3_epochs_fg, RHS="0 + stim")

Out:

  0%|          | 0/375 [00:00<?, ?it/s]
  1%|1         | 5/375 [00:00<00:08, 44.23it/s]
  3%|2         | 11/375 [00:00<00:07, 49.20it/s]
  5%|4         | 17/375 [00:00<00:07, 50.74it/s]
  6%|6         | 23/375 [00:00<00:06, 51.52it/s]
  8%|7         | 29/375 [00:00<00:06, 51.65it/s]
  9%|9         | 35/375 [00:00<00:06, 52.20it/s]
 11%|#         | 41/375 [00:00<00:06, 52.57it/s]
 13%|#2        | 47/375 [00:00<00:06, 52.76it/s]
 14%|#4        | 53/375 [00:01<00:06, 52.98it/s]
 16%|#5        | 59/375 [00:01<00:05, 53.12it/s]
 17%|#7        | 65/375 [00:01<00:05, 52.94it/s]
 19%|#8        | 71/375 [00:01<00:05, 53.07it/s]
 21%|##        | 77/375 [00:01<00:05, 53.22it/s]
 22%|##2       | 83/375 [00:01<00:05, 53.14it/s]
 24%|##3       | 89/375 [00:01<00:05, 53.13it/s]
 25%|##5       | 95/375 [00:01<00:05, 52.95it/s]
 27%|##6       | 101/375 [00:01<00:05, 52.96it/s]
 29%|##8       | 107/375 [00:02<00:05, 52.93it/s]
 30%|###       | 113/375 [00:02<00:04, 52.98it/s]
 32%|###1      | 119/375 [00:02<00:04, 53.11it/s]
 33%|###3      | 125/375 [00:02<00:04, 53.08it/s]
 35%|###4      | 131/375 [00:02<00:04, 53.23it/s]
 37%|###6      | 137/375 [00:02<00:04, 53.09it/s]
 38%|###8      | 143/375 [00:02<00:04, 53.08it/s]
 40%|###9      | 149/375 [00:02<00:04, 53.18it/s]
 41%|####1     | 155/375 [00:02<00:04, 53.06it/s]
 43%|####2     | 161/375 [00:03<00:04, 53.10it/s]
 45%|####4     | 167/375 [00:03<00:03, 53.04it/s]
 46%|####6     | 173/375 [00:03<00:03, 53.13it/s]
 48%|####7     | 179/375 [00:03<00:03, 53.07it/s]
 49%|####9     | 185/375 [00:03<00:03, 53.20it/s]
 51%|#####     | 191/375 [00:03<00:03, 53.34it/s]
 53%|#####2    | 197/375 [00:03<00:03, 53.29it/s]
 54%|#####4    | 203/375 [00:03<00:03, 53.34it/s]
 56%|#####5    | 209/375 [00:03<00:03, 53.22it/s]
 57%|#####7    | 215/375 [00:04<00:03, 53.11it/s]
 59%|#####8    | 221/375 [00:04<00:02, 53.00it/s]
 61%|######    | 227/375 [00:04<00:02, 53.03it/s]
 62%|######2   | 233/375 [00:04<00:02, 52.92it/s]
 64%|######3   | 239/375 [00:04<00:02, 52.71it/s]
 65%|######5   | 245/375 [00:04<00:02, 52.83it/s]
 67%|######6   | 251/375 [00:04<00:02, 52.84it/s]
 69%|######8   | 257/375 [00:04<00:02, 52.98it/s]
 70%|#######   | 263/375 [00:04<00:02, 52.96it/s]
 72%|#######1  | 269/375 [00:05<00:02, 52.96it/s]
 73%|#######3  | 275/375 [00:05<00:01, 52.99it/s]
 75%|#######4  | 281/375 [00:05<00:01, 53.07it/s]
 77%|#######6  | 287/375 [00:05<00:01, 53.05it/s]
 78%|#######8  | 293/375 [00:05<00:01, 52.76it/s]
 80%|#######9  | 299/375 [00:05<00:01, 52.35it/s]
 81%|########1 | 305/375 [00:05<00:01, 52.34it/s]
 83%|########2 | 311/375 [00:05<00:01, 52.49it/s]
 85%|########4 | 317/375 [00:06<00:01, 52.32it/s]
 86%|########6 | 323/375 [00:06<00:00, 52.41it/s]
 88%|########7 | 329/375 [00:06<00:00, 52.58it/s]
 89%|########9 | 335/375 [00:06<00:00, 52.56it/s]
 91%|######### | 341/375 [00:06<00:00, 52.66it/s]
 93%|#########2| 347/375 [00:06<00:00, 52.61it/s]
 94%|#########4| 353/375 [00:06<00:00, 52.79it/s]
 96%|#########5| 359/375 [00:06<00:00, 52.64it/s]
 97%|#########7| 365/375 [00:06<00:00, 50.97it/s]
 99%|#########8| 371/375 [00:07<00:00, 51.32it/s]
100%|##########| 375/375 [00:07<00:00, 52.67it/s]

Parameter estimates = Smith & Kutas (2015) regression ERPs

beta_hats = lmg_0_stim.params
beta_hats
MiPf MiCe MiPa MiOc
time_ms
-748 stim[cal] 0.025192 0.023050 0.021799 -0.041392
stim[standard] 2.235214 0.628217 2.202564 2.075095
stim[target] 0.890926 0.909482 3.704045 2.489043
-744 stim[cal] 0.013173 0.029130 -0.018333 -0.015801
stim[standard] 2.304857 0.758107 2.443701 2.423396
... ... ... ... ... ...
744 stim[standard] -2.059429 0.405358 -0.481777 -1.198918
stim[target] 1.159444 -3.025486 2.555300 5.525375
748 stim[cal] 0.037212 0.052357 0.018305 0.012487
stim[standard] -1.866571 0.507221 -0.323304 -1.001864
stim[target] 1.733518 -2.055181 3.420878 5.769298

1125 rows × 4 columns



Parameter estimate standard errors

bses = lmg_0_stim.bse
bses
MiPf MiCe MiPa MiOc
time_ms
-748 stim[cal] 1.717181 0.850727 0.886257 0.724336
stim[standard] 1.480024 0.733234 0.763858 0.624299
stim[target] 3.370163 1.669648 1.739380 1.421591
-744 stim[cal] 1.750222 0.856846 0.901908 0.729748
stim[standard] 1.508502 0.738508 0.777347 0.628964
... ... ... ... ... ...
744 stim[standard] 0.616983 0.704265 0.760638 0.676497
stim[target] 1.404932 1.603682 1.732049 1.540452
748 stim[cal] 0.718129 0.818604 0.881669 0.780283
stim[standard] 0.618949 0.705548 0.759903 0.672520
stim[target] 1.409409 1.606603 1.730375 1.531395

1125 rows × 4 columns



Visualize parameter estimates +/- standard error

from matplotlib import pyplot as plt

# label index columns for pandas groupby
for attr_df in [beta_hats, bses]:
    attr_df.index.set_names(["time_ms", "beta_hats"], inplace=True)

for beta_hat, vals in beta_hats.groupby("beta_hats"):
    vals.reset_index('beta_hats', inplace=True, drop=True)
    times = vals.index.to_numpy()
    bse = bses.query("beta_hats==@beta_hat")

    fig, ax = plt.subplots(figsize=(8, 4))
    ax.set_title(beta_hat)
    ax.set(
        xlabel="Time (ms)",
        xlim=(times[0], times[-1]),
        ylabel=r"$\mu$V",
        ylim=(-15, 15),
    )
    ax.axhline(0, color="lightgray", lw=1)
    ax.axvline(0, color="gray", lw=1)

    for jdx, chan in enumerate(vals.columns):
        ax.plot(times, vals[chan], label=chan)
        ax.fill_between(
            times, vals[chan] - bse[chan], vals[chan] + bse[chan], alpha=0.2
        )
    ax.legend(loc=(1.05, 0.5))
  • stim[cal]
  • stim[standard]
  • stim[target]

Why this works

Here is a small (“right hand side”) design matrix for 9 observations of a categorical variable with 3 levels. There is no intercept (constant) and when one of the 3 regressors is 1, the others are 0. The \(\hat{\beta}\) weights that minimize overall error are the means of the data at each level of the categorical variable.

from patsy import demo_data, dmatrix

cat_2 = demo_data("a", nlevels=3, min_rows=8)
dmatrix("0 + a", data=cat_2, return_type="dataframe")
a[a1] a[a2] a[a3]
0 1.0 0.0 0.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
3 1.0 0.0 0.0
4 0.0 1.0 0.0
5 0.0 0.0 1.0
6 1.0 0.0 0.0
7 0.0 1.0 0.0
8 0.0 0.0 1.0


For EEG data, the “means of the data at each level of the categorical variable” are the time-domain average ERPs. In the sample data, the categorical stimulus variable has three levels: standard, target, and cal for the 10 \(\mu\mathsf{V}\) calibration square wave.

We can reach into one cell of the FitGrid at time = 0 and channel = MiPa and pull out the design matrix. The three column indicator coding is the same as the demo_data example except for the column labels and hundreds observations instead of 9.

lmg_0_stim[0, "MiPa"].model.exog_names.unstack(-2)
MiPa
0 1 2
time_ms
0 0 stim[cal] stim[standard] stim[target]


lmg_0_stim[0, "MiPa"].model.exog.unstack(-1)

Out:

/home/runner/work/fitgrid-dev/fitgrid-dev/fitgrid/fitgrid.py:302: FutureWarning: inplace is deprecated and will be removed in a future version.
  temp.index.set_levels(self.epoch_index, level=i, inplace=True)
MiPa
0 1 2
time_ms epoch_id
0 0 0.0 0.0 1.0
1 0.0 0.0 1.0
2 0.0 0.0 1.0
3 0.0 0.0 1.0
4 0.0 0.0 1.0
... ... ... ...
596 1.0 0.0 0.0
597 1.0 0.0 0.0
598 1.0 0.0 0.0
599 1.0 0.0 0.0
600 1.0 0.0 0.0

542 rows × 3 columns



Total running time of the script: ( 0 minutes 10.492 seconds)

Gallery generated by Sphinx-Gallery