Average ERPs with fitgrid¶

For designed EEG experiments with categorical variables, a useful range of Hunt-Dawson average ERPs and ERP effects fall out of ordinary least squares regression modeling by selecting the appropriate categorical predictor variable coding with the patsy formula language. The results are identical to the addition, subtraction of average ERP waveforms without programming ad hoc algebraic manipulations.

Prepare and load epochs¶

import pandas as pd
import fitgrid as fg
from fitgrid import DATA_DIR, sample_data

sample_data.get_file("sub000p3.ms1500.epochs.feather")
p3_epochs_df = pd.read_feather(DATA_DIR / "sub000p3.ms1500.epochs.feather")

# select 3 types of stimulus event: standards, targets, and bioamp calibration triggers
p3_epochs_df = p3_epochs_df.query("stim in ['standard', 'target', 'cal']")

# look up the data QC flags and select the good epochs
good_epochs = p3_epochs_df.query("match_time == 0 and log_flags == 0")[
    "epoch_id"
]
p3_epochs_df = p3_epochs_df.query("epoch_id in @good_epochs")

# rename the time stamp column
p3_epochs_df.rename(columns={"match_time": "time_ms"}, inplace=True)

# select columns of interest for modeling
indices = ["epoch_id", "time_ms"]
predictors = ["stim"]  # categorical with 2 levels: standard, target
channels = ["MiPf", "MiCe", "MiPa", "MiOc"]  # midline electrodes
p3_epochs_df = p3_epochs_df[indices + predictors + channels]

# set the epoch and time column index for fg.Epochs
p3_epochs_df.set_index(["epoch_id", "time_ms"], inplace=True)

# "baseline", i.e., center each epoch on the 200 ms pre-stimulus interval
centered = []
for epoch_id, vals in p3_epochs_df.groupby("epoch_id"):
    centered.append(
        vals[channels]
        - vals[channels].query("time_ms >= -200 and time_ms < 0").mean()
    )
p3_epochs_df[channels] = pd.concat(centered)

# load data into fitgrid.Epochs
p3_epochs_fg = fg.epochs_from_dataframe(
    p3_epochs_df, epoch_id="epoch_id", time="time_ms", channels=channels
)

average ERPs by condition: \(\sim \mathsf{0 + stim}\)¶

Supressing the intercept term in the patsy model formula triggers full-rank dummy (indicator) coding of the two-level categorical variable. The estimated coefficients are identical to the average ERPs in each condition. The minimal design matrix illustrates dummy coding for one categorical variable with two levels.

lmg_0_stim = fg.lm(p3_epochs_fg, RHS="0 + stim")

Out:

  0%|          | 0/375 [00:00<?, ?it/s]
  1%|1         | 5/375 [00:00<00:08, 44.23it/s]
  3%|2         | 11/375 [00:00<00:07, 49.20it/s]
  5%|4         | 17/375 [00:00<00:07, 50.74it/s]
  6%|6         | 23/375 [00:00<00:06, 51.52it/s]
  8%|7         | 29/375 [00:00<00:06, 51.65it/s]
  9%|9         | 35/375 [00:00<00:06, 52.20it/s]
 11%|#         | 41/375 [00:00<00:06, 52.57it/s]
 13%|#2        | 47/375 [00:00<00:06, 52.76it/s]
 14%|#4        | 53/375 [00:01<00:06, 52.98it/s]
 16%|#5        | 59/375 [00:01<00:05, 53.12it/s]
 17%|#7        | 65/375 [00:01<00:05, 52.94it/s]
 19%|#8        | 71/375 [00:01<00:05, 53.07it/s]
 21%|##        | 77/375 [00:01<00:05, 53.22it/s]
 22%|##2       | 83/375 [00:01<00:05, 53.14it/s]
 24%|##3       | 89/375 [00:01<00:05, 53.13it/s]
 25%|##5       | 95/375 [00:01<00:05, 52.95it/s]
 27%|##6       | 101/375 [00:01<00:05, 52.96it/s]
 29%|##8       | 107/375 [00:02<00:05, 52.93it/s]
 30%|###       | 113/375 [00:02<00:04, 52.98it/s]
 32%|###1      | 119/375 [00:02<00:04, 53.11it/s]
 33%|###3      | 125/375 [00:02<00:04, 53.08it/s]
 35%|###4      | 131/375 [00:02<00:04, 53.23it/s]
 37%|###6      | 137/375 [00:02<00:04, 53.09it/s]
 38%|###8      | 143/375 [00:02<00:04, 53.08it/s]
 40%|###9      | 149/375 [00:02<00:04, 53.18it/s]
 41%|####1     | 155/375 [00:02<00:04, 53.06it/s]
 43%|####2     | 161/375 [00:03<00:04, 53.10it/s]
 45%|####4     | 167/375 [00:03<00:03, 53.04it/s]
 46%|####6     | 173/375 [00:03<00:03, 53.13it/s]
 48%|####7     | 179/375 [00:03<00:03, 53.07it/s]
 49%|####9     | 185/375 [00:03<00:03, 53.20it/s]
 51%|#####     | 191/375 [00:03<00:03, 53.34it/s]
 53%|#####2    | 197/375 [00:03<00:03, 53.29it/s]
 54%|#####4    | 203/375 [00:03<00:03, 53.34it/s]
 56%|#####5    | 209/375 [00:03<00:03, 53.22it/s]
 57%|#####7    | 215/375 [00:04<00:03, 53.11it/s]
 59%|#####8    | 221/375 [00:04<00:02, 53.00it/s]
 61%|######    | 227/375 [00:04<00:02, 53.03it/s]
 62%|######2   | 233/375 [00:04<00:02, 52.92it/s]
 64%|######3   | 239/375 [00:04<00:02, 52.71it/s]
 65%|######5   | 245/375 [00:04<00:02, 52.83it/s]
 67%|######6   | 251/375 [00:04<00:02, 52.84it/s]
 69%|######8   | 257/375 [00:04<00:02, 52.98it/s]
 70%|#######   | 263/375 [00:04<00:02, 52.96it/s]
 72%|#######1  | 269/375 [00:05<00:02, 52.96it/s]
 73%|#######3  | 275/375 [00:05<00:01, 52.99it/s]
 75%|#######4  | 281/375 [00:05<00:01, 53.07it/s]
 77%|#######6  | 287/375 [00:05<00:01, 53.05it/s]
 78%|#######8  | 293/375 [00:05<00:01, 52.76it/s]
 80%|#######9  | 299/375 [00:05<00:01, 52.35it/s]
 81%|########1 | 305/375 [00:05<00:01, 52.34it/s]
 83%|########2 | 311/375 [00:05<00:01, 52.49it/s]
 85%|########4 | 317/375 [00:06<00:01, 52.32it/s]
 86%|########6 | 323/375 [00:06<00:00, 52.41it/s]
 88%|########7 | 329/375 [00:06<00:00, 52.58it/s]
 89%|########9 | 335/375 [00:06<00:00, 52.56it/s]
 91%|######### | 341/375 [00:06<00:00, 52.66it/s]
 93%|#########2| 347/375 [00:06<00:00, 52.61it/s]
 94%|#########4| 353/375 [00:06<00:00, 52.79it/s]
 96%|#########5| 359/375 [00:06<00:00, 52.64it/s]
 97%|#########7| 365/375 [00:06<00:00, 50.97it/s]
 99%|#########8| 371/375 [00:07<00:00, 51.32it/s]
100%|##########| 375/375 [00:07<00:00, 52.67it/s]

Parameter estimates = Smith & Kutas (2015) regression ERPs

beta_hats = lmg_0_stim.params
beta_hats

		MiPf	MiCe	MiPa	MiOc
time_ms
-748	stim[cal]	0.025192	0.023050	0.021799	-0.041392
	stim[standard]	2.235214	0.628217	2.202564	2.075095
	stim[target]	0.890926	0.909482	3.704045	2.489043
-744	stim[cal]	0.013173	0.029130	-0.018333	-0.015801
-744	stim[standard]	2.304857	0.758107	2.443701	2.423396
...	...	...	...	...	...
744	stim[standard]	-2.059429	0.405358	-0.481777	-1.198918
744	stim[target]	1.159444	-3.025486	2.555300	5.525375
748	stim[cal]	0.037212	0.052357	0.018305	0.012487
	stim[standard]	-1.866571	0.507221	-0.323304	-1.001864
	stim[target]	1.733518	-2.055181	3.420878	5.769298

1125 rows × 4 columns

Parameter estimate standard errors

bses = lmg_0_stim.bse
bses

		MiPf	MiCe	MiPa	MiOc
time_ms
-748	stim[cal]	1.717181	0.850727	0.886257	0.724336
	stim[standard]	1.480024	0.733234	0.763858	0.624299
	stim[target]	3.370163	1.669648	1.739380	1.421591
-744	stim[cal]	1.750222	0.856846	0.901908	0.729748
-744	stim[standard]	1.508502	0.738508	0.777347	0.628964
...	...	...	...	...	...
744	stim[standard]	0.616983	0.704265	0.760638	0.676497
744	stim[target]	1.404932	1.603682	1.732049	1.540452
748	stim[cal]	0.718129	0.818604	0.881669	0.780283
	stim[standard]	0.618949	0.705548	0.759903	0.672520
	stim[target]	1.409409	1.606603	1.730375	1.531395

1125 rows × 4 columns

Visualize parameter estimates +/- standard error

from matplotlib import pyplot as plt

# label index columns for pandas groupby
for attr_df in [beta_hats, bses]:
    attr_df.index.set_names(["time_ms", "beta_hats"], inplace=True)

for beta_hat, vals in beta_hats.groupby("beta_hats"):
    vals.reset_index('beta_hats', inplace=True, drop=True)
    times = vals.index.to_numpy()
    bse = bses.query("beta_hats==@beta_hat")

    fig, ax = plt.subplots(figsize=(8, 4))
    ax.set_title(beta_hat)
    ax.set(
        xlabel="Time (ms)",
        xlim=(times[0], times[-1]),
        ylabel=r"$\mu$V",
        ylim=(-15, 15),
    )
    ax.axhline(0, color="lightgray", lw=1)
    ax.axvline(0, color="gray", lw=1)

    for jdx, chan in enumerate(vals.columns):
        ax.plot(times, vals[chan], label=chan)
        ax.fill_between(
            times, vals[chan] - bse[chan], vals[chan] + bse[chan], alpha=0.2
        )
    ax.legend(loc=(1.05, 0.5))

Why this works¶

Here is a small (“right hand side”) design matrix for 9 observations of a categorical variable with 3 levels. There is no intercept (constant) and when one of the 3 regressors is 1, the others are 0. The \(\hat{\beta}\) weights that minimize overall error are the means of the data at each level of the categorical variable.

from patsy import demo_data, dmatrix

cat_2 = demo_data("a", nlevels=3, min_rows=8)
dmatrix("0 + a", data=cat_2, return_type="dataframe")

	a[a1]	a[a2]	a[a3]
0	1.0	0.0	0.0
1	0.0	1.0	0.0
2	0.0	0.0	1.0
3	1.0	0.0	0.0
4	0.0	1.0	0.0
5	0.0	0.0	1.0
6	1.0	0.0	0.0
7	0.0	1.0	0.0
8	0.0	0.0	1.0

For EEG data, the “means of the data at each level of the categorical variable” are the time-domain average ERPs. In the sample data, the categorical stimulus variable has three levels: standard, target, and cal for the 10 \(\mu\mathsf{V}\) calibration square wave.

We can reach into one cell of the FitGrid at time = 0 and channel = MiPa and pull out the design matrix. The three column indicator coding is the same as the demo_data example except for the column labels and hundreds observations instead of 9.

lmg_0_stim[0, "MiPa"].model.exog_names.unstack(-2)

		MiPa
		0	1	2
time_ms
0	0	stim[cal]	stim[standard]	stim[target]

lmg_0_stim[0, "MiPa"].model.exog.unstack(-1)

Out:

/home/runner/work/fitgrid-dev/fitgrid-dev/fitgrid/fitgrid.py:302: FutureWarning: inplace is deprecated and will be removed in a future version.
  temp.index.set_levels(self.epoch_index, level=i, inplace=True)

		MiPa
		0	1	2
time_ms	epoch_id
0	0	0.0	0.0	1.0
	1	0.0	0.0	1.0
	2	0.0	0.0	1.0
	3	0.0	0.0	1.0
	4	0.0	0.0	1.0
	...	...	...	...
	596	1.0	0.0	0.0
	597	1.0	0.0	0.0
	598	1.0	0.0	0.0
	599	1.0	0.0	0.0
	600	1.0	0.0	0.0

542 rows × 3 columns

Total running time of the script: ( 0 minutes 10.492 seconds)

Gallery generated by Sphinx-Gallery