This is a list of tools available in fitgrid.

Data Ingestion

Functions that read epochs tables and create Epochs and load FitGrid objects.

fitgrid.epochs_from_dataframe(dataframe, time, epoch_id, channels)[source]

Construct Epochs object from a Pandas DataFrame epochs table.

The DataFrame should contain columns with names defined by epoch_id and time as index columns.

Parameters
  • dataframe (pandas DataFrame) – a pandas DataFrame object

  • time (str) – time column name

  • epoch_id (str) – epoch identifier column name

  • channels (list of str) – list of string channel names

Returns

epochs – an Epochs object with the data

Return type

Epochs

fitgrid.epochs_from_hdf(filename, key, time, epoch_id, channels)[source]

Construct Epochs object from an HDF5 file containing an epochs table.

The HDF5 file should contain columns with names defined by epoch_id and time either as index columns or as regular columns. This is added as a convenience, in general, input epochs tables should contain these columns in the index.

Parameters
  • filename (str) – HDF5 file name

  • key (str) – group identifier for the dataset when HDF5 file contains more than one

  • time (str) – time column name

  • epoch_id (str) – epoch identifier column name

  • channels (list of str) – list of string channel names

Returns

epochs – an Epochs object with the data

Return type

Epochs

fitgrid.load_grid(filename)[source]

Load a FitGrid object from file (created by running grid.save).

Parameters

filename (str) – indicates file to load from

Returns

grid – loaded FitGrid object

Return type

FitGrid

Data Simulation

fitgrid has a built-in function that generates data and creates Epochs:

fitgrid.generate(n_epochs=10, n_samples=100, n_categories=2, n_channels=32, time='time', epoch_id='epoch_id', seed=None, return_type='epochs')[source]

Return Epochs object or pandas.DataFrame with fake EEG data.

Parameters
  • n_epochs (int) – number of epochs per category to be generated

  • n_samples (int) – number of samples in a single epochs

  • n_categories (int) – number of levels of the categorical variable

  • n_channels (int) – number of time series representing EEG channels

  • time (str, defaults to defaults.TIME) – time column name

  • epoch_id (str, defaults to defaults.EPOCH_ID) – epoch identifier column name

  • seed=None ({None, int, array_like}, optional) – Random number generation seed. Default=None lets data vary from run to run. Set seed to a 32-bit unsigned integer to generate the same fake data run to run. See numpy.random.RandomState for details.

  • return_type (str {epochs, dataframe}) – return fitgrid.Epochs or the fitgrid.Epochs.table dataframe

Returns

epochs – Epochs object or just the data

Return type

fitgrid.Epochs or pandas.DataFrame

Notes

n_epochs and n_categories interact in the sense that n_epochs epochs are generated for each level of the categorical variable. In other words, the true number of epochs in the generated data is equal to n_epochs * n_categories.

For example, the default n_epochs = 10 and n_categories = 2 produces 20 epochs, 10 per category.

Epochs methods

Models and plotting.

fitgrid.epochs.Epochs.plot_averages(self, channels=None, negative_up=True)

Plot grand mean averages for each channel, negative up by default.

Parameters
  • channels (list of str, optional, defaults to all channels) – list of channel names to plot the averages

  • negative_up (bool, optional, default True) – by convention, ERPs are plotted negative voltage up

Returns

  • fig (matplotlib.figure.Figure) – figure containing plots

  • axes (numpy.ndarray of matplotlib.axes.Axes) – axes objects

Model running

fitgrid.lm(epochs, LHS=None, RHS=None, parallel=False, n_cores=4, quiet=False, eval_env=4)[source]

Run ordinary least squares linear regression on the epochs.

Parameters
  • epochs (Epochs) – epochs object on which regression is to be run

  • LHS (list of str, optional, defaults to all channels) – list of channels for the left hand side of the regression formula

  • RHS (str) – right hand side of the regression formula

  • parallel (bool, defaults to False) – change to True to run in parallel

  • n_cores (int, defaults to 4) – number of processes to use for computation

  • quiet (bool, defaults to False) – set to True to disable fitting progress bar

  • eval_env (int or patsy.EvalEnvironment, defaults to 4) – environment to use for evaluating patsy formulas, see patsy docs

Returns

grid – LMFitGrid object containing the results of the regression

Return type

LMFitGrid

fitgrid.lmer(epochs, LHS=None, RHS=None, family='gaussian', conf_int='Wald', factors=None, permute=None, ordered=False, REML=True, parallel=False, n_cores=4, quiet=False)[source]

Fit lme4 linear mixed model by interfacing with R.

Parameters
  • epochs (Epochs) – epochs object on which lmer is to be run

  • LHS (list of str, optional, defaults to all channels) – list of channels for the left hand side of the lmer formula

  • RHS (str) – right hand side of the lmer formula

  • family (str, defaults to ‘gaussian’) – distribution link function to use

  • conf_int (str, defaults to ‘Wald’)

  • factors (dict, optional) – Keys should be column names in data to treat as factors. Values should either be a list containing unique variable levels if dummy-coding or polynomial coding is desired. Otherwise values should themselves be dictionaries with unique variable levels as keys and desired contrast values (as specified in R!) as keys.

  • permute (int, defaults to None) – if non-zero, computes parameter significance tests by permuting test stastics rather than parametrically. Permutation is done by shuffling observations within clusters to respect random effects structure of data.

  • ordered (bool, defaults to False) – whether factors should be treated as ordered polynomial contrasts; this will parameterize a model with K-1 orthogonal polynomial regressors beginning with a linear contrast based on the factor order provided

  • REML (bool, defaults to True) – change to False to use ML estimation

  • parallel (bool, defaults to False) – change to True to run in parallel

  • n_cores (int, defaults to 4) – number of processes to use for computation

  • quiet (bool, defaults to False) – set to True to disable fitting progress bar

Returns

grid – LMERFitGrid object containing the results of lmer fitting

Return type

LMERFitGrid

fitgrid.run_model(epochs, function, channels=None, parallel=False, n_cores=4, quiet=False)[source]

Run an arbitrary model on the epochs.

Parameters
  • epochs (Epochs) – the epochs object on which the model is to be run

  • function (Python function) – function that runs a model, see Notes below for details

  • channels (list of str) – list of channels to serve as dependent variables

  • parallel (bool, defaults to False) – set to True in order to run in parallel

  • n_cores (int, defaults to 4) – number of processes to run in parallel

  • quiet (bool, defaults to False) – set to True to disable progress bar display

Returns

grid – a FitGrid object containing the results

Return type

FitGrid

Notes

The function should take two parameters, data and channel, run some model on the data, and return an object containing the results. data will be a snapshot across epochs at a single timepoint, containing all channels of interest. channel is the name of the target variable that the function runs the model against (uses it as the dependent variable).

Examples

Here’s an example of a function that can be passed to run_model:

def regression(data, channel):
    formula = channel + ' ~ continuous + categorical'
    return ols(formula, data).fit()

FitGrid methods

fitgrid.fitgrid.FitGrid.save(self, filename)

Save FitGrid object to file (reload with fitgrid.load_grid).

Parameters

filename (str) – file name to use

LMFitGrid methods

Plotting and statistics.

fitgrid.fitgrid.LMFitGrid.influential_epochs(self, top=None)

Return dataframe with top influential epochs ranked by Cook’s-D.

Parameters

top (int, optional, default None) – how many top epochs to return, all epochs by default

Returns

top_epochs – dataframe with epoch_id as index and aggregated Cook’s-D as values

Return type

pandas DataFrame

Notes

Cook’s distance is aggregated by simple averaging across time and channels.

fitgrid.fitgrid.LMFitGrid.plot_betas(self, legend_on_bottom=False)

Plot betas of the model, one plot per channel, overplotting betas.

Parameters

legend_on_bottom (bool, defaults to False) – set to True to plot single legend below all channel plots

Returns

  • fig (matplotlib.figure.Figure) – figure containing plots

  • axes (numpy.ndarray of matplotlib.axes.Axes) – axes objects

fitgrid.fitgrid.LMFitGrid.plot_adj_rsquared(self)

Plot adjusted \(R^2\) as a heatmap with marginal bar and line.

Returns

  • fig (matplotlib.figure.Figure) – figure containing plots

  • gs (matplotlib.gridspec.GridSpec) – grid specification that determines locations and sizes of subplots

  • bar, heatmap, colorbar, line (matplotlib.axes._subplots.AxesSubplot) – subplot objects

Utilities

model fit summaries

fitgrid.utils.summary.summarize(epochs_fg, modeler, LHS, RHS, parallel=True, n_cores=4, **kwargs)[source]

Fit the data with one or more model formulas and return summary information.

Convenience wrapper, useful for keeping memory use manageable when gathering betas and fit measures for a stack of models.

Parameters
  • epochs_fg (fitgrid.epochs.Epochs) – as returned by fitgrid.epochs_from_dataframe() or fitgrid.from_hdf(), NOT a pandas.DataFrame.

  • modeler ({‘lm’, ‘lmer’}) – class of model to fit, lm for OLS, lmer for linear mixed-effects. Note: the RHS formula language must match the modeler.

  • LHS (list of str) – the data columns to model

  • RHS (model formula or list of model formulas to fit) – see the Python package patsy docs for lm formula langauge and the R library lme4 docs for the lmer formula langauge.

  • parallel (bool)

  • n_cores (int) – number of cores to use. See what works, but golden rule if running on a shared machine.

  • **kwargs (key=value arguments passed to the modeler, optional)

Returns

summary_df – indexed by timestamp, model_formula, beta, and key, where the keys are ll.l_ci, uu.u_ci, AIC, DF, Estimate, P-val, SE, T-stat, has_warning, logLike.

Return type

pandas.DataFrame

Examples

>>> lm_formulas = [
    '1 + fixed_a + fixed_b + fixed_a:fixed_b',
    '1 + fixed_a + fixed_b',
    '1 + fixed_a,
    '1 + fixed_b,
    '1',
]
>>> lm_summary_df = fitgrid.utils.summarize(
    epochs_fg,
    'lm',
    LHS=['MiPf', 'MiCe', 'MiPa', 'MiOc'],
    RHS=lmer_formulas,
    parallel=True,
    n_cores=4
)
>>> lmer_formulas = [
    '1 + fixed_a + (1 + fixed_a | random_a) + (1 | random_b)',
    '1 + fixed_a + (1 | random_a) + (1 | random_b)',
    '1 + fixed_a + (1 | random_a)',
]
>>> lmer_summary_df = fitgrid.utils.summarize(
    epochs_fg,
    'lmer',
    LHS=['MiPf', 'MiCe', 'MiPa', 'MiOc'],
    RHS=lmer_formulas,
    parallel=True,
    n_cores=12,
    REML=False
)
fitgrid.utils.summary.plot_betas(summary_df, LHS, alpha=0.05, fdr=None, figsize=None, s=None, df_func=None, **kwargs)[source]

Plot model parameter estimates for each data column in LHS

Parameters
  • summary_df (pd.DataFrame) – as returned by fitgrid.utils.summary.summarize

  • LHS (list of str) – column names of the data fitgrid.fitgrid docs

  • alpha (float) – alpha level for false discovery rate correction

  • fdr (str {None, ‘BY’, ‘BH’}) – Add markers for FDR adjusted significant \(p\)-values. BY is Benjamini and Yekatuli, BH is Benjamini and Hochberg, None supresses the markers.

  • df_func ({None, function}) – plot function(degrees of freedom), e.g., np.log10, lambda x: x

  • s (float) – scatterplot marker size for BH and lmer decorations

  • kwargs (dict) – keyword args passed to pyplot.subplots()

Returns

figs

Return type

list

fitgrid.utils.summary.plot_AICmin_deltas(summary_df, figsize=None, gridspec_kw=None, **kwargs)[source]

plot FitGrid min delta AICs and fitter warnings

Thresholds of AIC_min delta at 2, 4, 7, 10 are from Burnham & Anderson 2004, see Notes.

Parameters
  • summary_df (pd.DataFrame) – as returned by fitgrid.utils.summary.summarize

  • figsize (2-ple) – pyplot.figure figure size parameter

  • gridspec_kw (dict) – matplotlib.gridspec key : value parameters

  • kwargs (dict) – keyword args passed to plt.subplots(…)

Returns

f, axs

Return type

matplotlib.pyplot.Figure

Notes

[BurAnd2004] p. 270-271. Where \(AIC_{min}\) is the lowest AIC value for “a set of a priori candidate models well-supported by the underlying science \(g_{i}, i = 1, 2, ..., R)\)”,

\[\Delta_{i} = AIC_{i} - AIC_{min}\]

“is the information loss experienced if we are using fitted model \(g_{i}\) rather than the best model, \(g_{min}\) for inference.” …

“Some simple rules of thumb are often useful in assessing the relative merits of models in the set: Models having \(\Delta_{i} <= 2\) have substantial support (evidence), those in which \(\Delta_{i} 4 <= 7\) have considerably less support, and models having \(\Delta_{i} > 10\) have essentially no support.”

lm diagnostics

fitgrid.utils.lm.get_vifs(epochs, RHS, quiet=False)[source]
fitgrid.utils.lm.list_diagnostics()[source]

Display statsmodels diagnostics implemented in fitgrid.utils.lm

fitgrid.utils.lm.get_diagnostic(lm_grid, diagnostic, do_nobs_loop=False)[source]

Fetch statsmodels diagnostic as a Time x Channel dataframe

statsmodels implements a variety of data and model diagnostic measures. For some, it also computes a version of a recommended critical value or \(p\)-value. Use these at your own risk after careful study of the statsmodels source code. For details visit statsmodels.stats.outliers_influence.OLSInfluence.html

For a catalog of the measures available for fitgrid.lm() run this in Python

>>>fitgrid.utils.lm.list_diagnostics()

Warning

Data diagnostics can be very large and very slow, see Notes for details.

  • By default all values of the diagnostics are computed, this dataframe can be pruned with fitgrid.utils.lm.filter_diagnostic() function.

  • By default slow diagnostics are not computed, this can be forced by setting do_nobs_loop=True.

Parameters
  • lm_grid (fitgrid.LMFitGrid) – As returned by fitgrid.lm().

  • diagnostic (string) – As implemented in statsmodels, e.g., “cooks_distance”, “dffits_internal”, “est_std”, “dfbetas”.

  • do_nobs_loop (bool) – True forces slow leave-one-observation-out model refitting.

Returns

  • diagnostic_df (pandas.DataFrame) – Channels are in columns. Model measures are row indexed by time; data measures add an epoch row index; parameter measures add a parameter row index.

  • sm_1_df (pandas.DataFrame) – The supplemenatary values statsmodels returns, or None, same shape as diagnostic_df.

Notes

  • Size: diagnostic_df values for data measures like cooks_distance and hat_matrix_diagonal are the size of the original data plus a row index and for some data measures like dfbetas, they are the size of the data multiplied by the number of regressors in the model.

  • Speed: Leave-one-observation-out (LOOO) model refitting takes as long as it takes to fit one model multiplied by the number of observations. This can be intractable for large datasets. Diagnostic measures calculated from the original fit like cooks_distance and dffits_internal are tractable even for large data sets.

Examples

# fake data
epochs_fg = fitgrid.generate()
lm_grid = fitgrid.lm(
    epochs_fg,
    LHS=epochs_fg.channels,
    RHS='continuous + categorical',
    parallel=True,
    n_cores=4,
)

# data diagnostic, one dataframe with the values
ess_press, _ = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'ess_press'
)

# Cook's D dataframe AND the p-values statsmodels computes
cooks_Ds, sm_pvals = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'cooks_distance'
)

# this fails because it requires LOOO loop
dfbetas_df, _  = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'dfbetas'
)

# this succeeds by forcing LOOO loop calculation
dfbetas_df, _  = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'dfbetas',
    do_nobs_loop=True
)
fitgrid.utils.lm.filter_diagnostic(diagnostic_df, how, bound_0, bound_1=None, format='long')[source]

Select a subset of a fitgrid statsmodels diagnostic dataframe by value.

Use this to identify time ponts, epochs, parameters, channels with outlying or potentially influential data.

Parameters
  • diagnostic_df (pandas.DataFrame) – As returned by fitgrid.utils.lm.get_diagnostic()

  • how ({‘above’, ‘below’, ‘inside’, ‘outside’}) – slice diagnostic_df above or below bound_0 or inside or outside the closed interval (bound_0, bound_1).

  • bound_0 (scalar or array-like) – bound_0 is the mandatory boundary for all how. See pandas.DataFrame.gt and pandas.DataFrame.lt documents for binary comparisons with dataframes.

  • bound_1 (scalar or array-like) – bound_1 is the mandatory upper bound for `how=”inside” and how=”outside”.

  • format ({‘long’, ‘wide’}) – The long format pivots the channel columns into a row index and returns just those times, (epochs, parameters), channels that pass the filter. The wide format returns filtered_df with the same shape as diagnostic_df, those datapoints that pass the filter in their original row, column location, nans elsewhere.

Returns

selected_df

Return type

pandas.DataFrame

lmer diagnostics

fitgrid.utils.lmer.get_lmer_dfbetas(epochs, factor, **kwargs)[source]

Fit lmers leaving out factor levels one by one, compute DBETAS.

Parameters
  • epochs (Epochs) – Epochs object

  • factor (str) – column name of the factor of interest

  • **kwargs – keyword arguments to pass on to fitgrid.lmer, like RHS

Returns

dfbetas – dataframe containing DFBETAS values

Return type

pandas.DataFrame

Examples

Example calculation showing how to pass in model fitting parameters:

dfbetas = fitgrid.utils.lmer.get_lmer_dfbetas(
    epochs=epochs,
    factor='subject_id',
    RHS='x + (x|a)
)

Notes

DFBETAS is computed according to the following formula [NieGroPel2012]:

\[DFBETAS_{ij} = \frac{\hat{\gamma}_i - \hat{\gamma}_{i(-j)}}{se\left(\hat{\gamma}_{i(-j)}\right)}\]

for parameter \(i\) and level \(j\) of factor.