This is a list of tools available in fitgrid.

Data Ingestion¶

Functions that read epochs tables and create Epochs and load FitGrid objects.

fitgrid.epochs_from_dataframe(dataframe, time, epoch_id, channels)[source]

Construct Epochs object from a Pandas DataFrame epochs table.

The DataFrame should contain columns with names defined by epoch_id and time as index columns.

Parameters

dataframe (pandas DataFrame) – a pandas DataFrame object
time (str) – time column name
epoch_id (str) – epoch identifier column name
channels (list of str) – list of string channel names

Returns

epochs – an Epochs object with the data

Return type

Epochs

fitgrid.epochs_from_hdf(filename, key, time, epoch_id, channels)[source]

Construct Epochs object from an HDF5 file containing an epochs table.

The HDF5 file should contain columns with names defined by epoch_id and time either as index columns or as regular columns. This is added as a convenience, in general, input epochs tables should contain these columns in the index.

Parameters

filename (str) – HDF5 file name
key (str) – group identifier for the dataset when HDF5 file contains more than one
time (str) – time column name
epoch_id (str) – epoch identifier column name
channels (list of str) – list of string channel names

Returns

epochs – an Epochs object with the data

Return type

Epochs

fitgrid.load_grid(filename)[source]

Load a FitGrid object from file (created by running grid.save).

Parameters: filename (str) – indicates file to load from
Returns: grid – loaded FitGrid object
Return type: FitGrid

Data Simulation¶

fitgrid has a built-in function that generates data and creates Epochs:

fitgrid.generate(n_epochs=10, n_samples=100, n_categories=2, n_channels=32, time='time', epoch_id='epoch_id', seed=None, return_type='epochs')[source]

Return Epochs object or pandas.DataFrame with fake EEG data.

Parameters

n_epochs (int) – number of epochs per category to be generated
n_samples (int) – number of samples in a single epochs
n_categories (int) – number of levels of the categorical variable
n_channels (int) – number of time series representing EEG channels
time (str, defaults to defaults.TIME) – time column name
epoch_id (str, defaults to defaults.EPOCH_ID) – epoch identifier column name
seed=None ({None, int, array_like}, optional) – Random number generation seed. Default=None lets data vary from run to run. Set seed to a 32-bit unsigned integer to generate the same fake data run to run. See numpy.random.RandomState for details.
return_type (str {epochs, dataframe}) – return fitgrid.Epochs or the fitgrid.Epochs.table dataframe

Returns

epochs – Epochs object or just the data

Return type

fitgrid.Epochs or pandas.DataFrame

Notes

n_epochs and n_categories interact in the sense that n_epochs epochs are generated for each level of the categorical variable. In other words, the true number of epochs in the generated data is equal to n_epochs * n_categories.

For example, the default n_epochs = 10 and n_categories = 2 produces 20 epochs, 10 per category.

`Epochs` methods¶

Models and plotting.

fitgrid.epochs.Epochs.plot_averages(self, channels=None, negative_up=True)

Plot grand mean averages for each channel, negative up by default.

Parameters

channels (list of str, optional, defaults to all channels) – list of channel names to plot the averages
negative_up (bool, optional, default True) – by convention, ERPs are plotted negative voltage up

Returns

fig (matplotlib.figure.Figure) – figure containing plots
axes (numpy.ndarray of matplotlib.axes.Axes) – axes objects

Model running¶

fitgrid.lm(epochs, LHS=None, RHS=None, parallel=False, n_cores=4, quiet=False, eval_env=4)[source]

Run ordinary least squares linear regression on the epochs.

Parameters

epochs (Epochs) – epochs object on which regression is to be run
LHS (list of str, optional, defaults to all channels) – list of channels for the left hand side of the regression formula
RHS (str) – right hand side of the regression formula
parallel (bool, defaults to False) – change to True to run in parallel
n_cores (int, defaults to 4) – number of processes to use for computation
quiet (bool, defaults to False) – set to True to disable fitting progress bar
eval_env (int or patsy.EvalEnvironment, defaults to 4) – environment to use for evaluating patsy formulas, see patsy docs

Returns

grid – LMFitGrid object containing the results of the regression

Return type

LMFitGrid

fitgrid.lmer(epochs, LHS=None, RHS=None, family='gaussian', conf_int='Wald', factors=None, permute=None, ordered=False, REML=True, parallel=False, n_cores=4, quiet=False)[source]

Fit lme4 linear mixed model by interfacing with R.

Parameters

epochs (Epochs) – epochs object on which lmer is to be run
LHS (list of str, optional, defaults to all channels) – list of channels for the left hand side of the lmer formula
RHS (str) – right hand side of the lmer formula
family (str, defaults to ‘gaussian’) – distribution link function to use
conf_int (str, defaults to ‘Wald’)
factors (dict, optional) – Keys should be column names in data to treat as factors. Values should either be a list containing unique variable levels if dummy-coding or polynomial coding is desired. Otherwise values should themselves be dictionaries with unique variable levels as keys and desired contrast values (as specified in R!) as keys.
permute (int, defaults to None) – if non-zero, computes parameter significance tests by permuting test stastics rather than parametrically. Permutation is done by shuffling observations within clusters to respect random effects structure of data.
ordered (bool, defaults to False) – whether factors should be treated as ordered polynomial contrasts; this will parameterize a model with K-1 orthogonal polynomial regressors beginning with a linear contrast based on the factor order provided
REML (bool, defaults to True) – change to False to use ML estimation
parallel (bool, defaults to False) – change to True to run in parallel
n_cores (int, defaults to 4) – number of processes to use for computation
quiet (bool, defaults to False) – set to True to disable fitting progress bar

Returns

grid – LMERFitGrid object containing the results of lmer fitting

Return type

LMERFitGrid

fitgrid.run_model(epochs, function, channels=None, parallel=False, n_cores=4, quiet=False)[source]

Run an arbitrary model on the epochs.

Parameters

epochs (Epochs) – the epochs object on which the model is to be run
function (Python function) – function that runs a model, see Notes below for details
channels (list of str) – list of channels to serve as dependent variables
parallel (bool, defaults to False) – set to True in order to run in parallel
n_cores (int, defaults to 4) – number of processes to run in parallel
quiet (bool, defaults to False) – set to True to disable progress bar display

Returns

grid – a FitGrid object containing the results

Return type

FitGrid

Notes

The function should take two parameters, data and channel, run some model on the data, and return an object containing the results. data will be a snapshot across epochs at a single timepoint, containing all channels of interest. channel is the name of the target variable that the function runs the model against (uses it as the dependent variable).

Examples

Here’s an example of a function that can be passed to run_model:

def regression(data, channel):
    formula = channel + ' ~ continuous + categorical'
    return ols(formula, data).fit()

`FitGrid` methods¶

fitgrid.fitgrid.FitGrid.save(self, filename)

Save FitGrid object to file (reload with fitgrid.load_grid).

Parameters: filename (str) – file name to use

`LMFitGrid` methods¶

Plotting and statistics.

fitgrid.fitgrid.LMFitGrid.influential_epochs(self, top=None)

Return dataframe with top influential epochs ranked by Cook’s-D.

Parameters: top (int, optional, default None) – how many top epochs to return, all epochs by default
Returns: top_epochs – dataframe with epoch_id as index and aggregated Cook’s-D as values
Return type: pandas DataFrame

Notes

Cook’s distance is aggregated by simple averaging across time and channels.

fitgrid.fitgrid.LMFitGrid.plot_betas(self, legend_on_bottom=False)

Plot betas of the model, one plot per channel, overplotting betas.

Parameters

legend_on_bottom (bool, defaults to False) – set to True to plot single legend below all channel plots

Returns

fig (matplotlib.figure.Figure) – figure containing plots
axes (numpy.ndarray of matplotlib.axes.Axes) – axes objects

fitgrid.fitgrid.LMFitGrid.plot_adj_rsquared(self)

Plot adjusted \(R^2\) as a heatmap with marginal bar and line.

Returns

fig (matplotlib.figure.Figure) – figure containing plots
gs (matplotlib.gridspec.GridSpec) – grid specification that determines locations and sizes of subplots
bar, heatmap, colorbar, line (matplotlib.axes._subplots.AxesSubplot) – subplot objects

Utilities¶

model fit summaries¶

fitgrid.utils.summary.summarize(epochs_fg, modeler, LHS, RHS, parallel=True, n_cores=4, **kwargs)[source]

Fit the data with one or more model formulas and return summary information.

Convenience wrapper, useful for keeping memory use manageable when gathering betas and fit measures for a stack of models.

Parameters

epochs_fg (fitgrid.epochs.Epochs) – as returned by fitgrid.epochs_from_dataframe() or fitgrid.from_hdf(), NOT a pandas.DataFrame.
modeler ({‘lm’, ‘lmer’}) – class of model to fit, lm for OLS, lmer for linear mixed-effects. Note: the RHS formula language must match the modeler.
LHS (list of str) – the data columns to model
RHS (model formula or list of model formulas to fit) – see the Python package patsy docs for lm formula langauge and the R library lme4 docs for the lmer formula langauge.
parallel (bool)
n_cores (int) – number of cores to use. See what works, but golden rule if running on a shared machine.
**kwargs (key=value arguments passed to the modeler, optional)

Returns

summary_df – indexed by timestamp, model_formula, beta, and key, where the keys are ll.l_ci, uu.u_ci, AIC, DF, Estimate, P-val, SE, T-stat, has_warning, logLike.

Return type

pandas.DataFrame

Examples

>>> lm_formulas = [
    '1 + fixed_a + fixed_b + fixed_a:fixed_b',
    '1 + fixed_a + fixed_b',
    '1 + fixed_a,
    '1 + fixed_b,
    '1',
]
>>> lm_summary_df = fitgrid.utils.summarize(
    epochs_fg,
    'lm',
    LHS=['MiPf', 'MiCe', 'MiPa', 'MiOc'],
    RHS=lmer_formulas,
    parallel=True,
    n_cores=4
)

>>> lmer_formulas = [
    '1 + fixed_a + (1 + fixed_a | random_a) + (1 | random_b)',
    '1 + fixed_a + (1 | random_a) + (1 | random_b)',
    '1 + fixed_a + (1 | random_a)',
]
>>> lmer_summary_df = fitgrid.utils.summarize(
    epochs_fg,
    'lmer',
    LHS=['MiPf', 'MiCe', 'MiPa', 'MiOc'],
    RHS=lmer_formulas,
    parallel=True,
    n_cores=12,
    REML=False
)

fitgrid.utils.summary.plot_betas(summary_df, LHS, alpha=0.05, fdr=None, figsize=None, s=None, df_func=None, **kwargs)[source]

Plot model parameter estimates for each data column in LHS

Parameters

summary_df (pd.DataFrame) – as returned by fitgrid.utils.summary.summarize
LHS (list of str) – column names of the data fitgrid.fitgrid docs
alpha (float) – alpha level for false discovery rate correction
fdr (str {None, ‘BY’, ‘BH’}) – Add markers for FDR adjusted significant \(p\)-values. BY is Benjamini and Yekatuli, BH is Benjamini and Hochberg, None supresses the markers.
df_func ({None, function}) – plot function(degrees of freedom), e.g., np.log10, lambda x: x
s (float) – scatterplot marker size for BH and lmer decorations
kwargs (dict) – keyword args passed to pyplot.subplots()

Returns

figs

Return type

list

fitgrid.utils.summary.plot_AICmin_deltas(summary_df, figsize=None, gridspec_kw=None, **kwargs)[source]

plot FitGrid min delta AICs and fitter warnings

Thresholds of AIC_min delta at 2, 4, 7, 10 are from Burnham & Anderson 2004, see Notes.

Parameters

summary_df (pd.DataFrame) – as returned by fitgrid.utils.summary.summarize
figsize (2-ple) – pyplot.figure figure size parameter
gridspec_kw (dict) – matplotlib.gridspec key : value parameters
kwargs (dict) – keyword args passed to plt.subplots(…)

Returns

f, axs

Return type

matplotlib.pyplot.Figure

Notes

[BurAnd2004] p. 270-271. Where \(AIC_{min}\) is the lowest AIC value for “a set of a priori candidate models well-supported by the underlying science \(g_{i}, i = 1, 2, ..., R)\)”,

\[\Delta_{i} = AIC_{i} - AIC_{min}\]

“is the information loss experienced if we are using fitted model \(g_{i}\) rather than the best model, \(g_{min}\) for inference.” …

“Some simple rules of thumb are often useful in assessing the relative merits of models in the set: Models having \(\Delta_{i} <= 2\) have substantial support (evidence), those in which \(\Delta_{i} 4 <= 7\) have considerably less support, and models having \(\Delta_{i} > 10\) have essentially no support.”

lm diagnostics¶

fitgrid.utils.lm.get_vifs(epochs, RHS, quiet=False)[source]

fitgrid.utils.lm.list_diagnostics()[source]: Display statsmodels diagnostics implemented in fitgrid.utils.lm

fitgrid.utils.lm.get_diagnostic(lm_grid, diagnostic, do_nobs_loop=False)[source]

Fetch statsmodels diagnostic as a Time x Channel dataframe

statsmodels implements a variety of data and model diagnostic measures. For some, it also computes a version of a recommended critical value or \(p\)-value. Use these at your own risk after careful study of the statsmodels source code. For details visit statsmodels.stats.outliers_influence.OLSInfluence.html

For a catalog of the measures available for fitgrid.lm() run this in Python

>>>fitgrid.utils.lm.list_diagnostics()

Warning

Data diagnostics can be very large and very slow, see Notes for details.

By default all values of the diagnostics are computed, this dataframe can be pruned with fitgrid.utils.lm.filter_diagnostic() function.
By default slow diagnostics are not computed, this can be forced by setting do_nobs_loop=True.

Parameters

lm_grid (fitgrid.LMFitGrid) – As returned by fitgrid.lm().
diagnostic (string) – As implemented in statsmodels, e.g., “cooks_distance”, “dffits_internal”, “est_std”, “dfbetas”.
do_nobs_loop (bool) – True forces slow leave-one-observation-out model refitting.

Returns

diagnostic_df (pandas.DataFrame) – Channels are in columns. Model measures are row indexed by time; data measures add an epoch row index; parameter measures add a parameter row index.
sm_1_df (pandas.DataFrame) – The supplemenatary values statsmodels returns, or None, same shape as diagnostic_df.

Notes

Size: diagnostic_df values for data measures like cooks_distance and hat_matrix_diagonal are the size of the original data plus a row index and for some data measures like dfbetas, they are the size of the data multiplied by the number of regressors in the model.
Speed: Leave-one-observation-out (LOOO) model refitting takes as long as it takes to fit one model multiplied by the number of observations. This can be intractable for large datasets. Diagnostic measures calculated from the original fit like cooks_distance and dffits_internal are tractable even for large data sets.

Examples

# fake data
epochs_fg = fitgrid.generate()
lm_grid = fitgrid.lm(
    epochs_fg,
    LHS=epochs_fg.channels,
    RHS='continuous + categorical',
    parallel=True,
    n_cores=4,
)

# data diagnostic, one dataframe with the values
ess_press, _ = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'ess_press'
)

# Cook's D dataframe AND the p-values statsmodels computes
cooks_Ds, sm_pvals = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'cooks_distance'
)

# this fails because it requires LOOO loop
dfbetas_df, _  = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'dfbetas'
)

# this succeeds by forcing LOOO loop calculation
dfbetas_df, _  = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'dfbetas',
    do_nobs_loop=True
)

fitgrid.utils.lm.filter_diagnostic(diagnostic_df, how, bound_0, bound_1=None, format='long')[source]

Select a subset of a fitgrid statsmodels diagnostic dataframe by value.

Use this to identify time ponts, epochs, parameters, channels with outlying or potentially influential data.

Parameters

diagnostic_df (pandas.DataFrame) – As returned by fitgrid.utils.lm.get_diagnostic()
how ({‘above’, ‘below’, ‘inside’, ‘outside’}) – slice diagnostic_df above or below bound_0 or inside or outside the closed interval (bound_0, bound_1).
bound_0 (scalar or array-like) – bound_0 is the mandatory boundary for all how. See pandas.DataFrame.gt and pandas.DataFrame.lt documents for binary comparisons with dataframes.
bound_1 (scalar or array-like) – bound_1 is the mandatory upper bound for `how=”inside” and how=”outside”.
format ({‘long’, ‘wide’}) – The long format pivots the channel columns into a row index and returns just those times, (epochs, parameters), channels that pass the filter. The wide format returns filtered_df with the same shape as diagnostic_df, those datapoints that pass the filter in their original row, column location, nans elsewhere.

Returns

selected_df

Return type

pandas.DataFrame

lmer diagnostics¶

fitgrid.utils.lmer.get_lmer_dfbetas(epochs, factor, **kwargs)[source]

Fit lmers leaving out factor levels one by one, compute DBETAS.

Parameters

epochs (Epochs) – Epochs object
factor (str) – column name of the factor of interest
**kwargs – keyword arguments to pass on to fitgrid.lmer, like RHS

Returns

dfbetas – dataframe containing DFBETAS values

Return type

pandas.DataFrame

Examples

Example calculation showing how to pass in model fitting parameters:

dfbetas = fitgrid.utils.lmer.get_lmer_dfbetas(
    epochs=epochs,
    factor='subject_id',
    RHS='x + (x|a)
)

Notes

DFBETAS is computed according to the following formula [NieGroPel2012]:

\[DFBETAS_{ij} = \frac{\hat{\gamma}_i - \hat{\gamma}_{i(-j)}}{se\left(\hat{\gamma}_{i(-j)}\right)}\]

for parameter \(i\) and level \(j\) of factor.

Data Ingestion¶

Data Simulation¶

Epochs methods¶

Model running¶

FitGrid methods¶

LMFitGrid methods¶

Utilities¶

model fit summaries¶

lm diagnostics¶

lmer diagnostics¶

`Epochs` methods¶

`FitGrid` methods¶

`LMFitGrid` methods¶