fitgrid.utils.summary module

User functions to streamline working with grids of OLS and LME model summaries and sets of models.

fitgrid.utils.summary.plot_AICmin_deltas(summary_df, show_warnings='no_labels', figsize=None, gridspec_kw=None, subplot_kw=None)[source]

plot FitGrid min delta AICs and fitter warnings

Thresholds of AIC_min delta at 2, 4, 7, 10 are from Burnham & Anderson 2004, see Notes.

Parameters

summary_df (pd.DataFrame) – as returned by fitgrid.utils.summary.summarize
show_warnings ({“no_labels”, “labels”, str, list of str}) – “no_labels” (default) highlights everywhere there is any warning in red, the default behavior in fitgrid < v0.5.0. “labels” display all warning strings the axes titles. A str or list of str selects and display only warnings that (partial) match a model warning string.
figsize (2-ple) – pyplot.figure figure size parameter
gridspec_kw (dict) – matplotlib.gridspec keyword args passed to pyplot.subplots(..., gridspec_kw=gridspec_kw})
subplot_kw (dict) – keyword args passed to pyplot.subplots(..., subplot_kw=subplot_kw))

Returns

f, axs

Return type

matplotlib.pyplot.Figure

Notes

[BurAnd2004] p. 270-271. Where $AIC_{min}$ is the lowest AIC value for “a set of a priori candidate models well-supported by the underlying science $g_{i}, i = 1, 2, ..., R)$”,

\[\Delta_{i} = AIC_{i} - AIC_{min}\]

“is the information loss experienced if we are using fitted model $g_{i}$ rather than the best model, $g_{min}$ for inference.” …

“Some simple rules of thumb are often useful in assessing the relative merits of models in the set: Models having $\Delta_{i} <= 2$ have substantial support (evidence), those in which $\Delta_{i} 4 <= 7$ have considerably less support, and models having $\Delta_{i} > 10$ have essentially no support.”

fitgrid.utils.summary.plot_betas(summary_df, LHS=[], models=[], betas=[], interval=[], beta_plot_kw={}, show_se=True, show_warnings=True, fdr_kw={}, fig_kw={'figsize': (8, 3)}, df_func=None, scatter_size=75, **kwargs)[source]

Plot model parameter estimates for model, beta, and channel LHS

The time course of estimated betas and standard errors is plotted by channel for the models, betas, and channels in the data frame. Channels, models, betas and time intervals may be selected from the summary dataframe. Plots are marked with model fit warnings by default and may be tagged to indicate differences from 0 controlled for false discovery rate (FDR).

Parameters

summary_df (pd.DataFrame) – as returned by fitgrid.utils.summary.summarize
LHS (list of str or []) – column names of the data, [] default = all channels
models (list of str or []) – select model or model betas to display, [] default = all models
betas (list of str [] or []) – select beta or betas to plot, [] default = all betas
interval ([start, stop] list of two ints) – time interval to plot
beta_plot_kw (dict) – keyword arguments passed to matplotlib.axes.plot()
show_se (bool) – toggle display of standard error shading (default = True)
show_warnings (bool) – toggle display of model warnings (default = True)
fdr_kw (dict (default empty)) – One or more keyword arguments passed to summaries_fdr_control() to trigger to tag plots for FDR controlled differences from 0.
fig_kw (dict) – keyword args passed to pyplot.subplots()
df_func ({None, function}) – toggle degrees of freedom line plot via function, e.g., np.log10, lambda x: x
scatter_size (float) – scatterplot marker size for FDR (default = 75) and warnings (= 1.5 scatter_size)

Returns

figs

Return type

list of matplotlib.Figure

Note

The FDR family of tests is given by all channels, models, betas, and times in the summary data frame regardless of which of these are selected for plotting. To specify a different family of tests, construct a summary dataframe with all and only the tests for that family before plotting the betas.

fitgrid.utils.summary.summaries_fdr_control(model_summary_df, method='BY', rate=0.05, plot_pvalues=True)[source]

False discovery rate control for non-zero betas in model summary dataframes

The family of tests for FDR control is assumed to be all and only the channels, models, and $\hat{\beta}_i$ in the summary dataframe. The user must select the appropriate family of tests by slicing or stacking summary dataframes before running the FDR calculator.

Parameters

model_summary_df (pandas.DataFrame) – As returned by fitgrid.utils.summary.summarize.
method (str {“BY”, “BH”}) – BY (default) is from Benjamini and Yekatuli 1, BH is Benjamini and Hochberg 2.
rate (float {0.05}) – The target rate for controlling false discoveries.
plot_pvalues (bool {True, False}) – Display a plot of the family of $p$-values and critical value for FDR control.

References

1: Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency.The Annals of Statistics, 29, 1165-1188.
2: Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57, 289-300.

fitgrid.utils.summary.summarize(epochs_fg, modeler, LHS, RHS, parallel=False, n_cores=2, quiet=False, **kwargs)[source]

Fit the data with one or more model formulas and return summary information.

Convenience wrapper, useful for keeping memory use manageable when gathering betas and fit measures for a stack of models.

Parameters

epochs_fg (fitgrid.epochs.Epochs) – as returned by fitgrid.epochs_from_dataframe() or fitgrid.from_hdf(), NOT a pandas.DataFrame.
modeler ({‘lm’, ‘lmer’}) – class of model to fit, lm for OLS, lmer for linear mixed-effects. Note: the RHS formula language must match the modeler.
LHS (list of str) – the data columns to model
RHS (model formula or list of model formulas to fit) – see the Python package patsy docs for lm formula language and the R library lme4 docs for the lmer formula language.
parallel (bool) – If True, model fitting is distributed to multiple cores
n_cores (int) – number of cores to use. See what works, but golden rule if running on a shared machine.
quiet (bool) – Show progress bar default=True
**kwargs (key=value arguments passed to the modeler, optional)

Returns

summary_df – indexed by timestamp, model_formula, beta, and key, where the keys are ll.l_ci, uu.u_ci, AIC, DF, Estimate, P-val, SE, T-stat, has_warning, logLike.

Return type

pandas.DataFrame

Examples

>>> lm_formulas = [
    '1 + fixed_a + fixed_b + fixed_a:fixed_b',
    '1 + fixed_a + fixed_b',
    '1 + fixed_a,
    '1 + fixed_b,
    '1',
]
>>> lm_summary_df = fitgrid.utils.summarize(
    epochs_fg,
    'lm',
    LHS=['MiPf', 'MiCe', 'MiPa', 'MiOc'],
    RHS=lmer_formulas,
    parallel=True,
    n_cores=4
)

>>> lmer_formulas = [
    '1 + fixed_a + (1 + fixed_a | random_a) + (1 | random_b)',
    '1 + fixed_a + (1 | random_a) + (1 | random_b)',
    '1 + fixed_a + (1 | random_a)',
]
>>> lmer_summary_df = fitgrid.utils.summarize(
    epochs_fg,
    'lmer',
    LHS=['MiPf', 'MiCe', 'MiPa', 'MiOc'],
    RHS=lmer_formulas,
    parallel=True,
    n_cores=12,
    REML=False
)