fitgrid.utils.summary module

fitgrid.utils.summary.plot_AICmin_deltas(summary_df, figsize=None, gridspec_kw=None, **kwargs)[source]

plot FitGrid min delta AICs and fitter warnings

Thresholds of AIC_min delta at 2, 4, 7, 10 are from Burnham & Anderson 2004, see Notes.

Parameters
  • summary_df (pd.DataFrame) – as returned by fitgrid.utils.summary.summarize

  • figsize (2-ple) – pyplot.figure figure size parameter

  • gridspec_kw (dict) – matplotlib.gridspec key : value parameters

  • kwargs (dict) – keyword args passed to plt.subplots(…)

Returns

f, axs

Return type

matplotlib.pyplot.Figure

Notes

[BurAnd2004] p. 270-271. Where \(AIC_{min}\) is the lowest AIC value for “a set of a priori candidate models well-supported by the underlying science \(g_{i}, i = 1, 2, ..., R)\)”,

\[\Delta_{i} = AIC_{i} - AIC_{min}\]

“is the information loss experienced if we are using fitted model \(g_{i}\) rather than the best model, \(g_{min}\) for inference.” …

“Some simple rules of thumb are often useful in assessing the relative merits of models in the set: Models having \(\Delta_{i} <= 2\) have substantial support (evidence), those in which \(\Delta_{i} 4 <= 7\) have considerably less support, and models having \(\Delta_{i} > 10\) have essentially no support.”

fitgrid.utils.summary.plot_betas(summary_df, LHS, alpha=0.05, fdr=None, figsize=None, s=None, df_func=None, **kwargs)[source]

Plot model parameter estimates for each data column in LHS

Parameters
  • summary_df (pd.DataFrame) – as returned by fitgrid.utils.summary.summarize

  • LHS (list of str) – column names of the data fitgrid.fitgrid docs

  • alpha (float) – alpha level for false discovery rate correction

  • fdr (str {None, ‘BY’, ‘BH’}) – Add markers for FDR adjusted significant \(p\)-values. BY is Benjamini and Yekatuli, BH is Benjamini and Hochberg, None supresses the markers.

  • df_func ({None, function}) – plot function(degrees of freedom), e.g., np.log10, lambda x: x

  • s (float) – scatterplot marker size for BH and lmer decorations

  • kwargs (dict) – keyword args passed to pyplot.subplots()

Returns

figs

Return type

list

fitgrid.utils.summary.summarize(epochs_fg, modeler, LHS, RHS, parallel=True, n_cores=4, **kwargs)[source]

Fit the data with one or more model formulas and return summary information.

Convenience wrapper, useful for keeping memory use manageable when gathering betas and fit measures for a stack of models.

Parameters
  • epochs_fg (fitgrid.epochs.Epochs) – as returned by fitgrid.epochs_from_dataframe() or fitgrid.from_hdf(), NOT a pandas.DataFrame.

  • modeler ({‘lm’, ‘lmer’}) – class of model to fit, lm for OLS, lmer for linear mixed-effects. Note: the RHS formula language must match the modeler.

  • LHS (list of str) – the data columns to model

  • RHS (model formula or list of model formulas to fit) – see the Python package patsy docs for lm formula langauge and the R library lme4 docs for the lmer formula langauge.

  • parallel (bool)

  • n_cores (int) – number of cores to use. See what works, but golden rule if running on a shared machine.

  • **kwargs (key=value arguments passed to the modeler, optional)

Returns

summary_df – indexed by timestamp, model_formula, beta, and key, where the keys are ll.l_ci, uu.u_ci, AIC, DF, Estimate, P-val, SE, T-stat, has_warning, logLike.

Return type

pandas.DataFrame

Examples

>>> lm_formulas = [
    '1 + fixed_a + fixed_b + fixed_a:fixed_b',
    '1 + fixed_a + fixed_b',
    '1 + fixed_a,
    '1 + fixed_b,
    '1',
]
>>> lm_summary_df = fitgrid.utils.summarize(
    epochs_fg,
    'lm',
    LHS=['MiPf', 'MiCe', 'MiPa', 'MiOc'],
    RHS=lmer_formulas,
    parallel=True,
    n_cores=4
)
>>> lmer_formulas = [
    '1 + fixed_a + (1 + fixed_a | random_a) + (1 | random_b)',
    '1 + fixed_a + (1 | random_a) + (1 | random_b)',
    '1 + fixed_a + (1 | random_a)',
]
>>> lmer_summary_df = fitgrid.utils.summarize(
    epochs_fg,
    'lmer',
    LHS=['MiPf', 'MiCe', 'MiPa', 'MiOc'],
    RHS=lmer_formulas,
    parallel=True,
    n_cores=12,
    REML=False
)