fitgrid.utils.summary module¶

fitgrid.utils.summary.plot_AICmin_deltas(summary_df, figsize=None, gridspec_kw=None, **kwargs)[source]¶

plot FitGrid min delta AICs and fitter warnings

Thresholds of AIC_min delta at 2, 4, 7, 10 are from Burnham & Anderson 2004, see Notes.

Parameters

summary_df (pd.DataFrame) – as returned by fitgrid.utils.summary.summarize
figsize (2-ple) – pyplot.figure figure size parameter
gridspec_kw (dict) – matplotlib.gridspec key : value parameters
kwargs (dict) – keyword args passed to plt.subplots(…)

Returns

f, axs

Return type

matplotlib.pyplot.Figure

Notes

[BurAnd2004] p. 270-271. Where \(AIC_{min}\) is the lowest AIC value for “a set of a priori candidate models well-supported by the underlying science \(g_{i}, i = 1, 2, ..., R)\)”,

\[\Delta_{i} = AIC_{i} - AIC_{min}\]

“is the information loss experienced if we are using fitted model \(g_{i}\) rather than the best model, \(g_{min}\) for inference.” …

“Some simple rules of thumb are often useful in assessing the relative merits of models in the set: Models having \(\Delta_{i} <= 2\) have substantial support (evidence), those in which \(\Delta_{i} 4 <= 7\) have considerably less support, and models having \(\Delta_{i} > 10\) have essentially no support.”

fitgrid.utils.summary.plot_betas(summary_df, LHS, alpha=0.05, fdr=None, figsize=None, s=None, df_func=None, **kwargs)[source]¶

Plot model parameter estimates for each data column in LHS

Parameters

summary_df (pd.DataFrame) – as returned by fitgrid.utils.summary.summarize
LHS (list of str) – column names of the data fitgrid.fitgrid docs
alpha (float) – alpha level for false discovery rate correction
fdr (str {None, ‘BY’, ‘BH’}) – Add markers for FDR adjusted significant \(p\)-values. BY is Benjamini and Yekatuli, BH is Benjamini and Hochberg, None supresses the markers.
df_func ({None, function}) – plot function(degrees of freedom), e.g., np.log10, lambda x: x
s (float) – scatterplot marker size for BH and lmer decorations
kwargs (dict) – keyword args passed to pyplot.subplots()

Returns

figs

Return type

list

fitgrid.utils.summary.summarize(epochs_fg, modeler, LHS, RHS, parallel=True, n_cores=4, **kwargs)[source]¶

Fit the data with one or more model formulas and return summary information.

Convenience wrapper, useful for keeping memory use manageable when gathering betas and fit measures for a stack of models.

Parameters

epochs_fg (fitgrid.epochs.Epochs) – as returned by fitgrid.epochs_from_dataframe() or fitgrid.from_hdf(), NOT a pandas.DataFrame.
modeler ({‘lm’, ‘lmer’}) – class of model to fit, lm for OLS, lmer for linear mixed-effects. Note: the RHS formula language must match the modeler.
LHS (list of str) – the data columns to model
RHS (model formula or list of model formulas to fit) – see the Python package patsy docs for lm formula langauge and the R library lme4 docs for the lmer formula langauge.
parallel (bool)
n_cores (int) – number of cores to use. See what works, but golden rule if running on a shared machine.
**kwargs (key=value arguments passed to the modeler, optional)

Returns

summary_df – indexed by timestamp, model_formula, beta, and key, where the keys are ll.l_ci, uu.u_ci, AIC, DF, Estimate, P-val, SE, T-stat, has_warning, logLike.

Return type

pandas.DataFrame

Examples

>>> lm_formulas = [
    '1 + fixed_a + fixed_b + fixed_a:fixed_b',
    '1 + fixed_a + fixed_b',
    '1 + fixed_a,
    '1 + fixed_b,
    '1',
]
>>> lm_summary_df = fitgrid.utils.summarize(
    epochs_fg,
    'lm',
    LHS=['MiPf', 'MiCe', 'MiPa', 'MiOc'],
    RHS=lmer_formulas,
    parallel=True,
    n_cores=4
)

>>> lmer_formulas = [
    '1 + fixed_a + (1 + fixed_a | random_a) + (1 | random_b)',
    '1 + fixed_a + (1 | random_a) + (1 | random_b)',
    '1 + fixed_a + (1 | random_a)',
]
>>> lmer_summary_df = fitgrid.utils.summarize(
    epochs_fg,
    'lmer',
    LHS=['MiPf', 'MiCe', 'MiPa', 'MiOc'],
    RHS=lmer_formulas,
    parallel=True,
    n_cores=12,
    REML=False
)