fitgrid.utils.lm module

User functions to streamline working with selected statsmodels OLS fit attributes for fitgrid.lm grids.

fitgrid.utils.lm.filter_diagnostic(diagnostic_df, how, bound_0, bound_1=None, format='long')[source]

Select a subset of a fitgrid statsmodels diagnostic dataframe by value.

Use this to identify time points, epochs, parameters, channels with outlying or potentially influential data.

Parameters

diagnostic_df (pandas.DataFrame) – As returned by fitgrid.utils.lm.get_diagnostic()
how ({‘above’, ‘below’, ‘inside’, ‘outside’}) – slice diagnostic_df above or below bound_0 or inside or outside the closed interval (bound_0, bound_1).
bound_0 (scalar or array-like) – bound_0 is the mandatory boundary for all how. See pandas.DataFrame.gt and pandas.DataFrame.lt documents for binary comparisons with dataframes.
bound_1 (scalar or array-like) – bound_1 is the mandatory upper bound for `how=”inside” and how=”outside”.
format ({‘long’, ‘wide’}) – The long format pivots the channel columns into a row index and returns just those times, (epochs, parameters), channels that pass the filter. The wide format returns filtered_df with the same shape as diagnostic_df, those datapoints that pass the filter in their original row, column location, nans elsewhere.

Returns

selected_df

Return type

pandas.DataFrame

fitgrid.utils.lm.get_diagnostic(lm_grid, diagnostic, do_nobs_loop=False)[source]

Fetch statsmodels diagnostic as a Time x Channel dataframe

statsmodels implements a variety of data and model diagnostic measures. For some, it also computes a version of a recommended critical value or \(p\)-value. Use these at your own risk after careful study of the statsmodels source code. For details visit statsmodels.stats.outliers_influence.OLSInfluence.html

For a catalog of the measures available for fitgrid.lm() run this in Python

>>>fitgrid.utils.lm.list_diagnostics()

Warning

Data diagnostics can be very large and very slow, see Notes for details.

By default all values of the diagnostics are computed, this dataframe can be pruned with fitgrid.utils.lm.filter_diagnostic() function.
By default slow diagnostics are not computed, this can be forced by setting do_nobs_loop=True.

Parameters

lm_grid (fitgrid.LMFitGrid) – As returned by fitgrid.lm().
diagnostic (string) – As implemented in statsmodels, e.g., “cooks_distance”, “dffits_internal”, “est_std”, “dfbetas”.
do_nobs_loop (bool) – True forces slow leave-one-observation-out model refitting.

Returns

diagnostic_df (pandas.DataFrame) – Channels are in columns. Model measures are row indexed by time; data measures add an epoch row index; parameter measures add a parameter row index.
sm_1_df (pandas.DataFrame) – The supplemenatary values statsmodels returns, or None, same shape as diagnostic_df.

Notes

Size: diagnostic_df values for data measures like cooks_distance and hat_matrix_diagonal are the size of the original data plus a row index and for some data measures like dfbetas, they are the size of the data multiplied by the number of regressors in the model.
Speed: Leave-one-observation-out (LOOO) model refitting takes as long as it takes to fit one model multiplied by the number of observations. This can be intractable for large datasets. Diagnostic measures calculated from the original fit like cooks_distance and dffits_internal are tractable even for large data sets.

Examples

# fake data
epochs_fg = fitgrid.generate()
lm_grid = fitgrid.lm(
    epochs_fg,
    LHS=epochs_fg.channels,
    RHS='continuous + categorical',
    parallel=True,
    n_cores=4,
)

# data diagnostic, one dataframe with the values
ess_press, _ = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'ess_press'
)

# Cook's D dataframe AND the p-values statsmodels computes
cooks_Ds, sm_pvals = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'cooks_distance'
)

# this fails because it requires LOOO loop
dfbetas_df, _  = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'dfbetas'
)

# this succeeds by forcing LOOO loop calculation
dfbetas_df, _  = fitgrid.utils.lm.get_diagnostic(
    lm_grid,
    'dfbetas',
    do_nobs_loop=True
)

fitgrid.utils.lm.get_vifs(epochs, RHS, quiet=False)[source]

fitgrid.utils.lm.list_diagnostics()[source]: Display statsmodels diagnostics implemented in fitgrid.utils.lm