fitgrid.utils.lm module
User functions to streamline working with selected statsmodels OLS
fit attributes for fitgrid.lm grids.
- fitgrid.utils.lm.filter_diagnostic(diagnostic_df, how, bound_0, bound_1=None, format='long')[source]
- Select a subset of a fitgrid statsmodels diagnostic dataframe by value. - Use this to identify time points, epochs, parameters, channels with outlying or potentially influential data. - Parameters
- diagnostic_df (pandas.DataFrame) – As returned by - fitgrid.utils.lm.get_diagnostic()
- how ({‘above’, ‘below’, ‘inside’, ‘outside’}) – slice diagnostic_df above or below bound_0 or inside or outside the closed interval (bound_0, bound_1). 
- bound_0 (scalar or array-like) – bound_0 is the mandatory boundary for all how. See pandas.DataFrame.gt and pandas.DataFrame.lt documents for binary comparisons with dataframes. 
- bound_1 (scalar or array-like) – bound_1 is the mandatory upper bound for `how=”inside” and how=”outside”. 
- format ({‘long’, ‘wide’}) – The long format pivots the channel columns into a row index and returns just those times, (epochs, parameters), channels that pass the filter. The wide format returns filtered_df with the same shape as diagnostic_df, those datapoints that pass the filter in their original row, column location, nans elsewhere. 
 
- Returns
- selected_df 
- Return type
 
- fitgrid.utils.lm.get_diagnostic(lm_grid, diagnostic, do_nobs_loop=False)[source]
- Fetch statsmodels diagnostic as a Time x Channel dataframe - statsmodels implements a variety of data and model diagnostic measures. For some, it also computes a version of a recommended critical value or \(p\)-value. Use these at your own risk after careful study of the statsmodels source code. For details visit statsmodels.stats.outliers_influence.OLSInfluence.html - For a catalog of the measures available for fitgrid.lm() run this in Python - >>>fitgrid.utils.lm.list_diagnostics()- Warning - Data diagnostics can be very large and very slow, see Notes for details. - By default all values of the diagnostics are computed, this dataframe can be pruned with - fitgrid.utils.lm.filter_diagnostic()function.
- By default slow diagnostics are not computed, this can be forced by setting do_nobs_loop=True. 
 - Parameters
- lm_grid (fitgrid.LMFitGrid) – As returned by - fitgrid.lm().
- diagnostic (string) – As implemented in statsmodels, e.g., “cooks_distance”, “dffits_internal”, “est_std”, “dfbetas”. 
- do_nobs_loop (bool) – True forces slow leave-one-observation-out model refitting. 
 
- Returns
- diagnostic_df (pandas.DataFrame) – Channels are in columns. Model measures are row indexed by time; data measures add an epoch row index; parameter measures add a parameter row index. 
- sm_1_df (pandas.DataFrame) – The supplemenatary values statsmodels returns, or None, same shape as diagnostic_df. 
 
 - Notes - Size: diagnostic_df values for data measures like cooks_distance and hat_matrix_diagonal are the size of the original data plus a row index and for some data measures like dfbetas, they are the size of the data multiplied by the number of regressors in the model. 
- Speed: Leave-one-observation-out (LOOO) model refitting takes as long as it takes to fit one model multiplied by the number of observations. This can be intractable for large datasets. Diagnostic measures calculated from the original fit like cooks_distance and dffits_internal are tractable even for large data sets. 
 - Examples - # fake data epochs_fg = fitgrid.generate() lm_grid = fitgrid.lm( epochs_fg, LHS=epochs_fg.channels, RHS='continuous + categorical', parallel=True, n_cores=4, ) # data diagnostic, one dataframe with the values ess_press, _ = fitgrid.utils.lm.get_diagnostic( lm_grid, 'ess_press' ) # Cook's D dataframe AND the p-values statsmodels computes cooks_Ds, sm_pvals = fitgrid.utils.lm.get_diagnostic( lm_grid, 'cooks_distance' ) # this fails because it requires LOOO loop dfbetas_df, _ = fitgrid.utils.lm.get_diagnostic( lm_grid, 'dfbetas' ) # this succeeds by forcing LOOO loop calculation dfbetas_df, _ = fitgrid.utils.lm.get_diagnostic( lm_grid, 'dfbetas', do_nobs_loop=True )