mkpy.mkh5 module

exception mkpy.mkh5.BadChannelsError[source]: Bases: Exception

exception mkpy.mkh5.DigRecordsNotSequential[source]: Bases: UserWarning

exception mkpy.mkh5.DuplicateLocationLabelError[source]: Bases: Exception

exception mkpy.mkh5.EpochSpansBoundary[source]: Bases: UserWarning

class mkpy.mkh5.LocDat(type, label, coord, pos, distance_units=None, angle_units=None)[source]

Bases: object

map Kutas lab spherical coordinates and Brainsight .elp data files to 3-D cartesian XYZ

Coordinates

LocDat native positions are in Cartesian 3-space

Origin is center of head

Orientation is RAS: X+ = Right, Y+ = Anterior, Z+ = Superior

Cartesian coordinates come in as triples: x, y, z

Polar coordinates come in as triples: radius, theta, z

Kutaslab

Kutaslab topo coordinates are spherical come in as radius, theta, phi triples (see topofiles for theta, phi) and get mapped to x,y,z

origin is between the ears (co-planar with 10-20 temporal line)

vectors from the origin at angles (degrees)

theta = 0 points toward right ear, along interaural line, 90 points to forehead along midline

phi = 0 points to the vertex, 90 points to the temporal line

exception mkpy.mkh5.LogRawEventCodeMismatch[source]: Bases: UserWarning

exception mkpy.mkh5.MightyWeenieCals[source]: Bases: UserWarning

class mkpy.mkh5.mkh5(h5name)[source]

Bases: object

Import and prepare ERPSS single-trial data for cross-platform analysis.

This class provides the user API for converting compressed binary EEG data files into readily accessible HDF5 files.

Parameters: h5_fname (str) – Path to a new or existing HDF5 file used as the database.

EPOCH_TABLES_PATH = '_epoch_tables'

exception EpochsTableDataError(pd_data_type, series)[source]

Bases: Exception

raised for pd.Series data we can’t or won’t directly convert for HDF5

These include mixed num-like and str-like * booleans with missing data

class HeaderIO[source]

Bases: object

private-ish helper class for managing mkh5 datablock header information

mkh5 header structures are python dictionaries, serialized for hdf5 storage as JSON strings, and tucked into the hdf5 attribute so they travel with the datablock.

The dblock header holds information collected/generated from various sources. Some is read from dig .crw/.log file headers, some is generated at runtime as the dig data is converted to mkh5 format. Some is generated/merged in at runtime when the YAML YAML header info file is processed

native .crw header from the info dict returned by mkio._read_header()
mkh5/hdf5 info added by mkh5_read_raw_log()
- miscellanous
- data stream specs, 1-1 with the dblock data columns
supplementary information specified in a YAML format text file and loaded along with the .crw and .log files when they are converted to the dblock, HDF5 format.

The .crw/.dig header can be extended by loading it from a YAML file. See _load_yhdr() docstring for specs.

exception HeaderIOError(value)[source]: Bases: Exception

exception YAMLClobberError(hio, keyword, yhdr_f=None)[source]

Bases: Exception

raised when a YAML header file tries to overwrite an mkh5 header reserved word

get(dblock)[source]

load header info from dblock into self._header

Parameters: dblock (h5py.Dataset) – The HDF5 dataset whose attribute ‘json_header’ holds the header JSON string.

get_slices()[source]

slice out data values from dblock header for use in event table columns

Parameters

slicer (dict) – dictionary of col_name: slash_pattern where, * col_name (string) is the dict key that will appear as a table column heading * search_path (list of strings) as [ ‘key1’, ‘key2’, … key_n] to probe header

Returns

slicer (list of 2-ples, possibly empty)) – each tuple is (col_name, datum) where
datum (object) – leaf returned by dpath.util.get(self._header, search_path)

Raises

RuntimeError if HeaderIO instance doesn't have self._header or self._slicer dicts –
RuntimeError if dpath.util.get finds multiple values –

property header: expose header data like a read-only attribute

new(hdr_dict, yhdr_f)[source]: merge a dictionary and dict from the YAML file into a well-formed mkh5 datablock header or die

set(dblock)[source]

jsonify the current self._header as value of dblock.attrs[self._json_key]

Parameters: dblock (h5py.Dataset) – writeable mkh5 datablock reference

set_slicer(slicer_f)[source]

load YAML header slicer for selecting subsets of mkh5 header values

Parameters: slicer_f (str) – YAML file in mkh5 header slicer format
Returns: side effect: sets self._slicer
Return type: None

The mkh5 header is a tree structure (dict) with branches that terminate in data.
The mkh5 header slicer is an mkh5 header subtree “template” that contains
- terminating branches only
- string labels as terminals, e.g., col_0, col_1

Ex. , [‘key_0’, ‘key_1’, … ‘key_i’, col_0]

Walking through header slicer with dpath.util.get(path) fetches the data value at the end of the path and we label it with the slicer column name like so

[ (col_0, val_0), … (col_n, val_n)]

This converts neatly to wide tabular format

col_0	…	col_j
value_1	…	value_n

Examples

# here is some YAML header info
---
runsheet:
  age: 22
  SAT_math: 720
  SAT_verbal: 680
  handedness: L/L
  mood_VAS: 4.5

The YAML header slicer follows matching paths into that header to pluck out the terminal data values (leafs) and (re-)label them

# here is an extractor for the header
---
runsheet:
  mood_VAS: mood
  handedness: fam_hand
  age: age

Note

key:value order does not matter

This next slicer specifies the same paths into the header tree and extracts exactly the same values

---
runsheet:
  age: age
  handedness: fam_hand
  mood_VAS: mood

The slicer paths are the same for both:

runsheet/mood_VAS/mood

runsheet/handedness/fam_hand

runsheet/age/age

Algorithm

HeaderIO.get_slices() extracts the header values at the end of the path, i.e., 22, L/L, 4.5 and pairs each datum with its path-matching slicer label like so

[ (sub_age, 22), (fam_hand, ‘L/L’) ]
mkh5.get_event_table() converts these to wide-format and merges them with the rest of the single trial event code column information it gets from the code tag mapper.

sub_age fam_hand 22 ‘L/L’

exception Mkh5CalError(value)[source]

Bases: Exception

mkh5 calibration error

exception Mkh5Error(value)[source]

Bases: Exception

general purposes mkh5 error

exception Mkh5FormatError(value)[source]

Bases: Exception

raised on mkh5 format violations

exception YamlHeaderFormatError(value)[source]

Bases: Exception

informative errors for bad yhdr YAML files

append_mkdata(h5_path, eeg_f, log_f, yhdr_f, *args, with_log_events='aligned', **kwargs)[source]

Append .crw, .log, .yhdr to an existing h5_path

Extend an existing sequence of datablocks h5_path/dblock_0, … h5_path/dblock_N, with the continuation, h5_path/dblock_N+1, …

The intended applicaton is to combine .crw, .log files together that could or should be grouped, e.g., to add separately recorded cals, recover from dig crashes, to pool an individuals data recorded in different sessions.

Parameters

h5_path (str) – The full slashpath location in the .h5 file where the new data blocks will be stored in the hdf5 file. Must be the full slashpath from the root without the leading slash.
eeg_f (str) – file path to the .crw files.
log_f (str or None) – file path to corresponding .log file.
with_log_events (str) – how to handle the log event codes, see mkh5.create_mkdata() for details
yhdr_f (string) – path to the YAML header file.

Raises

Warning – If the new crw headers do not match the existing group attributes

See also

create_mkdata()

calibrate_mkdata(id_name, cal_size=None, polarity=None, lo_cursor=None, hi_cursor=30, n_points=None, cal_ccode=None, use_cals=None, use_file=None)[source]

fetch and apply normerp style calibration to raw/crw dataset.

This locates two cursors, one on either side of a an event-triggered calibration square wave step and measures average values in the interval +/- n_points around the cursors separately at each EEG data stream. The magnitude of the difference is the measure of the step. The calibration scaling factor for that stream is the average of the (trimmed) calibration pulses.

Parameters

id_name (str) – h5 group name that is parent to mkpy format dblocks, id_name/dblocks
cal_size (float) – magnitude of calibration square wave step in microvolts, e.g., 10
polarity ((1,0)) – ignored, and should be deprecated. In ERPSS this inverts all waveforms … has nothing to do with calibration really.
lo_cursor (float (positive value)) – magnitude of the low cursor offset from the calibration event in milliseconds
hi_cursor (float (positive value)) – magnitude of the high cursor offset from the calibration event in milliseconds
n_points (uint) – number of points on either side of each cursor to measure, interval = 2*n_points + 1
cal_ccode (uint (default = 0)) – search for cal pulses only in dblocks where the ccode column == cal_ccode. The standing kutas lab convention is cal ccode == 0
use_cals (str (None defaults to id_name)) – slashpath to an alternate h5 group containing dblocks with the cal pulses
use_file (str (None defaults to self.f_name)) – slashpath to an alternate mkpy format h5 data file.

Calibration pulses are often recorded into the same .crw file or a separate file following data acquisition and then mkh5.append()-ed to a group. In both cases, the cal pulses appear in dblocks sister to the EEG data they are used to calibrate.

Consequently the default calibration behavior is to search the dblocks daughter to self.h5_fname/id_name for the cal pulses.

In rare cases, cal pulses are missing entirely and must be poached from another data group in the same hdf5 file or a data group in a different hdf5 file.

Setting use_cals overrides the default group name.

Setting use_file overrides the default self.f_name.
The normerp way is to use the ABSOLUTE VALUE of the cal step regardless of polarity to adjust the amplitude of the +/- A/D recordings … leaving the sign unchanged.
The polarity flag -1 is used ONLY to switch the sign of the EEG and has nothing to do with the A/D scaling factor

create_mkdata(h5_path, eeg_f, log_f, yhdr_f, *args, with_log_events='aligned', **kwargs)[source]

Convert Kutas lab ERPSS .crw and .log to the mkh5 hdf5 format.

This merges dig .crw, .log, and user-specified .yml data into a tidy HDF5 dataset of continuous EEG recording + jsonic header.

Note

Log events are automatically truncated if log event codes occur after the end of the EEG data. This is rare but can happen when dig crashes or drops the last event code.

Parameters

h5_path (str) – The full slashpath location in the .h5 file where the new data blocks will be stored in the hdf5 file. Must be the full slashpath from the root without the leading slash.
eeg_f (str) – file path to the .crw file.
log_f (str or None) – file path to corresponding .log file, if any.
yhdr_f (str) – file path to the YAML header file.
with_log_events ({“aligned”, “from_eeg”, “none”, “as_is”}, optional) – how to handle log file event codes (log_evcodes) relative to the eeg event codes (raw_evcodes) from the eeg recording.

aligned (default)
ensures eeg and log event code timestamps are 1-1 but allows discrepant, e.g., logpoked, event codes with a warning. Requires a log file. This default is the mkpy.mkh5 <= 0.2.2 behavior.

from_eeg
propagates the eeg event codes (dig mark track) to the log_evcodes column. Requires log_f is None.

none
sets log_evcodes, log_ccode, log_flags all to 0. Requires log_f is None.

as_is
loads whatever codes are in the log file without checking against the eeg data. Requires a log file. Silently allows eeg and log event code misalignment. Exceedingly dangerous but useful for disaster recovery.
*args (strings, optional) – passed in to h5py.create_dataset()
*kwargs (key=values, optional) – passed in to h5py.create_dataset(), e.g., compression=”gzip”.

Notes

The EEG and event code data streams are snipped apart into uninterrupted “datablocks” at pause marks. Each data block has its own header containing information from the .crw file merged with the additional information from the YAML header file yhdr_f.

Uncompressed ERPSS .raw files are also legal but there is no good reason to have them around. If the raw won’t compress because it is defective it won’t convert to mkh5 either. There are no known useful **kwargs. HDF5 chunking fails when the size of datablock is smaller than the chunk and compression makes files a little smaller and a lot slower to read/write.

Nathaniel Smith did all the hard work of low level ERPSS file IO.

Examples

Todo

Give examples or link to snippets or examples

property data_blocks: deprecated use mkh5.dblock_paths for a list of HDF5 paths to all the data blocks

property data_groups

property dblock_paths: an iterable list of HDF5 paths to all the data blocks in the mkh5 file

delete_mkdata(sub_id)[source]

delete a top-level group from the mkh5 h5 file without warning, see Notes about wasted space.

Parameters

sub_id (string) path to h5 group in the instance’s h5file

Notes:

Use sparingly or not at all. hdf5 has no garbage collection, deleting groups leaves holes in the file unless the entire file tree is copied to a fresh file

FIX ME: hdf5 notes hack around no garbage collection is to rewrite the gappy file to a new file … this could be built in here.

property epochs_names

export_epochs(epochs_name, epochs_f, file_format='h5', columns=None)[source]

write previously set epochs to data in the specified file format

Recommended epoch export formats for cross-platform data interchange

Parameters

epochs_name (string) – must name one of the datasets in this h5[‘epochs’]
epochs_f (string) – file path and name of the data file
file_format (string, {‘h5’, ‘pdh5’, ‘feather’, ‘txt’})

Warning

File formats other than h5 overwrite any file with the same name without warning.

Note

h5 format:
- the epochs are saved in the HDF5 file root as a dataset named epochs_name. Fails if such a dataset already exists.
- 2-D rows x columns epochs data are stored as a single 1-D column vector (rows) of an HDF5 compound data type (columns). This HDF5 dataset is easily read and unpacked with any HDF5 reader that supports HDF5 compound data types.
pdh5 format: 2-D rows x columns epochs data are written to disk with pandas.to_hdf writer (via pytables). These epochs data are easily read into a pandas.DataFrame with pandas.read_hdf(epochs_f, key=epochs_name) and are also readable, less easily, by other HDF5 readers.
feather, txt formats: 2-D rows x columns epochs data are written to disk with pandas.to_feather (via pyarrow) and as tab-separated text with pandas.to_csv(…, sep=’ ‘).

export_event_table(event_table, event_table_f, format='feather')[source]: fetch the specified event table and save it in the specified format

garv_data_group(h5_data_group_path, skip_ccodes=[0])[source]

Run pygarv on all the dblocks under the h5_data_group_path.

Parameters

h5_data_group_path (str) – name of the h5 datagroup containing the dblocks to screen
skip_ccodes (list of uint, None ([0])) – dblocks with log_ccodes in the list are not scanned. Default is [0] to skip calibration blocks. Setting to None disables the skipper and scans all dblocks in the data group

get_dblock(h5_path, header=True, dblock=True)[source]

return a copy of header dict and numpy ndarray from the mkh5 datablock at h5_path

Parameters

h5_path (string) – full HDF5 slashpath to a datablock in this mkh5 instance
header (bool {True}, optional) – return the header
dblock (bool {True}, optional) – return the dblock dataset

Return type

hdr, data

Raises

ValueError if header and dblock are both False –

get_epochs(epochs_name, format='numpy', columns=None)[source]

fetch single trial epochs in tabluar form

Parameters

epochs_name (str) – name of previously set epochs table
format (str {‘numpy’, ‘pandas’})
columns (list of str or None {‘None’}) – the subset of column names to extract

Returns

epochs (numpy.ndarray or pandas.DataFrame) – epochs.shape == (i x m, n + 2) where

i = the number of epochs, indexed uniquely by epoch_table[‘epoch_id’] m = epoch length in samples n = the number of columns in the epochs_name epochs table

See _h5_get_epochs() for details.
attrs (dict) – stub

get_epochs_table(epochs_name, format='pandas')[source]

look up a previously set epochs table by name

Parameters

epochs_name (str) – name of a previously defined epochs table as set with an mkh5.set_epochs(event_table)
format (str {‘pandas’, ‘numpy’}) – pandas.Dataframe or numpy.ndarray

Returns

epochs_table

Return type

pandas.Dataframe or numpy.ndarray

Bytestrings from the hdf5 are converted to unicode epochs_table table returned

get_epochs_table_names()[source]: returns a list, possibly empty of previously named epochs tables

get_event_table(code_map_f, header_map_f=None)[source]

Reads the code tag and header extractor and returns an event lookup table

Parameters

code_map_f (str) – Excel, YAML, or tab-separated text, see mkh5 docs for format details.
header_map_f (str) – YAML header extractor file, keys match header keys, values specify name of the event table column to put the header data

Returns

event_table – See Note.

Return type

pandas.DataFrame

Note

This sweeps the code tag map across the data to generate a lookup table for specific event (sequence patterns) where the rows specify:
- slashpath to the mkh5 dblock data set and sample index for pattern-matching events.
- all the additional information for that pattern given in the code tag map
The event table generated from mkh5 data and the code_map specification is in lieu of .blf (for EEG epoching and time-locking), .rts (for event-to-event timing), and .hdr (for experimental design specification).

ccode Special column. If the code tag map has a column named ccode the code finder finds events that match the code sequence given by the regex pattern AND the log_ccode == ccode. This emulates Kutas Lab cdbl event lookup and to support, e.g., the condition code == 0 for cals convention and blocked designs where the ccode varies block to block. If the code map does not specify a ccode, column the log_ccode column is ignored for pattern matching.

gethead(pattern)[source]: get header values as a list of (slashpath, value) 2-ples suitable for passing to edhead

headinfo(pattern='.+')[source]

print header information matching the pattern regular expression to STDOUT

Parameters: pattern (regexp) – regular expression to look for in the slashpaths to datablocks and header information in this mkh5 format file. Default ‘.+’ matches all header info.

Assume we have previously constructed an mkh5 file expts.h5 containing data for two experiments and multiple subjects in each and decorated with yaml header information about electrode locations.

We may want to query and display more or less header information. Usually less since there is lots.

Select the relevant information with regular expression pattern matching:

> expts = mkh5.mkh5('expts.h5') # initialize the mkh5 object
> expts.headinfo('Expt1/S001') # fetch all header info for Expt1/S001, all datablocks
> expts.headinfo('Expt1/.*MiPa') # returns everything in any Expt1 header involving MiPa
> expts.headinfo('Expt1/S001/apparatus/space/origin') # origin of sensor space Expt1/S003
> expts.headinfo('Expt1/S001/apparatus/sensors/MiPa/x') # x-coordinate of electrode MiPa

info()[source]

return h5dump-ish overview of h5_path’s groups/datasets/attributes and data Parameter:

h5_path (string) h5 path to a group or dataset

Returns info (string)

plotcals(*args, **kwargs)[source]: visualize cal pulses and scaling factors used to convert to microvolts

reset_all()[source]: completely wipe out the mkh5 file and reset to empty without mercy

set_epochs(epochs_table_name, event_table, tmin_ms, tmax_ms)[source]

construct and store a named EEG epochs lookup-table in self[‘epcochs’]

For storing in hdf5 the columns must be one of these:: string-like (unicode, bytes) int-like (int, np.int, np.uint32, np.uint64) float-like (float, np.float32, np.float64)

Parameters

epochs_table_name (string) – name of the epochs table to store
event_table (pandas.DataFrame) – as returned by mkh5.get_event_table()
tmin_ms (float) – epoch start in millseconds relative to the event, e.g, -500
tmax_ms (float) – epoch end in millseconds relative to the event, e..g, 1500, strictly greater than tmin_ms

Returns

updates h5_f/EPOCH_TABLES_PATH/ with the named epoch table h5py.Dataset

Return type

None

Notes

The epochs table is a lightweight lookup table specific to each mkh5 instance’s hdf5 file,

h5[‘epochs’][epochs_table_name] = epochs_table

The epochs table is row-for-row the same as the time-locking event table and just adds the time interval information for use when extracting the time series EEG data segments.

For reproducibility, by design the epochs tables in an mkh5 file are write-protected. New tables may be added, but existing tables cannot be overwritten or deleted. If you need to the revise the epochs, rebuild the mkh5 file from crws/logs with the ones you want.

sethead(slash_vals, **kwargs)[source]

update mkh5 dblock header information via dpath.utils style slash path, value notation.

The recommended method for adding information to mkh5 headers is via the YAML header file loaded when converting .crw/.log to mkh5

myh5file.create_mkdata(‘my.crw’, ‘my.log’, ‘my.yhdr’)

Use sethead() at your own risk. mucking with headers by hand is dangerous without a clear understanding of the mkh5 dataset and header attribute format and dpath.util behavior.

Parameters: slash_vals ((str, value) 2-ple or list of them) – str is a slash path to an mkh5 dblock and on into the header value is JSON-ifiable scalar, dict, or sequence

mydat = mkh5.mkh5('myfile.h5')
mydat.sethead(('S01/dblock_0/long_subid', 'S0001_A')

# probably a bad idea to set this only for first datablock
mydat.sethead(('S01/dblock_0/npsych/mood_score', 4)

# use a list to set for all dblocks
spvs = [('S01/dblock_0/npsych/mood_score', 4),
        ('S01/dblock_1/npsych/mood_score', 4),
        ('S01/dblock_2/npsych/mood_score', 4),
        ('S01/dblock_3/npsych/mood_score', 4),
        ('S01/dblock_4/npsych/mood_score', 4),
         ('S01/dblock_5/npsych/mood_score', 4), ]