mkpy.mkh5 module
- class mkpy.mkh5.LocDat(type, label, coord, pos, distance_units=None, angle_units=None)[source]
Bases:
object
map Kutas lab spherical coordinates and Brainsight .elp data files to 3-D cartesian XYZ
Coordinates
LocDat native positions are in Cartesian 3-space
Origin is center of head
Orientation is RAS: X+ = Right, Y+ = Anterior, Z+ = Superior
Cartesian coordinates come in as triples: x, y, z
Polar coordinates come in as triples: radius, theta, z
Kutaslab
Kutaslab topo coordinates are spherical come in as radius, theta, phi triples (see topofiles for theta, phi) and get mapped to x,y,z
origin is between the ears (co-planar with 10-20 temporal line)
vectors from the origin at angles (degrees)
theta = 0 points toward right ear, along interaural line, 90 points to forehead along midline
phi = 0 points to the vertex, 90 points to the temporal line
- class mkpy.mkh5.mkh5(h5name)[source]
Bases:
object
Import and prepare ERPSS single-trial data for cross-platform analysis.
This class provides the user API for converting compressed binary EEG data files into readily accessible HDF5 files.
- Parameters
h5_fname (str) – Path to a new or existing HDF5 file used as the database.
- EPOCH_TABLES_PATH = '_epoch_tables'
- exception EpochsTableDataError(pd_data_type, series)[source]
Bases:
Exception
raised for pd.Series data we can’t or won’t directly convert for HDF5
These include mixed num-like and str-like * booleans with missing data
- class HeaderIO[source]
Bases:
object
private-ish helper class for managing mkh5 datablock header information
mkh5 header structures are python dictionaries, serialized for hdf5 storage as JSON strings, and tucked into the hdf5 attribute so they travel with the datablock.
The dblock header holds information collected/generated from various sources. Some is read from dig .crw/.log file headers, some is generated at runtime as the dig data is converted to mkh5 format. Some is generated/merged in at runtime when the YAML YAML header info file is processed
native .crw header from the info dict returned by mkio._read_header()
mkh5/hdf5 info added by mkh5_read_raw_log()
miscellanous
data stream specs, 1-1 with the dblock data columns
supplementary information specified in a YAML format text file and loaded along with the
.crw
and.log
files when they are converted to the dblock, HDF5 format.
The .crw/.dig header can be extended by loading it from a YAML file. See _load_yhdr() docstring for specs.
- exception YAMLClobberError(hio, keyword, yhdr_f=None)[source]
Bases:
Exception
raised when a YAML header file tries to overwrite an mkh5 header reserved word
- get(dblock)[source]
load header info from dblock into self._header
- Parameters
dblock (h5py.Dataset) – The HDF5 dataset whose attribute ‘json_header’ holds the header JSON string.
- get_slices()[source]
slice out data values from dblock header for use in event table columns
- Parameters
slicer (dict) – dictionary of col_name: slash_pattern where, * col_name (string) is the dict key that will appear as a table column heading * search_path (list of strings) as [ ‘key1’, ‘key2’, … key_n] to probe header
- Returns
slicer (list of 2-ples, possibly empty)) – each tuple is (col_name, datum) where
datum (object) – leaf returned by dpath.util.get(self._header, search_path)
- Raises
RuntimeError if HeaderIO instance doesn't have self._header or self._slicer dicts –
RuntimeError if dpath.util.get finds multiple values –
- property header
expose header data like a read-only attribute
- new(hdr_dict, yhdr_f)[source]
merge a dictionary and dict from the YAML file into a well-formed mkh5 datablock header or die
- set(dblock)[source]
jsonify the current self._header as value of dblock.attrs[self._json_key]
- Parameters
dblock (h5py.Dataset) – writeable mkh5 datablock reference
- set_slicer(slicer_f)[source]
load YAML header slicer for selecting subsets of mkh5 header values
- Parameters
slicer_f (str) – YAML file in mkh5 header slicer format
- Returns
side effect: sets self._slicer
- Return type
None
The mkh5 header is a tree structure (dict) with branches that terminate in data.
The mkh5 header slicer is an mkh5 header subtree “template” that contains
terminating branches only
string labels as terminals, e.g., col_0, col_1
Ex. , [‘key_0’, ‘key_1’, … ‘key_i’, col_0]
Walking through header slicer with dpath.util.get(path) fetches the data value at the end of the path and we label it with the slicer column name like so
[ (col_0, val_0), … (col_n, val_n)]
This converts neatly to wide tabular format
col_0
…
col_j
value_1
…
value_n
Examples
# here is some YAML header info --- runsheet: age: 22 SAT_math: 720 SAT_verbal: 680 handedness: L/L mood_VAS: 4.5
The YAML header slicer follows matching paths into that header to pluck out the terminal data values (leafs) and (re-)label them
# here is an extractor for the header --- runsheet: mood_VAS: mood handedness: fam_hand age: age
Note
key:value
order does not matterThis next slicer specifies the same paths into the header tree and extracts exactly the same values
--- runsheet: age: age handedness: fam_hand mood_VAS: mood
The slicer paths are the same for both:
runsheet/mood_VAS/mood
runsheet/handedness/fam_hand
runsheet/age/age
Algorithm
HeaderIO.get_slices() extracts the header values at the end of the path, i.e., 22, L/L, 4.5 and pairs each datum with its path-matching slicer label like so
[ (sub_age, 22), (fam_hand, ‘L/L’) ]
mkh5.get_event_table() converts these to wide-format and merges them with the rest of the single trial event code column information it gets from the code tag mapper.
sub_age fam_hand 22 ‘L/L’
- exception YamlHeaderFormatError(value)[source]
Bases:
Exception
informative errors for bad yhdr YAML files
- append_mkdata(h5_path, eeg_f, log_f, yhdr_f, *args, with_log_events='aligned', **kwargs)[source]
Append .crw, .log, .yhdr to an existing h5_path
Extend an existing sequence of datablocks h5_path/dblock_0, … h5_path/dblock_N, with the continuation, h5_path/dblock_N+1, …
The intended applicaton is to combine .crw, .log files together that could or should be grouped, e.g., to add separately recorded cals, recover from dig crashes, to pool an individuals data recorded in different sessions.
- Parameters
h5_path (str) – The full slashpath location in the .h5 file where the new data blocks will be stored in the hdf5 file. Must be the full slashpath from the root without the leading slash.
eeg_f (str) – file path to the .crw files.
log_f (str or None) – file path to corresponding .log file.
with_log_events (str) – how to handle the log event codes, see mkh5.create_mkdata() for details
yhdr_f (string) – path to the YAML header file.
- Raises
Warning – If the new crw headers do not match the existing group attributes
See also
- calibrate_mkdata(id_name, cal_size=None, polarity=None, lo_cursor=None, hi_cursor=30, n_points=None, cal_ccode=None, use_cals=None, use_file=None)[source]
fetch and apply normerp style calibration to raw/crw dataset.
This locates two cursors, one on either side of a an event-triggered calibration square wave step and measures average values in the interval +/- n_points around the cursors separately at each EEG data stream. The magnitude of the difference is the measure of the step. The calibration scaling factor for that stream is the average of the (trimmed) calibration pulses.
- Parameters
id_name (str) – h5 group name that is parent to mkpy format dblocks, id_name/dblocks
cal_size (float) – magnitude of calibration square wave step in microvolts, e.g., 10
polarity ((1,0)) – ignored, and should be deprecated. In ERPSS this inverts all waveforms … has nothing to do with calibration really.
lo_cursor (float (positive value)) – magnitude of the low cursor offset from the calibration event in milliseconds
hi_cursor (float (positive value)) – magnitude of the high cursor offset from the calibration event in milliseconds
n_points (uint) – number of points on either side of each cursor to measure, interval = 2*n_points + 1
cal_ccode (uint (default = 0)) – search for cal pulses only in dblocks where the ccode column == cal_ccode. The standing kutas lab convention is cal ccode == 0
use_cals (str (None defaults to id_name)) – slashpath to an alternate h5 group containing dblocks with the cal pulses
use_file (str (None defaults to self.f_name)) – slashpath to an alternate mkpy format h5 data file.
Calibration pulses are often recorded into the same .crw file or a separate file following data acquisition and then mkh5.append()-ed to a group. In both cases, the cal pulses appear in dblocks sister to the EEG data they are used to calibrate.
Consequently the default calibration behavior is to search the dblocks daughter to self.h5_fname/id_name for the cal pulses.
In rare cases, cal pulses are missing entirely and must be poached from another data group in the same hdf5 file or a data group in a different hdf5 file.
Setting use_cals overrides the default group name.
Setting use_file overrides the default self.f_name.
The normerp way is to use the ABSOLUTE VALUE of the cal step regardless of polarity to adjust the amplitude of the +/- A/D recordings … leaving the sign unchanged.
The polarity flag -1 is used ONLY to switch the sign of the EEG and has nothing to do with the A/D scaling factor
- create_mkdata(h5_path, eeg_f, log_f, yhdr_f, *args, with_log_events='aligned', **kwargs)[source]
Convert Kutas lab ERPSS .crw and .log to the mkh5 hdf5 format.
This merges dig .crw, .log, and user-specified .yml data into a tidy HDF5 dataset of continuous EEG recording + jsonic header.
Note
Log events are automatically truncated if log event codes occur after the end of the EEG data. This is rare but can happen when dig crashes or drops the last event code.
- Parameters
h5_path (str) – The full slashpath location in the .h5 file where the new data blocks will be stored in the hdf5 file. Must be the full slashpath from the root without the leading slash.
eeg_f (str) – file path to the .crw file.
log_f (str or None) – file path to corresponding .log file, if any.
yhdr_f (str) – file path to the YAML header file.
with_log_events ({“aligned”, “from_eeg”, “none”, “as_is”}, optional) – how to handle log file event codes (log_evcodes) relative to the eeg event codes (raw_evcodes) from the eeg recording.
- aligned (default)
ensures eeg and log event code timestamps are 1-1 but allows discrepant, e.g., logpoked, event codes with a warning. Requires a log file. This default is the mkpy.mkh5 <= 0.2.2 behavior.
- from_eeg
propagates the eeg event codes (dig mark track) to the log_evcodes column. Requires log_f is None.
- none
sets log_evcodes, log_ccode, log_flags all to 0. Requires log_f is None.
- as_is
loads whatever codes are in the log file without checking against the eeg data. Requires a log file. Silently allows eeg and log event code misalignment. Exceedingly dangerous but useful for disaster recovery.
*args (strings, optional) – passed in to h5py.create_dataset()
*kwargs (key=values, optional) – passed in to h5py.create_dataset(), e.g., compression=”gzip”.
Notes
The EEG and event code data streams are snipped apart into uninterrupted “datablocks” at pause marks. Each data block has its own header containing information from the .crw file merged with the additional information from the YAML header file yhdr_f.
Uncompressed ERPSS .raw files are also legal but there is no good reason to have them around. If the raw won’t compress because it is defective it won’t convert to mkh5 either. There are no known useful **kwargs. HDF5 chunking fails when the size of datablock is smaller than the chunk and compression makes files a little smaller and a lot slower to read/write.
Nathaniel Smith did all the hard work of low level ERPSS file IO.
Examples
Todo
Give examples or link to snippets or examples
- property data_blocks
deprecated use mkh5.dblock_paths for a list of HDF5 paths to all the data blocks
- property data_groups
- property dblock_paths
an iterable list of HDF5 paths to all the data blocks in the mkh5 file
- delete_mkdata(sub_id)[source]
delete a top-level group from the mkh5 h5 file without warning, see Notes about wasted space.
Parameters
sub_id (string) path to h5 group in the instance’s h5file
Notes:
Use sparingly or not at all. hdf5 has no garbage collection, deleting groups leaves holes in the file unless the entire file tree is copied to a fresh file
FIX ME: hdf5 notes hack around no garbage collection is to rewrite the gappy file to a new file … this could be built in here.
- property epochs_names
- export_epochs(epochs_name, epochs_f, file_format='h5', columns=None)[source]
write previously set epochs to data in the specified file format
Recommended epoch export formats for cross-platform data interchange
- Parameters
epochs_name (string) – must name one of the datasets in this h5[‘epochs’]
epochs_f (string) – file path and name of the data file
file_format (string, {‘h5’, ‘pdh5’, ‘feather’, ‘txt’})
Warning
File formats other than h5 overwrite any file with the same name without warning.
Note
h5 format:
the epochs are saved in the HDF5 file root as a dataset named epochs_name. Fails if such a dataset already exists.
2-D rows x columns epochs data are stored as a single 1-D column vector (rows) of an HDF5 compound data type (columns). This HDF5 dataset is easily read and unpacked with any HDF5 reader that supports HDF5 compound data types.
pdh5 format: 2-D rows x columns epochs data are written to disk with pandas.to_hdf writer (via pytables). These epochs data are easily read into a pandas.DataFrame with pandas.read_hdf(epochs_f, key=epochs_name) and are also readable, less easily, by other HDF5 readers.
feather, txt formats: 2-D rows x columns epochs data are written to disk with pandas.to_feather (via pyarrow) and as tab-separated text with pandas.to_csv(…, sep=’ ‘).
- export_event_table(event_table, event_table_f, format='feather')[source]
fetch the specified event table and save it in the specified format
- garv_data_group(h5_data_group_path, skip_ccodes=[0])[source]
Run pygarv on all the dblocks under the h5_data_group_path.
- Parameters
h5_data_group_path (str) – name of the h5 datagroup containing the dblocks to screen
skip_ccodes (list of uint, None ([0])) – dblocks with log_ccodes in the list are not scanned. Default is [0] to skip calibration blocks. Setting to None disables the skipper and scans all dblocks in the data group
- get_dblock(h5_path, header=True, dblock=True)[source]
return a copy of header dict and numpy ndarray from the mkh5 datablock at h5_path
- Parameters
h5_path (string) – full HDF5 slashpath to a datablock in this mkh5 instance
header (bool {True}, optional) – return the header
dblock (bool {True}, optional) – return the dblock dataset
- Return type
hdr, data
- Raises
ValueError if header and dblock are both False –
- get_epochs(epochs_name, format='numpy', columns=None)[source]
fetch single trial epochs in tabluar form
- Parameters
epochs_name (str) – name of previously set epochs table
format (str {‘numpy’, ‘pandas’})
columns (list of str or None {‘None’}) – the subset of column names to extract
- Returns
epochs (numpy.ndarray or pandas.DataFrame) – epochs.shape == (i x m, n + 2) where
i = the number of epochs, indexed uniquely by epoch_table[‘epoch_id’] m = epoch length in samples n = the number of columns in the epochs_name epochs table
See _h5_get_epochs() for details.
attrs (dict) – stub
- get_epochs_table(epochs_name, format='pandas')[source]
look up a previously set epochs table by name
- Parameters
epochs_name (str) – name of a previously defined epochs table as set with an mkh5.set_epochs(event_table)
format (str {‘pandas’, ‘numpy’}) – pandas.Dataframe or numpy.ndarray
- Returns
epochs_table
- Return type
pandas.Dataframe or numpy.ndarray
Bytestrings from the hdf5 are converted to unicode epochs_table table returned
- get_event_table(code_map_f, header_map_f=None)[source]
Reads the code tag and header extractor and returns an event lookup table
- Parameters
code_map_f (str) – Excel, YAML, or tab-separated text, see mkh5 docs for format details.
header_map_f (str) – YAML header extractor file, keys match header keys, values specify name of the event table column to put the header data
- Returns
event_table – See Note.
- Return type
pandas.DataFrame
Note
This sweeps the code tag map across the data to generate a lookup table for specific event (sequence patterns) where the rows specify:
slashpath to the mkh5 dblock data set and sample index for pattern-matching events.
all the additional information for that pattern given in the code tag map
The event table generated from mkh5 data and the code_map specification is in lieu of .blf (for EEG epoching and time-locking), .rts (for event-to-event timing), and .hdr (for experimental design specification).
ccode
Special column. If the code tag map has a column namedccode
the code finder finds events that match the code sequence given by the regex pattern AND the log_ccode == ccode. This emulates Kutas Lab cdbl event lookup and to support, e.g., the condition code == 0 for cals convention and blocked designs where the ccode varies block to block. If the code map does not specify accode
, column the log_ccode column is ignored for pattern matching.
- gethead(pattern)[source]
get header values as a list of (slashpath, value) 2-ples suitable for passing to edhead
- headinfo(pattern='.+')[source]
print header information matching the pattern regular expression to STDOUT
- Parameters
pattern (regexp) – regular expression to look for in the slashpaths to datablocks and header information in this mkh5 format file. Default ‘.+’ matches all header info.
Assume we have previously constructed an mkh5 file expts.h5 containing data for two experiments and multiple subjects in each and decorated with yaml header information about electrode locations.
We may want to query and display more or less header information. Usually less since there is lots.
Select the relevant information with regular expression pattern matching:
> expts = mkh5.mkh5('expts.h5') # initialize the mkh5 object > expts.headinfo('Expt1/S001') # fetch all header info for Expt1/S001, all datablocks > expts.headinfo('Expt1/.*MiPa') # returns everything in any Expt1 header involving MiPa > expts.headinfo('Expt1/S001/apparatus/space/origin') # origin of sensor space Expt1/S003 > expts.headinfo('Expt1/S001/apparatus/sensors/MiPa/x') # x-coordinate of electrode MiPa
- info()[source]
return h5dump-ish overview of h5_path’s groups/datasets/attributes and data Parameter:
h5_path (string) h5 path to a group or dataset
Returns info (string)
- plotcals(*args, **kwargs)[source]
visualize cal pulses and scaling factors used to convert to microvolts
- set_epochs(epochs_table_name, event_table, tmin_ms, tmax_ms)[source]
construct and store a named EEG epochs lookup-table in self[‘epcochs’]
- For storing in hdf5 the columns must be one of these:
string-like (unicode, bytes) int-like (int, np.int, np.uint32, np.uint64) float-like (float, np.float32, np.float64)
- Parameters
epochs_table_name (string) – name of the epochs table to store
event_table (pandas.DataFrame) – as returned by mkh5.get_event_table()
tmin_ms (float) – epoch start in millseconds relative to the event, e.g, -500
tmax_ms (float) – epoch end in millseconds relative to the event, e..g, 1500, strictly greater than tmin_ms
- Returns
updates h5_f/EPOCH_TABLES_PATH/ with the named epoch table h5py.Dataset
- Return type
None
Notes
The epochs table is a lightweight lookup table specific to each mkh5 instance’s hdf5 file,
h5[‘epochs’][epochs_table_name] = epochs_table
The epochs table is row-for-row the same as the time-locking event table and just adds the time interval information for use when extracting the time series EEG data segments.
For reproducibility, by design the epochs tables in an mkh5 file are write-protected. New tables may be added, but existing tables cannot be overwritten or deleted. If you need to the revise the epochs, rebuild the mkh5 file from crws/logs with the ones you want.
- sethead(slash_vals, **kwargs)[source]
update mkh5 dblock header information via
dpath.utils
style slash path, value notation.The recommended method for adding information to mkh5 headers is via the YAML header file loaded when converting .crw/.log to mkh5
myh5file.create_mkdata(‘my.crw’, ‘my.log’, ‘my.yhdr’)
Use sethead() at your own risk. mucking with headers by hand is dangerous without a clear understanding of the mkh5 dataset and header attribute format and dpath.util behavior.
- Parameters
slash_vals ((str, value) 2-ple or list of them) – str is a slash path to an mkh5 dblock and on into the header value is JSON-ifiable scalar, dict, or sequence
mydat = mkh5.mkh5('myfile.h5') mydat.sethead(('S01/dblock_0/long_subid', 'S0001_A') # probably a bad idea to set this only for first datablock mydat.sethead(('S01/dblock_0/npsych/mood_score', 4) # use a list to set for all dblocks spvs = [('S01/dblock_0/npsych/mood_score', 4), ('S01/dblock_1/npsych/mood_score', 4), ('S01/dblock_2/npsych/mood_score', 4), ('S01/dblock_3/npsych/mood_score', 4), ('S01/dblock_4/npsych/mood_score', 4), ('S01/dblock_5/npsych/mood_score', 4), ]