mkpy.pygarv module

pygarv is the backend for marking artifacts in mkh5 data with tests defined in a YAML file

Successful runs of tests and their results are stored in PyGarv.tr_docs a list of tr_doc dicts, one dict per h5 datablock.

Parameters: tr_doc[‘tests’] (list) – each item is a dict

Examples

tr_doc[‘tests’]

[ {'dblock_path_idx': 0,
   'dblock_path': 'calstest/dblock_0',
   'name': 'pygarv',
   'tests': [ [{'test': 'ppa'},
               {'tag': 'amplitude exursions'},
               {'stream': 'MiCe'},
               {'threshold': 0.0},
               {'interval': 0.0} ],
              [{'test': 'ppadif'},
               {'tag': 'amplitude exursions'},
               {'stream': 'MiCe'},
               {'threshold': 0.1},
               {'interval': 0.1},
               {'stream2': 'MiPa'} ] ]},

    {'dblock_path_idx': 1,
     'dblock_path': 'calstest/dblock_1',
     'name': 'pygarv',
     'tests': None},
]

tr_doc[‘fails’] : list

len(tr_doc[‘fails’] == len(tr_doc[‘tests’]) where tr_doc[‘fails’][idx] is

a list of (start, stop) intervals in dblock_tick indexes where tr_doc[‘test’] failed. tr_doc[‘pygarv’]

The tests are specified as a YAML file .yarf.

---
dblock_path: some_path
dblock_path_idx: unint
name: pygarv
tests:
- - test_spec
  - test_spec
  ...
  - test_spec

Each test_spec is a YAML map with a mandatory name and tag parameter and optional other paramters as needed for specific tests

test: str tag: str

where

test names a pygarv test function, e.g., mxflat, ppadif tag is a user-defined descriptive tag, e.g., blocking, heog, fancy test

class mkpy.pygarv.PyGarv(mkh5_f, yarf_f=None)[source]

Bases: object

container to hold an inventory of functions for computing sample-wise artifact masks.

When invoked at the command line, pygarv needs an mkh5 file to work with

There are two cases:

has not been previously garved with _update_mkh5()
- no pygarv test info in header
- pygarv data streams all zeros
data has been previously garved with _update_mkh5()
- pygarv test info appears in header
- test results are unknown, possibly None
- pygarv data stream state is unknown

On init the mkh5 file is scanned for previous runs, if found the pygarv data buffers (volatile) are synced with the info from the h5 file.

For each data block:

self.tr_docs are set to match the header[‘pygarv’] dict

self.yarf_fails are set according to dblock[‘pygarv’], self.tr_docs

the value of pygarv = run_test(db_idx) (what-if run) is checked against the dblock data, discrepanices throw a warning

PyGarv now has persistent and volatile rejection data in alignment, suitable for viewing/editing in mkh5viewer

PyGarvTest

The PyGarvTest decorator handles all the default parameter name and type bookkeeping for specific tests

To add a test to the catalog …

1. implement a function that takes two args (hdr, dblock, **kwargs) and returns a boolean artifact mask of length dblock data samples where 0 = good, 1 = bad.

The hdr (dict), and dblock (np.ndarray) are, e.g., as returned by hdr, dblock = mkh5.get_dblock(path_to_datablock) but can by any dict and dblock that expose variable needed to compute the artifact mask.

decorate it with @PyGarvTest(test_name, [key=dtype, key=dtype])

where test_name is the test name and the list of key_i=dtype_i optionally gives extra parameters named key_1, … key_n with data type dtype.

cppa = {'interval': None, 'stream': None, 'tag': None, 'test': 'cppa', 'threshold': None}[source]

cppadif = {'interval': None, 'stream': None, 'stream2': None, 'tag': None, 'test': 'cppadif', 'threshold': None}[source]

cstdev = {'interval': None, 'stream': None, 'tag': None, 'test': 'cstdev', 'threshold': None}[source]

get_catalog()[source]

get_result(pg_test_result)[source]

convenience wrapper to query a test result, decode the mask, and return with its test in a handy package.

Parameters: pg_test_result (a (tr_doc, pygarv_mask) tuple) – as returned by run_* functions

maxflat = {'nsamp': None, 'poststim': None, 'prestim': None, 'stream': None, 'tag': None, 'test': 'maxflat', 'threshold': None}[source]

param_types = {'interval': <class 'float'>, 'stream': <class 'str'>, 'threshold': <class 'float'>}

ppa = {'poststim': None, 'prestim': None, 'stream': None, 'tag': None, 'test': 'ppa', 'threshold': None}[source]

ppadif = {'poststim': None, 'prestim': None, 'stream': None, 'stream2': None, 'tag': None, 'test': 'ppadif', 'threshold': None}[source]

run_dblock(dbp_idx, tr_doc)[source]

Run tests in the tr_doc for datablock at dbp_idx, returns 64-bit pygarv sample mask.

Parameters

dpb_idx (uint) – index of the ith dblock in self.dblock_paths
tr_doc (dict) – PyYarf format dict with tr_doc[‘tests’]

Returns

dict of results like so:

{name: 'results',
 dblock_path: str (== the yarf_dbp),
 pygarv : np.ndarray(shape=(len(dblock),),
                     dtype=dblock['pygarv'].dtype),
 fails : list of uint 2-ples (x0, x1)}

Return type

results

The fails list amounts to an RLL compression of the boolean vector pygarv > 0

Raises: ValueError if tr_doc['dblock_path'] != self.dblock_paths[dbp_idx] –

run_tests()[source]

fetch tests and pygarv mask for all dblocks, does not modify mkh5

Returns

pg_test_results (list of 2-ples (tr_doc, pygarv_mask), one)
for each datablock in self.mkh5

class mkpy.pygarv.PyGarvTest(test, **kwargs)[source]

Bases: OrderedDict

Decorator class for the PyGarv tests.

This enforces an extensible standard form on PyGarv test specs and execution.

The class derives from OrderedDict so it returns .keys() .values() .items() in fixed original parameter order. This is useful for populating test UI elements and reading writing YAML sequences without scrambling the key:value pairs the way a dict() might.

Parameters

param_specs ([(key,type), …]) –

keystr: parameter label
typePython type: required Python data type for values of the key

(‘test’,str), (‘tag’, str), (‘stream’, str),

Default test parameters (in sequence order)

teststr
corresponds to the self._test() function that runs it

tagstr
user specified descriptive tag for the test … anything sensible

streamstr
name or regex pattern for primary dblock data stream(s) to run the test on
Optional test specific parameter:type pairs are defined in the decorator arguments

Raises: ValueError – If the type of a test parameter differs from that in param_specs

PyGarvTest overrides OrderedDict.__setitem__() with additional type checking on the value of test[‘key’] = value
The class variable param_specs specifies mandatory PyGarvTest parameters and types.
Optional decorator arguments can extend the mandatory parameters and types and will be automatically passed to the decorated test function.
all PyGarvTest instances have _default_params with key, type
optional decorater args extend PyGarvTest instances with additional params
public CRUD API is standardized
To preserve test spec order for display and yamlized round trips, test specs are stored internally as OrderedDicts and the setter/getter API wants and returns lists of dict, i.e., ..code-block:: python

[{‘test’:’ppa’}, …{‘interval’:1500.0}]

run(hdr, dblock, \*\*kwargs)

Parameters

hdr (dict) – metadata consulted in running the tests, e.g., sampling rate
dblock (np.ndarray (named dtypes)) – columns of data, typically accessed by dtype.name

Returns

results – sample-wise data rejection mask, 1=bad, 0=good

Return type

np.ndarray, dtype=bool, length = len(dblock)

Usage()

-----

get_specs()[source]

param_type(param)[source]: type of param

property param_types

property params: names of the parameters this test as a list

reset()[source]

set_specs(test_params)[source]: test_params is {key:value, … } for test keys,values

property specs

property specs_as_yaml: returns current specs as yaml string

property types: data types of the values for the parameters as a list

class mkpy.pygarv.PyYarf(yarf_f=None)[source]

Bases: object

YAML test file I/O for PyGarv artifact test parameters

Parameters

yarf_f (str) – file path to well-formed YAML with PyYarf test specification structure

Variables

yarf_docs (list) –

each item is a yarf_doc dict that yamlizes in-out without modification ..code-block:: python

{‘name’: ‘pygarv’ (str),
’dblock_path_idx’: n (uint) ‘dblock_path’: path_to_a_mkh5_dblock (str), ‘tests’: [ test_spec, … test_spec] (list)}

IO methods: read yarf_docs from yaml write yarf_docs to yaml read yarf_docs from mkh5 headers

PyYarf YAML format:

exactly one yaml document per mkh5 dblock_path
each doc is a map with 3 keys: name, dblock_path, tests
the value of name must be pygarv (str)
the value of dblock_path in the ith yaml doc must == mkh5.data_blocks[i] (str)
the value of tests must be a list of test specifications (see PyGarvTest docs)

Examples

# generated by PyYarf
---
dblock_path_idx: 0
dblock_path: calstest/dblock_0
name: pygarv
tests:
  - - test: ppa_event
    - tag: tag1
    - stream: MiPf
    - threshold: 20.0
    - prestim: 500.0
    - poststim: 1500.0
  - - test: ppa_event
    - tag: tag1
    - stream: MiCe
    - threshold: 50.0
    - prestim: 100.0
    - poststim: 1000.0
  - - test: ppa_event
    - tag: tag1
    - stream: MiPa
    - threshold: 10.0
    - prestim: 10.0
    - poststim: 200.0
---
dblock_path_idx: 1
dblock_path: calstest/dblock_1
name: pygarv
tests: []

check_yarf_doc(yarf_doc)[source]

lint_yarf(yarf_stream)[source]: run yamllint on yarf_stream, if errors die informatively

read_from_mkh5(mkh5_f)[source]

scan mkh5 dblock headers and dblock[‘pygarv’] stream artifact test info

Returns: yarf_docs – dict is a PyYarf format dict see PyYarf doc string for details
Return type: list of list of dict where

read_from_yaml(yarf_f)[source]: return yarf_doc list populated with yarf info, if any, from mkh5 dblock headers

to_yaml(yarf_docs)[source]: return yarf_docs YAML-ized as string suitable for serialization