mkpy.pygarv module

pygarv is the backend for marking artifacts in mkh5 data with tests defined in a YAML file

Successful runs of tests and their results are stored in PyGarv.tr_docs a list of tr_doc dicts, one dict per h5 datablock.

Parameters

tr_doc[‘tests’] (list) – each item is a dict

Examples

  • tr_doc[‘tests’]

    [ {'dblock_path_idx': 0,
       'dblock_path': 'calstest/dblock_0',
       'name': 'pygarv',
       'tests': [ [{'test': 'ppa'},
                   {'tag': 'amplitude exursions'},
                   {'stream': 'MiCe'},
                   {'threshold': 0.0},
                   {'interval': 0.0} ],
                  [{'test': 'ppadif'},
                   {'tag': 'amplitude exursions'},
                   {'stream': 'MiCe'},
                   {'threshold': 0.1},
                   {'interval': 0.1},
                   {'stream2': 'MiPa'} ] ]},
    
        {'dblock_path_idx': 1,
         'dblock_path': 'calstest/dblock_1',
         'name': 'pygarv',
         'tests': None},
    ]
    

tr_doc[‘fails’] : list

len(tr_doc[‘fails’] == len(tr_doc[‘tests’]) where tr_doc[‘fails’][idx] is

a list of (start, stop) intervals in dblock_tick indexes where tr_doc[‘test’] failed. tr_doc[‘pygarv’]

  • The tests are specified as a YAML file .yarf.

    ---
    dblock_path: some_path
    dblock_path_idx: unint
    name: pygarv
    tests:
    - - test_spec
      - test_spec
      ...
      - test_spec
    
  • Each test_spec is a YAML map with a mandatory name and tag parameter and optional other paramters as needed for specific tests

    test: str tag: str

where

test names a pygarv test function, e.g., mxflat, ppadif tag is a user-defined descriptive tag, e.g., blocking, heog, fancy test

class mkpy.pygarv.PyGarv(mkh5_f, yarf_f=None)[source]

Bases: object

container to hold an inventory of functions for computing sample-wise artifact masks.

When invoked at the command line, pygarv needs an mkh5 file to work with

There are two cases:
  • has not been previously garved with _update_mkh5()
    • no pygarv test info in header

    • pygarv data streams all zeros

  • data has been previously garved with _update_mkh5()
    • pygarv test info appears in header

    • test results are unknown, possibly None

    • pygarv data stream state is unknown

On init the mkh5 file is scanned for previous runs, if found the pygarv data buffers (volatile) are synced with the info from the h5 file.

For each data block:

  • self.tr_docs are set to match the header[‘pygarv’] dict

  • self.yarf_fails are set according to dblock[‘pygarv’], self.tr_docs

  • the value of pygarv = run_test(db_idx) (what-if run) is checked against the dblock data, discrepanices throw a warning

PyGarv now has persistent and volatile rejection data in alignment, suitable for viewing/editing in mkh5viewer

PyGarvTest

The PyGarvTest decorator handles all the default parameter name and type bookkeeping for specific tests

To add a test to the catalog …

1. implement a function that takes two args (hdr, dblock, **kwargs) and returns a boolean artifact mask of length dblock data samples where 0 = good, 1 = bad.

The hdr (dict), and dblock (np.ndarray) are, e.g., as returned by hdr, dblock = mkh5.get_dblock(path_to_datablock) but can by any dict and dblock that expose variable needed to compute the artifact mask.

  1. decorate it with @PyGarvTest(test_name, [key=dtype, key=dtype])

where test_name is the test name and the list of key_i=dtype_i optionally gives extra parameters named key_1, … key_n with data type dtype.

cppa = {'interval': None, 'stream': None, 'tag': None, 'test': 'cppa', 'threshold': None}[source]
cppadif = {'interval': None, 'stream': None, 'stream2': None, 'tag': None, 'test': 'cppadif', 'threshold': None}[source]
cstdev = {'interval': None, 'stream': None, 'tag': None, 'test': 'cstdev', 'threshold': None}[source]
get_catalog()[source]
get_result(pg_test_result)[source]

convenience wrapper to query a test result, decode the mask, and return with its test in a handy package.

Parameters

pg_test_result (a (tr_doc, pygarv_mask) tuple) – as returned by run_* functions

maxflat = {'nsamp': None, 'poststim': None, 'prestim': None, 'stream': None, 'tag': None, 'test': 'maxflat', 'threshold': None}[source]
param_types = {'interval': <class 'float'>, 'stream': <class 'str'>, 'threshold': <class 'float'>}
ppa = {'poststim': None, 'prestim': None, 'stream': None, 'tag': None, 'test': 'ppa', 'threshold': None}[source]
ppadif = {'poststim': None, 'prestim': None, 'stream': None, 'stream2': None, 'tag': None, 'test': 'ppadif', 'threshold': None}[source]
run_dblock(dbp_idx, tr_doc)[source]

Run tests in the tr_doc for datablock at dbp_idx, returns 64-bit pygarv sample mask.

Parameters
  • dpb_idx (uint) – index of the ith dblock in self.dblock_paths

  • tr_doc (dict) – PyYarf format dict with tr_doc[‘tests’]

Returns

dict of results like so:

{name: 'results',
 dblock_path: str (== the yarf_dbp),
 pygarv : np.ndarray(shape=(len(dblock),),
                     dtype=dblock['pygarv'].dtype),
 fails : list of uint 2-ples (x0, x1)}

Return type

results

  • The fails list amounts to an RLL compression of the boolean vector pygarv > 0

Raises

ValueError if tr_doc['dblock_path'] != self.dblock_paths[dbp_idx]

run_tests()[source]

fetch tests and pygarv mask for all dblocks, does not modify mkh5

Returns

  • pg_test_results (list of 2-ples (tr_doc, pygarv_mask), one)

  • for each datablock in self.mkh5

class mkpy.pygarv.PyGarvTest(test, **kwargs)[source]

Bases: OrderedDict

Decorator class for the PyGarv tests.

This enforces an extensible standard form on PyGarv test specs and execution.

The class derives from OrderedDict so it returns .keys() .values() .items() in fixed original parameter order. This is useful for populating test UI elements and reading writing YAML sequences without scrambling the key:value pairs the way a dict() might.

Parameters

param_specs ([(key,type), …]) –

keystr

parameter label

typePython type

required Python data type for values of the key

(‘test’,str), (‘tag’, str), (‘stream’, str),

  • Default test parameters (in sequence order)

    teststr

    corresponds to the self._test() function that runs it

    tagstr

    user specified descriptive tag for the test … anything sensible

    streamstr

    name or regex pattern for primary dblock data stream(s) to run the test on

  • Optional test specific parameter:type pairs are defined in the decorator arguments

Raises

ValueError – If the type of a test parameter differs from that in param_specs

  • PyGarvTest overrides OrderedDict.__setitem__() with additional type checking on the value of test[‘key’] = value

  • The class variable param_specs specifies mandatory PyGarvTest parameters and types.

  • Optional decorator arguments can extend the mandatory parameters and types and will be automatically passed to the decorated test function.

  • all PyGarvTest instances have _default_params with key, type

  • optional decorater args extend PyGarvTest instances with additional params

  • public CRUD API is standardized

  • To preserve test spec order for display and yamlized round trips, test specs are stored internally as OrderedDicts and the setter/getter API wants and returns lists of dict, i.e., ..code-block:: python

    [{‘test’:’ppa’}, …{‘interval’:1500.0}]

run(hdr, dblock, \*\*kwargs)
Parameters
  • hdr (dict) – metadata consulted in running the tests, e.g., sampling rate

  • dblock (np.ndarray (named dtypes)) – columns of data, typically accessed by dtype.name

Returns

results – sample-wise data rejection mask, 1=bad, 0=good

Return type

np.ndarray, dtype=bool, length = len(dblock)

Usage()
-----
get_specs()[source]
param_type(param)[source]

type of param

property param_types
property params

names of the parameters this test as a list

reset()[source]
set_specs(test_params)[source]

test_params is {key:value, … } for test keys,values

property specs
property specs_as_yaml

returns current specs as yaml string

property types

data types of the values for the parameters as a list

class mkpy.pygarv.PyYarf(yarf_f=None)[source]

Bases: object

YAML test file I/O for PyGarv artifact test parameters

Parameters

yarf_f (str) – file path to well-formed YAML with PyYarf test specification structure

Variables

yarf_docs (list) –

each item is a yarf_doc dict that yamlizes in-out without modification ..code-block:: python

{‘name’: ‘pygarv’ (str),

’dblock_path_idx’: n (uint) ‘dblock_path’: path_to_a_mkh5_dblock (str), ‘tests’: [ test_spec, … test_spec] (list)}

IO methods

read yarf_docs from yaml write yarf_docs to yaml read yarf_docs from mkh5 headers

PyYarf YAML format:
  • exactly one yaml document per mkh5 dblock_path

  • each doc is a map with 3 keys: name, dblock_path, tests

  • the value of name must be pygarv (str)

  • the value of dblock_path in the ith yaml doc must == mkh5.data_blocks[i] (str)

  • the value of tests must be a list of test specifications (see PyGarvTest docs)

Examples

# generated by PyYarf
---
dblock_path_idx: 0
dblock_path: calstest/dblock_0
name: pygarv
tests:
  - - test: ppa_event
    - tag: tag1
    - stream: MiPf
    - threshold: 20.0
    - prestim: 500.0
    - poststim: 1500.0
  - - test: ppa_event
    - tag: tag1
    - stream: MiCe
    - threshold: 50.0
    - prestim: 100.0
    - poststim: 1000.0
  - - test: ppa_event
    - tag: tag1
    - stream: MiPa
    - threshold: 10.0
    - prestim: 10.0
    - poststim: 200.0
---
dblock_path_idx: 1
dblock_path: calstest/dblock_1
name: pygarv
tests: []
check_yarf_doc(yarf_doc)[source]
lint_yarf(yarf_stream)[source]

run yamllint on yarf_stream, if errors die informatively

read_from_mkh5(mkh5_f)[source]

scan mkh5 dblock headers and dblock[‘pygarv’] stream artifact test info

Returns

yarf_docs – dict is a PyYarf format dict see PyYarf doc string for details

Return type

list of list of dict where

read_from_yaml(yarf_f)[source]

return yarf_doc list populated with yarf info, if any, from mkh5 dblock headers

to_yaml(yarf_docs)[source]

return yarf_docs YAML-ized as string suitable for serialization