Read and write epochs as HDF5, feather, or csv

spudtr runs on pandas.DataFrame so use pandas IO tools to read and write epochs data.

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

The feather format is a good choice for Python and R, the files are incrementally larger than HDF5 but read more quickly.

The HDF5 format is fairly portable across Python, R, and MATLAB.

Both feather and HDF5 are binary formats that read-write-read round trip with identical pandas dataframes.

Writing the epochs data as text .csv is possible but the data do not round trip. It is a bad idea and should be used only as a desparate last resort.

Example: read HDF5 EEG data

[1]:
import pandas as pd
from spudtr import epf
from spudtr import get_demo_df, P3_100_FEATHER  # small sample epochs file

from spudtr import DATA_DIR # replace DATA_DIR with the path to your own data directory

# fetch epochs data for demonstration
epochs_df = get_demo_df(P3_100_FEATHER)

# verify the format
eeg_channels = ['MiPf', 'MiCe', 'MiPa', 'MiOc']
epf.check_epochs(epochs_df, eeg_channels, epoch_id="epoch_id", time="time_ms")
downloading ./spudtr/data/sub000p3.ms100.epochs.feather from https://zenodo.org/record/3968485/files/ ... please wait

write/read epochs as feather

[2]:
epochs_df.to_feather(DATA_DIR / "io_demo.epochs.feather")
epochs_df_fthr = pd.read_feather(DATA_DIR / "io_demo.epochs.feather")

assert epochs_df.equals(epochs_df_fthr)  # verify round trip

write/read epochs as pandas/pytables HDF5

Note: The pandas default HDF5 file mode is append. This means the default behavior for re-running .to_hdf() in a jupyter notebook cell is to append copy after copy after copy of your epochs to the HDF5 file which is probably not what you want.

To prevent this, call to_hdf(..., mode="w"). This makes it behaves like to_feather() and save one copy of the current epochs data in the HDF5. Setting format="fixed" is not necessary for read/writes in Python/pandas but it simplifies the guts of the HDF5 file and makes it more portable across platforms if that is of interest.

[3]:
epochs_df.to_hdf(DATA_DIR / "io_demo.epochs.h5", key="io_demo", mode="w", format="fixed")
epochs_df_h5 = pd.read_hdf(DATA_DIR / "io_demo.epochs.h5", key="io_demo")

assert epochs_df.equals(epochs_df_h5)  # verify the round trip

verify the feather and HDF5 data are the same

[4]:
assert epochs_df_fthr.equals(epochs_df_h5)

do not write/read epochs data as ascii text unless absolutely necessary … it does not round trip

[5]:
# write the binary dataframe as a tab-separated text file
epochs_df.to_csv(DATA_DIR / "io_demo.epochs.tsv", index=False, sep="\t")
epochs_df_tsv = pd.read_csv(DATA_DIR / "io_demo.epochs.tsv", sep="\t")

# write the dataframe read from from text back to (another) text file
epochs_df_tsv.to_csv(DATA_DIR / "io_demo_2.epochs.tsv", sep="\t")
epochs_df_tsv_2 = pd.read_csv(DATA_DIR / "io_demo_2.epochs.tsv", sep="\t")

try:
    assert epochs_df.equals(epochs_df_tsv), "to_csv(), read_csv, does not round trip"
except AssertionError as fail:
    print(fail)

try:
    assert epochs_df_tsv.equals(epochs_df_tsv_2), "even after an initial conversion to text"
except AssertionError as fail:
    print(fail)


to_csv(), read_csv, does not round trip
even after an initial conversion to text
[ ]: