Read and write epochs as HDF5, feather, or csv
spudtr
runs on pandas.DataFrame
so use pandas
IO tools to read and write epochs data.
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
The feather
format is a good choice for Python and R, the files are incrementally larger than HDF5 but read more quickly.
The HDF5
format is fairly portable across Python, R, and MATLAB.
Both feather
and HDF5
are binary formats that read-write-read round trip with identical pandas dataframes.
Writing the epochs data as text .csv
is possible but the data do not round trip. It is a bad idea and should be used only as a desparate last resort.
Example: read HDF5 EEG data
[1]:
import pandas as pd
from spudtr import epf
from spudtr import get_demo_df, P3_100_FEATHER # small sample epochs file
from spudtr import DATA_DIR # replace DATA_DIR with the path to your own data directory
# fetch epochs data for demonstration
epochs_df = get_demo_df(P3_100_FEATHER)
# verify the format
eeg_channels = ['MiPf', 'MiCe', 'MiPa', 'MiOc']
epf.check_epochs(epochs_df, eeg_channels, epoch_id="epoch_id", time="time_ms")
downloading ./spudtr/data/sub000p3.ms100.epochs.feather from https://zenodo.org/record/3968485/files/ ... please wait
write/read epochs as feather
[2]:
epochs_df.to_feather(DATA_DIR / "io_demo.epochs.feather")
epochs_df_fthr = pd.read_feather(DATA_DIR / "io_demo.epochs.feather")
assert epochs_df.equals(epochs_df_fthr) # verify round trip
write/read epochs as pandas/pytables HDF5
Note: The pandas default HDF5 file mode is append
. This means the default behavior for re-running .to_hdf()
in a jupyter notebook cell is to append copy after copy after copy of your epochs to the HDF5 file which is probably not what you want.
To prevent this, call to_hdf(..., mode="w")
. This makes it behaves like to_feather()
and save one copy of the current epochs data in the HDF5. Setting format="fixed"
is not necessary for read/writes in Python/pandas but it simplifies the guts of the HDF5 file and makes it more portable across platforms if that is of interest.
[3]:
epochs_df.to_hdf(DATA_DIR / "io_demo.epochs.h5", key="io_demo", mode="w", format="fixed")
epochs_df_h5 = pd.read_hdf(DATA_DIR / "io_demo.epochs.h5", key="io_demo")
assert epochs_df.equals(epochs_df_h5) # verify the round trip
verify the feather and HDF5 data are the same
[4]:
assert epochs_df_fthr.equals(epochs_df_h5)
do not write/read epochs data as ascii text unless absolutely necessary … it does not round trip
[5]:
# write the binary dataframe as a tab-separated text file
epochs_df.to_csv(DATA_DIR / "io_demo.epochs.tsv", index=False, sep="\t")
epochs_df_tsv = pd.read_csv(DATA_DIR / "io_demo.epochs.tsv", sep="\t")
# write the dataframe read from from text back to (another) text file
epochs_df_tsv.to_csv(DATA_DIR / "io_demo_2.epochs.tsv", sep="\t")
epochs_df_tsv_2 = pd.read_csv(DATA_DIR / "io_demo_2.epochs.tsv", sep="\t")
try:
assert epochs_df.equals(epochs_df_tsv), "to_csv(), read_csv, does not round trip"
except AssertionError as fail:
print(fail)
try:
assert epochs_df_tsv.equals(epochs_df_tsv_2), "even after an initial conversion to text"
except AssertionError as fail:
print(fail)
to_csv(), read_csv, does not round trip
even after an initial conversion to text
[ ]: