{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Read and write epochs as HDF5, feather, or csv\n", "\n", "`spudtr` runs on `pandas.DataFrame` so use `pandas` IO tools to read and write epochs data.\n", "\n", "https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html\n", "\n", "\n", "The `feather` format is a good choice for Python and R, the files are incrementally larger than HDF5 but read more quickly.\n", "\n", "The `HDF5` format is fairly portable across Python, R, and MATLAB.\n", "\n", "Both `feather` and `HDF5` are binary formats that read-write-read round trip with identical pandas dataframes.\n", "\n", "Writing the epochs data as text `.csv` is possible but the data do not round trip. It is a bad idea and should be used only as a desparate last resort." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example: read HDF5 EEG data**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/turbach/miniconda3/envs/mckonda_spudtr_dev/lib/python3.6/site-packages/pyarrow/pandas_compat.py:752: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead.\n", " labels, = index.labels\n" ] } ], "source": [ "import pandas as pd\n", "from spudtr import epf\n", "from spudtr import get_demo_df, P3_100_FEATHER # small sample epochs file\n", "\n", "from spudtr import DATA_DIR # replace DATA_DIR with the path to your own data directory\n", "\n", "# fetch epochs data for demonstration \n", "epochs_df = get_demo_df(P3_100_FEATHER)\n", "\n", "# verify the format\n", "eeg_channels = ['MiPf', 'MiCe', 'MiPa', 'MiOc']\n", "epf.check_epochs(epochs_df, eeg_channels, epoch_id=\"epoch_id\", time=\"time_ms\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**write/read epochs as feather**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "epochs_df.to_feather(DATA_DIR / \"io_demo.epochs.feather\")\n", "epochs_df_fthr = pd.read_feather(DATA_DIR / \"io_demo.epochs.feather\")\n", "\n", "assert epochs_df.equals(epochs_df_fthr) # verify round trip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**write/read epochs as pandas/pytables HDF5**\n", "\n", "Note: The pandas default HDF5 file mode is `append`. This means the default behavior for re-running `.to_hdf()` in a jupyter notebook cell is to append copy after copy after copy of your epochs to the HDF5 file which is probably not what you want.\n", "\n", "To prevent this, call `to_hdf(..., mode=\"w\")`. This makes it behaves like `to_feather()` and save one copy of the current epochs data in the HDF5. Setting `format=\"fixed\"` is not necessary for read/writes in Python/pandas but it simplifies the guts of the HDF5 file and makes it more portable across platforms if that is of interest." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "epochs_df.to_hdf(DATA_DIR / \"io_demo.epochs.h5\", key=\"io_demo\", mode=\"w\", format=\"fixed\")\n", "epochs_df_h5 = pd.read_hdf(DATA_DIR / \"io_demo.epochs.h5\", key=\"io_demo\")\n", "\n", "assert epochs_df.equals(epochs_df_h5) # verify the round trip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**verify the feather and HDF5 data are the same**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "assert epochs_df_fthr.equals(epochs_df_h5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**do not write/read epochs data as ascii text unless absolutely necessary ... it does not round trip**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "to_csv(), read_csv, does not round trip\n", "even after an initial conversion to text\n" ] } ], "source": [ "# write the binary dataframe as a tab-separated text file\n", "epochs_df.to_csv(DATA_DIR / \"io_demo.epochs.tsv\", index=False, sep=\"\\t\")\n", "epochs_df_tsv = pd.read_csv(DATA_DIR / \"io_demo.epochs.tsv\", sep=\"\\t\")\n", "\n", "# write the dataframe read from from text back to (another) text file\n", "epochs_df_tsv.to_csv(DATA_DIR / \"io_demo_2.epochs.tsv\", sep=\"\\t\")\n", "epochs_df_tsv_2 = pd.read_csv(DATA_DIR / \"io_demo_2.epochs.tsv\", sep=\"\\t\")\n", "\n", "try:\n", " assert epochs_df.equals(epochs_df_tsv), \"to_csv(), read_csv, does not round trip\"\n", "except AssertionError as fail:\n", " print(fail)\n", " \n", "try:\n", " assert epochs_df_tsv.equals(epochs_df_tsv_2), \"even after an initial conversion to text\"\n", "except AssertionError as fail:\n", " print(fail)\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }