Installation

TL;DR Use mamba or conda to install the fitgrid conda package along with your other packages into a fresh conda environment on a fast multicore x64_86 computer with gobs of RAM.

About conda virtual environments

fitgrid is packaged on anaconda.org/kutaslab/fitgrid for installation into conda “virtual” environments using the conda or mamba package manager. A virtual environment isolates the fitgrid installation to prevent clashes with what is already installed elsewhere in your system and other virtual environments. When the package manager installs fitgrid in a virtual environment it also automatically installs compatible versions of the hundreds of Python and R packages fitgrid requires to run including numpy, pandas, matplotlib, statsmodels, and pymer4, rpy2, R, lme4, and lmerTest to name a few. You can also install other conda packages in addition to fitgrid as needed for the task at hand.

The steps for creating conda environments and installing fitgrid are straightforward but it is prudent to have a general understanding of conda virtual environments and at least these commands: conda create ..., conda install -c ..., conda activate ..., and conda deactivate. See the Conda Cheat Sheet for a summary. For fine-tuning conda environments and working around incompatible package versions refer to the core conda tasks especially managing conda channels and channel priority and installing packages. The mamba package installer is an alternative to conda. At present, the mamba create ... and mamba install ... commands tend to resolve the complex fitgrid package dependencies substantially faster than conda.

For working with fitgrid and mixed Python and R conda environments generally, it is important to attend to the difference between the conda default channels anaconda.org/main and anaconda.org/r where packages are maintained by the Anaconda, Inc. team and the not-always-compatible parallel universe of the conda-forge channel where many of the same-named conda packages are maintained by the open-source community. Choosing suitable channels for installing conda packages can be tricky because the specific versions of packages required for compatibility and best performance depend on the computing hardware, operating systems, compilers, and the version requirements of packages that are already in the environment or will be. The conda-forge maintainers recommend setting the .condarc configuration file to strict conda-forge channel priority. However, revising the default channel priority may not be appropriate for all users, in which case the command-line options --channel conda-forge --strict-channel-priority may be used. The examples below illustrate command line options for a few common installation scenarios encountered in practice.

Our current recommended best practice for working with conda environments is to install the lightweight miniconda3 and then avoid polluting the “base” conda environment with data analysis and application packages like fitgrid. Instead, create separate new working environments, each populated with the packages needed for a given project. The mamba package is an exception to this rule. If you elect to use mamba follow the installation instructions carefully.

How to install fitgrid

These examples show how to install fitgrid into a new conda working environment from the conda base environment with a shell command in a linux or Mac terminal window. They assume the conda and mamba executables are already installed in the base environment and the users’s channel configuration is the minconda3 default shown here:

(base) $ which conda mamba
/home/your_userid/miniconda3/bin/conda
/home/your_userid/miniconda3/bin/mamba
(base) $ conda config --show channels default_channels channel_priority
channels:
  - defaults
default_channels:
  - https://repo.anaconda.com/pkgs/main
  - https://repo.anaconda.com/pkgs/r
channel_priority: flexible

Note

The example installation commands are broken into separate lines for readability. If you do this, make sure the \ is the last character on each line. Alternatively you can enter the command as a single line without any \.

with mamba

fitgrid stable release

This is a typical installation of the latest stable release of fitgrid into a fresh conda environment named fg_012021. This pattern is likely to be compatible with recent versions of other conda packages for x86_64 linux platforms and recent Intel Mac OSX.

(base) $ mamba create --name fg_012021 \
    -c conda-forge -c ejolly -c kutaslab \
    fitgrid

Note

This installation currently defaults to OpenBLAS builds of matrix math and linear algebra libraries so execution time on some Intel CPUs may be substantially longer than for the Intel Math Kernel (MKL) builds of the libraries. For a workaround see Selecting MKL or OpenBLAS below.

fitgrid development version

At times, the development version of fitgrid runs ahead of the latest stable release and includes bug fixes and new features. The latest development version may be installed by overriding the default kutaslab conda channel with kutaslab/label/pre-release like so:

(base) $ mamba create --name fg_012021 \
    -c conda-forge -c ejolly -c kutaslab/label/pre-release \
    fitgrid

Selecting a Python version

Specific versions of Python and other packages can be selected for installation with the conda package specification syntax. This example installs fitgrid with the most recent version of Python 3.8.

(base) $ mamba create --name fg_012021 \
    -c conda-forge -c ejolly -c kutaslab \
    fitgrid python=3.8

Selecting MKL or OpenBLAS

On Intel CPUs, the Intel Math Kernel Library (MKL) builds of optimized math libraries like the Basic Linear Algebra Subprograms (BLAS) may offer a substantial performance advantage over OpenBLAS. For AMD CPUs OpenBLAS may outperform MKL. This example shows how to enforce installation of the MKL build and use conda list to inspect the installed packages. To select OpenBLAS builds, replace mkl with openblas in the first command.

(base) $ mamba create --name fg_012021 \
    -c conda-forge -c ejolly -c kutaslab \
    fitgrid "blas=*=mkl*"
(base) $ activate fg_012021
(fg_012021) $ conda list | egrep "(mkl|blas|liblapack)"
# packages in environment at /home/userid/miniconda3/envs/fg_012021:
blas                      2.109                       mkl    conda-forge
blas-devel                3.9.0                     9_mkl    conda-forge
libblas                   3.9.0                     9_mkl    conda-forge
libcblas                  3.9.0                     9_mkl    conda-forge
liblapack                 3.9.0                     9_mkl    conda-forge
liblapacke                3.9.0                     9_mkl    conda-forge
mkl                       2021.2.0           h06a4308_296
mkl-devel                 2021.2.0           h66538d2_296
mkl-include               2021.2.0           h06a4308_296

Prioritize anaconda.org default channels over conda-forge

This example shows how to install fitgrid into an environment populated primarily with the stale-but-stable packages from the Anaconda default channels. The explicit -c conda-forge channel is necessary here because not all dependencies are available on the default conda channels.

(base) $ mamba create --name fg_012021 \
    -c defaults -c conda-forge -c ejolly -c kutaslab \
    fitgrid

with conda

The conda installer may be used in place of mamba as shown in the next example, although dependency resolution may be substantially slower.

(base) $ conda create --name fg_012021 \
    -c conda-forge -c ejolly -c kutaslab \
    fitgrid

Note

The conda and mamba dependency resolution algorithms are not identical and may arrive at different solutions.

pip is not supported

Since fitgrid requires numerous R packages, installing with the Python package installer, pip is no longer supported and is not recommended for general use.

System requirements

The platform of choice is linux. Minimum system requirements are not known but obviously large scale regression modeling with millions of data points is computationally demanding. Current versions of fitgrid are developed and used in Ubuntu 20.04 running on a high-performance multicore server with Intel CPUs (72 cores/144 threads, 1TB RAM); continuous integrations tests run on ubuntu-latest and macos-10.15 on GitHub Actions hosted runners. Previous versions of fitgrid were developed and used in CentOS 7 with Intel CPUs (24 cores/48 threads, 256-512 GB RAM). We are unable to test the Windows 64-bit conda package, field reports are welcome, see Contributing for more information.

Tips

  • Use conda list to inspect package versions and the channels they come from when constructing conda enviroments.

  • To help avoid package version conflicts and speed up the dependency solver it can be useful to specify the Python version and install fitgrid along with the other conda packages you want into a fresh environment in one fell swoop. The package installers cannot see into the future. If packages are installed one by one, the next package version you want may not be compatible with what is already in the environment.

  • mamba create and mamba install are not exact drop in replacements for conda create and conda install because conda has an affinity for packages on default conda channels and mamba has an affinity for packages on conda-forge and they may resolve dependencies differently.

  • What works and what doesn’t when creating conda environments and installing packages depends greatly on the combinations of packages you wish to install. Not all combinations of platforms, Python versions, installers, channel priority, and packages are compatible.

  • Depending on your computer hardware, you may see a significant performance difference between the Intel MKL and OpenBLAS builds of the Basic Linear Algebra Support (BLAS) and Linear Algebra Package (LAPACK) libraries, particularly for fitting mixed-effects models.