Setup Environment#

The project uses conda to manage its environment and packages. Conda is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. You should have a relatively smooth experience running the project with any of those platforms. Conda handles library dependencies outside the Python packages as well as the Python packages themselves.

✏️ We test the project running on Linux based system (ML workstations) and MacBook M1 laptops (ARM-based CPU). It is generally harder to get Python things running on M1. However, we do not test the implementation under Windows.

Well, this guide is not intended as an extensive intro to conda. To get more insight, we recommend reading [Vat22] or [Sar20].

Setup#

Before installing the environment, follow official instructions for installing conda on your target platform. We recommend installing the minimal version of Anaconda called Miniconda. The minimalistic version includes only conda and its dependencies.

✏️ To speed up the installation of packages, install Mamba to the base conda environment with the following:

conda install mamba -n base -c conda-forge

Once installed, run the following command to replicate the environment that is being used for lectures and exercises:

mamba env create -f environment.yml

If you have installed Mamba. If not, run:

conda env create -f environment.yml

To start using the environment, run:

conda activate ds-academy

Once active, you might want to run Jupyter Notebook Server by executing:

jupyter notebook

Another alternative way to execute notebooks is using IDEs. To start with notebooks, check out the guide on notebooks.

New Environment#

It is convenient to create a separate environment for experimenting (and for the end2end ML project). The environment.yml located in the project can be used as an inspiration. Check out recommended resources [Vat22] or [Sar20] to get a better idea.

In general, we recommend installing as many packages as possible using Conda as it handles library dependencies outside of the Python packages as well as the Python packages themselves. Most of the packages are hosted on conda-forge. If you cannot find your package in any conda-forge repository, install them using pip.

name: my-awesome-project
channels:
  - conda-forge  # prioritize packages from conda-forge repositories
  - defaults
dependencies:
  - python>=3.7,<=3.9  # version of Python, there are minimum requirements on version, we leave it up to conda
  - pip  # a package taken from conda repository
  - matplotlib
  - numpy
  - notebook
  - pip:
      - see  # an example of package installed using pip as it is not available on conda

✏️ For production environments, it is good practice to fix the package version to ensure that the environment is the same as the original one. For example, to specify the version of Python, use python>=3.7,<=3.9.

Tips#

You can delete the ds-academy environment and start over at any time with the command:

conda remove --name ds-academy --all

Resources#

Sar20(1,2)

Matthew Sarmiento. The definitive guide to conda environments. Feb 2020. URL: https://towardsdatascience.com/a-guide-to-conda-environments-bc6180fc533.

Vat22(1,2)

Vatsal. Comprehensive guide to python virtual environments using conda for data scientists. May 2022. URL: https://towardsdatascience.com/comprehensive-guide-to-python-virtual-environments-using-conda-for-data-scientists-6ebea645c5b.