Environment management

1

Why should we care?

2

How we typically think of our work

../_images/Paper.ML_in_Prod_-_feb_2020.1.png

Data and source goes in: artefacts come out

3

Even the simplest code relies on many other things…

../_images/Paper.ML_in_Prod_-_feb_2020.6.png
4

And needs to run on different environments

../_images/Paper.ML_in_Prod_-_feb_2020.3.png

How do we make sure we can port and reproduce the same results?

5

How we should think of our work

../_images/Paper.ML_in_Prod_-_feb_2020.2.png

Is this an overkill? Are we overthinking it?

6

Python dependency management

7

Many snakes…

../_images/Paper.ML_in_Prod_-_feb_2020.8.png

Many different pythons co-exist in a host… $PATH determines which one we run

8

How does Python imports libraries?

import numpy
  1. built-in packages (import sys; sys.builtin_module_names)
  2. Current directory
  3. Rest of sys.path (which includes {python_root}/site-packages

Tip: Use help(‘modules’) to find all installed packages

Ref: Python 3 doc: the module search path and Chris Yeh’s guide to import statements

9

What does pip install do?

  1. Looks for source or binary distribution in PyPi Index
  2. Discovers its dependencies
  3. Installs all packages (target + dependencies)

Warning

All third-party packages are installed in (“copied into”) the same folder {python_root}/site-packages!

Ref: Alex Becker’s “How pip install works”

10

Example: numpy

../_images/Paper.ML_in_Prod_-_feb_2020.9.png

https://pypi.org/simple/numpy

11

Pip hell

Does anything like this sound familiar?

> pip install google-cloud-storage==1.13.2

ERROR: google-cloud-storage 1.13.2 has requirement XYZ
but you'll have google-cloud-core 0.28.1 which is incompatible.

Or this:

> pip install coolest-package

... bla bla ...
clang: error: unsupported option '-fopenmp'
error: command 'gcc' failed with exit status 1
12

Pip hell (2)

13

We need a tool to:

14

Available tools

../_images/Paper.ML_in_Prod_-_feb_2020.7.png
15

Env Management via Conda

16

What is conda?

17

Quick foreplay

conda create -n vintage_env python=2.7

conda activate vintage_env
# now we are in a vintage 2.7 env

# install your favorite package, for example:
conda install pandas
# inspect the output

# run the python interpreter
python # which version is it?

conda deactivate
# back to the base
18

You can port conda environments

conda env export --no-builds > env.yml
# I would recommend to manually edit `env.yml`:
# 1) remove any possible native libraries (gloups!)
# 2) and the `prefix` entry

# now let's create an environment from those specs (will fail!)
conda env create -f env.yml

# clean-up:
conda env remove -n env_name

Warning

Envs ported like this are not fully reproducible: different builds + different native libraries –> different output

19

You can manage non-python dependencies

conda install git

…but you may need to use esoteric channels:

conda config --add channels new_channel

which you can find by searching in Anaconda.org.

20

You can continue using pip… with care

More info: Using Pip in a Conda Environment

conda doesn’t know how to manage pip dependencies ==> no proper dependency resolution for pip-installed packages

21

conda envs limitations

22

Quizz

  1. Where is conda installing your packages?
  2. What do you think which python returns from within a conda env?
  3. What do you think echo $PATH returns from within a conda env?
23

Quizz (solution)

  1. Where is conda installing your packages?
> which conda
/Users/arnau.tibau/miniconda3/condabin/conda
# my conda root is at `/Users/arnau.tibau/miniconda3`.
# the environments are in the `envs` folder
> ls /Users/arnau.tibau/miniconda3/envs/
# for a given environment, the Python packages are in
# their corresponding site-packages:
> ls /path/to/envs/ml_in_prod/lib/python3.7/site-packages/
24

Quizz (solution) (2)

  1. What do you think which python returns from within a conda env?
(ml_in_prod) > which python
/Users/arnau.tibau/miniconda3/envs/ml_in_prod/bin/python
25

Quizz (solution) (3)

  1. What do you think echo $PATH returns from within a conda env?
# $PATH's first element is the conda env bin folder
> echo $PATH
/Users/arnau.tibau/miniconda3/envs/ml_in_prod/bin: ....
26

Practice time!

  1. Create a new conda env for Python 3.7.3 named practice_env
  2. Install Flask 1.1.1, Tensorflow 2.1, pandas 1.0, scikit-learn 0.22.1
  3. Write a test_env.py that verifies import correctness
  4. Export the environment and send it to the person on your right
  5. Create a new environment from the environment sent to you by the person to your left

BONUS: Create a jupyter notebook kernel linked to this new environment

27

Practice time (solution)

  1. Create a new conda env for Python 3.7.3 named practice_env
  2. Install Flask 1.1.1, Tensorflow 2.1, pandas 1.0, scikit-learn 0.22.1
conda create -n practice_env Python=3.7.3
conda activate practice_env
# Tensorflow 2.1 is not available for macOS in any conda channel,
# so we skip it for now
conda install flask=1.1.1 pandas=1.0 scikit-learn=0.22.1
# We install Tensorflow 2.1 from pip
pip install tensorflow==2.1
28

Practice time (solution) (2)

  1. Write a test_env.py that verifies import correctness
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Example solution for env_test.py
# (only testing `pandas`)
def test_pandas():
    import pandas as pd
    pd.DataFrame({'a': [1]})

tests = {'pandas': test_pandas}
for name, test in tests.items():
    try:
        test()
    except Exception as e:
        print(f"Error! Tests failed for {name}: {e}")
    else:
        print(f"OK: Tests passed for {name}")
29

Practice time (solution) (3)

  1. Export the environment
conda env export > practice_env.yml
  1. Create a new environment from the environment sent to you
# note we give it a different name so as to not override our own
conda env create -f practice_env.yml -n other_practice_env
30

Practice time (solution) (4)

BONUS: Create a jupyter kernel linked to practice_env

conda activate practice_env
conda install ipykernel
ipython kernel install --user --name=practice_env
31

Additional resources

32