Data and source goes in: artefacts come out
How do we make sure we can port and reproduce the same results?
Is this an overkill? Are we overthinking it?
Many different pythons co-exist in a host… $PATH determines which one we run
import numpy
Tip: Use help(‘modules’) to find all installed packages
Ref: Python 3 doc: the module search path and Chris Yeh’s guide to import statements
Warning
All third-party packages are installed in (“copied into”) the same folder {python_root}/site-packages!
Ref: Alex Becker’s “How pip install works”
Does anything like this sound familiar?
> pip install google-cloud-storage==1.13.2
ERROR: google-cloud-storage 1.13.2 has requirement XYZ
but you'll have google-cloud-core 0.28.1 which is incompatible.
Or this:
> pip install coolest-package
... bla bla ...
clang: error: unsupported option '-fopenmp'
error: command 'gcc' failed with exit status 1
conda create -n vintage_env python=2.7
conda activate vintage_env
# now we are in a vintage 2.7 env
# install your favorite package, for example:
conda install pandas
# inspect the output
# run the python interpreter
python # which version is it?
conda deactivate
# back to the base
conda env export --no-builds > env.yml
# I would recommend to manually edit `env.yml`:
# 1) remove any possible native libraries (gloups!)
# 2) and the `prefix` entry
# now let's create an environment from those specs (will fail!)
conda env create -f env.yml
# clean-up:
conda env remove -n env_name
Warning
Envs ported like this are not fully reproducible: different builds + different native libraries –> different output
conda install git
…but you may need to use esoteric channels:
conda config --add channels new_channel
which you can find by searching in Anaconda.org.
More info: Using Pip in a Conda Environment
conda doesn’t know how to manage pip dependencies ==> no proper dependency resolution for pip-installed packages
> which conda
/Users/arnau.tibau/miniconda3/condabin/conda
# my conda root is at `/Users/arnau.tibau/miniconda3`.
# the environments are in the `envs` folder
> ls /Users/arnau.tibau/miniconda3/envs/
# for a given environment, the Python packages are in
# their corresponding site-packages:
> ls /path/to/envs/ml_in_prod/lib/python3.7/site-packages/
(ml_in_prod) > which python
/Users/arnau.tibau/miniconda3/envs/ml_in_prod/bin/python
# $PATH's first element is the conda env bin folder
> echo $PATH
/Users/arnau.tibau/miniconda3/envs/ml_in_prod/bin: ....
BONUS: Create a jupyter notebook kernel linked to this new environment
conda create -n practice_env Python=3.7.3
conda activate practice_env
# Tensorflow 2.1 is not available for macOS in any conda channel,
# so we skip it for now
conda install flask=1.1.1 pandas=1.0 scikit-learn=0.22.1
# We install Tensorflow 2.1 from pip
pip install tensorflow==2.1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Example solution for env_test.py
# (only testing `pandas`)
def test_pandas():
import pandas as pd
pd.DataFrame({'a': [1]})
tests = {'pandas': test_pandas}
for name, test in tests.items():
try:
test()
except Exception as e:
print(f"Error! Tests failed for {name}: {e}")
else:
print(f"OK: Tests passed for {name}")
|
conda env export > practice_env.yml
# note we give it a different name so as to not override our own
conda env create -f practice_env.yml -n other_practice_env
BONUS: Create a jupyter kernel linked to practice_env
conda activate practice_env
conda install ipykernel
ipython kernel install --user --name=practice_env