# Create a file called `Makefile` and write these 2 lines
hello.txt:
echo "Hello world!" > hello.txt
> make hello.txt
echo "Hello world!" > hello.txt
> cat hello.txt
Hello world!
# recipe for `target_1`
target_1: dependencies_1
task_1
...
# recipe for `target_N`
target_N: target_1 target_5
task_N
(assuming both parents exist)
.setup_done: environment.yml
conda env create --force -f environment.yml
mkdir -pv data
touch .setup_done
We may be tempted to add targets such as this one:
train: data/model.tar.gz
.PHONY: train
train: data/model.tar.gz
And now make knows that train is always out-of-date.
We can define variables within a Makefile
DATA_FOLDER = data/
ARTEFACT_PATH = $(DATA_FOLDER)/model.tar.gz
.PHONY: train
train: $(ARTEFACT_PATH)
We can even define them in a separate file and include them:
include Makefile.conf # Filename is irrelevant
.PHONY: train
train: $(ARTEFACT_PATH)
where Makefile.conf would contain the variable definition
A very useful feature are variables that can be passed as command-line arguments:
DATA_FOLDER ?= data/ # Notice the "?" before the "="
ARTEFACT_PATH = $(DATA_FOLDER)/model.tar.gz
.PHONY: train
train: $(ARTEFACT_PATH)
And now we can do, for example
> make train DATA_FOLDER=/tmp/data
# Simple example Makefile
.setup_done:
touch .setup_done
data.gz: .setup_done
# `$@` below is a "special" variable with the name of the target
touch $@
model.gz: data.gz
touch $@
# This will not run the targets but will show you the execution plan
> make model.gz --dry-run
.setup_done:
touch .setup_done
data_1.gz: .setup_done
touch $@
data_2.gz: .setup_done
touch $@
model.gz: data_1.gz data_2.gz
touch $@
# The order of execution might vary but it always respects the DAG
> make model.gz --jobs 2
Sometimes make’s behavior can be a bit puzzling…
Since Make spawns a new shell, we need to explicitly use the python from our conda env:
# replace with your conda root folder:
CONDA_ROOT = /Users/arnau.tibau/miniconda3/
ENV_NAME = ml_in_prod
PIP = $(CONDA_ROOT)/envs/$(ENV_NAME)/bin/pip
PYTHON = $(CONDA_ROOT)/envs/$(ENV_NAME)/bin/python
.setup_done:
# let's simulate us having our own package
echo "from setuptools import setup; setup()" > setup.py
# Installs our package in the desired environment
$(PIP) install -e .
touch .setup_done
model.gz: data.gz
# Runs the script under the right environment
$(PYTHON) my_training_script.py data.gz
What would this recipe do?
A B: C D
touch A
echo "B" > B
What would this recipe do?
A B: C D
touch A
echo "B" > B
b) It would generate two files (A and B) if C *OR* D’s state changes since last execution
Do these two Makefiles have the same behavior when you run make all?
# Makefile #1
A B: C
touch A
touch B
all: A B
# Makefile #2
A: C
touch A
B: C
touch B
all: A B
Do these two Makefiles have the same behavior when you run make all? No!
# Generates A and B whenever C changes
A B: C
touch A
touch B
all: A B
# Generates A or B whenever C changes, depending
# on whether A or B are up-to-date
A: C
touch A
B: C
touch B
all: A B
Write a Makefile with at least:
1. A setup target that creates a conda environment from the Yaml file we created in last chapter’s Practice time and runs the corresponding test_env.py
You can find inspiration in the capstone project Makefile
SHELL := bash
.ONESHELL: # instructs make to use the same shell within a recipe
.SHELLFLAGS := -eu -o pipefail -c # a few recommended shell flags
ENV_NAME ?= practice_env
CONDA_ROOT =? /Users/arnau.tibau/miniconda3/
PYTHON = $(CONDA_ROOT)/envs/$(ENV_NAME)/bin/python
.setup_done: environment.yml
echo "Creating and testing environment"
conda env create --force -f environment.yml -n $(ENV_NAME)
$(PYTHON) test_env.py
touch .setup_done
.PHONY: setup
setup: .setup_done
.PHONY: clean
clean: .setup_done
echo "Cleaning up environment $(ENV_NAME)..."
conda env remove -n $(ENV_NAME)
rm .setup_done
# let's download a couple files
download_data:
wget https://www.w3.org/TR/PNG/iso_8859-1.txt http://humanstxt.org/humans.txt
# this will generate a .words file for each .txt file
%.words : %.txt
# the `$<` variable refers to the dependency
# the `$@` variable refers to the target
cat $< | wc -w > $@
CODE = "import sys;print(sum(int(open(v).read()) for v in sys.argv[1:]))"
summary : iso_8859-1.words humans.words
python -c $(CODE) $(wildcard *.words) > $@