Programming in Python for DS

1

OOP in Python

2

Classes and objects

Here’s the simplest possible class

class Foo: pass

When we “call” a class, we instantiate it, returning an object of type Foo (also known as class instance):

> foo = Foo()
> type(foo)
<class '__main__.Foo'>
3

Classes and objects (2)

In Python, everything is an object. Try for instance:

4

Classes have members

Class members are either:

> foo = Foo()
> dir(foo) # inspects an object members
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
 '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
  ...]

We haven’t defined any attributes or methods and yet our class already has many…

5

Classes have members (2)

All Python (3+) classes implicitly inherit from object:

> o = object()
> dir(o)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
 '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
  '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__',
  '__new__', '__str__', ...']
6

Defining class and instance attributes

class MLModel:
    # this is a class attribute
    name = "MLModel"
    # this is the class constructor
    def __init__(self, parameters):
        # example instance attribute
        self.parameters = parameters
7

Defining class and instance attributes (2)

# When we call the class, it internally executes
# the __init__ method (constructor)
> ml_model = MLModel([1, 2, 3])
# we access the instance attribute
> ml_model.parameters
[1, 2, 3]
# class attributes are also accessible from the instance:
> ml_model.name
8

Defining instance methods

Instance methods are functions defined on the instances of a class. They must take the instance as an input, together with any other arguments

import pickle

class MLModel:
    def __init__(self, parameters):
        self.parameters = parameters
    def save_parameters(self, path):
        """
        Saves the model parameters into a file names `path`
        """
        with open(path, 'wb') as f:
            pickle.dump(f, self.parameters)
9

Defining instance methods (2)

> ml_model = MLModel([1, 2, 3])
# When we call an instance method, the `self`
# parameter is passed implicitly
> ml_model.save_parameters('model.pkl')
10

Defining instance methods (3)

Warning

Class members are mutable!

> ml_model = MLModel([1, 2, 3])
# we can freely override paremeters
> ml_model.parameters = "whatever"
# or even define new methods and attach them to the instance
> ml_models.new_method = lambda x: 2*x

@property decorators are a way to make sure setting attributes is disabled or doesn’t break things

11

Defining class methods

Class methods apply to the class, not the object. They are useful for things like factory methods

class MLModel:

    ....

    @classmethod
    def load(cls, saved_parameters_path):
        """
        Returns an instance of `MLModel` from the parameters saved via
        `MLModel.save_parameters`
        """
        with open(saved_parameters_path, 'rb') as f:
            parameters = pickle.load(f)
            return cls(parameters)
12

Inheritance

Classes can inherit from classes other than object:

class LogisticRegressionModel(MLModel):
    def __init__(self, parameters):
        # This calls LogisticRegressionModel's superclass constructor.
        # It is equivalent to MLModel.__init__(self, parameters) but preferred,
        # because it avoids hard-coding the parent class name
            super().__init__(parameters)

    def predict(self, x):
            ...

Now every instance of LogisticRegressionModel has a save_parameters method inherited from MLModel, and the LogisticRegressionModel class has a load method.

13

Multiple Inheritance

import numpy as np
class LinearModel:
    def __init__(self, coefficients):
        self.coefficients = coefficients
    def predict(self, x):
        assert x.shape[1] == self.coefficients.shape[0]
        return np.dot(x, self.coefficients)

class LogisticRegressionModel(MLModel, LinearModel):
    def __init__(self, parameters):
            super().__init__(parameters)
            # We also need to call the constructor for `LinearModel`!
            super(MLModel, self).__init__(parameters)
    def predict(self, x):
            x_times_coef = super().predict(x)
            return sigmoid(x_times_coef)
14

Multiple Inheritance (2)

../_images/Paper.ML_in_Prod_-_feb_2020.15.png

When we run super().__init__(parameters), which of the 2 parent classes’ __init__ do we run?

15

Multiple Inheritance (3)

Multiple inheritance is tricky, particularly when we have diamond inheritance or methods with the same name:

> LogisticRegressionModel.mro()
[<class '__main__.LogisticRegressionModel'>, <class '__main__.MLModel'>, <class '__main__.LinearModel'>, <class 'object'>]

More on super: Super considered super

16

Multiple Inheritance (4)

../_images/Paper.ML_in_Prod_-_feb_2020.16.png

When we run super().__init__(parameters), we will run MLModel.__init__

17

Mixin classes

Mix-in classes are designed to be used with multiple inheritance:

18

Mixin classes (2)

class ModelSavingMixin:
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    def save_parameters(self, path):
        with open(path, 'wb') as f:
            # this will fail unless `parameters` exist!
            pickle.dump(f, self.parameters)
    @classmethod
    def load(cls, saved_parameters_path):
        with open(saved_parameters_path, 'rb') as f:
            parameters = pickle.load(f)
            return cls(parameters)
19

Mixin classes (3)

class LogisticRegressionModel(ModelSavingMixin, LinearModel):
    def __init__(self, parameters):
            super().__init__(parameters)
            # No longer need to call the constructor for `LinearModel`!
    def predict(self, x):
            x_times_coef = super().predict(x)
            return sigmoid(x_times_coef)

The LogisticRegressionModel now has saving and loading functionality inherited from ModelSavingMixin

20

Mixin classes (4)

We still have a problem…

class ModelSavingMixin:
    ...
    def save_parameters(self, path):
        with open(path, 'wb') as f:
            # this will fail unless `parameters` exist!
            pickle.dump(f, self.parameters)

Some models may have a vector of parameters, others are parameterized otherwise…. How can we make this more generic so that it can be reused?

21

Abstract classes and interfaces

The latter is a good reason to define the mix-in as an abstract class:

class AbstractModelSavingMixin:
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    def get_parameters(self):
        raise NotImplementedError
    def save_parameters(self, path):
        with open(path, 'wb') as f:
            # this will fail unless we override self.get_parameters()
            pickle.dump(f, self.get_parameters())
    @classmethod
    def load(cls, saved_parameters_path):
        with open(saved_parameters_path, 'rb') as f:
            parameters = pickle.load(f)
            return cls(parameters)
22

Abstract classes and interfaces (2)

For now the only thing that makes the class abstract is its name and the unimplemented get_parameters()

The abc (abstract base classes) library is useful to make it more like a regular abstract class:

23

Abstract classes and interfaces (3)

import abc
# inherits from abc.ABC, which internally sets a `meta_class`
# cf: https://realpython.com/python-metaclasses/
class AbstractModelSavingMixin(abc.ABC):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    @abc.abstractmethod
    def get_parameters(self):
        """Retrieves the parameters of the model as a pickable object"""
    def save_parameters(self, path):
        with open(path, 'wb') as f:
            # this will fail unless we override self.get_parameters()
            pickle.dump(f, self.get_parameters())
    ...
24

Abstract classes and interfaces (4)

And now we just need to implement the get_parameters method in its subclasses:

class LogisticRegressionModel(ModelSavingMixin, LinearModel):
    def __init__(self, parameters):
            super().__init__(parameters)
    def get_parameters(self):
        # the implementation could vary depending on the model.
        return self.parameters
    def predict(self, x):
            x_times_coef = super().predict(x)
            return sigmoid(x_times_coef)
> model = LogisticRegressionModel(np.ones(3))
> model.save_parameters('model.tmp')
> loaded_model = LogisticRegressionModel.load('model.tmp')
25

Abstract classes and interfaces (5)

Any class subclassing AbstractModelSavingMixin will:

> model = LogisticRegressionModel(np.ones(3))
> issubclass(LogisticRegressionModel, AbstractModelSavingMixin)
> isinstance(model, AbstractModelSavingMixin)

More info: Abstract Base Classes from PyMOTW

26

Why bother about all of this!?

Warning

Beware of building abstractions for problems we don’t yet understand! It’s often better to start with concrete implementations.

27

Python’s philosophy: (Dynamic) Duck typing

“If it walks like a duck and it quacks like a duck, then it must be a duck”

../_images/ducktourboatbeaching.jpeg
28

Practice time

Let’s practice by improving the way we implemented text pre-processing in the capstone project.

1. Take a look at how we apply a text pre-processing in the LocalTextCategorizationDataset class in the preprocessing package and how we instantiate this class in the train module.

  1. What are the problems of this approach?

3. Let’s think and implement a design that: (1) Ensures the function we pass to the LocalTextCategorizationDataset constructor abides by an expected interface, and (2) Enables other developers to build other pre-processing implementations that will work

29

Packaging in Python

30

Modules

A Python “module” is a single namespace, with a collection of values:

# hello_world.py
hello_str = "hello {}"
def hello(name): return hello_str.format(name)
> from hello_world import hello
> hello("arnau")
31

Packages

A package is a directory with a file called __init__.py and any number of modules or other package directories:

greetings
   __init__.py
   hello_world.py
   spanish
     __init__.py
     hola_mundo.py

The __init__.py file can be empty or have code in it, it will be run when the package is imported (import greetings)

32

Packages (2)

In addition, a python package is usually bundled with:

33

setuptools

setuptools is an extension to Python’s original packaging tool (distutils) that provides a number of functionalities:

34

setuptools (2)

setup.py # this is the installer/build script
greetings
   __init__.py
   hello_world.py
   spanish
     __init__.py
     hola_mundo.py
# setup.py
import setuptools
setuptools.setup(
    name='greetings',
    version='0.0.1',
    packages=setuptools.find_packages(),
    install_requires=['pandas >0.1,<1.0'] # dependencies
    python_requires='>=3.6' # python version
)
35

setuptools (3)

This was just a simple example, setup.py enables a lot more configuration:

It’s out of the scope of this training but instead of passing arguments to the setuptools.setup function, one can also configure the build via a setup.cfg configuration file.

36

setuptools (4)

With the setup.py ready, we can:

# build a (source or wheel) distribution
> python setup.py sdist bdist_wheel

# install locally
> python setup.py install
# or
> pip install .

# or install in develop/editable mode
> python setup.py develop
> pip install -e .
37

Practice time

1. Create a greetings package with two sub-packages, english and spanish, each containing a simple function to greet in the corresponding language, using the termcolor package to print the greeting in color.

  1. Write the setup.py script for the package, including the requirement for the termcolor dependency
  2. Install the package in develop mode (don’t forget to do it in the ml_in_prod environment!)
  3. Change some of the code and notice how the changes are reflected on the next import, without having to re-install
38

Other useful Python patterns

39

decorators

import time
# here de define the decorator, which is a higher-order function
def timeit(f):
    def timed_f(*args, **kwargs):
        tic = time.time()
        val = f(*args, **kwargs)
        print(f"Call to {f.__name__} took: {time.time() - tic}")
        return val
    return timed_f

@timeit
def sum(x, y): return x + y
>>> sum(3,4)
Call to sum took: 3.814697265625e-06
40

decorators (2)

There are a few built-in decorators:

And several typical use cases:

41

iterators

>>> hello_worlds = HelloWorldIterator(10)
>>> for hello_world in hello_worlds: print(hello_world)

Let’s build a HelloWorldIterator !

42

iterators (2)

class HelloWorldIterator:
    def __init__(self, n):
        self.n = n
        self.current = 0
    def __iter__(self):
        # here we could return any object implementing __next__
        # for simplicity, we return self which implements __next__ :)
        return self
    def __next__(self):
        if self.current < self.n:
            self.current += 1
            return "Hello!"
        else:
            raise StopIteration
43

iterators (3)

As you can imagine, there are several objects in python with built-in iterators:

You can check by yourself:

>>> dir(dict()) # or dir({'a': 1})
[.... '__iter__', ....]
>>> dict_iterator = dict().__iter__()
>>> dir(dict_iterator)
[.... '__next__', ....]
44

generators

The iterator pattern is a bit cumbersome… In many cases we can accomplish the same using the generator pattern:

def hello_world_generator(n):
    for _ in range(n):
        yield "Hello!"
>>> for hello_world in hello_world_generator(10): print(hello_world)

Or even more succint: using a generator comprehension:

>>> hello_world_generator_2 = ("hello" for _ in range(10))
>>> for hello_world in hello_world_generator_2: print(hello_world)
45

generators (2)

The most important thing about generators (and some iterators) is that they are lazy-evaluated: that is, their elements are not computed or stored in memory until it is their turn. (Unlike lists, dicts, etc)

Example applications:

46