Here’s the simplest possible class
class Foo: pass
When we “call” a class, we instantiate it, returning an object of type Foo (also known as class instance):
> foo = Foo()
> type(foo)
<class '__main__.Foo'>
In Python, everything is an object. Try for instance:
Class members are either:
> foo = Foo()
> dir(foo) # inspects an object members
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
...]
We haven’t defined any attributes or methods and yet our class already has many…
All Python (3+) classes implicitly inherit from object:
> o = object()
> dir(o)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__init_subclass__', '__le__', '__lt__', '__module__', '__ne__',
'__new__', '__str__', ...']
class MLModel:
# this is a class attribute
name = "MLModel"
# this is the class constructor
def __init__(self, parameters):
# example instance attribute
self.parameters = parameters
# When we call the class, it internally executes
# the __init__ method (constructor)
> ml_model = MLModel([1, 2, 3])
# we access the instance attribute
> ml_model.parameters
[1, 2, 3]
# class attributes are also accessible from the instance:
> ml_model.name
Instance methods are functions defined on the instances of a class. They must take the instance as an input, together with any other arguments
import pickle
class MLModel:
def __init__(self, parameters):
self.parameters = parameters
def save_parameters(self, path):
"""
Saves the model parameters into a file names `path`
"""
with open(path, 'wb') as f:
pickle.dump(f, self.parameters)
> ml_model = MLModel([1, 2, 3])
# When we call an instance method, the `self`
# parameter is passed implicitly
> ml_model.save_parameters('model.pkl')
Warning
Class members are mutable!
> ml_model = MLModel([1, 2, 3])
# we can freely override paremeters
> ml_model.parameters = "whatever"
# or even define new methods and attach them to the instance
> ml_models.new_method = lambda x: 2*x
@property decorators are a way to make sure setting attributes is disabled or doesn’t break things
Class methods apply to the class, not the object. They are useful for things like factory methods
class MLModel:
....
@classmethod
def load(cls, saved_parameters_path):
"""
Returns an instance of `MLModel` from the parameters saved via
`MLModel.save_parameters`
"""
with open(saved_parameters_path, 'rb') as f:
parameters = pickle.load(f)
return cls(parameters)
Classes can inherit from classes other than object:
class LogisticRegressionModel(MLModel):
def __init__(self, parameters):
# This calls LogisticRegressionModel's superclass constructor.
# It is equivalent to MLModel.__init__(self, parameters) but preferred,
# because it avoids hard-coding the parent class name
super().__init__(parameters)
def predict(self, x):
...
Now every instance of LogisticRegressionModel has a save_parameters method inherited from MLModel, and the LogisticRegressionModel class has a load method.
import numpy as np
class LinearModel:
def __init__(self, coefficients):
self.coefficients = coefficients
def predict(self, x):
assert x.shape[1] == self.coefficients.shape[0]
return np.dot(x, self.coefficients)
class LogisticRegressionModel(MLModel, LinearModel):
def __init__(self, parameters):
super().__init__(parameters)
# We also need to call the constructor for `LinearModel`!
super(MLModel, self).__init__(parameters)
def predict(self, x):
x_times_coef = super().predict(x)
return sigmoid(x_times_coef)
When we run super().__init__(parameters), which of the 2 parent classes’ __init__ do we run?
Multiple inheritance is tricky, particularly when we have diamond inheritance or methods with the same name:
> LogisticRegressionModel.mro()
[<class '__main__.LogisticRegressionModel'>, <class '__main__.MLModel'>, <class '__main__.LinearModel'>, <class 'object'>]
More on super: Super considered super
When we run super().__init__(parameters), we will run MLModel.__init__
Mix-in classes are designed to be used with multiple inheritance:
class ModelSavingMixin:
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def save_parameters(self, path):
with open(path, 'wb') as f:
# this will fail unless `parameters` exist!
pickle.dump(f, self.parameters)
@classmethod
def load(cls, saved_parameters_path):
with open(saved_parameters_path, 'rb') as f:
parameters = pickle.load(f)
return cls(parameters)
class LogisticRegressionModel(ModelSavingMixin, LinearModel):
def __init__(self, parameters):
super().__init__(parameters)
# No longer need to call the constructor for `LinearModel`!
def predict(self, x):
x_times_coef = super().predict(x)
return sigmoid(x_times_coef)
The LogisticRegressionModel now has saving and loading functionality inherited from ModelSavingMixin
We still have a problem…
class ModelSavingMixin:
...
def save_parameters(self, path):
with open(path, 'wb') as f:
# this will fail unless `parameters` exist!
pickle.dump(f, self.parameters)
Some models may have a vector of parameters, others are parameterized otherwise…. How can we make this more generic so that it can be reused?
The latter is a good reason to define the mix-in as an abstract class:
class AbstractModelSavingMixin:
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def get_parameters(self):
raise NotImplementedError
def save_parameters(self, path):
with open(path, 'wb') as f:
# this will fail unless we override self.get_parameters()
pickle.dump(f, self.get_parameters())
@classmethod
def load(cls, saved_parameters_path):
with open(saved_parameters_path, 'rb') as f:
parameters = pickle.load(f)
return cls(parameters)
For now the only thing that makes the class abstract is its name and the unimplemented get_parameters()…
The abc (abstract base classes) library is useful to make it more like a regular abstract class:
import abc
# inherits from abc.ABC, which internally sets a `meta_class`
# cf: https://realpython.com/python-metaclasses/
class AbstractModelSavingMixin(abc.ABC):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
@abc.abstractmethod
def get_parameters(self):
"""Retrieves the parameters of the model as a pickable object"""
def save_parameters(self, path):
with open(path, 'wb') as f:
# this will fail unless we override self.get_parameters()
pickle.dump(f, self.get_parameters())
...
And now we just need to implement the get_parameters method in its subclasses:
class LogisticRegressionModel(ModelSavingMixin, LinearModel):
def __init__(self, parameters):
super().__init__(parameters)
def get_parameters(self):
# the implementation could vary depending on the model.
return self.parameters
def predict(self, x):
x_times_coef = super().predict(x)
return sigmoid(x_times_coef)
> model = LogisticRegressionModel(np.ones(3))
> model.save_parameters('model.tmp')
> loaded_model = LogisticRegressionModel.load('model.tmp')
Any class subclassing AbstractModelSavingMixin will:
> model = LogisticRegressionModel(np.ones(3))
> issubclass(LogisticRegressionModel, AbstractModelSavingMixin)
> isinstance(model, AbstractModelSavingMixin)
More info: Abstract Base Classes from PyMOTW
Warning
Beware of building abstractions for problems we don’t yet understand! It’s often better to start with concrete implementations.
“If it walks like a duck and it quacks like a duck, then it must be a duck”
Let’s practice by improving the way we implemented text pre-processing in the capstone project.
1. Take a look at how we apply a text pre-processing in the LocalTextCategorizationDataset class in the preprocessing package and how we instantiate this class in the train module.
3. Let’s think and implement a design that: (1) Ensures the function we pass to the LocalTextCategorizationDataset constructor abides by an expected interface, and (2) Enables other developers to build other pre-processing implementations that will work
A Python “module” is a single namespace, with a collection of values:
# hello_world.py
hello_str = "hello {}"
def hello(name): return hello_str.format(name)
> from hello_world import hello
> hello("arnau")
A package is a directory with a file called __init__.py and any number of modules or other package directories:
greetings
__init__.py
hello_world.py
spanish
__init__.py
hola_mundo.py
The __init__.py file can be empty or have code in it, it will be run when the package is imported (import greetings)
In addition, a python package is usually bundled with:
setuptools is an extension to Python’s original packaging tool (distutils) that provides a number of functionalities:
setup.py # this is the installer/build script
greetings
__init__.py
hello_world.py
spanish
__init__.py
hola_mundo.py
# setup.py
import setuptools
setuptools.setup(
name='greetings',
version='0.0.1',
packages=setuptools.find_packages(),
install_requires=['pandas >0.1,<1.0'] # dependencies
python_requires='>=3.6' # python version
)
This was just a simple example, setup.py enables a lot more configuration:
It’s out of the scope of this training but instead of passing arguments to the setuptools.setup function, one can also configure the build via a setup.cfg configuration file.
With the setup.py ready, we can:
# build a (source or wheel) distribution
> python setup.py sdist bdist_wheel
# install locally
> python setup.py install
# or
> pip install .
# or install in develop/editable mode
> python setup.py develop
> pip install -e .
1. Create a greetings package with two sub-packages, english and spanish, each containing a simple function to greet in the corresponding language, using the termcolor package to print the greeting in color.
import time
# here de define the decorator, which is a higher-order function
def timeit(f):
def timed_f(*args, **kwargs):
tic = time.time()
val = f(*args, **kwargs)
print(f"Call to {f.__name__} took: {time.time() - tic}")
return val
return timed_f
@timeit
def sum(x, y): return x + y
>>> sum(3,4)
Call to sum took: 3.814697265625e-06
There are a few built-in decorators:
And several typical use cases:
>>> hello_worlds = HelloWorldIterator(10)
>>> for hello_world in hello_worlds: print(hello_world)
Let’s build a HelloWorldIterator !
class HelloWorldIterator:
def __init__(self, n):
self.n = n
self.current = 0
def __iter__(self):
# here we could return any object implementing __next__
# for simplicity, we return self which implements __next__ :)
return self
def __next__(self):
if self.current < self.n:
self.current += 1
return "Hello!"
else:
raise StopIteration
As you can imagine, there are several objects in python with built-in iterators:
You can check by yourself:
>>> dir(dict()) # or dir({'a': 1})
[.... '__iter__', ....]
>>> dict_iterator = dict().__iter__()
>>> dir(dict_iterator)
[.... '__next__', ....]
The iterator pattern is a bit cumbersome… In many cases we can accomplish the same using the generator pattern:
def hello_world_generator(n):
for _ in range(n):
yield "Hello!"
>>> for hello_world in hello_world_generator(10): print(hello_world)
Or even more succint: using a generator comprehension:
>>> hello_world_generator_2 = ("hello" for _ in range(10))
>>> for hello_world in hello_world_generator_2: print(hello_world)
The most important thing about generators (and some iterators) is that they are lazy-evaluated: that is, their elements are not computed or stored in memory until it is their turn. (Unlike lists, dicts, etc)
Example applications: