Applying mypy to real world projects

Some hints and tips for getting started with Mypy and introducing it to existing projects

I think static typing can be very oversold.

All the same, mypy offers quite a lot of benefits for how minimally invasive it is. Here are some ideas, in rough order of importance, for how to add typing to an existing Python project.

First ensure that mypy really is being run

Two very common initial problems I've seen are

  1. mypy is not running as part of the build
  2. mypy is running such that it doesn't actually find any of your source files or only finds some.

Mypy's permissive-by-default nature makes both surprisingly easy.

Both of these situations are a real pain because you end up with people applying types that then aren't checked and which slowly become wrong and then very confusing.

Where to supply types manually

Mypy does type inference - that is it can divine the types of values based on the context by examining the code around the value in question. However the truth is that due to the "gradual typing" of mypy (where it infers the magic Any type if it is unsure) supplying types manually in Python is more important than in other inferred languages like Haskell.

My current thinking is that you should aim to supply types for all function arguments and return types1 (and anywhere else that mypy asks for help).

You do not generally need to apply types to variables - though that may help make your code clearer or help you work through a type error that you don't understand.

Optional comes up a lot

One of the most important types in practice is Optional. Optional is used for nullable values. For example:

def get_config_value(key: str): Optional[str]:
    # Either return the config value, or None if it's not present

A great deal of code will make use of Optional parameterised with another type. With Optional, mypy will be able to check that null is checked wherever that value is used.

Optional is a simple type but it catches a disproportionate number of defects. It might be the single best part of the whole type-checking edifice.

Consider whether to include your tests

I don't know that there is any consensus on whether or not to include tests or not in your type checking. Some projects do run mypy over the tests/ dir and some projects do not.

The main advantage of including tests in the type-checking run2 is that you can find out more quickly that your applied types don't match expected usage, or that the types mypy is inferring don't match expected usage. This is particularly useful in code-bases which are new to typing and don't have a high proportion of applied types. They can also be used to improve IDE tab-completion.

That has to be balanced against the fact that, usually for the purposes of mocking and faking, tests will sometimes make odd use of your code and it can be a bit of a battle sometimes to get certain test patterns past the type-checker. This isn't too hard but doesn't feel like a productive use of time and can dilute the benefits of typing.

Be selective about the usage of third-party stubs

Some libraries include a great deal of runtime meta-programming hi-jinks. As a result these Big Hairy Libraries consequently don't generally provide much type information.

It can be annoying to use a Big Hairy Library in the centre of your program and not have access to type information for it. Some libraries have third-party stub files, for example for sqlalchemy there is sqlalchemy-stubs which provides some useful, but incomplete, types.

Not all Big Hairy Libraries have useful third-party stubs. As of writing I'm not convinced by any of the third-party stubs for boto. It seems that the best approach is to just learn to live with Any when calling AWS/Openstack APIs (but test thoroughly using moto).

Escape hatches are occasionally necessary

Occasionally you will run into a situation where code is correct but mypy can't tell. There are a few ways to deal with this.

First (and probably best) is to use typing.cast which will tell mypy that you know better than it does. This will maintain the type checking throughout the code, except that mypy now is informed of one specific correction.

The second option is to explicitly type the value as Any. This disables checking for that particular value. You might use this if the value in question is a complicated object for which there is no easy type.

Third is to use the # type: ignore pragma. This is handy if the problem is not in determining a particular type but in some invariant of typing which mypy thinks is being broken but is not.

Consider adding a comment to explain the reasoning.

Prioritise a few choice strictness options

Mypy's mypy.ini file allows for a a wide range of configuration. I have yet to see any real world project avoid having a mypy configuration. Here are my top picks:

A worked example:

check_untyped_defs = True
no_implicit_optional = True
ignore_missing_imports = True
ignore_missing_imports = True

Once you have your project firmly in the grip of mypy, I think it's a good idea to look at all of the extra strictness options that mypy offers. Starting with a low strictness level and working towards a high strictness level is a good strategy.

How to debug type issues

There are a couple of non-obvious strategies for debugging type issues that you don't immediately understand.

First: you can start applying types to things around the type error - variables, function arguments, loop iteration variables, anything. This can help if it moves the error from the line of code where you don't understand it to somewhere else where the problem may be more apparent.

The second strategy is to use magic mypy builtins. reveal_type(expr) will make mypy print out it's opinion of the type of the given expression. reveal_locals() will make mypy print out it's opinion of the types of all the variables in scope. Of the two, I use reveal_locals() much more.

Abstract, concrete, mutable and immutable

Mypy has concrete types, like List and Dict. It also has abstract types, like Sequence and Mapping.

Some abstract types also have mutable and immutable versions such as Set and MutableSet (and Mapping and MutableMapping).

This allows for choices about which type to apply, and when.

My friend Oli Russell suggests the following policy:

  1. Make argument types as abstract as possible
    • to give the greatest latitude to callers to pass what they want
  2. Make return types (more) concrete
    • again, to give the greatest latitude to callers to do what they want with your return value

This matches my experience. When you try to be too draconian, for example returning Sequence instead of List, it just ends up that someone has to edit your return type later to facilitate their aim.

Equally, over-specific argument types are also a pain in the bum because you find you can't pass your custom dict-like object in to a function because it was annotated with Dict even though it only uses __getitem__.

You can consult the standard library documentation for to find the methods are included in each abstract type. The following are the most commonly used:

There are other considerations. You might want to prevent your caller from modifying the thing you are returning them (perhaps because you're using it internally3). In that case it's better to return Mapping rather than Dict.

Typed dataclasses

Python 3.7 introduced dataclasses. These are an alternate class definition that offers a more succinct syntax if your class primarily contains data (as opposed to behaviour).

class Point:
    x: int
    y: int
    label: str

These work well with the type system and allow easy definition of value types.

The drawback is that code that changes the representation of its data a lot tends not to be fast code. If an immutable Mapping will work through a large section of the program, that will usually be a faster than having a number of intermediate dataclasses.


There is also a typed dictionary available in mypy's library of extensions. From 3.8 it is in the standard library.

Point = TypedDict('Point', {
    'x': int,
    'y': int,
    'label': str,

TypedDict allows you to control, via the type system, which keys are present and what values their keys are, beyond the usual Mapping[str, str].

Can be a useful (often faster) alternative to using numerous typed dataclasses.

Generics and type variables

mypy includes support for type variables and generic types. It seems most people find these easy and natural to use (eg List<Integer>) but there is a tendency to recoil in moral terror when new instances are defined as they appear to be type-level metanonsense.

Not every Generic is a matter for the International Criminal Court however - there really are occasional cases where it's worthwhile to use them in garden-variety Python code.

Type variables are used to be flexible about what type is allowed but to maintain checking. For example:

C = TypeVar("C")

def take_first_n(collection: Iterable[C], n: int) -> Sequence[C]:

Another example:

R = TypeVar("R")

def retry_with_backoff(fn: Callable[[], R]) -> R:

Generics allow you to define classes that are specialised on a type variable. In the below case we are also binding D to a specific abstract base class.

D = TypeVar(

class MyVeryOODomainObjectRepository(Generic[D]):
    def save(d: D) -> None:

    def get(key: str) -> D:

I'm sure this will be abused mightily in the years to come.

ABCs vs Protocols

Sometimes there will be a need to apply an abstract type to something where there is a range of concrete options (there's some overlap in practice here with the use of generic types). There's two ways to do this.

The first is method by which you name a parent base class from which subclasses will inherit.

class Animal:
    def eat(self) -> None:

class Dog(Animal):
    def eat(self) -> None:

class Cat(Animal):
    def eat(self) -> None:

def feed_animal(animal: Animal) -> None:

feed_animal here is marked as taking an Animal - an abstract base class.

This first method is called nominal subtyping. It should be familiar to most as the traditional way to use abstract types in object-oriented languages.

There is a second, newer (to Python), method where instead of naming a parent class you name a "protocol" which has the methods which you need.

class Carnivore(Protocol):
    def eat_meat(self) -> None:

def feed_animal2(animal: Carnivore) -> None:

feed_animal here is marked as taking a Carnivore - a protocol. Notice I didn't have to mark any animal classes as members: if they have an eat_meat method with the same type then they are automatically taken as part of Carnivore.

Protocols are "open" so any class with a matching eat_meat method will be counted as a member. This method is called structural subtyping. There are a number of built in protocols like Sized, SupportsBytes, Container, etc.

Where you have control over enough of the class hierarchy you have the option of nominal subtyping but where that isn't the case you will have to use structural subtyping.

As with generic types, this is best used judiciously. Hopefully concrete types are sufficient for the vast majority of cases and you won't have to fill your codebase with lots of protocols and abstract base classes.

The final tip

Some people are far too interested in types: it's a bit sad to see that large proportion of the Haskell community has effectively been nerd-sniped by their own creation.

Beware of type mania! Try not to lose sight of the fact that type checking is supposed to be an aid to correctness and not an intellectually satisfying end in itself.


See also

  1. The relevant configuration option to enforce this is disallow_untyped_defs but enabling it straight away for an existing project is typically biting off more than can be chewed. 

  2. Note that you have to include the tests in the same mypy run as the main code base. Splitting the tests into their own run won't provide any advantage. 

  3. As I say elsewhere in this article, one good reason to avoid creating new collections everywhere is speed. Marking return types with immutable abstract types facilitates static analysis enforcement of this.