Python’s collections module provides the namedtuple function. I wanted to understand how it works to create a namedtuple + dataclass ~abomination~ combo for work. It’s scary stuff. Skip to “This is where the magic happens” for the interesting bit.

What does it do?

The namedtuple function returns a subtype of tuple with named constructor arguments and attributes. Here’s an example:

from collections import namedtuple

FileWithHash = namedtuple("FileWithHash", ["file_contents", "sha256digest"])

my_value = FileWithHash(
    file_contents="1337",
    sha256digest="5db1...",
)

print(f"Value: {my_value.file_contents}")
print(f"Hash: {my_value.sha256digest}")
print(f"Is a named tuple a tuple? {isinstance(my_value, tuple)}")  # prints 'True'

file_contents, digest = my_value

Why would I want this over a `dataclass`?

Named tuples are a way of introducing sanity into a context where something must be a tuple. For example, suppose you have a method that expects a tuple, or you provide a callback that returns a tuple. Keeping track of which element corresponds to what value is error-prone and a named tuple obviates that. For example this,

(
    FileWithHash(file_contents="contents", sha256digest="abba...")
    == FileWithHash(sha256digest="abba...", file_contents="contents")
)

evaluates to True.

Ok, but how does it actually work?

Let’s look at the code¹. We can look at the CPython implementation, which is open-sourced on github. Standard library functions that are written in python can be found in Lib. So we want to take a look at Lib/collections/__init__.py and search for namedtuple. Bingo!

def namedtuple(typename, field_names, *, rename=False, defaults=None, module=None):
    # Validate the field names.  At the user's option, either generate an error
    # message or automatically replace the field name with a valid name.
    if isinstance(field_names, str):
        field_names = field_names.replace(',', ' ').split()
    field_names = list(map(str, field_names))
    typename = _sys.intern(str(typename))

This is fun. Field names should really be list[str], but it can be Iterable[whatever] as long as mapping the elements to str produces valid identifiers (see below). Or you could pass a comma-separated string of identifiers. I don’t know why you would want this, but hey.

Why use _sys.intern? Quoting the docs:

Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare.

After we “sanitize” our field names, we validate them:

    seen = set()
    for name in field_names:
        if name.startswith('_') and not rename:
            raise ValueError('Field names cannot start with an underscore: '
                             f'{name!r}')
        if name in seen:
            raise ValueError(f'Encountered duplicate field name: {name!r}')
        seen.add(name)

Why shouldn’t it start with an underscore, you might wonder? You’ll see.

The clever bit

    # Variables used in the methods and docstrings
    field_names = tuple(map(_sys.intern, field_names))
    num_fields = len(field_names)
    arg_list = ', '.join(field_names)
    if num_fields == 1:
        arg_list += ','

We create an arg_list. In our example from earlier, it would be the string "file_contents, sha256digest". We make sure that if there’s only one element, there’s a trailing comma. Oh oh.

Next we create the __new__ classmethod which we will use to create new instances of the type we’re creating (so instances of FileWithHash).

    tuple_new = tuple.__new__
    namespace = {
        '_tuple_new': tuple_new,
        '__builtins__': {},
        '__name__': f'namedtuple_{typename}',
    }
    code = f'lambda _cls, {arg_list}: _tuple_new(_cls, ({arg_list}))'
    __new__ = eval(code, namespace)

The __new__ method is the result of eval on a string we stitched together. In our example, it’s equivalent to the following declaration:

def __new__(_cls, file_contents, sha256digest):
    return _tuple_new(_cls, (file_contents, sha256digest))

Calling this __new__ will return an object almost identical to tuple((file_contents, sha256digest)), except with a different base class. The trailing underscore in args_list was needed so that if the namedtuple only has one element, the call to the base __new__ method is _tuple_new(_cls, (arg,)) and not _tuple_new(_cls, (arg)). The latter is equivalent to _tuple_new(_cls, arg) which is wrong because the tuple constructor expects an iterable. And if we didn’t ensure that fields can’t start with an underscore, we could have a field called _tuple_new (or __builtins__ or __name__) which would clash with the namespace of the eval.

Importantly, we can now call it with kwargs, which is part of what this was all about.

After this, we just do a whole bunch of bookkeeping and create some utility methods like _asdict and _replace.

    _dict, _tuple, _len, _map, _zip = dict, tuple, len, map, zip

    @classmethod
    def _make(cls, iterable):
        result = tuple_new(cls, iterable)
        if _len(result) != num_fields:
            raise TypeError(f'Expected {num_fields} arguments, got {len(result)}')
        return result

    def _replace(self, /, **kwds):
        result = self._make(_map(kwds.pop, field_names, self))  # <- This is very clever.
        if kwds:
            raise TypeError(f'Got unexpected field names: {list(kwds)!r}')
        return result

    def __repr__(self):
        'Return a nicely formatted representation string'
        return self.__class__.__name__ + repr_fmt % self

    def _asdict(self):
        'Return a new dict which maps field names to their values.'
        return _dict(_zip(self._fields, self))

    def __getnewargs__(self):
        'Return self as a plain tuple.  Used by copy and pickle.'
        return _tuple(self)

    class_namespace = {
        '__doc__': f'{typename}({arg_list})',
        '__slots__': (),
        '_fields': field_names,
        '_field_defaults': field_defaults,
        '__new__': __new__,
        '_make': _make,
        '__replace__': _replace,
        '_replace': _replace,
        '__repr__': __repr__,
        '_asdict': _asdict,
        '__getnewargs__': __getnewargs__,
        '__match_args__': field_names,
    }

One mildly interesting thing is how we create the accessors that allow us to write my_value.file_contents:

    _tuplegetter = lambda index, doc: property(_itemgetter(index), doc=doc)

    for index, name in enumerate(field_names):
        doc = _sys.intern(f'Alias for field number {index}')
        class_namespace[name] = _tuplegetter(index, doc)

Here, _itemgetter(idx) is equivalent to lambda els: els[idx].

Finally, we invoke the 3-argument form of type, which creates a new type with the given name, mro, and class attributes. And that’s it!

    return type(typename, (tuple,), class_namespace)

Summing up

namedtuple is mostly just book keeping, combined with one very clever, kind of scary, lambda + eval trick. I don’t understand why nested functions with locals wouldn’t do the trick:

def __new__(_cls, *args, **kwargs):
    # Do something tedious or clever to order args using `field_names`
    return _tuple_new(_cls, (file_contents, sha256digest))

Maybe it’s about namespace isolation, but I don’t get why we would care and what exactly would be achieved. Or it’s just the neatest way to write things, if you’re not afraid of string manipulation and evals.

I’m not sure what the lesson is here. eval isn’t scary as long as you sanitize your inputs properly? Every language library has skeletons in the closet?

Anyways, reading this showed me two tricks I hadn’t seen before: assembling the namespace of a dynamically created type, and “writing the code you want and evaling it”. I’ll probably not use the second one, but I might be tempted by the first.

I hope you enjoyed!

Reading the code of tools you’re using directly is one of the best pieces of coding advice I’ve ever received. It’s a great way to learn about your tools and language, and is genuinely often quicker than looking on forums, scouring documentation, or asking AI (so far). I always thought this was a flex, but it’s absolutely real – try it! ↩

What does it do?

Why would I want this over a dataclass?

Ok, but how does it actually work?

The clever bit

Summing up

Why would I want this over a `dataclass`?