How is Python's namedtuple implemented?
Python’s collections
module provides the namedtuple
function. I wanted to understand how it works to create a namedtuple + dataclass ~abomination~ combo for work. It’s scary stuff. Skip to “This is where the magic happens” for the interesting bit.
What does it do?
The namedtuple
function returns a subtype of tuple
with named constructor arguments and attributes. Here’s an example:
Why would I want this over a dataclass
?
Named tuples are a way of introducing sanity into a context where something must be a tuple. For example, suppose you have a method that expects a tuple, or you provide a callback that returns a tuple. Keeping track of which element corresponds to what value is error-prone and a named tuple obviates that. For example this,
evaluates to True
.
Ok, but how does it actually work?
Let’s look at the code1. We can look at the CPython implementation, which is open-sourced on github. Standard library functions that are written in python can be found in Lib
. So we want to take a look at Lib/collections/__init__.py
and search for namedtuple
. Bingo!
This is fun. Field names should really be list[str]
, but it can be Iterable[whatever]
as long as mapping the elements to str
produces valid identifiers (see below). Or you could pass a comma-sepparated string of identifiers. I don’t know why you would want this, but hey.
Why use _sys.intern
? Quoting the docs:
Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare.
After we “sanitize” our field names, we validate them:
Why shouldn’t it start with an underscore, you might wonder? You’ll see.
The clever bit
We create an arg_list
. In our example from earlier, it would be the string "file_contents, sha256digest"
. We make sure that if there’s only one element, there’s a trailing comma. Oh oh.
Next we create the __new__
classmethod which we will use to create new instances of the type we’re creating (so instances of FileWithHash
).
The __new__
method is the result of eval
on a string we stiched together. In our example, it’s equivalent to the following declaration:
Calling this __new__
will return an object almost identical to tuple((file_contents, sha256digest))
, except with a different base class. The trailing underscore in args_list
was needed so that if the namedtuple only has one element, the call to the base __new__
method is _tuple_new(_cls, (arg,))
and not _tuple_new(_cls, (arg))
. The latter is equivalent to _tuple_new(_cls, arg)
which is wrong because the tuple constructor expects an iterable. And if we didn’t ensure that fields can’t start with an underscore, we could have a field called _tuple_new
(or __builtins__
or __name__
) which would clash with the namespace of the eval.
Importantly, we can now call it with kwargs
, which is part of what this was all about.
After this, we just do a whole bunch of bookkeeping and create some utility methods like _asdict
and _replace
.
One mildly interesting thing is how we create the accessors that allow us to write my_value.file_contents
:
Here, _itemgetter(idx)
is equivalent to lambda els: els[idx]
.
Finally, we invoke the 3-argument form of type
, which creates a new type with the given name, mro, and class attributes. And that’s it!
Summing up
namedtuple
is mostly just book keeping, combined with one very clever, kind of scary, lambda
+ eval
trick. I don’t understand why nested functions with locals wouldn’t do the trick:
Maybe it’s about namespace isolation, but I don’t get why we would care and what exactly would be achieved. Or it’s just the neatest way to write things, if you’re not afraid of string manipulation and evals.
I’m not sure what the lesson is here. eval
isn’t scary as long as you sanitize your inputs properly? Every language library has skeletons in the closet?
Anyways, reading this showed me two tricks I hadn’t seen before: assembling the namespace of a dynamically created type, and “writing the code you want and evaling it”. I’ll probably not use the second one, but I might be tempted by the first.
I hope you enjoyed!
-
Reading the code of tools you’re using directly is one of the best pieces of coding advice I’ve ever received. It’s a great way to learn about your tools and language, and is genuinely often quicker than looking on forums, scouring documentation, or asking AI (so far). I always thought this was a flex, but it’s absolutely real – try it! ↩