On Very Tight Iteration Cycles
Tight iteration loops being a massive productivity boost is a truism that I’m only just starting to internalize. I want to talk about a specific kind, with iteration cycles of less than one hour. What these have in common is the feeling of working inside one: one of breezing through work and of making fast progress on a specific problem. This workflow is a pretty reliable way of getting into flow, but situations where it can be applied are not very common.
Examples
In standard software/feature development
Here, unit testing can serve this function. The cycle is between running a test, and modifying it. Repeated until the test passes. Then, onto the next test.
Data / Model evals
When analyzing a dataset or evaluating a model, we take a test set and a quantitative measure of performance. The cycle consists in looking at the few cases with the lowest performance, fixing those cases, and running on the test set again.
Performance optimization
In this case, we create a fixture that allows us to run a small but representative part of the workload, in a profiler1. We look at the bit that takes up the most total runtime, make it faster, and run the fixture again. Rinse and repeat until it’s fast enough.
I was once working with someone on a big web API, looking to track down a performance problem. I asked for help and we teamed up. First thing he did was to comment out the entry point and replace it with just the thing I cared about. I was blown away, I hadn’t known that you were allowed to do that. Progress was very quick from there.
Anatomy of a cycle
The examples have two common components. First, there is an objective measure of progress (binary for unit tests, quantitative for performance optimization and model development). Second, there is a fixture that can evaluate the current state against that measure. In an ideal setup, running the fixture is very low-friction and there should be no thought involved. Running a shell script with no arguments is good. Having the shell script run automatically whenever you modify a source file (using entr
) is better.
How, mechanistically, is this good
A workflow like this has two useful properties. It serves as an external driving function for work, and it keeps you focused on what’s important.
Working inside a cycle like this is a constant back-and-forth between you and the fixture. It is always clear what to do: Fix the unit test, reduce time spent in __getitem__
, make your model robust to this weird kind of outlier. Your entire focus can be on fixing a well-defined problem and you have to spend no mental energy on figuring out what to do.
Creating the fixture for an iteration cycle is valuable work
It can take weeks or even months to create the fixture. Often, a lot of engineering effort has already been put into creating tools that make this easy (e.g. test frameworks or profiles).
Another story: I used to work with a senior developer who was asked to make a Java behemoth faster. He spent the better part of a month on automating the build process, integrating it with CI/CD, and creating a suite of performance tests before he touched the code itself. He then achieved a 5x speedup in a matter of days. I absolutely believe this was the best way to solve this problem (especially since modernizing the build process was obviously positive in lots of other ways, too).
Variations on the theme
Deliberate practice is similar to workflows like this, although the criterion for success is often fuzzier and you end up being your own driving function. The double-focus on improving and analyzing your performance is one of the reasons why deliberate practice is really hard.
A question I would like to ask myself more is: How can I embed what I’m doing in a tight iteration cycle?
-
Picking a representative part of the application can be subtle, especially when caching or IO is involved. Performance behaviour with cold caches is often irrelevant. And workloads in a local dev environment will often be much more IO heavy than workloads running in a cluster with very high bandwidth connections to databases, file storage, and other services. And sometimes, the performace problem is mysterius enough that the whole application needs to be analyzed. ↩