In this post I’m going to talk about domain-driven design, and how I’ve been using that approach when developing VocalPy. In the last post on VocalPy version 0.10.0, I made a point of saying that I would only describe new features in that post, and that I wouldn’t say much about how I designed those features. Instead, I said, I’d save that for another post. Well, here’s that post.
I’m writing this post for two audiences. The first is people who are interested in using and/or contributing to VocalPy, that want to better understand its design. The second is research software engineers in general, who are interested in how to design software for scientists, other software engineers, and everyone who sits somewhere along the spectrum between researcher and engineer.
It might seem silly to write about software design, when what I should be doing is adding a bunch of features, recruiting new users and other maintainers, and just generally demonstrating that VocalPy actually meets its goals of supporting a community of researchers studying acoustic communication. Sure. You’re right.
But I do think it’s worth taking just a little bit of time to justify my approach in words, and claim that there’s a method to my madness. That’s what I said I’d do in the first post on this blog after all. And reading books like “The Architecture of Open Source Applications” has made me think it’s worth talking about stuff like this.
What is domain-driven design?
If you read the Forum Acusticum proceedings paper, “Introducing VocalPy”, you’ll see that I name-drop domain-driven design there. But I don’t talk about it a lot, since that you only get so much space in a proceedings paper. So please consider this a longer, and much more informal, version of what I wrote there.
For this post to make sense, I have to introduce domain-driven design. I also need to repeat myself a little – I wrote about this on my personal blog, where I was talking about it in reference to another book, Structure and Interpretation of Computer Programs. So if you already read that, you can skip down to the next section.
What is domain-driven design, and why should you care about it? Sometime in 2022-2023, I read Domain-Driven Design by Eric Evans, and I got really excited about it. (You can get it from bookshop.org here, and if you’re feeling dangerous you can probably find a PDF of it on a random GitHub repository.) I ended up reading it because I had been reading Architecture Patterns with Python, and they mentioned it in the introduction. Full disclosure: I have not finished either of these books. In fact, I haven’t finished a lot of books, but that’s maybe the undiagnosed ADHD talking. I did spend a lot of time with the first few chapters of both, though. If you do nothing else, read the first chapter of Evans’ book, where he relates the story of how he worked with some electrical engineers to design software they would use to design printed circuit boards (AKA PCBs). If you have ever gone through the process of designing software for some real-world domain, I bet it will really resonate with you. Or, you know what? I’ll dare to say that, even if you have only ever written nerdy software tools for the domain of other software nerds, you still might find that the story resonates with you. It’s an interesting story for a couple of reasons. First of all, you have a feeling that he is almost an anthropologist, going into this unfamiliar tribe of electrical engineers so he can learn their culture. I think this is a familiar feeling for anyone who has tried to translate some real-world domain into software, even if it’s part of a culture they feel like they belong to. Second, you really get a feel for his process. At the beginning, he makes mistakes. He tries to understand their jargon word-for-word. Then he asks them to specify in detail what they think the software should do. Neither of those approaches were ever going to work well. Finally he hits upon the idea of asking them to draw out diagrams of their process and how the software should interact with it. These are simple, rough box and arrow sketches as he shows.

From Domain-Driven Design to VocalPy
I happened to read Evans’ book at the same time that I had been sketching out some initial ideas for the VocalPy library that I develop. You can see some of these sketches here: https://github.com/vocalpy/vocalpy/issues/19

If you were to click through to the library’s docs, you might notice that these bear little resemblance to VocalPy now. I think this is actually a good thing – more on that below. (You might also notice at the time I was thinking of calling it vocles :facepalm: – this is a very tortured pun, everyone please clap for me showing enough restraint for once in my life to not deploy a tortured pun.)
I don’t actually remember which came first: these sketches, or me reading the book. I think that I actually drew the sketches first, and had them sitting around on a desk forever, until finally it hit me that I should add them to the repo to document my design process. And then reading this part of Evans’ book really made me think that drawings like this should be integral to the design process. Part of what I want to say here is that, you should be doing this, if you’re not already, and what’s more, you should be including it in your docs for your software. And this goes for all software, unless you are literally writing such a boring cookiecutter CRUD app that a so-called Large Language Model can regurgitate it perfectly for you after being “trained” on the actual work of human beings. These drawings should be part of the theory of your scientific software.
How I’ve used domain-driven design when developing VocalPy
Ok, now you have an idea of what domain-driven design is. You might ask yourself, have I done anything with this? Or do I just like pontificating into the void about ideas from computer science and tech books? Even if I haven’t gone back to finish the book and immerse myself in every detail, the core idea has really stuck with me. Going back to that Proceedings paper where I first introduced VocalPy, you can see where I included similar schematics.

As I mentioned above, even by the time I got to this first Proceeding paper, the design of the library had evolved. But this is a good thing — I did exactly what Evans prescribed, and continued to iterate on the design of the package. Doing so made me realized which parts were actually useful, that I wanted to retain in the core. I think sketching things out has also helped me understand why the things I ended up taking out are still useful, just not in the way I had thought at first. The library at first was very focused on the idea of capturing a dataset of specific file types, and then being able to save this dataset in the form of a SQLite file. You can see where I was really focused on treating the dataset as if it were part of an app, like in the architecture book. I do think this is still important, but it is not the core of what the library does – I realized later that the core data types needed to be things like sounds, spectrograms, annotations, the things that a researcher studying animal communication and using bioacoustics would be talking about. So, basically, I did the anthropological exercise, as in Chapter 1 of Evans’ book, but instead of doing it with other people, I started by doing it with the part of my brain that claims to know things about acoustic communication. (I have since engaged with other people who actually know these things and can give me good feedback.)
You can see how I’ve continued this way, looking at the diagram I show at the top of that last post on VocalPy version 0.10.0
If domain-driven design is just doodles, then you should put doodles in your docs
Ok, so now let me circle back around, talk about why, sure, domain-driven design is not a new idea (as Evans himself acknowledges right at the start of his book), and why you should be doing it, or doing even more of it. This is where I come back to the “no” part of “Do we already do this? Yes and no.”
Here I want to point to a talk I gave, fitting title “VocalPy as a case study of domain-driven design in scientific Python”. (Huge thank you to the DoePy exchange and Don’t Use this Code for inviting me and giving me space to talk through these things with a sympathetic community.)
Part of the reason I want to include this, is that in the talk I explain how my attempt to apply domain-driven design has led me to do things that maybe clashes with some recommendations for programming in scientific Python.
Compare for example with Gaël Varoquaux’s recommendations in this SciPy 2017 talk:
This might seem like I’m going off topic, but I want to show here that I’m not claiming that domain-driven design is some magic method you can follow to produce correct research code. Please also note that I definitely do not claim to be any smarter or more experienced of a developer than Gaël Varoquaux. This is just me trying to wrap my head around designing software for different domains, and how to reconcile that with different programming paradigms.
(Yes, this is foreshadowing for another blog post.)
In the discussion at the end of that talk, I said just what I’ve said here, that a lot of people react as if, “so, yeah, we already do that”. If that’s so, then show me the doodles! Show me your mental model of your domain — put it in your docs! Let me read it, let me actually see these schematics, even if they are just doodles, it helps me to know how your thought process evolved. All I can see right now is this insurmountable mountain of code, and I don’t even know where the path starts so I can scale it! I know that there are examples of people doing this, e.g., in the scientific Python community where I spend most of my time, but I think it’s fair to say that this is not the norm. I don’t know that I have ever seen diagrams showing how the design evolved, as part of an iterative development process. I can’t help but feel like that’s exactly the sort of thing that could help people get up to speed on how the code works. I hope I’ve shown what that might look like here.