Over 10,000 green dots, each representing a passed test, and one red F, indicating a failed test

At my job I've been doing a lot of ETL (Extract, Transform, Load) work. In the past year source of business has grown from just a small experiment to a core component of what we do. Now I'm not going to go into detail on what exactly this source of business is — company secrets and what not, you know — but I thought the software practices that I do to keep things well oiled and running would be an interesting topic to write about.

So we'll assume that I'm moving around real estate data. I have a couple coworkers that gather real estate listings from dozens of various sources, and they insert it into a database that all the developers and our applications have access to. It has a standard schema for the things that we care about with regard to real estate. Location, asking price, number of bedrooms, number of bathrooms, special features like having a pool, etc etc. They are more on the Extract and Transform side of things, while I am more on the Transform and Load side of things in our whole pipeline of data. I'm taking that data from our internal database and getting it uploaded to various APIs. So while they're gathering data from the various sources, I work on making sure the data is uploaded to their final destinations.

The various APIs have different requirements. Some might have hard requirements like only accepting apartments. Others might have soft limitations that are specific to our business needs, such as not uploading house listings that are under a minimum price. It started off simple — as most things do — from one API and one set of rules, then it grew to multiple APIs and several sets of rules. Early on making code changes wasn't too big of a deal. But as this set of APIs and business rules grew larger and larger, every single push of code changes became more and more nerve racking.

It all came to a head when I got pulled into a frantic meeting last October on a Wednesday, a little before I was about to log off for the day. Business metrics were down, a big source of revenue wasn't producing the results we were expecting, nobody had any idea what was going on. I had pushed a moderate change in code two days prior, so I readily offered up that information in hopes that it maybe would explain what could be happening, and also so that we could try to figure out what was wrong quickly and find a fix. We spent the next 4 hours carefully scrutinizing every little detail of every line change that I made in the past two weeks. I rolled back a couple deployments to different previous versions to see if that would make any noticeable change that we could observe, but we didn't notice any effect.

It turns out, there was nothing wrong with my code at all. A service on the other end of the API that we don't control wasn't working as it normally should, and it was affecting our business. Even though I wasn't at fault this time around, I still wasn't happy that I wasn't more sure that what I had pushed up to production was entirely correct. That's when I figured I need to take a better approach while developing this software. I wanted more confidence in the code that I push to production, so I needed to implement some automated testing so that I can be more sure that what I'm pushing to production is correct. So I started to implement test driven development (TDD) into my workflow, and also writing tests for existing code.

I slowly started refactoring the code to make implementing testing easier. To do that, I implemented dependency injection into places where it wasn't already implemented. And instead of directly using database connectors and API clients, I put them in wrapper classes so writing fakes would be easier. I wrote the classes with a few public methods with type hints, and with that I could also write fakes that would return data in the same format. Here's a small bit of example code that shows how the project looks like in a very simple form.

from typing import Protocol, Any


class DatabaseProtocol(Protocol):
    def get_listings(self) -> list[dict[str, Any]]: ...


class ApiProtocol(Protocol):
    def upload_listing_batch(self, listings: list[dict[str, Any]]) -> int: ...


class App:
    def __init__(self, database: DatabaseProtocol, api: ApiProtocol):
        self.database = database
        self.api = api

    def run(self):
        while True:
            listings = self.database.get_listings()
            processed_listings = self.process_listings(listings)
            num_listings_uploaded = self.api.upload_listing_batch(
                processed_listings)

            if num_listings_uploaded == len(listings):
                print(f"Uploaded {num_listings_uploaded}.")
            if len(listings) != num_listings_uploaded:
                print(f"Uploaded {num_listings_uploaded}. Failed to upload {len(listings) - num_listings_uploaded}.")

    def process_listings(self, listings: list[dict[str, Any]]) -> list[dict[str, Any]]:
        # Imagine that the listings are processed.
        return listings

This is a really simple example, but you can get the idea and hopefully imagine how much more complicated the processing could be. And then an example test would be something like:

def test_rejection_of_listings():
    class MockDatabase:
        def get_listings(self) -> list[dict[str, Any]]:
            return [{"id": 1, "title": "Listing 1"}, {"id": 2, "title": ""}]

    class MockApi:
        def upload_listing_batch(self, listings: list[dict[str, Any]]) -> int:
            self.uploaded_listings = listings
            return len(listings)

    app = App(MockDatabase(), MockApi())
    app.run()

    # Reject the second product because it has an empty title.
    assert len(app.api.uploaded_listings) == 1
    assert app.api.uploaded_listings[0]["id"] == 1

Since refactoring the dependencies out, and injecting them back in and writing tests with fake dependencies, it's been going pretty well. At the beginning of the new year, the business side came to me with another API to start uploading products to, and a set of business rules. And for each API requirement and business rule, I wrote a test to make sure that the data that was being supplied to the upload object is correct. After sending a couple test upload samples to the business side, they came back and told something like "Hey we're only supposed to have a max of 5 special features." So that became another test, with the implementation coming shortly thereafter.

Now obviously it would be possible to push disastrous changes without tripping tests depending on how things are set up, but the goal is to at least reduce that possibility as much as I reasonable can. And a second point for setting up testing, is being able to have something to point to when I get asked "How can we prevent this from happening in the future?" Up until this point it has just been "Uh, well I'll try and be more careful." Which for as much as I try to write as perfect code as I can, nobody is 100% perfect, and mistakes inevitably happen.

I don't have any testing on the wrapper objects. It doesn't really seem beneficial. They're very simple, they mostly do some authentication with wherever the data is coming from/going to, and then they just move the data. If for example I write to the wrong field for an API, like writing the zip code into the address field, then I'm not sure how testing could really help besides just being an automatic redundant check that doubles the amount of code I write. This works well enough, and I can lighten my cognitive load a bit by not worrying as much about the core processing, and just having to scrutinize the code that does the data fetching and uploading. Also the business side would probably tell me pretty damn fast if I want uploading obviously false data like a zip code in the address field. The things that worry me the most are the potential subtle mistakes, mistakes that are hard to notice right away but still have a large negative effect days later, after having already processed thousands of listings.

For doing testing, I don't really like doing mocks of classes or functions from libraries I don't control, it feels too hacky and brittle. If whatever library I'm using doesn't already have an easy way of substituting a fake in it's place, then I want to wrap it in a class. And I want to keep these wrapper classes pretty lean and to the point. I don't like having too much logic in them, because then it makes them harder to fake. For example, if I want to pull data from a database in a certain way, I'll do that in the method and maybe do a little bit of processing if needed, but that's it. I want to be able to take a dump of a few select rows from tables in the database that I want to test, and easily set up my fake wrappers to hand that data to the code I want to test. Then I can have the fake API uploading wrapper save whatever is passed to it, and check that in the tests.

After writing this article I did a little bit of searching on the internet regarding TDD, and I was surprised that it's kind of a partisan issue in the software development community. Some people being vehemently against TDD and others vehemently pro TDD. It doesn't really make that much sense to me to be so hardcore pro/anti TDD. It's like being pro or anti screwdriver. TDD is a tool in your toolbox, use it if it makes sense for your project, otherwise don't obviously. In my case, it's making my development experience much more enjoyable because I can be a little more at ease knowing I'm not breaking API requirements or business rules that I've written tests for. But if I get another message telling me that something incorrect is being uploaded to the APIs, then that's just another test to write.