Molly White
Software engineer, editor and arbitrator on Wikipedia, feminist, Twitter bot commander, unabashed cat lady.


About a month ago, my friend Mark informed me that he was writing lochner, a scraper and DOM parser to get transcripts of U.S. Supreme Court cases from an online collection. Being one of those Wiki(m|p)edians who loves case law, he just wanted to download the text files so he could read them at his leisure. Being one of those Wiki(m|p)edians who loves adding any free content I can get my hands on to the projects (and note these two groups of people are not mutually exclusive), I told him that if he didn’t write a parser to wikitext and a Wikisource upload tool, I would. He took me up on this, and so brandeis was born.1

Brandeis has already proved to be a really fun project. One of my main gripes with Northeastern’s computer engineering program (and consequently, one of my main reasons for changing to a computer science major) was the lack of instruction on how to write good code. At least for the first two years, the code that we write is typically slow, buggy, and ugly as hell. I didn’t like that, in two years of college, I’d yet to write a single test case. Partially because of this complaint, and partially because of my tendency to fix one thing and break another in the Wikisource-to-LaTeX project, I decided that maybe I’d give test-driven development a spin. My coding style tends to follow the “fix one problem here, fix half a problem here, write half a function, fix the other half of that problem from before…” methodology, which quickly becomes problematic and inefficient. TDD forces me instead to think about what function I’m about to try to add, think about how it might break, and try to break it, all before actually writing the new function. It was somewhat difficult to wrap my head around the concept, but I started out by writing a simple test and quickly got into the flow of it. This test looks like this (excerpted from tests/, if you’d like to see it in its natural habitat):

def testGoodTitlePlacement(self):
    with open('buffer.txt', 'w', encoding='utf-8') as self.buffer:
        self.buffer.write(' = Foo v. Bar - Some other stuff = \n')
    v = Validator('buffer.txt')
    except:'Validator did not pass a good title.')

It opens a buffer text file, to which it writes ' = Foo v. Bar - Some other stuff = \n'. It then runs the validateTitlePlacement() function on this text file: a simple function to determine if the first line of the text file is some string, with a leading and trailing equals sign. If it is, the test passes. If it’s not (indicating that something has gone horribly wrong with lochner and that brandeis really shouldn’t even try to parse the file), a BadTitle exception will be raised and the test would fail.

It was a great feeling to write this test case and the accompanying function, and have the test pass. It’s an even better feeling to be able to run all these tests every so often to help assuage my fears that a stray line of code somewhere has broken all my previous code horribly. And possibly the best feeling so far was when, due to these test cases, I caught a bug that would have potentially remained in the project for a good while.


1. A note on the names: apparently they refer to various parties involved with Muller v. Oregon. Apparently this is hilarious and/or clever, but I am not a member of the case-law-loving-Wikipedian party, and thus don’t actually know why this is.

comments powered by Disqus