Thursday 1 March 2012

DRY is about Ideas, not Text

Dear Junior

Ever since the Pragmatic Programmer, the principle of DRY (Don't Repeat Yourself) has been one of the central pieces of advice for good programming. From a private perspective, it has many a time helped me to write code that is better than had I not had that meme in the back of my head. However, I have also seen it misused and (in my understanding) misunderstood several times - sometimes with disastrous result.

What I think is central about DRY is that it refers to ideas. Slightly rephrased DRY can be expressed that "each idea about your software solution should be expressed once and only once in your code". With "idea" I mean stuff like "an order number is a six-digit string not beginning with 0 or 1", or "an order that is cancelled after shipping will still be debited the costs that we cannot recover".

Unfortunately, a common misuse is to let DRY to talk about code as text. Let me make it a little bit more clear though an example. 

Say that we have some kind of order system where order numbers are five-digit numbers. Somewhere we might find this code that checks whether a string is a valid order number or not.

       public static boolean isValidOrderNumber(String orderNumber) {
            return orderNumber.matches("[1-9][0-9]{4}");
        }

The code might be in the Order class, or even inside an OrderNumber value object, and possibly being called by the constructor.

As it is an order system, there will also be a part that talks about shipping, to get the goods to some specified address. The address consists of a lot of things, among those a zip code that in this country is a digit string of length five (initial digit must be non-zero). So, somewhere we can expect to find the code doing this validation.

       public static boolean isValidZipCode(String zip) {
            return zip.matches("[1-9][0-9]{4}");
        }

This code might be in the Address class, or inside the ZipCode value object - possibly called by the constructor.

Now, this is where some DRY-zealots shout "duplication, duplication" because they have pattern-matched the string matching in isValidOrderNumber with the string matching in isValidZipCode and noticed that they consist of exactly the same text (barring a rename of a variable).

I think this is a mistake, and those zealots focus on the wrong thing. 

In the eyes of the zealots, the textual duplication is bad. Instead one method should call the other. This obviously becomes bizarre code if you try it. For example it would create a completely irrelevant coupling between the zip code and the order number - two concepts that are not related at all.

Alternatively, they claim, the common code should be broken out to a separate method called from both places. Now, that might make more sense, but you still introduce a coupling between the un-related concepts order number and zip code: they will now covariant - change one and the other will change.

For example, should this company decide to use letters in their order numbers, then they need to regression test all their use of addresses as well.  

Do this kind of coupling on a massive scale, and you will end up with a system where every change is full of surprises - conceptually totally unrelated concepts start behaving different just because they had some code-snippet in common.

What to do instead?

We should not primarily look at the code as a mass of text. Instead we should look at it as representation of a set of ideas. This is of course the perspective of Domain Driven Design, where we see the code as the encoding of a distilled domain model - a model that captures our selected way of looking at the problem domain.

Now, seen from that perspective the "repeated code text" in the two validation methods is completely unproblematic. It is just two separate concepts that happens to be represented using the same technical mechanism. They have nothing to do with each other - the duplication is a pure coincidence and the two concepts should be able to change totally uncoupled from each other.

The principle of Don't Repeat Yourself is about ideas, it does not apply to text.

Yours

   Dan