Sayed's Blog

Thoughts On Code Duplication

Posted October 10th, 2022

One of the first thing programmers are taught is that code duplication is bad.

There's good reason for this advice, but at some point it's not enough, and can even lead to more problems.

In this article, I'll explain why code duplication is bad, as well as the problems caused by avoiding it in bad ways, and discuss good ways to deal with it.

Why Is Code Duplication So Common?

When people first learn to program, they aren't very confident using things like functions, classes, and inheritance.

So they often do things the hard way, copy and pasting their own code rather using more advanced code organisation techniques.

Even when programmers become more experienced, dealing with code duplication often involves committing to decisions that have large effects on a code base. If more than one file is using the same code, then you have to decide where to move it to and what to name it. That's an extra decision that is seen as a distraction from the immediate goal of getting the code to work.

What's So Bad About Code Duplication?

Code duplication can be a fast way to get something working. Copy and paste, it limits the need for extra writing and thinking. But it comes back to bite later on.

Especially when a programmer has to change something. Maybe they made a mistake that they copied multiple times. Or the requirement has changed. Chances are they miss at least once, and this is a common source of bugs.

Another consequence of this is that their code spans multiple levels of abstraction, and one function can be hundreds of lines long, and next to impossible for someone else to understand.

An analogy I like to make for this is writing - specifically fiction.

Imagine a novel where every word is replaced with its dictionary definition. And where every word in that dictionary definition is also expanded into its definition, and so on.

Where every character and place is described over and over again, rather than once and then referred to by its name.

A book like that would be unreadable. It would be very clunky, and the plot would be impossible to decipher.

Duplicated effort also goes against the spirit of programming - letting the computer do most of the work.

Because of this, more experienced developers heavily emphasise avoiding code duplication.

When Avoiding Code Duplication Can Go Too Far

Sometimes two pieces of code are only incidentally similar, at one point in time. If someone tries to eliminate code duplication in that case, then the requirements can change and the features are not supposed to be as similar. When that happens, the place where the functionality was extracted to is modified to handle the separate use cases, and the places that use this abstracted functionality have to pass in more and more arguments to describe the specific use case. If you extrapolate this to the limit - you end up with a programming language. Able to do anything a computer can do, if you provide a lot of configuration.

Changing one thing now requires following the chains of abstraction, and it becomes difficult to change things so that only the intended thing is affected.

Understanding how the code works often requires reading how many different pieces of code work, pieces that are becoming less and less relevant.

The effort required to maintain and customise an abstraction begin to exceed the effort required to implement it from scratch.

To carry on the writing analogy, imagine if the writer tried to avoid using the same word more than once in the book. If two characters shared certain physical traits, then when the second character is introduced they're described as being "like the first character but they have blue eyes instead of brown eyes, and they're tall, and they're wearing green".

And if the author changes their mind about something and make it more different than the analogy they were using, they stick to the same point of references but add more differences.

A book like this would also be unreadable.

The similarities between the different characters and places in the story are superficial, they aren't actually the same. It's not repeating yourself if two people have the same hair colour, it's repeating yourself if you describe the same person more than once.

Ways Of Dealing With Code Duplication

Develop A Smell For Duplicated Code

Before deciding how to deal with code duplication, it should make you uneasy when you do it. It should make you feel like your job isn't complete until you remove it. You should still be able to decide to keep it, at least temporarily, but it shouldn't be a normal event. This is why the DRY principle is instilled so heavily, because this goes a long way in reducing instances of code duplication.

Don't Be A One Trick Pony

Constructs like functions, classes, closures, higher-order functions, and inheritance are useful ways of organising code and dealing with code duplication.

But a lot of issues around organisation and code duplication result from people automatically reaching for inheritance, for example, because it's the only thing they know.

Instead, be familiar with many different constructs, and understand the situations where they best apply.

These different constructs make different assumptions about how concepts relate to each other.

For example, inheritance assumes "is-a" relationships, whilst fields and instances assume "has-a" relationships.

Different constructs and patterns assume different relationships between concepts, and the programmer has to match these concepts to the domain they're working with.

Are these two pieces of functionality really the same concept? Or is it just a temporary coincidence, something that could change when the requirements change?

A chair and a human both have legs, but it's probably not a good idea to make them extend "LeggedObject".

Deferred Commitment

Before deciding to automate or deduplicate, do something manually a few times to make sure you understand what the commonalities are.

Once the code is working and the patterns are clear, you can deduplicate.

Before committing the code, refactor it to remove the duplication.

Extract First

I think that extraction is more important than deduplication. By extraction, I mean moving a piece of code elsewhere so that a unit of code only deals with one level of abstraction.

I think if you take a piece of code that is extracted into a similar function 4 times, then it's easier to identify the duplication, and easier to change even if you keep the duplication, because you only have one thing to focus on at a time.

Use Linting Tools To Identify Duplication

Linting tools often have copy and paste detectors that detect the amount of code duplication and tells you where to find them. This is a good way to know how serious a problem code duplication is within your codebase, and tells you where to focus your effort. It will also highlight places that you missed. You can configure it to prevent you from committing if copy-and-pasting goes beyond a certain level, or you could make it warn you as you're editing.

Conclusion

Code duplication is generally a bad thing, but it must be handled with care. By understanding the tools and constructs available to you, as well as the problem at hand, you can minimise code duplication without making your codebase difficult to work with.

PreviousNext

Subscribe to my blog



© 2023