Ask HN: How to best document large, legacy systems?

8 points by becquerel 3 years ago

I'm a relatively new developer coming into a 10+-year-old business-critical app with the expected share of arcane domain knowledge and unintuitive behaviour.

We have some very high-level overviews of the UI but very little documentation on what the system 'should' do for a given interaction. This means that I and the other devs are constantly second-guessing what the expected behaviour is. Authoritative answers from the business come slowly and piecemeal, usually via word-of-mouth, and aren't recorded in any systematic way.

I've been trying to do my part by documenting what I learn on the development team's personal wiki, but it doesn't feel like enough. Does anyone have experience writing after-the-fact specifications for legacy software? How did you sell it to management? Were there any pitfalls you fell into? Most importantly of all, I guess: was it worth it?

obviouslynotme 3 years ago

You should never trust anyone's answer on what the system does. The system is the only source of truth. Hopefully there is another developer there who can give you answers to broader questions. Answer specific questions yourself using the system and its code.

The first task is divide and conquering the code base. Every code base has an inherent architecture to it. Those without are usually small since it rapidly becomes impossible to grow them after a point. The first division is usually INPUT/LOAD/PROCESS/STORE/OUTPUT in poorly modular code bases. The main objective is to get a rough map of how data moves through the system by recursively tracing and naming places. Make lots of notes and save them in a grepable text file/folder.

The next task is to tame the beast. You do this by creating or setting up tests. Take typical customer inputs and save outputs. The more of these, the better. You don't have to run the tests constantly. They will take forever most likely. You do however have to run them regularly enough that breaks don't live too long. The reason for this work is so that you have the confidence to make changes without breaking everything.

While all of this is happening, management will expect you to be doing things. This is where you take out tech debt and make ugly bug fixes on top of the many other ugly bug fixes. As you grow in understanding of the system, you will have a better idea how it could be architected. You will know what code stays the same, what code changes, and what kinds of features are added. Begin spending at least half your time teasing apart the code base into modules that reflect the system and its needs. Management will never approve this. You just have to do it.

tra3 3 years ago

Good answers here already.

I'll throw in C4 Model [0] in here. It's a tool to visualize how system components connect at various level.

Unit tests, and code will help you understand the system bottom up. Documenting with C4 will give you an idea how the system is arranged top down. The context, component connections, etc.

I'm partial to C4-PlantUML [1] because you can diagram your system using code; you can version it. You don't have to worry about the layout. It gets out of your way.

[0]: https://c4model.com/

[1]: https://github.com/plantuml-stdlib/C4-PlantUML

sorokod 3 years ago

Start by familiarizing yourself with the test suite if there is one. If there isn't write one.

mikewarot 3 years ago

It sounds like you need to build a big list of all the things you're not certain about AND all the assumptions you've already made, and shop them around the people who DO have the domain knowledge, and ask them to "make sure all this makes sense", be up-front and humble about your lack of domain knowledge, and you should do just fine.