The separation of design and construction into phases is a hangover from civil engineering. It has the baked in assumption that the design phase is relatively cheap, short and somewhat unpredictable the construction phase is expensive, long and predictable. The root problem is the assumption that specifications can be validated for correctness, like a blueprint for a bridge can. Nothing could be further from the truth. This is a persistent myth in software development.
Yes, it is often (mostly?) true, but not always. In the book I've tried to not give the idea there is only one way to do things. Even so, a few have commented that they've got that impression.
Also, from a practical standpoint, both the design and construction chapters are huge, so combining them doesn't seem to be a good idea. Perhaps I could add your warning though, if you don't mind me quoting you.
I upvoted your comment even though (as a generalization) I disagree that these chapters should even exist as such. If were truly honest about the civil engineering metaphor as it applies to software, the design of a system is indeed the source code. The construction phase is done by the compiler. So your construction chapter should be about compilers and interpreters. Your design chapter should be about the explication of the stakeholder's intent in the form of human-readable code: the source code itself. What do you think?
Hmm, you've blown my mind. I've used traditional metaphors to describe and organize, such as the sdlc and topics like construction as defined by books like Code Complete. In other words, about/around these topics, not strictly limited to their concrete form or process.
I suppose it isn't the only way to look at things, just common. Though we are approaching the philosophical realm at this point… reminds me of the section of Philosophy class where you learn to question if you can even trust your own senses. I'm not sure, however this is a book for beginners, and I sit on the shoulders of those that came before. Not sure I'm qualified to reimagine software engineering from the ground up as you describe. If you write that book, I'd read it!
Try Jack Reeves from 93: http://www.mikadosoftware.com/articles/thecodeisthedesign
I am also writing a "this much I know" style book, and would be very interested to hear your trials and tribulations
I am very impressed you got from pile of notes to finished product - bravo.
Thanks, don't do it! Unless writing comes to you easily? The most depressing thing is "finishing" the book and then have to spend a month hacking css and build scripts to work around kindle bugs and the fascists at the ibooks store.
> the design phase is relatively cheap
In my world (enterprise software), the design phase is usually the most expensive phase, as it tends to be staffed with expensive architect/designer/technical lead-level folks.
> The separation of design and construction into phases is a hangover from civil engineering.
Yes and it is a pretty good principle. The last thing you want in a software project is a design which changes all the time.
Except a design that doesn't change after a certain arbitrary phase of a project simply guarantees your best outcome will be to deliver what the customer said they wanted once upon a time.
> The root problem is the assumption that specifications can be validated for correctness, like a blueprint for a bridge can.
We can validate specifications for correctness with automated theorem proving! The least we can do is model-checking. At least for the hard algorithms and core system interactions. We've had this ability for decades.
The problem with software development is that we get all hand-wavy about "architecture," and "design." I try not to physically groan when someone starts sketching out a box diagram. They're fancy pictures, sure, and useful at a conceptual level... but the reality is that the software will look nothing like that diagram. And there's no way to check that the diagram is even correct! Useless!
It doesn't have to be painfully slow, burdensome, and expensive to employ these tools either. Contrary to popular belief it doesn't take a team of highly trained PhD physicists to build a formal mathematics for modelling computations these days. There are plenty of open-source tools for using languages like TLA+ that work great besides! Ask Amazon -- they published a paper about it.
Watch Leslie Lamport - Who Builds a Skyscraper without Drawing Blueprints?: https://www.youtube.com/watch?v=iCRqE59VXT0
I draw pictures because I think in pictures.
I just took a short look at TLA+ and read a tutorial.
Don't think I could use it without drawing a picture.
Drawing a picture is a perfectly valid and legitimate start. However I think there are many problems where a picture is not sufficient and it's necessary to have good, reliable mathematics to sketch out our designs and prove properties of them.
A sketch is still useful the the absence of nothing else. But we don't build bridges and skyscrapers that way so why is it good enough for our data-centres and applications?
I don't think lives have to be on the line in order for software to be considered harmful if not built correctly. There are plenty of problems where having a thoroughly checked design will save you plenty of headaches and afford you many more opportunities.
> However I think there are many problems where a picture is not sufficient and it's necessary to have good, reliable mathematics to sketch out our designs and prove properties of them.
It's necessary to have a formal language to describe this design. Such a formal language needs to have one and only one interpretation. We should be able to take a document written with that formal language and verify that it meets our assumptions. We should also be able to verify that the final implementation is isomorphic to the design.
Luckily such tools are ubiquitous. You can use whatever programming language you like to specify your design. In the case where the programming language is defined poorly, then you must also decide what build system and run time environments you will support ahead of time.
You can document your assumptions using the same programming language. This is made easier if you use a "test framework", but if needed you can also roll your own. You can validate your design by running the tests against your design. Normally it is best to break up your design into "units" that make it more clear how to validate your assumptions. You can then add some extra validation that the composition of "units" transitively validates your assumptions.
You can verify that your final produce adheres to that design because there will be a 1:1 mapping from the design to the final product (i.e. the final product will be implemented using the same code as your design). There are wonderful tools that will tell you the "coverage" of code used in the final product that has been validated against the assumptions.
Finally, you can even specify your requirements using the same formal language and verify that the design meets the requirements. Normally you do this by validating that the "integration" of "units" meets your assumptions. This should not be done without individually validating the assumptions on the "units", though, because the "integration" can lead to exponentially growing alternatives. Normally it is infeasible to validate each alternative.
Yes, this reply is tongue in cheek, but I am not ignorant of formal specification methods. They have their place. That place is currently not in a professional software development shop. We have better methods at the moment. Possibly formal specifications methods will improve to the point where we can reasonably use them, but we aren't there yet.
> Possibly formal specifications methods will improve to the point where we can reasonably use them, but we aren't there yet.
I disagree. Amazon has had great success employing TLA+ in finding bugs, testing design changes, and chasing aggressive optimizations [0].
Perhaps it is because there are myths that are still floating around regarding formal methods that still make developers cringe when they hear mention of them [1].
None the less I couldn't find reference to it in the book... did I miss it?
And besides... unit tests, I'm sure you are aware, aren't good enough alone. They can only increase your confidence in an implementation but they prove nothing.
If we want to start calling ourselves engineers I think we better start looking to how the other engineering disciplines operate. I don't think it will be long before an insurance company comes along and finds actuaries capable of monitoring risk in software development projects and liability becomes a necessary concern.
[0] http://research.microsoft.com/en-us/um/people/lamport/tla/fo...
[1] http://www.fm4industry.org/index.php/G-HM-2
> I disagree. Amazon has had great success employing TLA+ in finding bugs, testing design changes, and chasing aggressive optimizations [0].
I googled TLA+ examples and tutorials and I am wondering how I would apply this to an already existing application?
I look for the critical bottlenecks in the application where the cost of failure or mistakes is fairly high. This is where I feel I will get the most "bang for the buck" so to speak.
You don't have to specify the entire application to get the benefit of high level specifications. Even specifying the protocol between two communicating channels can bring benefits.
I had to think about if for a bit, but you are right. My statement was waaay too general. Especially for communication protocols formal specifications are useful now. I still think we have a long way to go before we will be using tools like this every day. The main issue is whether or not it is easier to reason about the correctness of something by inspecting its design or its implementation. We can never prove that the design is correct, only that it is isomorphic to the requirements and the implementation. If there is only one document, the implementation, then the proof of isomorphism is trivial. That was my point. I personally believe that for user facing behaviour is easier and clearer to express with with tests (either integration or unit) and I really don't expect that to change in the near future.
Anyway, I regret the tone of my previous message, which mostly made me look foolish, and thank you for your kind response.
> there are many problems where a picture is not sufficient
like there are many problems where text is not sufficient
like there are many problems where code is not sufficient
like there are many problems where documentation is not sufficient
like there are many problems where using the right tool for the job is not sufficient
Statecharts. Processing. UML model-driven development.
I was under the impression UML gave up the ghost(?)
There's literally millions in commercial and academic activity around it. Maybe tens of millions. It's not dead yet despite me wishing it never had life to begin with haha.
None of which can be validated for correctness.
Really? First results on Google:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.377....
http://researcher.watson.ibm.com/researcher/files/il-RONENL/...
I've seen plenty more work on UML as academics & commercial vendors are still all over it. I couldn't find an example for "Processing" because they picked the stupidest name possible: a word so overused I'm getting results from the food industry, compilers, IRS, and computers all at once.
When you can build a system from UML, let's talk about that "formal provability" again.
UML specs them at an abstract level. You build them in code or HDL's. Few, formal methods can do both. Generally, you prove properties in the abstract in one and correspondence with the other. Code-level proofs only came online only in past 5-10 years or so in a subfield 40 years old.
So, UML would let you specify data and behavior then confirm properties about them, catch inconsistencies in requirements, or aid integrations. SysML is used for this in industry with verification results in academia even for UML as I showed. So, it's reality rather than theory even if you or I think better methods exist. I'll take a combo of Z, B, CSP, and/or Statecharts over UML anyday. Coq and HOL if I was specialist enough.
Not saying you're wrong: At the end of the day to the degree that any of these methods (UML, TLA+, whatever) formally prove correctness is the same degree that they gain the features of a turing-complete programming language. Which comes back to the notion design is the source code itself and represents the majority of the work. The compiler or interpreter is the "construction" phase of the project.
Formal validation are perfect for when you know exactly what problems you're solving. Unless a business is offering a very specific solution like that, then it won't help them. If you're writing security software or low-level system controls or something. But if your customers aren't actually engineers themselves, formal validation is unnecessary and inefficient.
>There are plenty of open-source tools for using languages like TLA+ that work great besides!
Does your team know TLA+? Is it an efficient use of their time to learn it? Are a bunch of TLA+ beginners going to crank out properly written software?
> Formal validation are perfect for when you know exactly what problems you're solving.
Or you're at the point where formal validation is a requirement.
There's nothing stopping one from using any of these tools as design tools. I'm working on a problem to check whether items delivered by a single producer on a FIFO queue with N workers can guarantee all work items will eventually be attended to. I could just write the code for that but I'll never prove the system works that way. The best one can do is gain confidence that for the prescribed scenarios it will work. You can get good coverage and use all of the tools we know to release something others may decide to use... but then you'll be fixing your errors after the fact when they are discovered and reported by your users. Or in the case of a system as large as AWS you may find that obscurity is no longer a comforting buffer... 1 in 100000 becomes a frequent occurrence.
update: There's nothing preventing you from only using formal methods on the critical parts of your system where a high degree of reliability is useful. One does not need to formally verify everything to gain the benefits.
> But if your customers aren't actually engineers themselves, formal validation is unnecessary and inefficient.
I think it depends on the difficulty of the problems you address with your software. If you're just sorting some lists or making system calls you might not want to bother. However if you want to guarantee consensus in the face of delays and partitions you'll need more than code to make any strong claims of efficacy. And if the public interest relies on your system I think it's necessary to serve them in the best capacity using the state of the art tools.
> Does your team know TLA+? Is it an efficient use of their time to learn it?
No. They could pick it up in a couple of weeks if necessary. It is an efficient use of their time: fewer bugs at scale means less downtime and less time spent chasing errors that slipped through.
> Are a bunch of TLA+ beginners going to crank out properly written software?
Are a bunch of beginner programmers going to crank out properly written software? There are different levels of experience and skill on any team. With training and diligence even beginners can learn to adopt the skill and ability to recognize when and where to use high level specifications and how to abstract systems into mathematical models they can test and prove.
Until then I suppose we have to live in a world where security breaches are common and the recall rates on cars will continue to increase as unreliable software continues to cause failures, lives, etc.
> I could just write the code for that but I'll never prove the system works that way.
Or you could just prove from first principles, like we all learn in formal algorithm design theory.
The question is not whether proofs are valuable, the question is whether translating a system into TLA and using that proof checker is more reliable and saves you time over just attempting a proof directly.
Even if your TLA proof checks out, it may be a false positive because it doesn't accurately reflect your production code.
By all means write your proof whichever way you see fit.
What TLA+ and other languages give you is an automated theorem prover or model checked that usually is built on some form of temporal logic and predicate calculus. This is especially good at modelling multiple communicating processes. Taking a logical approach to discrete maths has a few benefits here.
> Even if your TLA proof checks out, it may be a false positive because it doesn't accurately reflect your production code.
This is something one needs to be concerned with when writing high-level, abstract specifications. There's no way yet that I know of where we can synthesize the program from the specification although I am aware of research in that regard. However we can still gain the benefit of well-defined invariants and pre/post-conditions that we can use in testing our implementation. For now your implementation will be separate from your spec but you gain insights from the spec you would not otherwise if you had only written code.
update as to whether it is an efficient use of time to use a model checker or automated theorem prover... Well I think the reason for their invention was to specifically to handle the tedious task of proofs in formal maths. Some operations are tedious and mechanical and computers are better suited to the task than humans.
One area I find interesting is languages with dependent type systems like Agda and Idris. It seems like we're not too far away from being able to model and prove our specifications directly in the type system alongside the program that implements it.
They're good evrn for exploration if you're using formal specs. Reason is good ones knock out errors due to English ambiguity and inconsistencies. Z, Navy's SCR, B method, and Statecharts were all used for these purposes with success. Executable specifications did even better for exploration aspect.
Current research is taking it further with code generation from specs like AADL, UML, and especially SCADE/Esterel.