In this article I'm going to discuss a problem few people think of. Computer simulation of various processes becomes more and more widespread. This technology is wonderful because it allows us to save time and materials which would be otherwise spent on senseless chemical, biological, physical and other kinds of experiments. A computer simulation model of a wing section flow may help significantly reduce the number of prototypes to be tested in a real wind tunnel. Numerical experiments are given more and more trust nowadays. However, dazzled by the triumph of computer simulation, nobody notices the problem of software complexity growth behind it. People treat computer and computer programs just as a means to obtain necessary results. I'm worried that very few know and care about the fact that software size growth leads to a non-linear growth of the number of software bugs. It's dangerous to exploit a computer treating it just as a big calculator. So, that's what I think - I need to share this idea with other people.
At first I intended to entitle this article something like "If programmers can't create medicines, why can medics create programs?" Take some imaginary programmer - he is not allowed to develop and prepare medicines. The reason is obvious: he doesn't have the necessary education for that. However, it's not that simple with programming. It may seem that an imaginary medic who has learned how to program will by default be a successful and useful programmer - especially given that a more or less acceptable programming skill is much easier to master than organic chemistry and principles of medicine preparation.
Here lies a trap. A computer experiment requires as much care as a real one. Lab workers are taught to wash test tubes after experiments and make sure they are sterile. But few really care about the problem of some array accidentally remaining uninitialized.
Programmers are well aware that the more complex software is, the more complicated and subtle bugs occur in it. In other words, I'm speaking of non-linear growth of the number of bugs accompanying the code size growth. Programs performing chemical or any other scientific computations are far from being simple, aren't they? Here's where the danger is. It's OK that a medic-programmer makes mistakes. Any programmer, however skillful, makes them from time to time. What's not OK, people tend to trust these results more and more. You calculate something and go on with your business.
Those engaged in programming as their professional activity know how dangerous this approach is. They know what an undefined behavior is and how a program may only pretend to work well. There are numbers of articles and books explaining how to correctly develop unit tests and ensure the correctness of computations.
Such is the world of programmers. But the world of chemists/physicists/medics is not that way, I'm afraid. They never write a complex program - they simply don't think that way. They use the computer as if it were just a Big Calculator. This comparison was suggested by one of our readers. Let me quote his comment in full here, so that English-speaking readers could learn about it too, once the article is translated.
I can tell you something on this subject from my own experience. Though being a professional programmer, I actually come of a family of physicists and have physics education. At the moment when I had to choose which university to enter, the call of blood was stronger than my faith in the bright future of IT. So, I entered a physics university, rather prestigious on the local scale, which in fact is a "kindergarten" supervised by a large research institute in my native city Nizhny Novgorod. People who know the subject will at once guess which research institute and which university I mean.
While studying there, I quite naturally proved to be one of the best at programming (and mathematical methods of physical modeling in particular). It was at the same time I also figured out the following things:
1. Physicists tend to view the computer as a large multi-functional calculator allowing you to draw a graph of Eta versus Theta with Gamma going to infinity. As one can naturally expect, they are mainly interested in the graph itself, not the program.
2. As a consequence of the first fact, a programmer is not viewed as a profession. A programmer is just the guy who knows how to use the Big Calculator to draw the needed graph. They don't care which way it will be done - at all. Sorry, what did you say? Static analysis? Version control? Oh, come on, guys! C++ is the language of programmers; physicists write in FORTRAN!
3. As a consequence of the previous fact, anyone who's going to devote his life to writing programs to do physical modeling, even all-purpose ones, even tough as hell ones, is but an appendix to the Big Calculator. He's not even a person - just a kind of... By the way, it was not only me treated in such a way by physicists (I was just an ordinary student, after all) - but even the best computer modeling specialist in the research institute who taught a computational methods course at our university and who, when I turned to him as my thesis adviser while writing my term paper, said to me almost straightforward, "They will despise you, so be prepared to tolerate that".
I didn't want to tolerate that and after graduation left the computer modeling area for the field where programmers are not thought to be untermenschen. I hope this example will help you understand why initiatives like introducing static analysis even into relatively large (about 20 or 30 developers) projects on computer modeling are a hopeless job. There simply might not be a person who knows what it is. And if such a person does happen to be in the team, they'll most likely trample him because they don't need no trendy programmer frills of yours. "We've been doing without them for a hundred years - and will do for more."
Here's another story for those not bored yet. My father, though being a pensioner, still works at a very large defense engineering enterprise here, in Nyzhny Novgorod (it's the largest in our city and one of the largest ones across the country; again, those who know the subject will guess it ;) ). He's been programming in FORTRAN for his entire life. He started at the time when punched cards were in use. I don't blame him for not studying C++. It was already too late for him 10 years ago - and he's still goes on pretty well. However, there are certain security precautions at this enterprise 2/3 of the staff of which are engaged in programming one way or another:
1. No Internet. At all. You need literature - you go to the library. Stack Overflow? What's that? If you need to send an e-mail, you have to submit a written request to the boss explaining whom and what for you want to send it. Only a few chosen ones can use the Internet "against a receipt". Thank God, they have an internal network at least.
2. No administration rights on your computer. Perhaps this restriction makes sense for the white-collar mass, but I can't imagine a programmer feeling satisfied with it.
3. (Doesn't relate to the subject; just an illustration.) You can't even bring a cell phone with an integrated camera (have you seen ones without it nowadays?).
As a result, even young employees write code in FORTRAN, while really skilled programmers are very few. I know that for sure because I trained some guy of 25 whom my father had recommended as a promising programmer.
Here's my verdict: they are stuck in the 80-s there. Even given that they have pretty good salaries, I wouldn't go there for the world.
These are just two examples from the intellectual elite's life. I don't mean to discredit anyone - they do their job well enough, but my heart is bleeding as I watch which windmills my father has to fight sometimes. (Thank God, I've managed to persuade him to start using git recently.) No OOP in a million-line project, no static analysis - nothing.
It's just has to do with the human's trait to be very conservative about areas which are not one's strong points.
Ilja Mayzus. The original comment.
The core of this story is the ideology of treating the computer as a Big Calculator. In that case, you don't need to know any more about it than its younger brother, the pocket calculator, deserves. And it's the way it is actually used in many areas. Let's digress for a while and have a look inside the world of physics. Let's see how another theory finds a confirmation. To do this, I'll again have to quote a large extract from Bryan Greene's book "The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory" [1]:
We all huddled around Morrison's computer in the office he and I shared. Aspinwall told Morrison how to bring his program up on the screen and showed us the precise form for the required input. Morrison appropriately formatted the results we had generated the previous night, and we were set to go.
The particular calculation we were performing amounts, roughly speaking, to determining the mass of a certain particle species—a specific vibrational pattern of a string—when moving through a universe whose Calabi-Yau component we had spent all fall identifying. We hoped, in line with the strategy discussed earlier, that this mass would agree identically with a similar calculation done on the Calabi-Yau shape emerging from the space-tearing flop transition. The latter was the relatively easy calculation, and we had completed it weeks before; the answer turned out to be 3, in the particular units we were using. Since we were now doing the purported mirror calculation numerically on a computer, we expected to get something extremely close to but not exactly 3, something like 3.000001 or 2.999999, with the tiny difference arising from rounding errors.
Morrison sat at the computer with his finger hovering over the enter button. With the tension mounting he said, "Here goes," and set the calculation in motion. In a couple of seconds the computer returned its answer: 8.999999. My heart sank. Could it be that space-tearing flop transitions shatter the mirror relation, likely indicating that they cannot actually occur? Almost immediately, though, we all realized that something funny must be going on. If there was a real mismatch in the physics following from the two shapes, it was extremely unlikely that the computer calculation should yield an answer so close to a whole number. If our ideas were wrong, there was no reason in the world to expect anything but a random collection of digits. We had gotten a wrong answer, but one that suggested, perhaps, that we had just made some simple arithmetic error. Aspinwall and I went to the blackboard, and in a moment we found our mistake: we had dropped a factor of 3 in the "simpler" calculation we had done weeks before; the true result was 9. The computer answer was therefore just what we wanted.
Of course, the after-the-fact agreement was only marginally convincing. When you know the answer you want, it is often all too easy to figure out a way of getting it. We needed to do another example. Having already written all of the necessary computer code, this was not hard to do. We calculated another particle mass on the upper Calabi-Yau shape, being careful this time to make no errors. We found the answer: 12. Once again, we huddled around the computer and set it on its way. Seconds later it returned 11.999999. Agreement. We had shown that the supposed mirror is the mirror, and hence space-tearing flop transitions are part of the physics of string theory.
At this I jumped out of my chair and ran an unrestrained victory lap around the office. Morrison beamed from behind the computer. Aspinwall's reaction, though, was rather different. "That's great, but I knew it would work," he calmly said. "And where's my beer?"
I truly believe they are geniuses. But let's imagine for a moment that it were some ordinary students who used this approach to calculate an integral. I don't think programmers would take it seriously then. And what if the program generated 3 right away? The bug would be taken as the final evidence? I think it would clear up later, during a re-check by themselves or their scientist colleagues. Still, the "ideal spherical programmer in vacuum" is scared to death by this fact.
This is how things are in reality. It's not only personal computers used in such a way - it's also cluster systems exploited for scientific computations. And what's most scary, people trust the results produced by programs. In the future we are going to deal with even more computations of this kind, and the price of having software bugs will become steeper too.
Isn't it time to change something?
Yes, nobody can forbid me to stick a plaster on a cut myself; I guess I can recommend some medicine to take when you've caught a cold. But not more than that. I cannot drill a tooth or write out a prescription.
Don't you find it reasonable that developers creating a software system whose responsibility extends beyond certain scope should also confirm their skill?
I know there exist various certifications. But I'm talking of a different thing now. Certification is intended to ensure that program code conforms to certain standards. It partially prevents slopwork, in an indirect way. But the range of areas where certification is a strict requirement is quite narrow. It obviously does not cover the whole set of areas and situations where careless use of the Big Calculator may do much harm.
I guess many of you find my worries too abstract. That's why I suggest examining a few real-life examples. There is the open-source package Trans-Proteomic Pipeline (TPP) designed to solve various tasks in biology. No doubt, it is used - by its developers and perhaps some third-party organizations. I believe that any bug in it is already a potential issue. And does it have bugs? Yes, it does; and still more are appearing. We checked this project one year ago and reported it in the blog-post "Analysis of the Trans-Proteomic Pipeline (TPP) project".
What has changed since then? Nothing. The project is going on to develop and accumulate new bugs. The Big Calculator ideology has won. The developers are not writing a high-quality project with the minimum number of bugs possible. They simply solve their tasks; otherwise they would have reacted in some way to the last year article and considered introducing some static analysis tools. I don't mean they must necessarily choose PVS-Studio; there are numbers of other static code analyzers. The point is that their responsible application goes on collecting most trivial bugs. Let's see what fresh ones they've got.
In the previous article I mentioned incorrect loop conditions. The new package version has them too.
|
|
PVS-Studio's diagnostic message: V521 Such expressions using the ',' operator are dangerous. Make sure the expression is correct. spectrastpeaklist.cpp 504
In the check "i != this->m_bins->end(), j != other->m_bins->end()", the expression before the comma does not check anything. The ',' operator is used to execute expressions both to the right and to the left of it in the left-to-right order and return the value of the right expression. This is what the correct check should look like:
i != this->m_bins->end() && j != other->m_bins->end()
The same defect can be also found in the following fragments:
This bug won't lead to outputting incorrect computation results - it will cause a crash instead, which is much better. However, it would be strange not to mention these bugs.
|
|
PVS-Studio's diagnostic message: V522 Dereferencing of the null pointer 'pepIndx' might take place. asapcgidisplay2main.cxx 534
The same defect can be also found in the following fragments:
|
|
In this code the analyzer has caught two uncleared arrays at once:
V530 The return value of function 'empty' is required to be utilized. tag.cxx 72
V530 The return value of function 'empty' is required to be utilized. tag.cxx 73
You should call the clear() function instead of empty().
|
|
PVS-Studio's diagnostic message: V603 The object was created but it is not being used. If you wish to call constructor, 'this->ExperimentCycleRecord::ExperimentCycleRecord(....)' should be used. mascotconverter.cxx 101
The ExperimentCycleRecord() constructor does not do what it is intended to; it doesn't initialize anything. The developer might be an excellent chemist, but if he doesn't know how to use the C++ language properly, his computations using uninitialized memory aren't worth a damn. It's like using a dirty test tube.
Instead of calling another constructor, the line "ExperimentCycleRecord(0,0,0,True,False);" creates a temporary object which will be destroyed after that. This error pattern is discussed in detail in the article "Wade not in unknown waters. Part one".
The same defect can be also found in the following fragments:
|
|
PVS-Studio's diagnostic message: V628 It's possible that the line was commented out improperly, thus altering the program's operation logics. interprophetmain.cxx 175
After the 'if' operator, a few lines executing some operations were commented out. As a result, the program logic changed quite differently than expected. The programmer didn't want any actions to be done after executing the condition. Instead, the 'if' operator affects the code below. As a consequence, the tests' output now depends not only on the "testType!=NO_TEST" condition, but on the "getIsInteractiveMode()" condition as well. That is, the test may not test anything. That's why I strongly recommend not to rely fully on one testing methodology alone (for example, TDD).
Misprints are to be found everywhere and all the time. It's not that bad if you get fewer hit points after an explosion in a game than you ought to, because of such a bug. But what do incorrect data mean when computing chemical reactions?
|
|
PVS-Studio's diagnostic message: V519 The 'data->ratio[0]' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 130, 131. asapcgidisplay2main.cxx 131
One and the same variable is by mistake assigned two different values. The correct code is this one:
|
|
This fragment was then copied-and-pasted into other parts of the program:
Comparing signed and unsigned values properly requires some skill. Ordinary calculators don't deal with unsigned values, but the C++ language does.
|
|
PVS-Studio's diagnostic message: V555 The expression 'ppw_ref.size() - have_cluster > 0' will work as 'ppw_ref.size() != have_cluster'. proteinprophet.cpp 6767
The programmer wanted the check "ppw_ref.size() > have_cluster" to be executed. But he got something very different instead.
To make it clearer, let's assume we have the 'size_t' type which is 32-bit. Suppose the function "ppw_ref.size()" returns 10 while the variable have_cluster equals 15. The function ppw_ref.size() returns the unsigned type 'size_t'. According to the C++ rules, the right operand in the subtraction operation must also have the type 'size_t' before subtraction is executed. It's alright for now: we have 10u on the left and 15u on the right.
Here goes the subtraction:
10u - 15u
And here's where we get a trouble. Those very C++ rules tell us that the result of subtraction between two unsigned variables must also be unsigned.
It means that 10u - 15u = FFFFFFFBu. As you know, 4294967291 is greater than 0.
The Big Calculator Riot is successful. Writing a correct theoretical algorithm is only half of the job. You also need to write a correct code.
A similar bug can be found in the following fragment:
|
|
PVS-Studio's diagnostic message: V547 Expression 'b + tau >= 0' is always true. Unsigned type value is always >= 0. spectrastpeaklist.cpp 2058
As you can see, the variable 'tau' takes values within the range [-75, 75]. To avoid array overrun, the check b + tau >= 0 is used. I guess you've already got the point that this check won't work. The variable 'b' has the 'unsigned' modifier. It means that the "b + tau" expression's result is unsigned too. And an unsigned value is always greater than or equal to 0.
|
|
PVS-Studio's diagnostic message: V612 An unconditional 'return' within a loop. residuemass.cxx 1442
There is the 'return' operator inside the loop and it's called in any case. The loop can execute only once, after which the function terminates. It's either a misprint here or some condition is missing before the 'return' operator.
|
|
PVS-Studio's diagnostic message: V636 The 'used_count_ / rts_.size()' expression was implicitly casted from 'int' type to 'double' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. rtcalculator.cxx 6406
Since the function returns values of the double type, I find it reasonable to suppose the following.
When the variable 'used_count_' is assigned the value 5 and the function rts_.size() returns 7, the approximate result is 0,714. However, the function getUsedForGradientRate() will return 0 in this case.
The variable 'used_count_' has the 'int' type. The rts_.size() function also returns an 'int' value. An integer division occurs, and the result is obvious: it's zero. Then zero is implicitly cast to double, but it doesn't matter at this point.
To fix the defect, the code should be rewritten in the following way:
return static_cast<double>(used_count_) / rts_.size();
Other defects of this kind:
The function setPepMaxProb() contains a few large similarly looking blocks. In this fragment one can feel that specific smell of the Copy-Paste technique. Using it naturally results in an error. I had to SIGNIFICANTLY abridge the sample text. The bug is very noticeable in the abridged code, but it's almost impossible to see it in the original code. Yeah, it's an advertisement of static analysis tools in general and PVS-Studio in particular.
|
|
V525 The code containing the collection of similar blocks. Check items 'max3', 'max4', 'max5', 'max6', 'max6' in lines 4664, 4690, 4716, 4743, 4770. proteinprophet.cpp 4664
PVS-Studio's diagnostic message: V525 The code containing the collection of similar blocks. Check items 'max3', 'max4', 'max5', 'max6', 'max6' in lines 4664, 4690, 4716, 4743, 4770. proteinprophet.cpp 4664
Unfortunately, the V525 diagnostic produces many false positives and therefore referred to the third-level warnings. But if one overcomes one's laziness and study this class of warnings, one may find numbers of such nice bugs.
|
|
PVS-Studio's diagnostic message: V614 Potentially uninitialized pointer 'pScanIndex' used. sqt2xml.cxx 476
This program may crash at the end if the function rampOpenFile() returns NULL. It's not critical yet unpleasant.
Here's another variable that may remain uninitialized:
|
|
PVS-Studio's diagnostic message: V599 The virtual destructor is not present, although the 'DiscriminantFunction' class contains virtual functions. discrimvalmixturedistr.cxx 206
A number of classes are inherited from the DiscriminantFunction class. For example, such is the class DiscrimValMixtureDistr. Its destructor frees memory; therefore, it's very desirable that you call it. Unfortunately, the DiscriminantFunction class's destructor is not declared as a virtual one - with all the ensuing consequences.
There are numbers of small defects which won't have serious consequences but are still not very pleasant to have in your code. There are also strange fragments, but I can't say for sure if they are incorrect. Here's one of them:
|
|
PVS-Studio's diagnostic message: V607 Ownerless expression 'done_[charge]'. mixturemodel.cxx 1558
What is it? Incomplete code? Or maybe the programmer just wanted to point it out that nothing should be done if the "done_[charge] < 0" condition is true?
And here you are an incorrect way of freeing memory. Any critical consequences are unlikely, but still the code smells.
|
|
PVS-Studio's diagnostic message: V611 The memory was allocated using 'new T[]' operator but was released using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] pepString;'. pepxfield.cxx 1023
The correct way of doing this is to write "delete [] pepString". There are many other defects of this kind:
And here's an incorrect implementation of the "--" operator. It doesn't seem to be used anywhere, otherwise the bug would quickly reveal itself.
|
|
PVS-Studio's diagnostic message: V524 It is odd that the body of '--' function is fully equivalent to the body of '++' function. charindexedvector.hpp 81
The operators "--" and "++" are implemented in the same way. They must have been copied-and-pasted then:
Let's stop here. It all is not very interesting, and the article is big enough. As usual, I'm urging the developers not to limit themselves to fixing only the mentioned defects. Download and check the project with PVS-Studio yourself. I could have missed many errors. We can even grant you a free registration key for some time.
Unfortunately, the article has appeared a bit tangled. What did the author want to say, after all? I'll try to repeat in a very brief form my ideas I want to share with you.
So, what shall we do?
First of all, I'd like you to realize this problem and tell your colleagues from related areas. It's been clear to programmers for a long time that the software complexity growth and silly mistakes in large projects may easily turn into a source of great harm. On the other hand, those people who treat programming and computers just as a tool don't know that and don't bother to think about it. So, we need to draw their attention to this problem.
Here you are an analogy. Imagine a man who has got him a cudgel and starts hunting some animals. The cudgel in his hands gradually turns into a stone axe, then a sword, and finally a gun. But he still uses it just to stun hares by hitting them on the head. It's not only that this way of using the weapon is absolutely inefficient; it has also become much more dangerous now (he can accidentally shoot himself or his fellow men). Hunters from the "programmers" tribe quickly adapt themselves to these changes. The rest don't have time for that - they are busy hunting hares. After all, it's all about the hares. We need to tell these people that they have to learn, whether they like it or not. It'll improve everyone's life. And waving your gun around is no good.