An illustration of a fly swatter.
Debugging is inevitable because defects are inevitable
Chapter 14

Debugging

by Amy J. Ko

Despite all of your hard work at design, implementation, and verification, your software has failed. Somewhere in its implementation there’s a line of code, or multiple lines of code, that, given a particular set of inputs, causes the program to fail.  Debugging  is the activity of finding those causes and identifying changes to code that will prevent those failures. Of course, because defects are inevitable in code that human developers write, debugging is no niche process in software engineering: it is a central, critical, and challenging activity that is part of nearly all aspects of creating software.

Before we can talk about debugging, it’s first important to consider what counts as a “bug”. This term, which according to computer science mythology, first emerged from actual insects finding their way into the vacuum tubes of mainframes, is actually quite vague. Is the “bug” the incorrect code? Is it the faulty behavior that occurs at runtime when that incorrect code executes? Is it the problem that occurs in the world when the software misbehaves, such as a program crashing, or an operating system hanging? “ Bug ” actually refers to all of things things, which makes it a colloquial, but imprecise term.

To clarify things, consider four definitions 6 6

Amy J. Ko, Brad A. Myers (2005). A framework and methodology for studying the causes of software errors in programming systems. Journal of Visual Languages & Computing.

.

To begin, let’s consider  program behavior , which we will define as any program output, at either a point in time, or over time, that is perceived or processed by a person or other software. Behavior, in this sense, is what we see programs do: they crash, hang, retrieve incorrect information, show error codes, compute something incorrectly, exhibit incomprehensible behavior, and so on. Program behavior is what  requirements  attempt to constraint (e.g., “the program should always finish in less than one second” is a statement about the program’s behavior over time.)

Given this definition of behavior, we can then define a  defect  is some set of program fragments that may cause program behavior that is inconsistent a program’s requirements. Note that this definition actually has some non-obvious implications. First, defects do not necessarily cause problems; many defects may actually never be executed, or never executed with inputs that cause a program to misbehave. Second, defects can only be defined as such to the extent that requirements are clear. If you haven’t written those requirements down in an unambiguous way, there will be debate about whether something is defect. Take, for example, a web application that has SQL injection security vulnerabilities, but the for the purpose of learning how to identify such vulnerabilities. Those aren’t defects because they are there intentionally.

fault  is a program state caused by a defect that may result in a program behavior inconsistent with a program’s requirements. For example, imagine a program that is supposed to count from 1 to 10 using a variable to track and increment the current number, but with a defect that causes it to start at 0. The fault is the value of that variable when it is set to 0. When it is set to 1 through 10, there’s nothing faulty about program behavior. Faults, like defects, do not necessarily cause problems. For example, imagine that the same program prints out the current value, but has another defect that unintentionally skips printing the first value. There would be two defects, a fault on the first number, but no undesirable program behavior, because it would still print 1 to 10.

Finally, a  failure  is a program behavior that is inconsistent with a program’s requirements. Failures are what we report in bug reports, what we often mean when we say “bug”, and ultimately what matters in the world, as program behavior is what programs do to the world. To use our terminology then, we would say that “ defects may cause faults, faults may cause failures,  and failures may cause consequences in the world ” 

What then, is debugging, using this terminology?  Debugging  is any activity that, given a report of a failure, seeks to identify the one or more defects that caused one or more faults, which caused the failure, and then making changes to a program to eliminate the associated defects. How to do this, of course, is the hard part. Therefore, debugging is inherently a process of searching—for faults that cause failures, and defects that cause faults. What’s being searched when debugging is the thousands, millions, or perhaps even billions of instructions that are executed when a program executes (causing faults), and the thousands, or even millions of lines of code that might have have caused those faults.

Research and practice broadly agree: finding defects quickly and successfully requires systematic, and sometimes scientific investigations of causality 13 13

Andreas Zeller (2009). Why programs fail: a guide to systematic debugging. Elsevier.

. And yet, despite decades of research and practice, most developers never learn the skills for debugging systematically and don’t know how to properly use debugging tools to support systematic debugging. In fact, most still rely in basic print statements, partly because they are the most available and flexible tool 2 2

Beller, M., Spruit, N., Spinellis, D., & Zaidman, A (2018). On the dichotomy of debugging behavior among programmers. ACM/IEEE International Conference on Software Engineering.

. Despite this, debugging systematically has a few essential phases.

The first phase is  reproducing  the failure, so that the program may be inspected for faults, which can be traced back to defects. Failure reproduction is a matter of identifying inputs to the program (whether data it receives upon being executed, user inputs, network traffic, or any other form of input) that causes the failure to occur. If you found this failure while  you  were executing the program, then you’re lucky: you should be able to repeat whatever you just did and identify the inputs or series of inputs that caused the problem, giving you a way of testing that the program no longer fails once you’ve fixed the defect. If someone else was the one executing the program (for example, a user, or someone on your team), you better hope that they reported clear steps for reproducing the problem. When bug reports lack clear reproduction steps, bugs often can’t be fixed 3 3

Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann (2008). What makes a good bug report?. ACM SIGSOFT Foundations of Software Engineering (FSE).

.

Once you can reproduce a failure, the next phase is to minimize the failure-inducing input 12 12

Andreas Zeller, Ralf Hildebrandt (2002). Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering.

. Imagine, for example, a program that, given a string  "abcdefg" , is supposed to print all of the vowels in the program in the sequence they appear ( "ae" ), but instead produces just  "a" . The intuition behind minimizing failure-inducing inputs is that they reduce the complexity of the search space in debugging. For example, in our example, we might find that entering the string  "abcde"  causes the same failed output of  "a" , or even shorter, that just the string  "ae"  causes the failure. To minimize that failure-inducing input, one could just randomly remove parts of the input to find the smallest input that still causes the problem. More effective is to search the input space systematically, perhaps by using knowledge of the program behavior (e.g., vowels are what matter, so get rid of the consonants, as we did above), or even more systematic, doing something like a binary search of the input (e.g., trying successively smaller halves of the string until finding the smallest string that still causes the problem). Note that minimizing failure-inducing input applies to  any  kind of input, not just data: you can also minimize a program, excluding lines you believe are irrelevant to the failure, finding the smallest possible program that still causes the failure.

Once you have your minimized program and input, the next phase is to  localize  the defect, trying to identify the cause of the failure in code. There are many different strategies for localizing defects. One of the simplest strategies is to work forward:

  1. Set a breakpoint to the beginning of the program.
  2. Reproduce the failure. (If the program doesn’t pause, then either the line with the breakpoint doesn’t execute, the debugger is broken, or you didn’t reproduce the failure).
  3. Step forward one instruction at a time until the program deviates from intended behavior, monitoring program state and control flow after each step.
  4. This step that deviates or one of the previous steps caused the failure.

This process, while straightforward, is the slowest, requiring a long, vigilant search. A more efficient scientific strategy can leverage your knowledge of the program by guiding the search with hypotheses you generate 5 5

David Gilmore (1991). Models of debugging. Acta Psychologica.

:

  1. Observe the failure
  2. Form a hypothesis about what caused the failure
  3. Identify ways of observing program behavior to test your hypothesis.
  4. Analyzing the data from your observations
  5. If you’ve identified the defect, move on to the repair phase; if not, return to step 2.

The problems with the strategy above are numerous. First, what if you can’t generate a hypothesis? What if you can, but testing the hypothesis is slow or impossible? You could spend  hours  generating hypotheses that are completely off-base, effectively analyzing all of your code and its executions before finding the defect.

Another strategy is working backwards 7 7

Amy J. Ko and Brad A. Myers (2008). Debugging reinvented: asking and answering why and why not questions about program behavior. ACM/IEEE International Conference on Software Engineering.

.

  1. Observe the failure
  2. Identify the line of code that caused the failing output
  3. Identify the lines of code that caused the line of code in step 2 and any data used on the line in step 2
  4. Repeat three recursively, analyzing all lines of code for defects along the upstream chain of causality until finding the defect.

This strategy  guarantees  that you will find the defect if you systematically check all of the upstream causes of the failure. It still requires you to analyze each line of code and potentially execute to it in order to inspect what might be wrong, but it requires potentially less work than guessing. As we discussed in the  Comprehension  chapter, tools can automate some of this process 7 7

Amy J. Ko and Brad A. Myers (2008). Debugging reinvented: asking and answering why and why not questions about program behavior. ACM/IEEE International Conference on Software Engineering.

.

Yet another strategy called  delta debugging  is to compare successful and failing executions of the program 11 11

Andreas Zeller (2002). Isolating cause-effect chains from computer programs. FSE.

.

  1. Identify a successful set of inputs and minimize them
  2. Identify a failing set of inputs and minimize them
  3. Compare the differences in program state from the successful and failing executions during execution
  4. Identify a change to input that minimizes the differences in states between the two executions
  5. Variables their values that are different in these two executions contain the defect

This is a powerful strategy, but only when you have successful inputs and when you can automate comparing runs and identifying changes to inputs.

For particularly complex software, it can sometimes be necessary to debug with the help of teammates, helping to generate hypotheses, identify more effective search strategies, or rule out the influence of particular components in a bug 1 1

Jorge Aranda and Gina Venolia (2009). The secret life of bugs: Going past the errors and omissions in software repositories. ACM/IEEE International Conference on Software Engineering.

. In fact, some work shows that following systematic debugging strategies can make novice developers just as effective as experienced ones 8 8

Thomas D. LaToza, Maryam Arab, Dastyni Loksa, Amy J. Ko (2020). Explicit programming strategies. Empirical Software Engineering.

, but that many novices struggle to use them, either because they are overconfident, they struggle to self-regulate their work, or because they lack sufficient prior knowledge to execute the strategies, and have to revert to simpler ones that require less knowledge.

Ultimately, all of these strategies are essentially search algorithms, seeking the events that occurred while a program executed with a particular set of inputs that caused its output to be incorrect. Because programs execution millions and potentially billions of instructions, these strategies are necessary to reduce the scope of your search. This is where debugging  tools  come in: if you can find a tool that supports an effective strategy, then your work to search through those millions and billions of instructions will be greatly accelerated. This might be a print statement, a breakpoint debugger, a performance profiler, or one of the many advanced debugging tools beginning to emerge from research.

Once you’ve found the defect, what do you do? It turns out that there are usually many ways to repair a defect. How developers fix defects depends a lot on the circumstances: if they’re near a release, they may not even fix it if it’s too risky; if there’s no pressure, and the fix requires major changes, they may refactor or even redesign the program to prevent the failure 9 9

Emerson Murphy-Hill, Thomas Zimmermann, Christian Bird, and Nachiappan Nagappan (2013). The design of bug fixes. ACM/IEEE International Conference on Software Engineering.

. This can be a delicate, risky process: in one study of open source operating systems bug fixes, 27% of the incorrect fixes were made by developers who had never read the source code files they changed, suggesting that key to correct fixes is a deep comprehension of exactly how the defective code is intended to behave 10 10

Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram (2011). How do fixes become bugs?. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

.


Because debugging can be so challenging, and because it is so pervasive and inescapable in programming, it is often a major source of frustration and unpredictability in software engineering. However, finding a defect after a long search can also be a great triumph 4 4

Marc Eisenstadt (1997). My hairiest bug war stories. Communications of the ACM.

, bringing together the most powerful aspects of developer tools, the collective knowledge of a team, and the careful, systematic work of a programmer, trying to make sense of code. As with all things in software engineering, persistence and patience is rewarded.

References

  1. Jorge Aranda and Gina Venolia (2009). The secret life of bugs: Going past the errors and omissions in software repositories. ACM/IEEE International Conference on Software Engineering.

  2. Beller, M., Spruit, N., Spinellis, D., & Zaidman, A (2018). On the dichotomy of debugging behavior among programmers. ACM/IEEE International Conference on Software Engineering.

  3. Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann (2008). What makes a good bug report?. ACM SIGSOFT Foundations of Software Engineering (FSE).

  4. Marc Eisenstadt (1997). My hairiest bug war stories. Communications of the ACM.

  5. David Gilmore (1991). Models of debugging. Acta Psychologica.

  6. Amy J. Ko, Brad A. Myers (2005). A framework and methodology for studying the causes of software errors in programming systems. Journal of Visual Languages & Computing.

  7. Amy J. Ko and Brad A. Myers (2008). Debugging reinvented: asking and answering why and why not questions about program behavior. ACM/IEEE International Conference on Software Engineering.

  8. Thomas D. LaToza, Maryam Arab, Dastyni Loksa, Amy J. Ko (2020). Explicit programming strategies. Empirical Software Engineering.

  9. Emerson Murphy-Hill, Thomas Zimmermann, Christian Bird, and Nachiappan Nagappan (2013). The design of bug fixes. ACM/IEEE International Conference on Software Engineering.

  10. Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram (2011). How do fixes become bugs?. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

  11. Andreas Zeller (2002). Isolating cause-effect chains from computer programs. FSE.

  12. Andreas Zeller, Ralf Hildebrandt (2002). Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering.

  13. Andreas Zeller (2009). Why programs fail: a guide to systematic debugging. Elsevier.