Comprehension
Despite all of the activities that we’ve talked about so far—communicating, coordinating, planning, designing, architecting—really, most of a software engineers time is spent reading code 16 16 Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke (2014). On the comprehension of program comprehension. ACM Transactions on Software Engineering and Methodology (TOSEM).
Being good at program comprehension is a critical skill. You need to be able to read a function and know what it will do with its inputs; you need to be able to read a class and understand its state and functionality; you also need to be able to comprehend a whole implementation, understanding its architecture. Without these skills, you can’t test well, you can’t debug well, and you can’t fix or enhance the systems you’re building or maintaining. In fact, studies of software engineers’ first year at their first job show that a significant majority of their time is spent trying to simply comprehend the architecture of the system they are building or maintaining and understanding the processes that are being followed to modify and enhance them 6 6 Barthélémy Dagenais, Harold Ossher, Rachel K. E. Bellamy, Martin P. Robillard, and Jacqueline P. de Vries (2010). Moving into a new software project landscape. ACM/IEEE International Conference on Software Engineering.
What’s going on when developers comprehend code? Usually, developers are trying to answer questions about code that help them build larger models of how a program works. Because program comprehension is hard, they avoid it when they can, relying on explanations from other developers rather than trying to build precise models of how a program works on their own 19 19 Tobias Roehm, Rebecca Tiarks, Rainer Koschke, and Walid Maalej (2012). How do professional developers comprehend software?. ACM/IEEE International Conference on Software Engineering.
Thomas D. LaToza, Brad A. Myers (2010). Developers ask reachability questions. ACM/IEEE International Conference on Software Engineering.
Jonathan Sillito, Gail C. Murphy, and Kris De Volder (2006). Questions programmers ask during software evolution tasks. ACM SIGSOFT Foundations of Software Engineering (FSE).
- Which type represents this domain concept or this UI element or action?
- Where in the code is the text in this error message or UI element?
- Where is there any code involved in the implementation of this behavior?
- Is there an entity named something like this in that unit (for example in a project, package or class)?
- What are the parts of this type?
- Which types is this type a part of?
- Where does this type fit in the type hierarchy?
- Does this type have any siblings in the type hierarchy?
- Where is this field declared in the type hierarchy?
- Who implements this interface or these abstract methods?
- Where is this method called or type referenced?
- When during the execution is this method called?
- Where are instances of this class created?
- Where is this variable or data structure being accessed?
- What data can we access from this object?
- What does the declaration or definition of this look like?
- What are the arguments to this function?
- What are the values of these arguments at runtime?
- What data is being modified in this code?
- How are instances of these types created and assembled?
- How are these types or objects related?
- How is this feature or concern (object ownership, UI control, etc) implemented?
- What in this structure distinguishes these cases?
- What is the “correct” way to use or access this data structure?
- How does this data structure look at runtime?
- How can data be passed to (or accessed at) this point in the code?
- How is control getting (from here to) here?
- Why isn’t control reaching this point in the code?
- Which execution path is being taken in this case?
- Under what circumstances is this method called or exception thrown?
- What parts of this data structure are accessed in this code?
- How does the system behavior vary over these types or cases?
- What are the differences between these files or types?
- What is the difference between these similar parts of the code (e.g., between sets of methods)?
- What is the mapping between these UI types and these model types?
- How can we know this object has been created and initialized correctly?
If you think about the diversity of questions in this list, you can see why program comprehension requires expertise. You not only need to understand programming languages quite well, but you also need to have strategies for answering all of the questions above (and more) quickly, effectively, and accurately.
So how do developers go about answering these questions? Studies comparing experts and novices show that experts use prior knowledge that they have about architecture, design patterns, and the problem domain a program is built for to know what questions to ask and how to answer them, whereas novices use surface features of code, which leads them to spend considerable time reading code that is irrelevant to a question 13,23 13 Thomas D. LaToza, David Garlan, James D. Herbsleb, and Brad A. Myers (2007). Program comprehension as fact finding. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Anneliese von Mayrhauser A. Marie Vans (1994). Comprehension processes during large scale maintenance. ACM/IEEE International Conference on Software Engineering.
Dave Binkley, Marcia Davis, Dawn Lawrie, Jonathan I. Maletic, Christopher Morrell, Bonita Sharif (2013). The impact of identifier style on effort and comprehension. Empirical Software Engineering.
Mark Weiser (1981). Program slicing. ACM/IEEE International Conference on Software Engineering.
Scott D. Fleming, Christopher Scaffidi, David Piorkowski, Margaret M. Burnett, Rachel K. E. Bellamy (2013). An information foraging theory perspective on tools for debugging, refactoring, and reuse tasks. ACM Transactions on Software Engineering and Methodology (TOSEM).
Lawrie, D., Morrell, C., Feild, H., & Binkley, D (2006). What's in a name? A study of identifiers. IEEE International Conference on Program Comprehension (ICPC).
Of course, program comprehension is not an inherently individual process either. Expert developers are resourceful, and frequently ask others for explanations of program behavior. Some of this might happen between coworkers, where someone seeking insight asks other engineers for summaries of program behavior, to accelerate their learning 11 11 Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann (2011). Design lessons from the fastest Q&A site in the west. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Andy Begel, Beth Simon (2008). Novice software developers, all over again. ICER.
Shrestha, N., Botta, C., Barik, T., & Parnin, C (2020). Here we go again: why is it difficult for developers to learn another programming language?. ACM/IEEE International Conference on Software Engineering.
While much of program comprehension is individual and social skill, some aspects of program comprehension are determined by the design of programming languages. For example, some programming languages result in programs that are more comprehensible. One framework called the Cognitive Dimensions of Notations 9 9 Thomas R. G. Green (1989). Cognitive dimensions of notations. People and computers.
==
, which behave differently depending on what the type of the left and right operands are. Knowing the behavior for Booleans doesn’t tell you the behavior for a Boolean being compared to an integer. In contrast, Java is a high consistency language: ==
is only ever valid when both operands are of the same type.
These differences in notation can have some impact. Encapsulation through data structures leads to better comprehension that monolithic or purely functional languages 3,25 3 Pamela Bhattacharya and Iulian Neamtiu (2011). Assessing programming language impact on development and maintenance: a study on C and C++. ACM/IEEE International Conference on Software Engineering.
Scott N. Woodfield, Hubert E. Dunsmore, and Vincent Y. Shen (1981). The effect of modularization and comments on program comprehension. ACM/IEEE International Conference on Software Engineering.
Guido Salvaneschi, Sven Amann, Sebastian Proksch, and Mira Mezini (2014). An empirical study on program comprehension with reactive programming. ACM SIGSOFT Foundations of Software Engineering (FSE).
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu (2014). A large scale study of programming languages and code quality in GitHub. ACM SIGSOFT Foundations of Software Engineering (FSE).
Stefan Endrikat, Stefan Hanenberg, Romain Robbes, and Andreas Stefik (2014). How do API documentation and static typing affect API usability?. ACM/IEEE International Conference on Software Engineering.
Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, Andreas Stefik (2013). An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering.
Oscar Callaú, Romain Robbes, Éric Tanter, David Röthlisberger (2013). How (and why) developers use the dynamic features of programming languages: the case of Smalltalk. Empirical Software Engineering.
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu (2014). A large scale study of programming languages and code quality in GitHub. ACM SIGSOFT Foundations of Software Engineering (FSE).
Code editors, development environments, and program comprehension tools can also be helpful. Early evidence showed that simple features like syntax highlighting and careful typographic choices can improve the speed of program comprehension 1 1 Ron Baecker (1988). Enhancing program readability and comprehensibility with tools for program visualization. ACM/IEEE International Conference on Software Engineering.
Amy J. Ko and Brad A. Myers (2009). Finding causes of program output with the Java Whyline. ACM SIGCHI Conference on Human Factors in Computing (CHI).
The path from novice to expert in program comprehension is one that involves understanding programming language semantics exceedingly well and reading a lot of code, design patterns, and architectures. Anticipate that as you develop these skills, it will take you time to build robust understandings of what a program is doing, slowing down your writing, testing, and debugging.
References
-
Ron Baecker (1988). Enhancing program readability and comprehensibility with tools for program visualization. ACM/IEEE International Conference on Software Engineering.
-
Andy Begel, Beth Simon (2008). Novice software developers, all over again. ICER.
-
Pamela Bhattacharya and Iulian Neamtiu (2011). Assessing programming language impact on development and maintenance: a study on C and C++. ACM/IEEE International Conference on Software Engineering.
-
Dave Binkley, Marcia Davis, Dawn Lawrie, Jonathan I. Maletic, Christopher Morrell, Bonita Sharif (2013). The impact of identifier style on effort and comprehension. Empirical Software Engineering.
-
Oscar Callaú, Romain Robbes, Éric Tanter, David Röthlisberger (2013). How (and why) developers use the dynamic features of programming languages: the case of Smalltalk. Empirical Software Engineering.
-
Barthélémy Dagenais, Harold Ossher, Rachel K. E. Bellamy, Martin P. Robillard, and Jacqueline P. de Vries (2010). Moving into a new software project landscape. ACM/IEEE International Conference on Software Engineering.
-
Stefan Endrikat, Stefan Hanenberg, Romain Robbes, and Andreas Stefik (2014). How do API documentation and static typing affect API usability?. ACM/IEEE International Conference on Software Engineering.
-
Scott D. Fleming, Christopher Scaffidi, David Piorkowski, Margaret M. Burnett, Rachel K. E. Bellamy (2013). An information foraging theory perspective on tools for debugging, refactoring, and reuse tasks. ACM Transactions on Software Engineering and Methodology (TOSEM).
-
Thomas R. G. Green (1989). Cognitive dimensions of notations. People and computers.
-
Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, Andreas Stefik (2013). An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering.
-
Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.
-
Amy J. Ko and Brad A. Myers (2009). Finding causes of program output with the Java Whyline. ACM SIGCHI Conference on Human Factors in Computing (CHI).
-
Thomas D. LaToza, David Garlan, James D. Herbsleb, and Brad A. Myers (2007). Program comprehension as fact finding. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Thomas D. LaToza, Brad A. Myers (2010). Developers ask reachability questions. ACM/IEEE International Conference on Software Engineering.
-
Lawrie, D., Morrell, C., Feild, H., & Binkley, D (2006). What's in a name? A study of identifiers. IEEE International Conference on Program Comprehension (ICPC).
-
Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke (2014). On the comprehension of program comprehension. ACM Transactions on Software Engineering and Methodology (TOSEM).
-
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann (2011). Design lessons from the fastest Q&A site in the west. ACM SIGCHI Conference on Human Factors in Computing (CHI).
-
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu (2014). A large scale study of programming languages and code quality in GitHub. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Tobias Roehm, Rebecca Tiarks, Rainer Koschke, and Walid Maalej (2012). How do professional developers comprehend software?. ACM/IEEE International Conference on Software Engineering.
-
Guido Salvaneschi, Sven Amann, Sebastian Proksch, and Mira Mezini (2014). An empirical study on program comprehension with reactive programming. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Shrestha, N., Botta, C., Barik, T., & Parnin, C (2020). Here we go again: why is it difficult for developers to learn another programming language?. ACM/IEEE International Conference on Software Engineering.
-
Jonathan Sillito, Gail C. Murphy, and Kris De Volder (2006). Questions programmers ask during software evolution tasks. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Anneliese von Mayrhauser A. Marie Vans (1994). Comprehension processes during large scale maintenance. ACM/IEEE International Conference on Software Engineering.
-
Mark Weiser (1981). Program slicing. ACM/IEEE International Conference on Software Engineering.
-
Scott N. Woodfield, Hubert E. Dunsmore, and Vincent Y. Shen (1981). The effect of modularization and comments on program comprehension. ACM/IEEE International Conference on Software Engineering.