History
Computers haven’t been around for long. If you read one of the many histories of computing and information, such as James Gleick’s The Information 4 4 James Gleick (2011). The Information: A History, A Theory, A Flood. Pantheon Books.
Grudin, Jonathan (2017). From Tool to Partner: The Evolution of Human-Computer Interaction. Source.
Margot Lee Shetterly (2017). Hidden figures: the American dream and the untold story of the Black women mathematicians who helped win the space race. HarperCollins Nordic.
Because programming required such painstaking planning in machine code and computers were slow, most programs were not that complex. Their value was in calculating things faster than a person could do by hand, which meant thousands of calculations in a minute rather than one calculation in a minute. Computer programmers were not solving problems that had no solutions yet; they were translating existing solutions (for example, a quadratic formula) into machine instructions. Their power wasn’t in creating new realities or facilitating new tasks, it was accelerating old tasks.
The birth of software engineering, therefore, did not come until programmers started solving problems that didn’t have existing solutions, or were new ideas entirely. Most of these were done in academic contexts to develop things like basic operating systems and methods of input and output. These were complex projects, but as research, they didn’t need to scale; they just needed to work. It wasn’t until the late 1960s when the first truly large software projects were attempted commercially, and software had to actually perform.
The IBM 360 operating system was one of the first big projects of this kind. Suddenly, there were multiple people working on multiple components, all which interacted with one another. Each part of the program needed to coordinate with the others, which usually meant that each part’s authors needed to coordinate, and the term software engineering was born. Programmers and academics from around the world, especially those who were working on big projects, created conferences so they could meet and discuss their challenges. In the first software engineering conference in 1968, attendees speculated about why projects were shipping late, why they were over budget, and what they could do about it. There was a word for the phrase, and many questions, but few answers.
At the time, one of the key people behind pursuing these answers was Margaret Hamilton , a computer scientist who was Director of the Software Engineering Division of the MIT Instrumentation Laboratory. One of the lab’s key projects in the late 1960’s was developing the on-board flight software for the Apollo space program. Hamilton led the development of error detection and recovery, the information displays, the lunar lander, and many other critical components, while managing a team of other computer scientists who helped. It was as part of this project that many of the central problems in software engineering began to emerge, including verification of code, coordination of teams, and managing versions. This led to one of her passions, which was giving software legitimacy as a form of engineering—at the time, it was viewed as routine, uninteresting, and simple work. Her leadership in the field established the field as a core part of systems engineering.
The first conference, the IBM 360 project, and Hamilton’s experiences on the Apollo mission identified many problems that had no clear solutions:
- When you’re solving a problem that doesn’t yet have a solution, what is a good process for building a solution?
- When software does so many different things, how can you know software “works”?
- How can you make progress when no one on the team understands every part of the program?
- When people leave a project, how do you ensure their replacement has all of the knowledge they had?
- When no one understands every part of the program, how do you diagnose defects?
- When people are working in parallel, how do you prevent them from clobbering each other’s work?
- If software engineering is about more than coding, what skills does a good coder need to have?
- What kinds of tools and languages can accelerate a programmers work and help them prevent mistakes?
- How can projects not lose sight of the immense complexity of human needs, values, ethics, and policy that interact with engineering decisions?
As it became clear that software was not an incremental change in technology, but a profoundly disruptive one, countless communities began to explore these questions in research and practice. Black American entrepreneurs began to explore how to use software to connect and build community well before the internet was ubiquitous, creating some of the first web-scale online communities and forging careers at IBM, ultimately to be suppressed by racism in the workplace and society 9 9 Charlton D. McIlwain (2019). Black software: the internet and racial justice, from the AfroNet to Black Lives Matter. Oxford University Press.
Kenney, M (2000). Understanding Silicon Valley: The anatomy of an entrepreneurial region. Stanford University Press.
Michael Metcalf (2002). History of Fortran. ACM SIGPLAN Fortran Forum.
John McCarthy (1978). History of LISP. History of Programming Languages I.
Bjarn Stroustrup, B (1996). A history of C++: 1979--1991. History of programming languages II.
Alan C. Kay (1996). The early history of Smalltalk. History of programming languages II.
Janet Abbate (2012). Recoding gender: women's changing participation in computing. MIT Press.
While technical progress has been swift, progress on the human aspects of software engineering, have been more difficult to understand and improve. One of the seminal books on these issues was Fred P. Brooks, Jr.’s The Mythical Man Month 3 3 Fred P. Brooks (1995). The mythical man month. Pearson Education.
Ruha Benjamin (2019). Race after technology: Abolitionist tools for the New Jim Code. Polity Books.
If we step even further beyond software engineering as an activity and think more broadly about the role that software is playing in society today, there are also other, newer questions that we’ve only begun to answer. If every part of society now runs on code, what responsibility do software engineers have to ensure that code is right? What responsibility do software engineers have to avoid algorithmic bias? If our cars are to soon drive us around, who’s responsible for the first death: the car, the driver, or the software engineers who built it, or the company that sold it? These ethical questions are in some ways the future of software engineering, likely to shape its regulatory context, its processes, and its responsibilities.
There are also economic roles that software plays in society that it didn’t before. Around the world, software is a major source of job growth, but also a major source of automation, eliminating jobs that people used to do. These larger forces that software is playing on the world demand that software engineers have a stronger understanding of the roles that software plays in society, as the decisions that engineers make can have profoundly impactful unintended consequences.
We’re nowhere close to having deep answers about these questions, neither the old ones or the new ones. We know a lot about programming languages and a lot about testing. These are areas amenable to automation and so computer science has rapidly improved and accelerated these parts of software engineering. The rest of it, as we shall see, has not made much progress. In this book, we’ll discuss what we know and the much larger space of what we don’t.
References
-
Janet Abbate (2012). Recoding gender: women's changing participation in computing. MIT Press.
-
Ruha Benjamin (2019). Race after technology: Abolitionist tools for the New Jim Code. Polity Books.
-
Fred P. Brooks (1995). The mythical man month. Pearson Education.
-
James Gleick (2011). The Information: A History, A Theory, A Flood. Pantheon Books.
-
Grudin, Jonathan (2017). From Tool to Partner: The Evolution of Human-Computer Interaction. Source.
-
Alan C. Kay (1996). The early history of Smalltalk. History of programming languages II.
-
Kenney, M (2000). Understanding Silicon Valley: The anatomy of an entrepreneurial region. Stanford University Press.
-
John McCarthy (1978). History of LISP. History of Programming Languages I.
-
Charlton D. McIlwain (2019). Black software: the internet and racial justice, from the AfroNet to Black Lives Matter. Oxford University Press.
-
Michael Metcalf (2002). History of Fortran. ACM SIGPLAN Fortran Forum.
-
Margot Lee Shetterly (2017). Hidden figures: the American dream and the untold story of the Black women mathematicians who helped win the space race. HarperCollins Nordic.
-
Bjarn Stroustrup, B (1996). A history of C++: 1979--1991. History of programming languages II.
Organizations
The photo above is a candid shot of some of the software engineers of AnswerDash , a company I co-founded in 2012, that was later acquired in 2020. There are a few things to notice in the photograph. First, you see one of the employees explaining something, while others are diligently working off to the side. It’s not a huge team; just a few engineers, plus several employees in other parts of the organization in another room. This, as simple as it looks, is pretty much what all software engineering work looks like. Some organizations have one of these teams; others have thousands.
What you can’t see is just how much complexity underlies this work. You can’t see the organizational structures that exist to manage this complexity. Inside this room and the rooms around it were processes, standards, reviews, workflows, managers, values, culture, decision making, analytics, marketing, sales. And at the center of it were people executing all of these things as well as they could to achieve the organization’s goal.
Organizations are a much bigger topic than I could possibly address here. To deeply understand them, you’d need to learn about organizational studies , organizational behavior , information systems , and business in general.
The subset of this knowledge that’s critical to understand about software engineering is limited to a few important concepts. The first and most important concept is that even in software organizations, the point of the company is rarely to make software; it’s to provide value 8 8 Alexander Osterwalder, Yves Pigneur, Gregory Bernarda, Alan Smith (2015). Value proposition design: how to create products and services customers want. John Wiley & Sons.
The individuals in a software organization take on different roles to achieve that value. These roles are sometimes spread across different people and sometimes bundled up into one person, depending on how the organization is structured, but the roles are always there. Let’s go through each one in detail so you understand how software engineers relate to each role.
- Marketers look for opportunities to provide value. In for-profit businesses, this might mean conducting market research, estimating the size of opportunities, identifying audiences, and getting those audiences attention. Non-profits need to do this work as well in order to get their solutions to people, but may be driven more by solving problems than making money.
- Product managers decide what value the product will provide, monitoring the marketplace and prioritizing work.
- Designers decide how software will provide value. This isn’t about code or really even about software; it’s about envisioning solutions to problems that people have.
- Software engineers write code with other engineers to implement requirements envisioned by designers. If they fail to meet requirements, the design won’t be implemented correctly, which will prevent the software from providing value.
- Sales takes the product that’s been built and try to sell it to the audiences that marketers have identified. They also try to refine an organization’s understanding of what the customer wants and needs, providing feedback to marketing, product, and design, which engineers then address.
- Support helps the people using the product to use it successfully and, like sales, provides feedback to product, design, and engineering about the product’s value (or lack thereof) and its defects.
As I noted above, sometimes the roles above get merged into individuals. When I was CTO at AnswerDash, I had software engineering roles, design roles, product roles, sales roles, and support roles. This was partly because it was a small company when I was there. As organizations grow, these roles tend to be divided into smaller pieces. This division often means that different parts of the organization don’t share knowledge, even when it would be advantageous 3 3 Parmit K. Chilana, Amy J. Ko, Jacob O. Wobbrock, Tovi Grossman, and George Fitzmaurice (2011). Post-deployment usability: a survey of current practices. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Note that in the division of responsibilities above, software engineers really aren’t the designers by default. They don’t decide what product is made or what problems that product solves. They may have opinions—and a great deal of power to enforce their opinions, as the people building the product—but it’s not ultimately their decision.
There are other roles you might be thinking of that I haven’t mentioned:
- Engineering managers exist in all roles when teams get to a certain size, helping to move information from between higher and lower parts of an organization. Even engineering managers are primarily focused on organizing and prioritizing work, and not doing engineering 5 5
Eirini Kalliamvakou, Christian Bird, Thomas Zimmermann, Andrew Begel, Robert DeLine, Daniel M. German (2017). What makes a great manager of software engineers?. IEEE Transactions on Software Engineering.
. Much of their time is also spent ensuring every engineer has what they need to be productive, while also managing coordination and interpersonal conflict between engineers. - Data scientists , although a new role, typically facilitate decision making on the part of any of the roles above 1 1
Andy Begel, Thomas Zimmermann (2014). Analyze this! 145 questions for data scientists in software engineering. ACM/IEEE International Conference on Software Engineering.
. They might help engineers find bugs, marketers analyze data, track sales targets, mine support data, or inform design decisions. They’re experts at using data to accelerate and improve the decisions made by the roles above. - Researchers , also called user researchers, also help people in a software organization make decisions, but usually product decisions, helping marketers, sales, and product managers decide what products to make and who would want them. In many cases, they can complement the work of data scientists, providing qualitative work to triangulate quantitative data .
- Ethics and policy specialists , who might come with backgrounds in law, policy, or social science, might shape terms of service, software licenses, algorithmic bias audits, privacy policy compliance, and processes for engaging with stakeholders affected by the software being engineered. Any company that works with data, especially those that work with data at large scales or in contexts with great potential for harm, hate, and abuse, needs significant expertise to anticipate and prevent harm from engineering and design decisions.
Every decision made in a software team is under uncertainty, and so another important concept in organizations is risk 2 2 Boehm, B. W (1991). Software risk management: principles and practices. IEEE Software.
James Somers (2017). The coming software apocalypse. The Atlantic Monthly.
Open source communities are organizations too. The core activities of design, engineering, and support still exist in these, but how much a community is engaged in marketing and sales depends entirely on the purpose of the community. Big, established open source projects like Mozilla have revenue, buildings, and a CEO, and while they don’t sell anything, they do market. Others like Linux 6 6 Gwendolyn K. Lee, Robert E. Cole (2003). From a firm-based to a community-based model of knowledge creation: The case of the Linux kernel development. Organization science.
Yunwen Ye and Kouichi Kishida (2003). Toward an understanding of the motivation Open Source Software developers. ACM/IEEE International Conference on Software Engineering.
Jailton Coelho and Marco Tulio Valente (2017). Why modern open source projects fail. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Cassandra Overney, Jens Meinicke, Christian Kästner, Bogdan Vasilescu (2020). How to not get rich: an empirical study of donations in open source. ACM/IEEE International Conference on Software Engineering.
All of the above has some important implications for what it means to be a software engineer:
- Engineers are not the only important role in a software organization. In fact, they may be less important to an organization’s success than other roles because the decisions they make (how to implement requirements) have smaller impact on the organization’s goals than other decisions (what to make, who to sell it to, etc.).
- Engineers have to work with a lot of people working with different roles. Learning what those roles are and what shapes their success is important to being a good collaborator 7 7
Paul Luo Li, Amy J. Ko, and Andrew Begel (2017). Cross-disciplinary perspectives on collaborations with software engineers. International Workshop on Cooperative and Human Aspects of Software Engineering.
. - While engineers might have many great ideas for product, if they really want to shape what they’re building, they should be in a product role, not an engineering role.
All that said, without engineers, products wouldn’t exist. They ensure that every detail about a product reflects the best knowledge of the people in their organization, and so attention to detail is paramount. In future chapters, we’ll discuss all of the ways that software engineers manage this detail, mitigating the burden on their memories with tools and processes.
References
-
Andy Begel, Thomas Zimmermann (2014). Analyze this! 145 questions for data scientists in software engineering. ACM/IEEE International Conference on Software Engineering.
-
Boehm, B. W (1991). Software risk management: principles and practices. IEEE Software.
-
Parmit K. Chilana, Amy J. Ko, Jacob O. Wobbrock, Tovi Grossman, and George Fitzmaurice (2011). Post-deployment usability: a survey of current practices. ACM SIGCHI Conference on Human Factors in Computing (CHI).
-
Jailton Coelho and Marco Tulio Valente (2017). Why modern open source projects fail. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Eirini Kalliamvakou, Christian Bird, Thomas Zimmermann, Andrew Begel, Robert DeLine, Daniel M. German (2017). What makes a great manager of software engineers?. IEEE Transactions on Software Engineering.
-
Gwendolyn K. Lee, Robert E. Cole (2003). From a firm-based to a community-based model of knowledge creation: The case of the Linux kernel development. Organization science.
-
Paul Luo Li, Amy J. Ko, and Andrew Begel (2017). Cross-disciplinary perspectives on collaborations with software engineers. International Workshop on Cooperative and Human Aspects of Software Engineering.
-
Alexander Osterwalder, Yves Pigneur, Gregory Bernarda, Alan Smith (2015). Value proposition design: how to create products and services customers want. John Wiley & Sons.
-
Cassandra Overney, Jens Meinicke, Christian Kästner, Bogdan Vasilescu (2020). How to not get rich: an empirical study of donations in open source. ACM/IEEE International Conference on Software Engineering.
-
James Somers (2017). The coming software apocalypse. The Atlantic Monthly.
-
Yunwen Ye and Kouichi Kishida (2003). Toward an understanding of the motivation Open Source Software developers. ACM/IEEE International Conference on Software Engineering.
Communication
Because software engineering often times distributes work across multiple people, a fundamental challenge in software engineering is ensuring that everyone on a team has the same understanding of what is being built and why. In the seminal book The Mythical Man Month , Fred Brooks argued that good software needs to have conceptual integrity , both in how it is designed, but also how it is implemented 5 5 Fred P. Brooks (1995). The mythical man month. Pearson Education.
The solution is effective communication. As some events in industry have shown, communication requires empathy and teamwork. When communication is poor, teams become disconnected and produce software defects 4 4 Nicolas Bettenburg, Ahmed E. Hassan (2013). Studying the impact of social interactions on software quality. Empirical Software Engineering.
It turns out, however, that communication plays such a powerful role in software projects that it even shapes how projects unfold. Perhaps the most notable theory about the effect of communication is Conway’s Law 6 6 Melvin E. Conway (1968). How do committees invent. Datamation.
Minghui Zhou and Audris Mockus (2011). Does the initial environment impact the future of developers?. ACM/IEEE International Conference on Software Engineering.
Because communication is so central, software engineers are constantly seeking information to further their work, going to their coworkers’ desks, emailing them, chatting via messaging platforms, and even using social media 8 8 Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.
Andrew Begel, Yit Phang Khoo, and Thomas Zimmermann (2010). Codebook: discovering and exploiting relationships in software repositories. ACM/IEEE International Conference on Software Engineering.
Audris Mockus and James D. Herbsleb (2002). Expertise browser: a quantitative approach to identifying expertise. ACM/IEEE International Conference on Software Engineering.
Communication is not always effective. In fact, there are many kinds of communication that are highly problematic in software engineering teams. For example, Perlow 13 13 Leslie A. Perlow (1999). The time famine: Toward a sociology of work time. Administrative science quarterly.
Gloria Mark, Daniela Gudith, and Ulrich Klocke (2008). The cost of interrupted work: more speed and stress. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Communication isn’t just about transmitting information; it’s also about relationships and identity. For example, the dominant culture of many software engineering work environments—and even the perceived culture—is one that can deter many people from even pursuing careers in computer science. Modern work environments are still dominated by men, who speak loudly, out of turn, and disrespectfully, with sometimes even sexual harassment 16 16 Jennifer Wang' (2016). Female pursuit of Computer Science with Jennifer Wang. Software Engineering Daily Podcast.
Alicia Nicki Washington (2020). When twice as good isn't enough: the case for cultural competence in computing. ACM Technical Symposium on Computer Science Education.
Paul Luo Li, Amy J. Ko, and Andrew Begel (2017). Cross-disciplinary perspectives on collaborations with software engineers. International Workshop on Cooperative and Human Aspects of Software Engineering.
When communication is effective, it still takes time. One of the key strategies for reducing the amount of communication necessary is knowledge sharing tools, which broadly refers to any information system that stores facts that developers would normally have to retrieve from a person. By storing them in a database and making them easy to search, teams can avoid interruptions. The most common knowledge sharing tools in software teams are issue trackers, which are often at the center of communication not only between developers, but also with every other part of a software organization 3 3 Dane Bertram, Amy Voida, Saul Greenberg, and Robert Walker (2010). Communication, collaboration, and bugs: the social nature of issue tracking in small, collocated teams. ACM Conference on Computer Supported Cooperative Work (CSCW).
Christoph Treude and Margaret-Anne Storey (2011). Effective communication of software development knowledge through community portals. ACM SIGSOFT Foundations of Software Engineering (FSE).
Jeff Atwood (2016). The state of programming with Stack Overflow co-founder Jeff Atwood. Software Engineering Daily Podcast.
Anna May, Johannes Wachs, Anikó Hannák (2019). Gender differences in participation and reward on Stack Overflow. Empirical Software Engineering.
Because all of this knowledge is so critical to progress, when developers leave an organization and haven’t archived their knowledge somewhere, it can be quite disruptive to progress. Organizations often have single points of failure, in which a single developer may be critical to a team’s ability to maintain and enhance a software product 14 14 Peter C. Rigby, Yue Cai Zhu, Samuel M. Donadelli, and Audris Mockus (2016). Quantifying and mitigating turnover-induced knowledge loss: case studies of chrome and a project at Avaya. ACM/IEEE International Conference on Software Engineering.
Matthieu Foucault, Marc Palyart, Xavier Blanc, Gail C. Murphy, and Jean-Rémy Falleri (2015). Impact of developer turnover on quality in open-source software. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
What does all of this mean for you as an individual developer? To put it simply, don’t underestimate the importance of talking. Know who you need to talk to, talk to them frequently, and to the extent that you can, write down what you know both to lessen the demand for talking and mitigate the risk of you not being available, but also to make your knowledge more precise and accessible in the future. It often takes decades for engineers to excel at communication. The very fact that you know why communication is important gives you an critical head start.
References
-
Jeff Atwood (2016). The state of programming with Stack Overflow co-founder Jeff Atwood. Software Engineering Daily Podcast.
-
Andrew Begel, Yit Phang Khoo, and Thomas Zimmermann (2010). Codebook: discovering and exploiting relationships in software repositories. ACM/IEEE International Conference on Software Engineering.
-
Dane Bertram, Amy Voida, Saul Greenberg, and Robert Walker (2010). Communication, collaboration, and bugs: the social nature of issue tracking in small, collocated teams. ACM Conference on Computer Supported Cooperative Work (CSCW).
-
Nicolas Bettenburg, Ahmed E. Hassan (2013). Studying the impact of social interactions on software quality. Empirical Software Engineering.
-
Fred P. Brooks (1995). The mythical man month. Pearson Education.
-
Melvin E. Conway (1968). How do committees invent. Datamation.
-
Matthieu Foucault, Marc Palyart, Xavier Blanc, Gail C. Murphy, and Jean-Rémy Falleri (2015). Impact of developer turnover on quality in open-source software. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.
-
Paul Luo Li, Amy J. Ko, and Andrew Begel (2017). Cross-disciplinary perspectives on collaborations with software engineers. International Workshop on Cooperative and Human Aspects of Software Engineering.
-
Gloria Mark, Daniela Gudith, and Ulrich Klocke (2008). The cost of interrupted work: more speed and stress. ACM SIGCHI Conference on Human Factors in Computing (CHI).
-
Anna May, Johannes Wachs, Anikó Hannák (2019). Gender differences in participation and reward on Stack Overflow. Empirical Software Engineering.
-
Audris Mockus and James D. Herbsleb (2002). Expertise browser: a quantitative approach to identifying expertise. ACM/IEEE International Conference on Software Engineering.
-
Leslie A. Perlow (1999). The time famine: Toward a sociology of work time. Administrative science quarterly.
-
Peter C. Rigby, Yue Cai Zhu, Samuel M. Donadelli, and Audris Mockus (2016). Quantifying and mitigating turnover-induced knowledge loss: case studies of chrome and a project at Avaya. ACM/IEEE International Conference on Software Engineering.
-
Christoph Treude and Margaret-Anne Storey (2011). Effective communication of software development knowledge through community portals. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Jennifer Wang' (2016). Female pursuit of Computer Science with Jennifer Wang. Software Engineering Daily Podcast.
-
Alicia Nicki Washington (2020). When twice as good isn't enough: the case for cultural competence in computing. ACM Technical Symposium on Computer Science Education.
-
Minghui Zhou and Audris Mockus (2011). Does the initial environment impact the future of developers?. ACM/IEEE International Conference on Software Engineering.
Productivity
When we think of productivity, we usually have a vague concept of a rate of work per unit time. Where it gets tricky is in defining “work”. On an individual level, work can be easier to define, because developers often have specific concrete tasks that they’re assigned. But until they’re not, it’s not really easy to define progress (well, it’s not that easy to define “done” sometimes either, but that’s a topic for a later chapter). When you start considering work at the scale of a team or an organization, productivity gets even harder to define, since an individual’s productivity might be increased by ignoring every critical request from a teammate, harming the team’s overall productivity.
Despite the challenge in defining productivity, there are numerous factors that affect productivity. For example, at the individual level, having the right tools can result in an order of magnitude difference in speed at accomplishing a task. One study I ran found that developers using the Eclipse IDE spent a third of their time just physically navigating between source files 8 8 Amy J. Ko, Htet Htet Aung, Brad A. Myers (2005). Eliciting design requirements for maintenance-oriented IDEs: a detailed ttudy of corrective and perfective maintenance tasks. ACM/IEEE International Conference on Software Engineering.
Mik Kersten and Gail C. Murphy (2006). Using task context to improve programmer productivity. ACM SIGSOFT Foundations of Software Engineering (FSE).
Of course, individual productivity is about more than just tools. Studies of workplace productivity show that developers have highly fragmented days, interrupted by meetings, emails, coding, and non-work distractions 13 13 André N. Meyer, Laura E. Barton, Gail C. Murphy, Thomas Zimmermann, Thomas Fritz (2017). The work life of developers: Activities, switches and perceived productivity. IEEE Transactions on Software Engineering.
Ben Northup (2016). Reflections of an old programmer. Software Engineering Daily Podcast.
Sebastian Baltes, Stephan Diehl (2018). Towards a theory of software development expertise. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Ammon Bartram (2016). Hiring engineers with Ammon Bartram. Software Engineering Daily Podcast.
That said, productivity is not just about individual developers. Because communication is a key part of team productivity, an individual’s productivity is as much determined by their ability to collaborate and communicate with other developers. In a study spanning dozens of interviews with senior software engineers, Li et al. found that the majority of critical attributes for software engineering skill (productivity included) concerned their interpersonal skills, their communication skills, and their ability to be resourceful within their organization 10 10 Paul Luo Li, Amy J. Ko, and Jiamin Zhu (2015). What makes a great software engineer?. ACM/IEEE International Conference on Software Engineering.
Thomas D. LaToza, Gina Venolia, and Robert DeLine (2006). Maintaining mental models: a study of developer work habits. ACM/IEEE International Conference on Software Engineering.
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann (2011). Design lessons from the fastest Q&A site in the west. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Anton Barua, Stephen W. Thomas & Ahmed E. Hassan (2014). What are developers talking about? an analysis of topics and trends in Stack Overflow. Empirical Software Engineering.
Andrew Meneely, Pete Rotella, and Laurie Williams (2011). Does adding manpower also affect quality? An empirical, longitudinal analysis. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Another dimension of productivity is learning. Great engineers are resourceful, quick learners 10 10 Paul Luo Li, Amy J. Ko, and Jiamin Zhu (2015). What makes a great software engineer?. ACM/IEEE International Conference on Software Engineering.
Andy Begel, Beth Simon (2008). Novice software developers, all over again. ICER.
Leif Singer, Fernando Figueira Filho, and Margaret-Anne Storey (2014). Software engineering at the speed of light: how developers stay current using Twitter. ACM/IEEE International Conference on Software Engineering.
Xin Xia, Lingfeng Bao, David Lo, Pavneet Singh Kochhar, Ahmed E. Hassan, Zhenchang Xing (2017). What do developers search for on the web?. Empirical Software Engineering.
Unfortunately, learning is no easy task. One of my earliest studies as a researcher investigated the barriers to learning new programming languages and systems, finding six distinct types of content that are challenging 7 7 Amy J. Ko, Brad A. Myers, Htet Htet Aung (2004). Six learning barriers in end-user programming systems. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).
Aside from individual and team factors, productivity is also influenced by the particular features of a project’s code, how the project is managed, or the environment and organizational culture in which developers work 5,17 5 Tom DeMarco and Tim Lister (1985). Programmer performance and the effects of the workplace. ACM/IEEE International Conference on Software Engineering.
J. Vosburgh, B. Curtis, R. Wolverton, B. Albert, H. Malec, S. Hoben, and Y. Liu (1984). Productivity factors and programming environments. ACM/IEEE International Conference on Software Engineering.
A different way to think about productivity is to consider it from a “waste” perspective, in which waste is defined as any activity that does not contribute to a product’s value to users or customers. Sedano et al. investigated this view across two years and eight software development projects in a software development consultancy 15 15 Todd Sedano, Paul Ralph, Cécile Péraire (2017). Software development waste. ACM/IEEE International Conference on Software Engineering.
- Building the wrong feature or product . The cost of building a feature or product that does not address user or business needs.
- Mismanaging the backlog . The cost of duplicating work, expediting lower value user features, or delaying necessary bug fixes.
- Rework . The cost of altering delivered work that should have been done correctly but was not.
- Unnecessarily complex solutions . The cost of creating a more complicated solution than necessary, a missed opportunity to simplify features, user interface, or code.
- Extraneous cognitive load . The costs of unneeded expenditure of mental energy, such as poorly written code, context switching, confusing APIs, or technical debt.
- Psychological distress . The costs of burdening the team with unhelpful stress arising from low morale, pace, or interpersonal conflict.
- Waiting/multitasking . The cost of idle time, often hidden by multi-tasking, due to slow tests, missing information, or context switching.
- Knowledge loss . The cost of re-acquiring information that the team once knew.
- Ineffective communication . The cost of incomplete, incorrect, misleading, inefficient, or absent communication.
One could imagine using these concepts to refine processes and practices in a team, helping both developers and managers be more aware of sources of waste that harm productivity.
Of course, productivity is not only shaped by professional and organizational factors, but personal ones as well. Consider, for example, an engineer that has friends, wealth, health care, health, stable housing, sufficient pay, and safety: they likely have everything they need to bring their full attention to their work. In contrast, imagine an engineer that is isolated, has immense debt, has no health care, has a chronic disease like diabetes, is being displaced from an apartment by gentrification, has lower pay than their peers, or does not feel safe in public. Any one of these factors might limit an engineer’s ability to be productive at work; some people might experience multiple, or even all of these factors, especially if they are a person of color in the United States, who has faced a lifetime of racist inequities in school, health care, and housing. Because of the potential for such inequities to influence someone’s ability to work, managers and organizations need to make space for surfacing these inequities at work, so that teams can acknowledge them, plan around them, and ideally address them through targeted supports. Anything less tends to make engineers feel unsupported, which will only decrease their motivation to contribute to a team. These widely varying conceptions of productivity reveal that programming in a software engineering context is about far more than just writing a lot of code. It’s about coordinating productively with a team, synchronizing your work with an organizations goals, and most importantly, reflecting on ways to change work to achieve those goals more effectively.
References
-
Sebastian Baltes, Stephan Diehl (2018). Towards a theory of software development expertise. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Ammon Bartram (2016). Hiring engineers with Ammon Bartram. Software Engineering Daily Podcast.
-
Anton Barua, Stephen W. Thomas & Ahmed E. Hassan (2014). What are developers talking about? an analysis of topics and trends in Stack Overflow. Empirical Software Engineering.
-
Andy Begel, Beth Simon (2008). Novice software developers, all over again. ICER.
-
Tom DeMarco and Tim Lister (1985). Programmer performance and the effects of the workplace. ACM/IEEE International Conference on Software Engineering.
-
Mik Kersten and Gail C. Murphy (2006). Using task context to improve programmer productivity. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Amy J. Ko, Brad A. Myers, Htet Htet Aung (2004). Six learning barriers in end-user programming systems. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).
-
Amy J. Ko, Htet Htet Aung, Brad A. Myers (2005). Eliciting design requirements for maintenance-oriented IDEs: a detailed ttudy of corrective and perfective maintenance tasks. ACM/IEEE International Conference on Software Engineering.
-
Thomas D. LaToza, Gina Venolia, and Robert DeLine (2006). Maintaining mental models: a study of developer work habits. ACM/IEEE International Conference on Software Engineering.
-
Paul Luo Li, Amy J. Ko, and Jiamin Zhu (2015). What makes a great software engineer?. ACM/IEEE International Conference on Software Engineering.
-
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann (2011). Design lessons from the fastest Q&A site in the west. ACM SIGCHI Conference on Human Factors in Computing (CHI).
-
Andrew Meneely, Pete Rotella, and Laurie Williams (2011). Does adding manpower also affect quality? An empirical, longitudinal analysis. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
André N. Meyer, Laura E. Barton, Gail C. Murphy, Thomas Zimmermann, Thomas Fritz (2017). The work life of developers: Activities, switches and perceived productivity. IEEE Transactions on Software Engineering.
-
Ben Northup (2016). Reflections of an old programmer. Software Engineering Daily Podcast.
-
Todd Sedano, Paul Ralph, Cécile Péraire (2017). Software development waste. ACM/IEEE International Conference on Software Engineering.
-
Leif Singer, Fernando Figueira Filho, and Margaret-Anne Storey (2014). Software engineering at the speed of light: how developers stay current using Twitter. ACM/IEEE International Conference on Software Engineering.
-
J. Vosburgh, B. Curtis, R. Wolverton, B. Albert, H. Malec, S. Hoben, and Y. Liu (1984). Productivity factors and programming environments. ACM/IEEE International Conference on Software Engineering.
-
Xin Xia, Lingfeng Bao, David Lo, Pavneet Singh Kochhar, Ahmed E. Hassan, Zhenchang Xing (2017). What do developers search for on the web?. Empirical Software Engineering.
Quality
There are numerous ways a software project can fail: projects can be over budget, they can ship late, they can fail to be useful, or they can simply not be useful enough. Evidence clearly shows that success is highly contextual and stakeholder-dependent: success might be financial, social, physical and even emotional, suggesting that software engineering success is a multifaceted variable that cannot be explained simply by user satisfaction, profitability or meeting requirements, budgets and schedules 6 6 Paul Ralph and Paul Kelly (2014). The dimensions of software engineering success. ACM/IEEE International Conference on Software Engineering.
One of the central reasons for this is that there are many distinct software qualities that software can have and depending on the stakeholders, each of these qualities might have more or less importance. For example, a safety critical system such as flight automation software should be reliable and defect-free, but it’s okay if it’s not particularly learnable—that’s what training is for. A video game, however, should probably be fun and learnable, but it’s fine if it ships with a few defects, as long as they don’t interfere with fun 4 4 Emerson Murphy-Hill, Thomas Zimmermann, and Nachiappan Nagappan (2014). Cowboys, ankle sprains, and keepers of quality: how is video game development different from software development?. ACM/IEEE International Conference on Software Engineering.
There are a surprisingly large number of software qualities 1 1 Barry W. Boehm (1976). Software engineering. IEEE Transactions on Computers.
- Correctness is the extent to which a program behaves according to its specification. If specifications are ambiguous, correctness is ambiguous. However, even if a specification is perfectly unambiguous, it might still fail to meet other qualities (e.g., a web site may be built as intended, but still be slow, unusable, and useless.)
- Reliability is the extent to which a program behaves the same way over time in the same operating environment. For example, if your online banking app works most of the time, but crashes sometimes, it’s not particularly reliable.
- Robustness is the extent to which a program can recover from errors or unexpected input. For example, a login form that crashes if an email is formatted improperly isn’t very robust. A login form that handles any text input is optimally robust. One can make a system more robust by breadth of errors and inputs it can handle in a reasonable way.
- Performance is the extent to which a program uses computing resources economically. Synonymous with “fast” and “zippy”. Performance is directly determined by how many instructions a program has to execute to accomplish its operations, but it is difficult to measure because operations, inputs, and the operating environment can vary widely.
- Portability is the extent to which an implementation can run on different platforms without being modified. For example, “universal” applications in the Apple ecosystem that can run on iPhones, iPads, and Mac OS without being modified or recompiled are highly portable.
- Interoperability is the extent to which a system can seamlessly interact with other systems, typically through the use of standards. For example, some software systems use entirely proprietary and secret data formats and communication protocols. These are less interoperable than systems that use industry-wide standards.
- Security is the extent to which only authorized individuals can access a software system’s data and computation.
Whereas the above qualities are concerned with how software behaves technically according to specifications, some qualities concern properties of how developers interact with code:
- Verifiability is the effort required to verify that software does what it is intended to do. For example, it is hard to verify a safety critical system without either proving it correct or testing it in a safety-critical context (which isn’t safe). Take driverless cars, for example: for Google to test their software, they’ve had to set up thousands of paid drivers to monitor and report problems on the road. In contrast, verifying that a simple static HTML web page works correctly is as simple as opening it in a browser.
- Maintainability is the effort required to correct, adapt, or perfect software. This depends mostly on how comprehensible and modular an implementation is.
- Reusability is the effort required to use a program’s components for purposes other than those for which it was originally designed. APIs are reusable by definition, whereas black box embedded software (like the software built into a car’s traction systems) is not.
Other qualities are concerned with the use of the software in the world by people:
- Learnability is the ease with which a person can learn to operate software. Learnability is multi-dimensional and can be difficult to measure, including aspects of usability, expectations of prior knowledge, reliance on conventions, error proneness, and task alignment 2 2
Tovi Grossman, George Fitzmaurice, Ramtin Attar (2009). A survey of software learnability: metrics, methodologies and guidelines. ACM SIGCHI Conference on Human Factors in Computing (CHI).
. - User efficiency is the speed with which a person can perform tasks with a program. For example, think about the speed with which you can navigate back to the table of contents of this book. Obviously, because most software supports many tasks, user efficiency isn’t a single property of software, but one that varies depending on the task.
- Accessibility is the extent to which people with varying cognitive and motor abilities can operate the software as intended. For example, software that can only be used with a mouse is less accessible than something that can be used with a mouse, keyboard, or speech recognition. Software can be designed for all abilities, and even automatically adapted for individual abilities 7 7
Jacob O. Wobbrock, Shaun K. Kane, Krzysztof Z. Gajos, Susumu Harada, and Jon Froehlich (2011). Ability-based design: Concept, principles and examples. ACM Transactions on Accessible Computing (TACCESS).
. - Privacy is the extent to which a system prevents access to information that intended for a particular audience or use. To achieve privacy, a system must be secure; for example, if anyone could log into your Facebook account, it would be insecure, and thus have poor privacy preservation. However, a secure system is not necessarily private: Facebook works hard on security, but shares immense amounts of private data with third parties, often without informed consent.
- Consistency is the extent to which related functionality in a system leverages the same skills, rather than requiring new skills to learn how to use. For example, in Mac OS, quitting any application requires the same action: command-Q or the Quit menu item in the application menu; this is highly consistent. Other platforms that are less consistent allow applications to have many different ways of quitting applications.
- Usability is an aggregate quality that encompasses all of the qualities above. It is used holistically to refer to all of those factors. Because it is not very precise, it is mostly useful in casual conversation about software, but not as useful in technical conversations about software quality.
- Bias is the extent to which software discriminates or excludes on the basis of some aspect of its user, either directly, or by amplifying or reinforcing discriminatory or exclusionary structures in society. For example, data used to train a classifier might used racially biased data, algorithms might use sexist assumptions about gender, web forms might systematically exclude non-Western names and language, and applications might be only accessible to people who can see or use a mouse. Inaccessibility is a form of bias.
- Usefulness is the extent to which software is of value to its various stakeholders. Utility is often the most important quality because it subsumes all of the other lower-level qualities software can have (e.g., part of what makes a messaging app useful is that it’s performant, user efficient, and reliable). That also makes it less useful as a concept, because it encompasses so many things. That said, usefulness is not always the most important quality. For example, if you can sell a product to a customer and get a one time payment of their money, it might not matter—at least to a for-profit venture—that the product has low usefulness.
Although the lists above are not complete, you might have already noticed some tradeoffs between different qualities. A secure system is necessarily going to be less learnable, because there will be more to learn to operate it. A robust system will likely be less maintainable because it it will likely have more code to account for its diverse operating environments. Because one cannot achieve all software qualities, and achieving each quality takes significant time, it is necessary to prioritize qualities for each project.
These external notions of quality are not the only qualities that matter. For example, developers often view projects as successful if they offer intrinsically rewarding work 5 5 J. Drew Procaccino, June M. Verner, Katherine M. Shelfer, David Gefen (2005). What do software practitioners really think about project success: an exploratory study. Journal of Systems and Software.
Mathieu Lavallee and Pierre N. Robillard (2015). Why good developers write bad code: an observational case study of the impacts of organizational factors on software quality. ACM/IEEE International Conference on Software Engineering.
As I’ve noted before, the person most responsible for isolating developers from these organizational problems, and most responsible for prioritizing software qualities is a product manager.
References
-
Barry W. Boehm (1976). Software engineering. IEEE Transactions on Computers.
-
Tovi Grossman, George Fitzmaurice, Ramtin Attar (2009). A survey of software learnability: metrics, methodologies and guidelines. ACM SIGCHI Conference on Human Factors in Computing (CHI).
-
Mathieu Lavallee and Pierre N. Robillard (2015). Why good developers write bad code: an observational case study of the impacts of organizational factors on software quality. ACM/IEEE International Conference on Software Engineering.
-
Emerson Murphy-Hill, Thomas Zimmermann, and Nachiappan Nagappan (2014). Cowboys, ankle sprains, and keepers of quality: how is video game development different from software development?. ACM/IEEE International Conference on Software Engineering.
-
J. Drew Procaccino, June M. Verner, Katherine M. Shelfer, David Gefen (2005). What do software practitioners really think about project success: an exploratory study. Journal of Systems and Software.
-
Paul Ralph and Paul Kelly (2014). The dimensions of software engineering success. ACM/IEEE International Conference on Software Engineering.
-
Jacob O. Wobbrock, Shaun K. Kane, Krzysztof Z. Gajos, Susumu Harada, and Jon Froehlich (2011). Ability-based design: Concept, principles and examples. ACM Transactions on Accessible Computing (TACCESS).
Requirements
Once you have a problem, a solution, and a design specification, it’s entirely reasonable to start thinking about code. What libraries should we use? What platform is best? Who will build what? After all, there’s no better way to test the feasibility of an idea than to build it, deploy it, and find out if it works. Right?
It depends. This mentality towards product design works fine if building and deploying something is cheap and getting feedback has no consequences. Simple consumer applications often benefit from this simplicity, especially early stage ones, because there’s little to lose. For example, if you are starting a company, and do not even know if there is a market opportuniity yet, it may be worth quickly prototyping an idea, seeing if there’s interest, and then later thinking about how to carefully architect a product that meets that opportunity. This is how products such as Facebook started , with a poorly implemented prototype that revealed an opportunity, which was only later translated into a functional, reliable software service.
However, what if prototyping a beta isn’t cheap to build? What if your product only has one shot at adoption? What if you’re building something for a client and they want to define success? Worse yet, what if your product could kill people if it’s not built properly? Consider the U.S. HealthCare.gov launch , for example, which was lambasted for its countless defects and poor scalability at launch, only working for 1,100 simultaneous users, when 50,000 were expected and 250,000 actually arrived. To prevent disastrous launches like this, software teams have to be more careful about translating a design specification into a specific explicit set of goals that must be satisfied in order for the implementation to be complete. We call these goals requirements and we call this process requirements engineering 7 7 Ian Sommerville, Pete Sawyer (1997). Requirements engineering: a good practice guide. John Wiley & Sons, Inc.
In principle, requirements are a relatively simple concept. They are simply statements of what must be true about a system to make the system acceptable. For example, suppose you were designing an interactive mobile game. You might want to write the requirement The frame rate must never drop below 60 frames per second. This could be important for any number of reasons: the game may rely on interactive speeds, your company’s reputation may be for high fidelity graphics, or perhaps that high frame rate is key to creating a sense of realism. Or, imagine your game company has a reputation for high performance, high fidelity graphics, high frame rate graphics, and achieving any less would erode your company’s brand. Whatever the reasons, expressing it as a requirement makes it explicit that any version of the software that doesn’t meet that requirement is unacceptable, and sets a clear goal for engineering to meet.
The general idea of writing down requirements is actually a controversial one. Why not just discover what a system needs to do incrementally, through testing, user feedback, and other methods? Some of the original arguments for writing down requirements actually acknowledged that software is necessarily built incrementally, but that it is nevertheless useful to write down requirements from the outset 6 6 David L Parnas, Paul C. Clements (1986). A rational design process: How and why to fake it. IEEE Transactions on Software Engineering.
Do you really have to plan by writing down requirements? For example, why not do what designers do, expressing requirements in the form of prototypes and mockups. These implicitly state requirements, because they suggest what the software is supposed to do without saying it directly. But for some types of requirements, they actually imply nothing. For example, how responsive should a web page be? A prototype doesn’t really say; an explicit requirement of an average page load time of less than 1 second is quite explicit. Requirements can therefore be thought of more like an architect’s blueprint: they provide explicit definitions and scaffolding of project success.
And yet, like design, requirements come from the world and the people in it and not from software 2 2 Michael Jackson (2001). Problem frames. Addison-Wesley.
Axel van Lamsweerde (2008). Requirements engineering: from craft to discipline. ACM SIGSOFT Foundations of Software Engineering (FSE).
There are some approaches to specifying requirements formally . These techniques allow requirements engineers to automatically identify conflicting requirements, so they don’t end up proposing a design that can’t possibly exist. Some even use systems to make requirements traceable , meaning the high level requirement can be linked directly to the code that meets that requirement 4 4 Patrick Mäder, Alexander Egyed (2015). Do developers benefit from requirements traceability when evolving and maintaining a software system?. Empirical Software Engineering.
Rahul Mohanani, Paul Ralph, and Ben Shreeve (2014). Requirements fixation. ACM/IEEE International Conference on Software Engineering.
Expressing requirements in natural language can mitigate these effects, at the expense of precision. They just have to be complete , precise , non-conflicting , and verifiable . For example, consider a design for a simple to do list application. Its requirements might be something like the following:
- Users must be able to add to-do list items with a single action.
- To-do list items must contain text and a binary completed state.
- Users must be able to edit the text of to-do list items.
- Users must be able to toggle the completed state of to-do list items.
- Users must be able to delete to-do list items.
- All changes made to the state of to-do list items must be saved automatically without user intervention.
Let’s review these requirements against the criteria for good requirements that I listed above:
- Is it complete ? I can think of a few more requirements: is the list ordered? How long does state persist? Are there user accounts? Where is data stored? What does it look like? What kinds of user actions must be supported? Is delete undoable? Even just on these completeness dimension, you can see how even a very simple application can become quite complex. When you’re generating requirements, your job is to make sure you haven’t forgotten important requirements.
- Is the list precise ? Not really. When you add a to do list item, is it added at the beginning? The end? Wherever a user request it be added? How long can the to do list item text be? Clearly the requirement above is imprecise. And imprecise requirements lead to imprecise goals, which means that engineers might not meet them. Is this to do list team okay with not meeting its goals?
- Are the requirements non-conflicting ? I think they are since they all seem to be satisfiable together. But some of the missing requirements might conflict. For example, suppose we clarified the imprecise requirement about where a to do list item is added. If the requirement was that it was added to the end, is there also a requirement that the window scroll to make the newly added to do item visible? If not, would the first requirement of making it possible for users to add an item with a single action be achieveable? They could add it, but they wouldn’t know they had added it because of this usability problem, so is this requirement met? This example shows that reasoning through requirements is ultimately about interpreting words, finding source of ambiguity, and trying to eliminate them with more words.
- Finally, are they verifiable ? Some more than others. For example, is there a way to guarantee that the state saves successfully all the time? That may be difficult to prove given the vast number of ways the operating environment might prevent saving, such as a failing hard drive or an interrupted internet connection. This requirement might need to be revised to allow for failures to save, which itself might have implications for other requirements in the list.
Now, the flaws above don’t make the requirements “wrong”. They just make them “less good.” The more complete, precise, non-conflicting, and testable your requirements are, the easier it is to anticipate risk, estimate work, and evaluate progress, since requirements essentially give you a to do list for implementation and testing.
Lastly, remember that requirements are translated from a design, and designs have many more qualities than just completeness, preciseness, feasibility, and verifiability. Designs must also be legal, ethical, and just. Consider, for example, the anti-Black redlining practices pervasive throughout the United States. Even through the 1980’s, it was standard practice for banks to lend to lower-income white residents, but not Black residents, even middle-income or upper-income ones. Banks in the 1980’s wrote software to automate many lending decisions; would a software requirement such as this have been legal, ethical, or just?
That requirement is both precise and verifiable. In the 1980’s, it was legal. But was it ethical or just? Absolutely not. Therefore, requirements, no matter how formally extracted from a design specification, no matter how consistent with law, and no matter how aligned with an organization’s priorities, should be free of racist ideas. Requirements are just one of many ways that such ideas are manifested, and ultimately hidden in code 1 1 Ruha Benjamin (2019). Race after technology: Abolitionist tools for the New Jim Code. Polity Books.
References
-
Ruha Benjamin (2019). Race after technology: Abolitionist tools for the New Jim Code. Polity Books.
-
Michael Jackson (2001). Problem frames. Addison-Wesley.
-
Axel van Lamsweerde (2008). Requirements engineering: from craft to discipline. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Patrick Mäder, Alexander Egyed (2015). Do developers benefit from requirements traceability when evolving and maintaining a software system?. Empirical Software Engineering.
-
Rahul Mohanani, Paul Ralph, and Ben Shreeve (2014). Requirements fixation. ACM/IEEE International Conference on Software Engineering.
-
David L Parnas, Paul C. Clements (1986). A rational design process: How and why to fake it. IEEE Transactions on Software Engineering.
-
Ian Sommerville, Pete Sawyer (1997). Requirements engineering: a good practice guide. John Wiley & Sons, Inc.
Architecture
Once you have a sense of what your design must do (in the form of requirements or other less formal specifications), the next big problem is one of organization. How will you order all of the different data, algorithms, and control implied by your requirements? With a small program of a few hundred lines, you can get away without much organization, but as programs scale, they quickly become impossible to manage alone, let alone with multiple developers. Much of this challenge occurs because requirements change , and every time they do, code has to change to accommodate. The more code there is and the more entangled it is, the harder it is to change and more likely you are to break things.
This is where architecture comes in. Architecture is a way of organizing code, just like building architecture is a way of organizing space. The idea of software architecture has at its foundation a principle of information hiding : the less a part of a program knows about other parts of a program, the easier it is to change. The most popular information hiding strategy is encapsulation : this is the idea of designing self-contained abstractions with well-defined interfaces that separate different concerns in a program. Programming languages offer encapsulation support through things like functions and classes , which encapsulate data and functionality together. Another programming language encapsulation method is scoping , which hides variables and other names from other parts of program outside a scope. All of these strategies attempt to encourage developers to maximize information hiding and separation of concerns. If you get your encapsulation right, you should be able to easily make changes to a program’s behavior without having to change everything about its implementation.
When encapsulation strategies fail, one can end up with what some affectionately call a “ball of mud” architecture or “spaghetti code”. Ball of mud architectures have no apparent organization, which makes it difficult to comprehend how parts of its implementation interact. A more precise concept that can help explain this disorder is cross-cutting concerns , which are things like features and functionality that span multiple different components of a system, or even an entire system. There is some evidence that cross-cutting concerns can lead to difficulties in program comprehension and long-term design degradation 17 17 Robert J. Walker, Shreya Rawal, and Jonathan Sillito (2012). Do crosscutting concerns cause modularity problems?. ACM SIGSOFT Foundations of Software Engineering (FSE).
Neil A. Ernst, Stephany Bellomo, Ipek Ozkaya, Robert L. Nord, and Ian Gorton (2015). Measure it? Manage it? Ignore it? Software practitioners and technical debt. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Ravi Khadka, Belfrit V. Batlajery, Amir M. Saeidi, Slinger Jansen, and Jurriaan Hage (2014). How do professionals perceive legacy systems and software modernization?. ACM/IEEE International Conference on Software Engineering.
The preventative solution to this problems is to try to design architecture up front, mitigating the various risks that come from cross-cutting concerns (defects, low modifiability, etc.) 7 7 George Fairbanks (2010). Just enough software architecture: a risk-driven approach. Marshall & Brainerd.
Marian Petre (2013). UML in practice. ACM/IEEE International Conference on Software Engineering.
Emad Aghajani, Csaba Nagy, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, Michele Lanza, David C. Shepherd (2020). Software documentation: the practitioners' perspective. ACM/IEEE International Conference on Software Engineering.
More recent developers have investigated ideas of architectural styles , which are patterns of interactions and information exchange between encapsulated components. Some common architectural styles include:
- Client/server , in which data is transacted in response to requests. This is the basis of the Internet and cloud computing 5 5
Jürgen Cito, Philipp Leitner, Thomas Fritz, and Harald C. Gall (2015). The making of cloud applications: an empirical study on software development for the cloud. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
. - Pipe and filter , in which data is passed from component to component, and transformed and filtered along the way. Command lines, compilers, and machine learned programs are examples of pipe and filter architectures.
- Model-view-controller (MVC) , in which data is separated from views of the data and from manipulations of data. Nearly all user interface toolkits use MVC, including popular modern frameworks such as React.
- Peer to peer (P2P) , in which components transact data through a distributed standard interface. Examples include Bitcoin, Spotify, and Gnutella.
- Event-driven , in which some components “broadcast” events and others “subscribe” to notifications of these events. Examples include most model-view-controller-based user interface frameworks, which have models broadcast change events to subscribers. For example, views may subscribe to models so they may update themselves to render new model state each time it changes.
Architectural styles come in all shapes and sizes. Some are smaller design patterns of information sharing 4 4 Kent Beck, Ron Crocker, Gerard Meszaros, John Vlissides, James O. Coplien, Lutz Dominick, and Frances Paulisch (1996). Industrial experience with design patterns. ACM/IEEE International Conference on Software Engineering.
Len Bass, Bonnie E. John (2003). Linking usability to software architecture patterns through general scenarios. Journal of Systems and Software.
One fundamental unit of which an architecture is composed is a component . This is basically a word that refers to any abstraction—any code, really—that attempts to encapsulate some well defined functionality or behavior separate from other functionality and behavior. For example, consider the Java class Math : it encapsulates a wide range of related mathematical functions. This class has an interface that decide how it can communicate with other components (sending arguments to a math function and getting a return value). Components can be more than classes though: they might be a data structure, a set of functions, a library, an API, or even something like a web service. All of these are abstractions that encapsulate interrelated computation and state for some well-define purpose.
The second fundamental unit of architecture is connectors . Connectors are code that transmit information between components. They’re brokers that connect components, but do not necessarily have meaningful behaviors or states of their own. Connectors can be things like function calls, web service API calls, events, requests, and so on. None of these mechanisms store state or functionality themselves; instead, they are the things that tie components functionality and state together.
Even with carefully selected architectures, systems can still be difficult to put together, leading to architectural mismatch 8 8 Garlan, D., Allen, R., & Ockerbloom, J (1995). Architectural mismatch or why it's hard to build systems out of existing parts. ACM/IEEE International Conference on Software Engineering.
Dong Qiu, Bixin Li, and Zhendong Su (2013). An empirical analysis of the co-evolution of schema and code in database applications. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
The most common approach to dealing with both architectural mismatch and the changing of requirements over time is refactoring , which means changing the architecture of an implementation without changing its behavior. Refactoring is something most developers do as part of changing a system 11,16 11 Emerson Murphy-Hill, Chris Parnin, and Andrew P. Black (2009). How we refactor, and how we know it. ACM/IEEE International Conference on Software Engineering.
Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente (2016). Why we refactor? Confessions of GitHub contributors. ACM SIGSOFT Foundations of Software Engineering (FSE).
T. H. Ng, S. C. Cheung, W. K. Chan, and Y. T. Yu (2006). Work experience versus refactoring to design patterns: a controlled experiment. ACM SIGSOFT Foundations of Software Engineering (FSE).
Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan (2012). A field study of refactoring challenges and benefits. ACM SIGSOFT Foundations of Software Engineering (FSE).
Research on the actual activity of software architecture is actually somewhat sparse. One of the more recent syntheses of this work is Petre et al.’s book, Software Design Decoded 14 14 Marian Petre, André van der Hoek (2016). Software design decoded: 66 ways experts think. MIT Press.
Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab (2017). Why do developers use trivial packages? An empirical case study on npm. ACM SIGSOFT Foundations of Software Engineering (FSE).
References
-
Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab (2017). Why do developers use trivial packages? An empirical case study on npm. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Emad Aghajani, Csaba Nagy, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, Michele Lanza, David C. Shepherd (2020). Software documentation: the practitioners' perspective. ACM/IEEE International Conference on Software Engineering.
-
Len Bass, Bonnie E. John (2003). Linking usability to software architecture patterns through general scenarios. Journal of Systems and Software.
-
Kent Beck, Ron Crocker, Gerard Meszaros, John Vlissides, James O. Coplien, Lutz Dominick, and Frances Paulisch (1996). Industrial experience with design patterns. ACM/IEEE International Conference on Software Engineering.
-
Jürgen Cito, Philipp Leitner, Thomas Fritz, and Harald C. Gall (2015). The making of cloud applications: an empirical study on software development for the cloud. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Neil A. Ernst, Stephany Bellomo, Ipek Ozkaya, Robert L. Nord, and Ian Gorton (2015). Measure it? Manage it? Ignore it? Software practitioners and technical debt. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
George Fairbanks (2010). Just enough software architecture: a risk-driven approach. Marshall & Brainerd.
-
Garlan, D., Allen, R., & Ockerbloom, J (1995). Architectural mismatch or why it's hard to build systems out of existing parts. ACM/IEEE International Conference on Software Engineering.
-
Ravi Khadka, Belfrit V. Batlajery, Amir M. Saeidi, Slinger Jansen, and Jurriaan Hage (2014). How do professionals perceive legacy systems and software modernization?. ACM/IEEE International Conference on Software Engineering.
-
Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan (2012). A field study of refactoring challenges and benefits. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Emerson Murphy-Hill, Chris Parnin, and Andrew P. Black (2009). How we refactor, and how we know it. ACM/IEEE International Conference on Software Engineering.
-
T. H. Ng, S. C. Cheung, W. K. Chan, and Y. T. Yu (2006). Work experience versus refactoring to design patterns: a controlled experiment. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Marian Petre (2013). UML in practice. ACM/IEEE International Conference on Software Engineering.
-
Marian Petre, André van der Hoek (2016). Software design decoded: 66 ways experts think. MIT Press.
-
Dong Qiu, Bixin Li, and Zhendong Su (2013). An empirical analysis of the co-evolution of schema and code in database applications. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente (2016). Why we refactor? Confessions of GitHub contributors. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Robert J. Walker, Shreya Rawal, and Jonathan Sillito (2012). Do crosscutting concerns cause modularity problems?. ACM SIGSOFT Foundations of Software Engineering (FSE).
Specifications
When you make something with code, you’re probably used to figuring out a design as you go. You write a function, you choose some arguments, and if you don’t like what you see, perhaps you add a new argument to that function and test again. This cowboy coding as some people like to call it can be great fun! It allows systems to emerge more organically, as you iteratively see your front-end design emerge, the design of your implementation emerges too, co-evolving with how you’re feeling about the final product.
As you’ve probably noticed by now, this type of process doesn’t really scale, even when you’re working with just a few other people. That argument you added? You just broke a bunch of functions one of your teammates was planning and when she commits her code, now she gets merge conflicts, which cost her an hour to fix because she has to catch up to whatever design change you made. This lack of planning quickly turns into an uncoordinated mess of individual decision making. Suddenly you’re spending all of your time cleaning up coordination messes instead of writing code.
The techniques we’ve discussed so far for avoiding this boil down to specifying what code should do, so everyone can write code according to a plan. We’ve talked about requirements specifications , which are declarations of what software must do from a users’ perspective. We’ve also talked about architectural specifications , which are high-level declarations of how code will be organized, encapsulated, and coordinated. At the lowest level are functional specifications , which are declarations about the properties of input and output of functions in a program .
In their simplest form, a functional specification can be just some natural language that says what an individual function is supposed to do:
This comment achieves the core purpose of a specification: to help other developers understand what the requirements and intended behavior of a function are. As long as everyone sticks to this “plan” (everyone calls the function with only numbers and the function always returns the smaller of them), then there shouldn’t be any problems.
The comment above is okay, but it’s not very precise. It says what is returned and what properties it has, but it only implies that numbers are allowed without saying anything about what kind of numbers. Are decimals allowed or just integers? What about not-a-number (the result of dividing 1 by 0). Or infinity?
To make these clearer, many languages use static typing to allow developers to specify types explicitly:
Because an int
is well-defined in most languages, the two inputs to the function are well-defined.
Of course, if the above was JavaScript code (which doesn’t support static typing), JavaScript does nothing to actually verify that the data given to min()
are actually integers. It’s entirely fine with someone sending a string and an object. This probably won’t do what you intended, leading to defects.
This brings us to a second purpose of writing functional specifications: to help verify that functions, their input, and their output are correct. Tests of functions and other low-level procedures are called unit tests . There are many ways to use specifications to verify correctness. By far, one of the simplest and most widely used kinds of unit tests are assertions 3 3 Clarke, L. A., & Rosenblum, D. S (2006). A historical perspective on runtime assertion checking in software development. ACM SIGSOFT Software Engineering Notes.
These two new lines of code are essentially functional specifications that declare “ If either of those inputs is not an integer, the caller of this function is doing something wrong ”. This is useful to declare, but assertions have a bunch of problems: if your program can send a non-integer value to min, but you never test it in a way that does, you’ll never see those alerts. This form of dynamic verification is therefore very limited, since it provides weaker guarantees about correctness. That said, a study of the use of assertions in a large database of GitHub projects shows that use of assertions is related to fewer defects 1 1 Casey Casalnuovo, Prem Devanbu, Abilio Oliveira, Vladimir Filkov, and Baishakhi Ray (2015). Assert use in GitHub projects. ACM/IEEE International Conference on Software Engineering.
Assertions are related to the broader category of error handling language features. Error handling includes assertions, but also programming language features like exceptions and exception handlers. Error handling is a form of specification in that checking for errors usually entails explicitly specifying the conditions that determine an error. For example, in the code above, the condition Chien-Tsun Chen, Yu Chin Cheng, Chin-Yun Hsieh, and I-Lang Wu (2008). Exception handling refactorings: Directed by goals and driven by bug fixing. Journal of Systems and Software.
Felipe Eberta, Fernando Castora, Alexander Serebrenik (2015). An exploratory study on exception handling bugs in Java programs. Journal of Systems and Software.
Maxion, Roy A., and Robert T. Olszewski (2000). Eliminating exception handling errors with dependability cases: a comparative, empirical study. IEEE Transactions on Software Engineering.
Number.isInteger(a)
specifies that the parameter a
must be an integer. Other exception handling code such as the Java throws
statement indicates the cases in which errors can occur and the corresponding catch
statement indicates what is to done about errors. It is difficult to implement good exception handling that provides granular, clear ways of recovering from errors 2 2
Researchers have invented many forms of specification that require more work and more thought to write, but can be used to make stronger guarantees about correctness 9 9 Jim Woodcock, Peter Gorm Larsen, Juan Bicarregui, and John Fitzgerald (2009). Formal methods: Practice and experience. ACM Computing Surveys.
The annotations above require that, no matter what, the inputs have to be integers and the output has to be less than or equal to both values. The automatic theorem prover can then start with the claim that result is always less than or equal to both and begin searching for a counterexample. Can you find a counterexample? Really try. Think about what you’re doing while you try: you’re probably experimenting with different inputs to identify arguments that violate the contract. That’s similar to what automatic theorem provers do, but they use many tricks to explore large possible spaces of inputs all at once, and they do it very quickly.
There are definite tradeoffs with writing detailed, formal specifications. The benefits are clear: many companies have written formal functional specifications in order to make completely unambiguous the required behavior of their code, particularly systems that are capable of killing people or losing money, such as flight automation software, banking systems, and even compilers that create executables from code 9 9 Jim Woodcock, Peter Gorm Larsen, Juan Bicarregui, and John Fitzgerald (2009). Formal methods: Practice and experience. ACM Computing Surveys.
Davide Fucci, Hakan Erdogmus, Burak Turhan, Markku Oivo, Natalia Juristo (2016). A dissection of test-driven development: Does it really matter to test-first or to test-last?. IEEE Transactions on Software Engineering.
Writing formal specifications can also have downsides. When the consequences of software failure aren’t so high, the difficulty and time required to write and maintain functional specifications may not be worth the effort 7 7 Marian Petre (2013). UML in practice. ACM/IEEE International Conference on Software Engineering.
Todd W. Schiller, Kellen Donohue, Forrest Coward, and Michael D. Ernst (2014). Case studies and tools for contract specifications. ACM/IEEE International Conference on Software Engineering.
References
-
Casey Casalnuovo, Prem Devanbu, Abilio Oliveira, Vladimir Filkov, and Baishakhi Ray (2015). Assert use in GitHub projects. ACM/IEEE International Conference on Software Engineering.
-
Chien-Tsun Chen, Yu Chin Cheng, Chin-Yun Hsieh, and I-Lang Wu (2008). Exception handling refactorings: Directed by goals and driven by bug fixing. Journal of Systems and Software.
-
Clarke, L. A., & Rosenblum, D. S (2006). A historical perspective on runtime assertion checking in software development. ACM SIGSOFT Software Engineering Notes.
-
Felipe Eberta, Fernando Castora, Alexander Serebrenik (2015). An exploratory study on exception handling bugs in Java programs. Journal of Systems and Software.
-
Davide Fucci, Hakan Erdogmus, Burak Turhan, Markku Oivo, Natalia Juristo (2016). A dissection of test-driven development: Does it really matter to test-first or to test-last?. IEEE Transactions on Software Engineering.
-
Maxion, Roy A., and Robert T. Olszewski (2000). Eliminating exception handling errors with dependability cases: a comparative, empirical study. IEEE Transactions on Software Engineering.
-
Marian Petre (2013). UML in practice. ACM/IEEE International Conference on Software Engineering.
-
Todd W. Schiller, Kellen Donohue, Forrest Coward, and Michael D. Ernst (2014). Case studies and tools for contract specifications. ACM/IEEE International Conference on Software Engineering.
-
Jim Woodcock, Peter Gorm Larsen, Juan Bicarregui, and John Fitzgerald (2009). Formal methods: Practice and experience. ACM Computing Surveys.
Process
So you know what you’re going to build and how you’re going to build it. What process should you go about building it? Who’s going to build what? What order should you build it in? How do you make sure everyone is in sync while you’re building it? 22 22 Tim Pettersen (2016). Git Workflows with Tim Pettersen. Software Engineering Daily Podcast.
At the foundation of all of these questions are basic matters of project management : plan, execute, and monitor. But developers in the 1970’s and on found that traditional project management ideas didn’t seem to work. The earliest process ideas followed a “waterfall” model, in which a project begins by identifying requirements, writing specifications, implementing, testing, and releasing, all under the assumption that every stage could be fully tested and verified. (Recognize this? It’s the order of topics we’re discussing in this class!). Many managers seemed to like the waterfall model because it seemed structured and predictable; however, because most managers were originally software developers, they preferred a structured approach to project management 31 31 Gerald M. Weinberg (1982). Over-structured management of software engineering. ACM/IEEE International Conference on Software Engineering.
In 1988, Barry Boehm proposed an alternative to waterfall called the Spiral model 4 4 Barry W. Boehm (1988). A spiral model of software development and enhancement. IEEE Computer.
Around the same time, two influential books were published. Fred Brooks wrote The Mythical Man Month 6 6 Fred P. Brooks (1995). The mythical man month. Pearson Education.
Tom DeMarco, Tim Lister (1987). Peopleware: Productive projects and teams. Addison-Wesley.
These early ideas in software project management led to a wide variety of other discoveries about process. For example, organizations of all sizes can improve their process if they are very aware of what the people in the organization know, what it’s capable of learning, and if it builds robust processes to actually continually improve process 9,10 9 Tore Dybå (2002). Enabling software process improvement: an investigation of the importance of organizational issues. Empirical Software Engineering.
Tore Dybå (2003). Factors of software process improvement success in small and large organizations: an empirical study in the scandinavian context. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Beyond process improvement, other factors emerged. For example, researchers discovered that critical to team productivity was awareness of teammates’ work 16 16 Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.
Christoph Treude and Margaret-Anne Storey (2010). Awareness 2.0: staying aware of projects, developers and tasks using dashboards and feeds. ACM/IEEE International Conference on Software Engineering.
Allen E. Milewski (2007). Global and task effects in information-seeking among software engineers. Empirical Software Engineering.
In addition to awareness, ownership is a critical idea in process. This is the idea that for every line of code, someone is responsible for its quality. The owner might be the person who originally wrote the code, but it could also shift to new team members. Studies of code ownership on Windows Vista and Windows 7 found that less a component had a clear owner, the more pre-release defects it had and the more post-release failures were reported by users 3 3 Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu (2011). Don't touch my code! Examining the effects of ownership on software quality. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Pace is another factor that affects quality. Clearly, there’s a tradeoff between how fast a team works and the quality of the product it can build. In fact, interview studies of engineers at Google, Facebook, Microsoft, Intel, and other large companies found that the pressure to reduce “time to market” harmed nearly every aspect of teamwork: the availability and discoverability of information, clear communication, planning, integration with others’ work, and code ownership 24 24 Julia Rubin and Martin Rinard (2016). The challenges of staying together while moving fast: an exploratory study. ACM/IEEE International Conference on Software Engineering.
Amy J. Ko (2017). A Three-Year Participant Observation of Software Startup Software Evolution. ACM/IEEE International Conference on Software Engineering, Software Engineering in Practice.
Because of the importance of awareness and communication, the distance between teammates is also a critical factor. This is most visible in companies that hire remote developers, building distributed teams, or when teams are fully distributed (such as when there is a pandemic requiring social distancing). One motivation for doing this is to reduce costs or gain access to engineering talent that is distant from a team’s geographical center, but over time, companies have found that doing so necessitates significant investments in socialization to ensure quality, minimizing geographical, temporal and cultural separation 26 26 Darja Šmite, Claes Wohlin, Tony Gorschek, Robert Feldt (2010). Empirical evidence in global software engineering: a systematic review. Empirical Software Engineering.
Narayan Ramasubbu, Marcelo Cataldo, Rajesh Krishna Balan, and James D. Herbsleb (2011). Configuring global software teams: a multi-company analysis of project productivity, quality, and profits. ACM/IEEE International Conference on Software Engineering.
Patrick Wagstrom and Subhajit Datta (2014). Does latitude hurt while longitude kills? Geographical and temporal separation in a large scale software development project. ACM/IEEE International Conference on Software Engineering.
Ekrem Kocaguneli, Thomas Zimmermann, Christian Bird, Nachiappan Nagappan, and Tim Menzies (2013). Distributed development considered harmful?. ACM/IEEE International Conference on Software Engineering.
Klaas-Jan Stol and Brian Fitzgerald (2014). Two's company, three's a crowd: a case study of crowdsourcing software development. ACM/IEEE International Conference on Software Engineering.
A critical part of ensuring all that a team is successful is having someone responsible for managing these factors of distance, pace, ownership, awareness, and overall process. The most obvious person to oversee this is, of course, a project manager 5,21 5 Mike Borozdin (2017). Engineering management with Mike Borozdin. Software Engineering Daily Podcast.
Jeff Norris (2016). Tech leadership with Jeff Norris. Software Engineering Daily Podcast.
Eirini Kalliamvakou, Christian Bird, Thomas Zimmermann, Andrew Begel, Robert DeLine, Daniel M. German (2017). What makes a great manager of software engineers?. IEEE Transactions on Software Engineering.
While all of this research has strong implications for practice, industry has largely explored its own ideas about process, devising frameworks that addressed issues of distance, pace, ownership, awareness, and process improvement. Extreme Programming 2 2 Kent Beck (1999). Embracing change with extreme programming. IEEE Computer.
- Be iterative
- Do small releases
- Keep design simple
- Write unit tests
- Refactor to iterate
- Use pair programming
- Integrate continuously
- Everyone owns everything
- Use an open workspace
- Work sane hours
Note that none of these had any empirical evidence to back them. Moreover, Beck described in his original proposal that these ideas were best for “ outsourced or in-house development of small- to medium-sized systems where requirements are vague and likely to change ”, but as industry often does, it began hyping it as a universal solution to software project management woes and adopted all kinds of combinations of these ideas, adapting them to their existing processes. In reality, the value of XP appears to depend on highly project-specific factors 20 20 Matthias M. Müller and Frank Padberg (2003). On the economic evaluation of XP projects. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Helen Sharp, Hugh Robinson (2004). An ethnographic study of XP practice. Empirical Software Engineering.
di Bella, E., Fronza, I., Phaphoom, N., Sillitti, A., Succi, G., & Vlasenko, J (2013). Pair programming and software defects—A large, industrial case study. IEEE Transactions on Software Engineering.
At the same time, Beck began also espousing the idea of “Agile” methods , which celebrated many of the values underlying Extreme Programming, such as focusing on individuals, keeping things simple, collaborating with customers, and being iterative. This idea of being agile was even more popular and spread widely in industry and research, even though many of the same ideas appeared much earlier in Boehm’s work on the Spiral model. Researchers found that Agile methods can increase developer enthusiasm 28 28 Sharifah Syed-Abdullah, Mike Holcombe & Marian Gheorge (2006). The impact of an agile methodology on the well being of development teams. Empirical Software Engineering.
Rashina Hoda, James Noble, and Stuart Marshall (2010). Organizing self-organizing teams. ACM/IEEE International Conference on Software Engineering.
Osama Al-Baik & James Miller (2015). The kanban approach, between agility and leanness: a systematic review. Empirical Software Engineering.
Rashina Hoda, James Noble (2017). Becoming agile: a grounded theory of agile transitions in practice. ACM/IEEE International Conference on Software Engineering.
Ultimately, all of this energy around process ideas in industry is exciting, but disorganized. None of these efforts really get to the core of what makes software projects difficult to manage. One effort in research to get to this core by contributing new theories that explain these difficulties. The first is Herbsleb’s Socio-Technical Theory of Coordination (STTC) . The idea of the theory is quite simple: technical dependencies in engineering decisions (e.g., this function calls this other function, this data type stores this other data type) define the social constraints that the organization must solve in a variety of ways to build and maintain software 11,12 11 James D. Herbsleb and Audris Mockus (2003). Formulation and preliminary test of an empirical theory of coordination in software engineering. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
James Herbsleb (2016). Building a socio-technical theory of coordination: why and how. ACM SIGSOFT Foundations of Software Engineering (FSE).
I extended this idea to congruence with beliefs about product value 17 17 Amy J. Ko (2017). A Three-Year Participant Observation of Software Startup Software Evolution. ACM/IEEE International Conference on Software Engineering, Software Engineering in Practice.
References
-
Osama Al-Baik & James Miller (2015). The kanban approach, between agility and leanness: a systematic review. Empirical Software Engineering.
-
Kent Beck (1999). Embracing change with extreme programming. IEEE Computer.
-
Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu (2011). Don't touch my code! Examining the effects of ownership on software quality. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Barry W. Boehm (1988). A spiral model of software development and enhancement. IEEE Computer.
-
Mike Borozdin (2017). Engineering management with Mike Borozdin. Software Engineering Daily Podcast.
-
Fred P. Brooks (1995). The mythical man month. Pearson Education.
-
Tom DeMarco, Tim Lister (1987). Peopleware: Productive projects and teams. Addison-Wesley.
-
di Bella, E., Fronza, I., Phaphoom, N., Sillitti, A., Succi, G., & Vlasenko, J (2013). Pair programming and software defects—A large, industrial case study. IEEE Transactions on Software Engineering.
-
Tore Dybå (2002). Enabling software process improvement: an investigation of the importance of organizational issues. Empirical Software Engineering.
-
Tore Dybå (2003). Factors of software process improvement success in small and large organizations: an empirical study in the scandinavian context. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
James D. Herbsleb and Audris Mockus (2003). Formulation and preliminary test of an empirical theory of coordination in software engineering. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
James Herbsleb (2016). Building a socio-technical theory of coordination: why and how. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Rashina Hoda, James Noble, and Stuart Marshall (2010). Organizing self-organizing teams. ACM/IEEE International Conference on Software Engineering.
-
Rashina Hoda, James Noble (2017). Becoming agile: a grounded theory of agile transitions in practice. ACM/IEEE International Conference on Software Engineering.
-
Eirini Kalliamvakou, Christian Bird, Thomas Zimmermann, Andrew Begel, Robert DeLine, Daniel M. German (2017). What makes a great manager of software engineers?. IEEE Transactions on Software Engineering.
-
Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.
-
Amy J. Ko (2017). A Three-Year Participant Observation of Software Startup Software Evolution. ACM/IEEE International Conference on Software Engineering, Software Engineering in Practice.
-
Ekrem Kocaguneli, Thomas Zimmermann, Christian Bird, Nachiappan Nagappan, and Tim Menzies (2013). Distributed development considered harmful?. ACM/IEEE International Conference on Software Engineering.
-
Allen E. Milewski (2007). Global and task effects in information-seeking among software engineers. Empirical Software Engineering.
-
Matthias M. Müller and Frank Padberg (2003). On the economic evaluation of XP projects. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Jeff Norris (2016). Tech leadership with Jeff Norris. Software Engineering Daily Podcast.
-
Tim Pettersen (2016). Git Workflows with Tim Pettersen. Software Engineering Daily Podcast.
-
Narayan Ramasubbu, Marcelo Cataldo, Rajesh Krishna Balan, and James D. Herbsleb (2011). Configuring global software teams: a multi-company analysis of project productivity, quality, and profits. ACM/IEEE International Conference on Software Engineering.
-
Julia Rubin and Martin Rinard (2016). The challenges of staying together while moving fast: an exploratory study. ACM/IEEE International Conference on Software Engineering.
-
Helen Sharp, Hugh Robinson (2004). An ethnographic study of XP practice. Empirical Software Engineering.
-
Darja Šmite, Claes Wohlin, Tony Gorschek, Robert Feldt (2010). Empirical evidence in global software engineering: a systematic review. Empirical Software Engineering.
-
Klaas-Jan Stol and Brian Fitzgerald (2014). Two's company, three's a crowd: a case study of crowdsourcing software development. ACM/IEEE International Conference on Software Engineering.
-
Sharifah Syed-Abdullah, Mike Holcombe & Marian Gheorge (2006). The impact of an agile methodology on the well being of development teams. Empirical Software Engineering.
-
Christoph Treude and Margaret-Anne Storey (2010). Awareness 2.0: staying aware of projects, developers and tasks using dashboards and feeds. ACM/IEEE International Conference on Software Engineering.
-
Patrick Wagstrom and Subhajit Datta (2014). Does latitude hurt while longitude kills? Geographical and temporal separation in a large scale software development project. ACM/IEEE International Conference on Software Engineering.
-
Gerald M. Weinberg (1982). Over-structured management of software engineering. ACM/IEEE International Conference on Software Engineering.
Comprehension
Despite all of the activities that we’ve talked about so far—communicating, coordinating, planning, designing, architecting—really, most of a software engineers time is spent reading code 16 16 Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke (2014). On the comprehension of program comprehension. ACM Transactions on Software Engineering and Methodology (TOSEM).
Being good at program comprehension is a critical skill. You need to be able to read a function and know what it will do with its inputs; you need to be able to read a class and understand its state and functionality; you also need to be able to comprehend a whole implementation, understanding its architecture. Without these skills, you can’t test well, you can’t debug well, and you can’t fix or enhance the systems you’re building or maintaining. In fact, studies of software engineers’ first year at their first job show that a significant majority of their time is spent trying to simply comprehend the architecture of the system they are building or maintaining and understanding the processes that are being followed to modify and enhance them 6 6 Barthélémy Dagenais, Harold Ossher, Rachel K. E. Bellamy, Martin P. Robillard, and Jacqueline P. de Vries (2010). Moving into a new software project landscape. ACM/IEEE International Conference on Software Engineering.
What’s going on when developers comprehend code? Usually, developers are trying to answer questions about code that help them build larger models of how a program works. Because program comprehension is hard, they avoid it when they can, relying on explanations from other developers rather than trying to build precise models of how a program works on their own 19 19 Tobias Roehm, Rebecca Tiarks, Rainer Koschke, and Walid Maalej (2012). How do professional developers comprehend software?. ACM/IEEE International Conference on Software Engineering.
Thomas D. LaToza, Brad A. Myers (2010). Developers ask reachability questions. ACM/IEEE International Conference on Software Engineering.
Jonathan Sillito, Gail C. Murphy, and Kris De Volder (2006). Questions programmers ask during software evolution tasks. ACM SIGSOFT Foundations of Software Engineering (FSE).
- Which type represents this domain concept or this UI element or action?
- Where in the code is the text in this error message or UI element?
- Where is there any code involved in the implementation of this behavior?
- Is there an entity named something like this in that unit (for example in a project, package or class)?
- What are the parts of this type?
- Which types is this type a part of?
- Where does this type fit in the type hierarchy?
- Does this type have any siblings in the type hierarchy?
- Where is this field declared in the type hierarchy?
- Who implements this interface or these abstract methods?
- Where is this method called or type referenced?
- When during the execution is this method called?
- Where are instances of this class created?
- Where is this variable or data structure being accessed?
- What data can we access from this object?
- What does the declaration or definition of this look like?
- What are the arguments to this function?
- What are the values of these arguments at runtime?
- What data is being modified in this code?
- How are instances of these types created and assembled?
- How are these types or objects related?
- How is this feature or concern (object ownership, UI control, etc) implemented?
- What in this structure distinguishes these cases?
- What is the “correct” way to use or access this data structure?
- How does this data structure look at runtime?
- How can data be passed to (or accessed at) this point in the code?
- How is control getting (from here to) here?
- Why isn’t control reaching this point in the code?
- Which execution path is being taken in this case?
- Under what circumstances is this method called or exception thrown?
- What parts of this data structure are accessed in this code?
- How does the system behavior vary over these types or cases?
- What are the differences between these files or types?
- What is the difference between these similar parts of the code (e.g., between sets of methods)?
- What is the mapping between these UI types and these model types?
- How can we know this object has been created and initialized correctly?
If you think about the diversity of questions in this list, you can see why program comprehension requires expertise. You not only need to understand programming languages quite well, but you also need to have strategies for answering all of the questions above (and more) quickly, effectively, and accurately.
So how do developers go about answering these questions? Studies comparing experts and novices show that experts use prior knowledge that they have about architecture, design patterns, and the problem domain a program is built for to know what questions to ask and how to answer them, whereas novices use surface features of code, which leads them to spend considerable time reading code that is irrelevant to a question 13,23 13 Thomas D. LaToza, David Garlan, James D. Herbsleb, and Brad A. Myers (2007). Program comprehension as fact finding. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Anneliese von Mayrhauser A. Marie Vans (1994). Comprehension processes during large scale maintenance. ACM/IEEE International Conference on Software Engineering.
Dave Binkley, Marcia Davis, Dawn Lawrie, Jonathan I. Maletic, Christopher Morrell, Bonita Sharif (2013). The impact of identifier style on effort and comprehension. Empirical Software Engineering.
Mark Weiser (1981). Program slicing. ACM/IEEE International Conference on Software Engineering.
Scott D. Fleming, Christopher Scaffidi, David Piorkowski, Margaret M. Burnett, Rachel K. E. Bellamy (2013). An information foraging theory perspective on tools for debugging, refactoring, and reuse tasks. ACM Transactions on Software Engineering and Methodology (TOSEM).
Lawrie, D., Morrell, C., Feild, H., & Binkley, D (2006). What's in a name? A study of identifiers. IEEE International Conference on Program Comprehension (ICPC).
Of course, program comprehension is not an inherently individual process either. Expert developers are resourceful, and frequently ask others for explanations of program behavior. Some of this might happen between coworkers, where someone seeking insight asks other engineers for summaries of program behavior, to accelerate their learning 11 11 Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann (2011). Design lessons from the fastest Q&A site in the west. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Andy Begel, Beth Simon (2008). Novice software developers, all over again. ICER.
Shrestha, N., Botta, C., Barik, T., & Parnin, C (2020). Here we go again: why is it difficult for developers to learn another programming language?. ACM/IEEE International Conference on Software Engineering.
While much of program comprehension is individual and social skill, some aspects of program comprehension are determined by the design of programming languages. For example, some programming languages result in programs that are more comprehensible. One framework called the Cognitive Dimensions of Notations 9 9 Thomas R. G. Green (1989). Cognitive dimensions of notations. People and computers.
==
, which behave differently depending on what the type of the left and right operands are. Knowing the behavior for Booleans doesn’t tell you the behavior for a Boolean being compared to an integer. In contrast, Java is a high consistency language: ==
is only ever valid when both operands are of the same type.
These differences in notation can have some impact. Encapsulation through data structures leads to better comprehension that monolithic or purely functional languages 3,25 3 Pamela Bhattacharya and Iulian Neamtiu (2011). Assessing programming language impact on development and maintenance: a study on C and C++. ACM/IEEE International Conference on Software Engineering.
Scott N. Woodfield, Hubert E. Dunsmore, and Vincent Y. Shen (1981). The effect of modularization and comments on program comprehension. ACM/IEEE International Conference on Software Engineering.
Guido Salvaneschi, Sven Amann, Sebastian Proksch, and Mira Mezini (2014). An empirical study on program comprehension with reactive programming. ACM SIGSOFT Foundations of Software Engineering (FSE).
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu (2014). A large scale study of programming languages and code quality in GitHub. ACM SIGSOFT Foundations of Software Engineering (FSE).
Stefan Endrikat, Stefan Hanenberg, Romain Robbes, and Andreas Stefik (2014). How do API documentation and static typing affect API usability?. ACM/IEEE International Conference on Software Engineering.
Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, Andreas Stefik (2013). An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering.
Oscar Callaú, Romain Robbes, Éric Tanter, David Röthlisberger (2013). How (and why) developers use the dynamic features of programming languages: the case of Smalltalk. Empirical Software Engineering.
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu (2014). A large scale study of programming languages and code quality in GitHub. ACM SIGSOFT Foundations of Software Engineering (FSE).
Code editors, development environments, and program comprehension tools can also be helpful. Early evidence showed that simple features like syntax highlighting and careful typographic choices can improve the speed of program comprehension 1 1 Ron Baecker (1988). Enhancing program readability and comprehensibility with tools for program visualization. ACM/IEEE International Conference on Software Engineering.
Amy J. Ko and Brad A. Myers (2009). Finding causes of program output with the Java Whyline. ACM SIGCHI Conference on Human Factors in Computing (CHI).
The path from novice to expert in program comprehension is one that involves understanding programming language semantics exceedingly well and reading a lot of code, design patterns, and architectures. Anticipate that as you develop these skills, it will take you time to build robust understandings of what a program is doing, slowing down your writing, testing, and debugging.
References
-
Ron Baecker (1988). Enhancing program readability and comprehensibility with tools for program visualization. ACM/IEEE International Conference on Software Engineering.
-
Andy Begel, Beth Simon (2008). Novice software developers, all over again. ICER.
-
Pamela Bhattacharya and Iulian Neamtiu (2011). Assessing programming language impact on development and maintenance: a study on C and C++. ACM/IEEE International Conference on Software Engineering.
-
Dave Binkley, Marcia Davis, Dawn Lawrie, Jonathan I. Maletic, Christopher Morrell, Bonita Sharif (2013). The impact of identifier style on effort and comprehension. Empirical Software Engineering.
-
Oscar Callaú, Romain Robbes, Éric Tanter, David Röthlisberger (2013). How (and why) developers use the dynamic features of programming languages: the case of Smalltalk. Empirical Software Engineering.
-
Barthélémy Dagenais, Harold Ossher, Rachel K. E. Bellamy, Martin P. Robillard, and Jacqueline P. de Vries (2010). Moving into a new software project landscape. ACM/IEEE International Conference on Software Engineering.
-
Stefan Endrikat, Stefan Hanenberg, Romain Robbes, and Andreas Stefik (2014). How do API documentation and static typing affect API usability?. ACM/IEEE International Conference on Software Engineering.
-
Scott D. Fleming, Christopher Scaffidi, David Piorkowski, Margaret M. Burnett, Rachel K. E. Bellamy (2013). An information foraging theory perspective on tools for debugging, refactoring, and reuse tasks. ACM Transactions on Software Engineering and Methodology (TOSEM).
-
Thomas R. G. Green (1989). Cognitive dimensions of notations. People and computers.
-
Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, Andreas Stefik (2013). An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering.
-
Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.
-
Amy J. Ko and Brad A. Myers (2009). Finding causes of program output with the Java Whyline. ACM SIGCHI Conference on Human Factors in Computing (CHI).
-
Thomas D. LaToza, David Garlan, James D. Herbsleb, and Brad A. Myers (2007). Program comprehension as fact finding. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Thomas D. LaToza, Brad A. Myers (2010). Developers ask reachability questions. ACM/IEEE International Conference on Software Engineering.
-
Lawrie, D., Morrell, C., Feild, H., & Binkley, D (2006). What's in a name? A study of identifiers. IEEE International Conference on Program Comprehension (ICPC).
-
Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke (2014). On the comprehension of program comprehension. ACM Transactions on Software Engineering and Methodology (TOSEM).
-
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann (2011). Design lessons from the fastest Q&A site in the west. ACM SIGCHI Conference on Human Factors in Computing (CHI).
-
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu (2014). A large scale study of programming languages and code quality in GitHub. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Tobias Roehm, Rebecca Tiarks, Rainer Koschke, and Walid Maalej (2012). How do professional developers comprehend software?. ACM/IEEE International Conference on Software Engineering.
-
Guido Salvaneschi, Sven Amann, Sebastian Proksch, and Mira Mezini (2014). An empirical study on program comprehension with reactive programming. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Shrestha, N., Botta, C., Barik, T., & Parnin, C (2020). Here we go again: why is it difficult for developers to learn another programming language?. ACM/IEEE International Conference on Software Engineering.
-
Jonathan Sillito, Gail C. Murphy, and Kris De Volder (2006). Questions programmers ask during software evolution tasks. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Anneliese von Mayrhauser A. Marie Vans (1994). Comprehension processes during large scale maintenance. ACM/IEEE International Conference on Software Engineering.
-
Mark Weiser (1981). Program slicing. ACM/IEEE International Conference on Software Engineering.
-
Scott N. Woodfield, Hubert E. Dunsmore, and Vincent Y. Shen (1981). The effect of modularization and comments on program comprehension. ACM/IEEE International Conference on Software Engineering.
Verification
How do you know a program does what you intended?
Part of this is being clear about what you intended (by writing specifications , for example), but your intents, however clear, are not enough: you need evidence that your intents were correctly expressed computationally. To get this evidence, we do verification .
There are many ways to verify code. A reasonable first instinct is to simply run your program. After all, what better way to check whether you expressed your intents then to see with your own eyes what your program does? This is an empirical approach is called testing . Some testing is manual , in that a human executes a program and verifies that it does what was intended. Some testing is automated , in that the test is run automatically by a computer. Another way to verify code is to analyze it, using logic to verify its correct operation. As with testing, some analysis is manual , since humans do it. We call this manual analysis inspection , whereas other analysis is automated , since computers do it. We call this program analysis . This leads to a nice complementary set of verification technique along two axes: degree of automation and type of verification:
- Manual techniques include manual testing (which is empirical) and inspections (which is analytical)
- Automated techniques include automated testing (which is empirical) and program analysis (which is analytical)
To discuss each of these and their tradeoffs, first we have to cover some theory about verification. The first and simplest ideas are some terminology:
- A defect is some subset of a program’s code that exhibits behavior that violates a program’s specifications. For example, if a program was supposed to sort a list of numbers in increasing order and print it to a console, but a flipped inequality in the sorting algorithm made it sort them in decreasing order, the flipped inequality is the defect.
- A failure is the program behavior that results from a defect executing. In our sorting example, the failure is the incorrectly sorted list printed on the console.
- A bug vaguely refers to either the defect, the failure, or both. When we say “bug”, we’re not being very precise, but it is a popular shorthand for a defect and everything it causes.
Note that because defects are defined relative to intent , whether a behavior is a failure depends entirely the definition of intent. If that intent is vague, whether something is a defect is vague. Moreover, you can define intents that result in behaviors that seem like failures: for example, I can write a program that intentionally crashes. A crash isn’t a failure if it was intended! This might be pedantic, but you’d be surprised how many times I’ve seen professional developers in bug triage meetings say:
“Well, it’s worked this way for a long time, and people have built up a lot of workarounds for this bug. It’s also really hard to fix. Let’s just call this by design. Closing this bug as won’t fix.”
Testing
So how do you find defects in a program? Let’s start with testing. Testing is generally the easiest kind of verification to do, but as a practice, it has questionable efficacy. Empirical studies of testing find that it is related to fewer defects in the future, but not strongly related, and it’s entirely possible that it’s not the testing itself that results in fewer defects, but that other activities (such as more careful implementation) result in fewer defects and testing efforts 1 1 Iftekhar Ahmed, Rahul Gopinath, Caius Brindescu, Alex Groce, and Carlos Jensen (2016). Can testedness be effectively measured?. ACM SIGSOFT Foundations of Software Engineering (FSE).
Moritz Beller, Georgios Gousios, Annibale Panichella, and Andy Zaidman (2015). When, how, and why developers (do not) test in their IDEs. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Raphael Pham, Stephan Kiesling, Olga Liskin, Leif Singer, and Kurt Schneider (2014). Enablers, inhibitors, and perceptions of testing in novice software teams. ACM SIGSOFT Foundations of Software Engineering (FSE).
Why is this? One possibility is that no amount of testing can prove a program correct with respect to its specifications . Why? It boils down to the same limitations that exist in science: with empiricism, we can provide evidence that a program does have defects, but we can’t provide complete evidence that a program doesn’t have defects. This is because even simple programs can execute in a infinite number of different ways.
Consider this JavaScript program:
The function should always return 0, right? How many possible values of input
do we have to try manually to verify that it always does? Well, if input
is an integer, then there are 2 to the power 32 possible integer values, because JavaScript uses 32-bits to represent an integer. That’s not infinite, but that’s a lot. But what if input
is a string? There are an infinite number of possible strings because they can have any sequence of characters of any length. Now we have to manually test an infinite number of possible inputs. So if we were restricting ourselves to testing, we will never know that the program is correct for all possible inputs. In this case, automatic testing doesn’t even help, since there are an infinite number of tests to run.
There are some ideas in testing that can improve how well we can find defects. For example, rather than just testing the inputs you can think of, focus on all of the lines of code in your program. If you find a set of tests that can cause all of the lines of code to execute, you have one notion of test coverage . Of course, lines of code aren’t enough, because an individual line can contain multiple different paths in it (e.g., value ? getResult1() : getResult2()
). So another notion of coverage is executing all of the possible control flow paths through the various conditionals in your program. Executing all of the possible paths is hard, of course, because every conditional in your program doubles the number of possible paths (you have 200 if statements in your program? That’s up to 2 to the power 200 possible paths, which is more paths than there are atoms in the universe .
There are many types of testing that are common in software engineering:
- Unit tests verify that functions return the correct output. For example, a program that implemented a function for finding the day of the week for a given date might also include unit tests that verify for a large number of dates that the correct day of the week is returned. They’re good for ensuring widely used low-level functionality is correct.
- Integration tests verify that when all of the functionality of a program is put together into the final product, it behaves according to specifications. Integration tests often operate at the level of user interfaces, clicking buttons, entering text, submitting forms, and verifying that the expected feedback always occurs. Integration tests are good for ensuring that important tasks that users will perform are correct.
- Regression tests verify that behavior that previously worked doesn’t stop working. For example, imagine you find a defect that causes logins to fail; you might write a test that verifies that this cause of login failure does not occur, in case someone breaks the same functionality again, even for a different reason. Regression tests are good for ensuring that you don’t break things when you make changes to your application.
Which tests you should write depends on what risks you want to take. Don’t care about failures? Don’t write any tests. If failures of a particular kind are highly consequential to your team, you should probably write tests that check for those failures. As we noted above, you can’t write enough tests to catch all bugs, so deciding which tests to write and maintain is a key challenge.
Analysis
Now, you might be thinking that it’s obvious that the program above is defective for some integers and strings. How did you know? You analyzed the program rather than executing it with specific inputs. For example, when I read (analyzed) the program, I thought:
“if we assume input
is an integer, then there are only three types of values to meaningfully consider with respect to the >
in the loop condition: positive, zero, and negative. Positive numbers will always decrement to 0 and return 0. Zero will return zero. And negative numbers just get returned as is, since they’re less then zero, which is wrong with respect to the specification. And in JavaScript, strings are never greater than 0 (let’s not worry about whether it even makes sense to be able to compare strings and numbers), so the string is returned, which is wrong.”
The above is basically an informal proof. I used logic to divide the possible states of input
and their effect on the program’s behavior. I used symbolic execution to verify all possible paths through the function, finding the paths that result in correct and incorrect values. The strategy was an inspection because we did it manually. If we had written a program that read the program to perform this proof automatically, we would have called it program analysis .
The benefits of analysis is that it can demonstrate that a program is correct in all cases. This is because they can handle infinite spaces of possible inputs by mapping those infinite inputs onto a finite space of possible executions. It’s not always possible to do this in practice, since many kinds of programs can execute in infinite ways, but it gets us closer to proving correctness.
One popular type of automatic program analysis tools is a static analysis tool. These tools read programs and identify potential defects using the types of formal proofs like the ones above. They typically result in a set of warnings, each one requiring inspection by a developer to verify, since some of the warnings may be false positives (something the tool thought was a defect, but wasn’t). Although static analysis tools can find many kinds of defects, they aren’t yet viewed by developers to be that useful because the false positives are often large in number and the way they are presented make them difficult to understand 4 4 Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge (2013). Why don't software developers use static analysis tools to find bugs?. ACM/IEEE International Conference on Software Engineering.
Not all analytical techniques rely entirely on logic. In fact, one of the most popular methods of verification in industry are code reviews , also known as inspections . The basic idea of an inspection is to read the program analytically, following the control and data flow inside the code to look for defects. This can be done alone, in groups, and even included as part of process of integrating changes, to verify them before they are committed to a branch. Modern code reviews, while informal, help find defects, stimulate knowledge transfer between developers, increase team awareness, and help identify alternative implementations that can improve quality 2 2 Alberto Bacchelli and Christian Bird (2013). Expectations, outcomes, and challenges of modern code review. ACM/IEEE International Conference on Software Engineering.
Peter C. Rigby and Christian Bird (2013). Convergent contemporary software peer review practices. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey (2016). Code review quality: how developers see it. ACM/IEEE International Conference on Software Engineering.
Carolyn B. Seaman and Victor R. Basili (1997). An empirical study of communication in code inspections. ACM/IEEE International Conference on Software Engineering.
Peter C. Rigby and Margaret-Anne Storey (2011). Understanding broadcast based peer review on open source software projects. ACM/IEEE International Conference on Software Engineering.
Thongtanunam, P., McIntosh, S., Hassan, A. E., & Iida, H (2016). Review participation in modern code review: An empirical study of the Android, Qt, and OpenStack projects. Empirical Software Engineering.
Beyond these more technical considerations around verifying a program’s correctness are organizational issues around different software qualities. For example, different organizations have different sensitivities to defects. If a $0.99 game on the app store has a defect, that might not hurt its sales much, unless that defect prevents a player from completing the game. If Boeing’s flight automation software has a defect, hundreds of people might die. The game developer might do a little manual play testing, release, and see if anyone reports a defect. Boeing will spend years proving mathematically with automatic program analysis that every line of code does what is intended, and repeating this verification every time a line of code changes. Moreover, requirements may change differently in different domains. For example, a game company might finally recognize the sexist stereotypes amplified in its game mechanics and have to change requirements, resulting in changed definitions of correctness, and the incorporation of new software qualities such as bias into testing plans. Similarly, Boeing might have to respond to pandemic fears by having to shift resources away from verifying flight crash safety to verifying public health safety. What type of verification is right for your team depends entirely on what a team is building, who’s using it, and how they’re depending on it.
References
-
Iftekhar Ahmed, Rahul Gopinath, Caius Brindescu, Alex Groce, and Carlos Jensen (2016). Can testedness be effectively measured?. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Alberto Bacchelli and Christian Bird (2013). Expectations, outcomes, and challenges of modern code review. ACM/IEEE International Conference on Software Engineering.
-
Moritz Beller, Georgios Gousios, Annibale Panichella, and Andy Zaidman (2015). When, how, and why developers (do not) test in their IDEs. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge (2013). Why don't software developers use static analysis tools to find bugs?. ACM/IEEE International Conference on Software Engineering.
-
Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey (2016). Code review quality: how developers see it. ACM/IEEE International Conference on Software Engineering.
-
Raphael Pham, Stephan Kiesling, Olga Liskin, Leif Singer, and Kurt Schneider (2014). Enablers, inhibitors, and perceptions of testing in novice software teams. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Peter C. Rigby and Margaret-Anne Storey (2011). Understanding broadcast based peer review on open source software projects. ACM/IEEE International Conference on Software Engineering.
-
Peter C. Rigby and Christian Bird (2013). Convergent contemporary software peer review practices. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Carolyn B. Seaman and Victor R. Basili (1997). An empirical study of communication in code inspections. ACM/IEEE International Conference on Software Engineering.
-
Thongtanunam, P., McIntosh, S., Hassan, A. E., & Iida, H (2016). Review participation in modern code review: An empirical study of the Android, Qt, and OpenStack projects. Empirical Software Engineering.
Monitoring
The first application I ever wrote was a complete and utter failure.
I was an eager eighth grader, full of wonder and excitement about the infinite possibilities in code, with an insatiable desire to build, build, build. I’d made plenty of little games and widgets for myself, but now was my chance to create something for someone else: my friend and I were making a game and he needed a tool to create pixel art for it. We had no money for fancy Adobe licenses, and so I decided to make a tool.
In designing the app, I made every imaginable software engineering mistake. I didn’t talk to him about requirements. I didn’t test on his computer before sending the finished app. I certainly didn’t conduct any usability tests, performance tests, or acceptance tests. The app I ended up shipping was a pure expression of what I wanted to build, not what he needed to be creative or productive. As a result, it was buggy, slow, confusing, and useless, and blinded by my joy of coding, I had no clue.
Now, ideally my “customer” would have reported any of these problems to me right away, and I would have learned some tough lessons about software engineering. But this customer was my best friend, and also a very nice guy. He wasn’t about to trash all of my hard work. Instead, he suffered in silence. He struggled to install, struggled to use, and worst of all struggled to create. He produced some amazing art a few weeks after I gave him the app, but it was only after a few months of progress on our game that I learned he hadn’t used my app for a single asset, preferring instead to suffer through Microsoft Paint. My app was too buggy, too slow, and too confusing to be useful. I was devastated.
Why didn’t I know it was such a complete failure? Because I wasn’t looking . I’d ignored the ultimate test suite: my customer . I’d learned that the only way to really know whether software requirements are right is by watching how it executes in the world through monitoring 13 13 James Turnbull (2016). The art of monitoring with James Turnbull. Software Engineering Daily Podcast.
Discovering Failures
Of course, this is easier said than done. That’s because the (ideally) massive numbers of people executing your software is not easily observable 11 11 Tim Menzies, Tom Zimmermann (2013). Software analytics: so what?. IEEE Software.
These are some of the easiest failures to detect because they are overt and unambiguous. Microsoft was one of the first organizations to do this comprehensively, building what eventually became known as Windows Error Reporting 7 7 Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt (2009). Debugging in the (very) large: ten years of implementation and experience. ACM SIGOPS Symposium on Operating Systems Principles (SOSP).
Performance, like crashes, kernel panics, and hangs, is easily observable in software, but a bit trickier to characterize as good or bad. How slow is too slow? How bad is it if something is slow occasionally? You’ll have to define acceptable thresholds for different use cases to be able to identify problems automatically. Some experts in industry 8 8 Andi Grabner (2016). Performance monitoring with Andi Grabner. Software Engineering Daily Podcast.
It’s also hard to monitor performance without actually harming performance. Many tools and web services (e.g., New Relic ) are getting better at reducing this overhead and offering real time data about performance problems through sampling.
Monitoring for data breaches, identity theft, and other security and privacy concerns are incredibly important parts of running a service, but also very challenging. This is partly because the tools for doing this monitoring are not yet well integrated, requiring each team to develop its own practices and monitoring infrastructure. But it’s also because protecting data and identity is more than just detecting and blocking malicious payloads. It’s also about recovering from ones that get through, developing reliable data streams about application network activity, monitoring for anomalies and trends in those streams, and developing practices for tracking and responding to warnings that your monitoring system might generate. Researchers are still actively inventing more scalable, usable, and deployable techniques for all of these activities.
The biggest limitation of the monitoring above is that it only reveals what people are doing with your software, not why they are doing it, or why it has failed. Monitoring can help you know that a problem exists, but it can’t tell you why a program failed or why a person failed to use your software successfully.
Discovering Missing Requirements
Usability problems and missing features, unlike some of the preceding problems, are even harder to detect or observe, because the only true indicator that something is hard to use is in a user’s mind. That said, there are a couple of approaches to detecting the possibility of usability problems.
One is by monitoring application usage. Assuming your users will tolerate being watched, there are many techniques: 1) automatically instrumenting applications for user interaction events, 2) mining events for problematic patterns, and 3) browsing and analyzing patterns for more subjective issues 9 9 Melody Y. Ivory, Marti A. Hearst (2001). The state of the art in automating usability evaluation of user interfaces. ACM Computing Surveys.
David Akers, Matthew Simpson, Robin Jeffries, and Terry Winograd (2009). Undo and erase events as indicators of usability problems. ACM SIGCHI Conference on Human Factors in Computing (CHI).
All of the usage data above can tell you what your users are doing, but not why . For this, you’ll need to get explicit feedback from support tickets, support forums, product reviews, and other critiques of user experience. Some of these types of reports go directly to engineering teams, becoming part of bug reporting systems, while others end up in customer service or marketing departments. While all of this data is valuable for monitoring user experience, most companies still do a bad job of using anything but bug reports to improve user experience, overlooking the rich insights in customer service interactions 5 5 Parmit K. Chilana, Amy J. Ko, Jacob O. Wobbrock, Tovi Grossman, and George Fitzmaurice (2011). Post-deployment usability: a survey of current practices. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Although bug reports are widely used, they have significant problems as a way to monitor: for developers to fix a problem, they need detailed steps to reproduce the problem, or stack traces or other state to help them track down the cause of a problem 4 4 Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann (2008). What makes a good bug report?. ACM SIGSOFT Foundations of Software Engineering (FSE).
Jorge Aranda and Gina Venolia (2009). The secret life of bugs: Going past the errors and omissions in software repositories. ACM/IEEE International Conference on Software Engineering.
Haseeb Qureshi (2016). Debugging stories with Haseeb Qureshi. Software Engineering Daily Podcast.
Larger software organizations now employ data scientists to help mitigate these challenges of analyzing and maintaining monitoring data and bug reports. Most of them try to answer questions such as 3 3 Andy Begel, Thomas Zimmermann (2014). Analyze this! 145 questions for data scientists in software engineering. ACM/IEEE International Conference on Software Engineering.
- “How do users typically use my application?”
- “What parts of a software product are most used and/or loved by customers?”
- “What are best key performance indicators (KPIs) for monitoring services?”
- “What are the common patterns of execution in my application?”
- “How well does test coverage correspond to actual code usage by our customers?”
The most mature data science roles in software engineering teams even have multiple distinct roles, including Insight Providers , who gather and analyze data to inform decisions, Modeling Specialists , who use their machine learning expertise to build predictive models, Platform Builders , who create the infrastructure necessary for gathering data 10 10 Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel (2016). The emerging role of data scientists on software development teams. ACM/IEEE International Conference on Software Engineering.
All of this effort to capture and maintain user feedback can be messy to analyze because it usually comes in the form of natural language text. Services like AnswerDash (a company I co-founded) structure this data by organizing requests around frequently asked questions. AnswerDash imposes a little widget on every page in a web application, making it easy for users to submit questions and find answers to previously asked questions. This generates data about the features and use cases that are leading to the most confusion, which types of users are having this confusion, and where in an application the confusion is happening most frequently. This product was based on several years of research in my lab 6 6 Parmit K. Chilana, Amy J. Ko, Jacob O. Wobbrock, Tovi Grossman (2013). A multi-site field study of crowdsourced contextual help: usage and perspectives of end users and software teams. ACM SIGCHI Conference on Human Factors in Computing (CHI).
References
-
David Akers, Matthew Simpson, Robin Jeffries, and Terry Winograd (2009). Undo and erase events as indicators of usability problems. ACM SIGCHI Conference on Human Factors in Computing (CHI).
-
Jorge Aranda and Gina Venolia (2009). The secret life of bugs: Going past the errors and omissions in software repositories. ACM/IEEE International Conference on Software Engineering.
-
Andy Begel, Thomas Zimmermann (2014). Analyze this! 145 questions for data scientists in software engineering. ACM/IEEE International Conference on Software Engineering.
-
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann (2008). What makes a good bug report?. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Parmit K. Chilana, Amy J. Ko, Jacob O. Wobbrock, Tovi Grossman, and George Fitzmaurice (2011). Post-deployment usability: a survey of current practices. ACM SIGCHI Conference on Human Factors in Computing (CHI).
-
Parmit K. Chilana, Amy J. Ko, Jacob O. Wobbrock, Tovi Grossman (2013). A multi-site field study of crowdsourced contextual help: usage and perspectives of end users and software teams. ACM SIGCHI Conference on Human Factors in Computing (CHI).
-
Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt (2009). Debugging in the (very) large: ten years of implementation and experience. ACM SIGOPS Symposium on Operating Systems Principles (SOSP).
-
Andi Grabner (2016). Performance monitoring with Andi Grabner. Software Engineering Daily Podcast.
-
Melody Y. Ivory, Marti A. Hearst (2001). The state of the art in automating usability evaluation of user interfaces. ACM Computing Surveys.
-
Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel (2016). The emerging role of data scientists on software development teams. ACM/IEEE International Conference on Software Engineering.
-
Tim Menzies, Tom Zimmermann (2013). Software analytics: so what?. IEEE Software.
-
Haseeb Qureshi (2016). Debugging stories with Haseeb Qureshi. Software Engineering Daily Podcast.
-
James Turnbull (2016). The art of monitoring with James Turnbull. Software Engineering Daily Podcast.
Evolution
Programs change. You find bugs, you fix them. You discover a new requirement, you add a feature. A requirement changes because users demand it, you revise a feature. The simple fact about programs are that they’re rarely stable, but rather constantly changing, living artifacts that shift as much as our social worlds shift.
Nowhere is this constant evolution more apparent then in our daily encounters with software updates. The apps on our phones are constantly being updated to improve our experiences, while the web sites we visit potentially change every time we visit them, without us noticing. These different models have different notions of who controls changes to user experience: should software companies control when your experience changes or should you? And with systems with significant backend dependencies, is it even possible to give users control over when things change?
To manage change, developers use many kinds of tools and practices.
One of the most common ways of managing change is to refactor code. Refactoring helps developers modify the architecture of a program while keeping its behavior the same, enabling them to implement or modify functionality more easily. For example, one of the most common and simple refactorings is to rename a variable (renaming its definition and all of its uses). This doesn’t change the architecture of a program at all, but does improve its readability. Other refactors can be more complex. For example, consider adding a new parameter to a function: all calls to that function need to pass that new parameter, which means you need to go through each call and decide on a value to send from that call site. Studies of refactoring in practice have found that refactorings can be big and small, that they don’t always preserve the behavior of a program, and that developers perceive them as involving substantial costs and risks 5 5 Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan (2012). A field study of refactoring challenges and benefits. ACM SIGSOFT Foundations of Software Engineering (FSE).
Another fundamental way that developers manage change is version control systems. As you know, they help developers track changes to code, allowing them to revert, merge, fork, and clone projects in a way that is traceable and reliable. Version control systems also help developers identify merge conflicts, so that they don’t accidentally override each others’ work 6 6 Nicholas Nelson, Caius Brindescu, Shane McKee, Anita Sarma & Danny Dig (2019). The life-cycle of merge conflicts: processes, barriers, and strategies. Empirical Software Engineering.
Research comparing centralized and distributed revision control systems mostly reveal tradeoffs rather than a clear winner. Distributed version control, for example, appears to lead to commits that are smaller and more scoped to single changes, since developers can manage their own history of commits to their local repository 2 2 Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig (2014). How do centralized and distributed version control systems impact software changes?. ACM/IEEE International Conference on Software Engineering.
Rachel Potvin, Josh Levenberg (2016). Why Google stores billions of lines of code in a single repository. Communications of the ACM.
When code changes, you need to test it, which often means you need to build it, compiling source, data, and other resources into an executable format suitable for testing (and possibly release). Build systems can be as simple as nothing (e.g., loading an HTML file in a web browser interprets the HTML and displays it, requiring no special preparation) and as complex is hundreds and thousands of lines of build script code, compiling, linking, and managing files in a manner that prepares a system for testing, such as those used to build operating systems like Windows or Linux. To write these complex build procedures, developers use build automation tools like Shaun Phillips, Thomas Zimmermann, and Christian Bird (2014). Understanding and improving software build teams. ACM/IEEE International Conference on Software Engineering.
make
, ant
, gulp
and dozens of others, each helping to automate builds. In large companies, there are whole teams that maintain build automation scripts to ensure that developers can always quickly build and test. In these teams, most of the challenges are social and not technical: teams need to clarify role ambiguity, knowledge sharing, communication, trust, and conflict in order to be productive, just like other software engineering teams 7 7
Perhaps the most modern form of build practice is continuous integration (CI). This is the idea of completely automating not only builds, but also the running of a collection of tests, every time a bundle of changes is pushed to a central version control repository. The claimed benefit of CI is that every major change is quickly built, tested, and ready for deployment, shortening the time between a change and the discovery of failures. Research shows this is true: CI helps projects release more often and is widely adopted in open source 4 4 Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, Danny Dig (2016). Usage, costs, and benefits of continuous integration in open-source projects. IEEE/ACM International Conference on Automated Software Engineering.
For example, some large projects like Windows can take a whole day to build, making continuous integration of the whole operating system infeasible. When builds and tests are fast, continuous integration can accelerate development, especially in projects with large numbers of contributors 12 12 Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov (2015). Quality and productivity outcomes relating to continuous integration in GitHub. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Lianping Chen (2015). Continuous delivery: Huge benefits, but challenges too. IEEE Software.
One last problem with changes in software is managing the releases of software. Good release management should archive new versions of software, automatically post the version online, make the version accessible to users, keep a history of who accesses the new version, and provide clear release notes describing changes from the previous version 11 11 André van der Hoek, Richard S. Hall, Dennis Heimbigner, and Alexander L. Wolf (1997). Software release management. ACM SIGSOFT Foundations of Software Engineering (FSE).
With so many ways that software can change, and so many tools for managing that change, it also becomes important to manage the risk of change. One approach to managing this risk is impact analysis 1 1 Robert Arnold, Shawn Bohner (1996). Software change impact analysis. IEEE Computer Society Press.
Per Runeson (2006). A survey of unit testing practices. IEEE Software.
Gregg G. Rothermel, Mary Jean Harrold (1996). Analyzing regression test selection techniques. IEEE Transactions on Software Engineering.
Impact analysis, and software evolution in general, is therefore ultimately a process of managing change. Change in requirements, change in code, change in data, and change in how software is situated in the world. And like any change management, it must be done cautiously, both to avoid breaking critical functionality, but also ensure that whatever new changes are being brought to the world achieve their goals.
References
-
Robert Arnold, Shawn Bohner (1996). Software change impact analysis. IEEE Computer Society Press.
-
Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig (2014). How do centralized and distributed version control systems impact software changes?. ACM/IEEE International Conference on Software Engineering.
-
Lianping Chen (2015). Continuous delivery: Huge benefits, but challenges too. IEEE Software.
-
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, Danny Dig (2016). Usage, costs, and benefits of continuous integration in open-source projects. IEEE/ACM International Conference on Automated Software Engineering.
-
Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan (2012). A field study of refactoring challenges and benefits. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Nicholas Nelson, Caius Brindescu, Shane McKee, Anita Sarma & Danny Dig (2019). The life-cycle of merge conflicts: processes, barriers, and strategies. Empirical Software Engineering.
-
Shaun Phillips, Thomas Zimmermann, and Christian Bird (2014). Understanding and improving software build teams. ACM/IEEE International Conference on Software Engineering.
-
Rachel Potvin, Josh Levenberg (2016). Why Google stores billions of lines of code in a single repository. Communications of the ACM.
-
Gregg G. Rothermel, Mary Jean Harrold (1996). Analyzing regression test selection techniques. IEEE Transactions on Software Engineering.
-
Per Runeson (2006). A survey of unit testing practices. IEEE Software.
-
André van der Hoek, Richard S. Hall, Dennis Heimbigner, and Alexander L. Wolf (1997). Software release management. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov (2015). Quality and productivity outcomes relating to continuous integration in GitHub. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Debugging
Despite all of your hard work at design, implementation, and verification, your software has failed. Somewhere in its implementation there’s a line of code, or multiple lines of code, that, given a particular set of inputs, causes the program to fail. Debugging is the activity of finding those causes and identifying changes to code that will prevent those failures. Of course, because defects are inevitable in code that human developers write, debugging is no niche process in software engineering: it is a central, critical, and challenging activity that is part of nearly all aspects of creating software.
What is debugging?
Before we can talk about debugging, it’s first important to consider what counts as a “bug”. This term is thought to have emerged in the late 19th century to describe any problem with a machine. Grace Hopper’s team reported then found it amusing when they found an actual moth in an electromechanical relay in their early computer. And yet, the term is actually quite vague. Is the “bug” the incorrect code? Is it the faulty behavior that occurs at runtime when that incorrect code executes? Is it the problem that occurs in the world when the software misbehaves, such as a program crashing, or an operating system hanging? “ Bug ” actually refers to all of things things, which makes it a colloquial, but imprecise term.
To clarify things, consider four definitions 6 6 Amy J. Ko, Brad A. Myers (2005). A framework and methodology for studying the causes of software errors in programming systems. Journal of Visual Languages & Computing.
To begin, let’s consider program behavior , which we will define as any program output, at either a point in time, or over time, that is perceived or processed by a person or other software. Behavior, in this sense, is what we see programs do: they crash, hang, retrieve incorrect information, show error codes, compute something incorrectly, exhibit incomprehensible behavior, and so on. Program behavior is what requirements attempt to constraint (e.g., “the program should always finish in less than one second” is a statement about the program’s behavior over time.)
Given this definition of behavior, we can then define a defect is some set of program fragments that may cause program behavior that is inconsistent a program’s requirements. Note that this definition actually has some non-obvious implications. First, defects do not necessarily cause problems; many defects may actually never be executed, or never executed with inputs that cause a program to misbehave. Second, defects can only be defined as such to the extent that requirements are clear. If you haven’t written those requirements down in an unambiguous way, there will be debate about whether something is defect. Take, for example, a web application that has SQL injection security vulnerabilities, but the for the purpose of learning how to identify such vulnerabilities. Those aren’t defects because they are there intentionally.
A fault is a program state caused by a defect that may result in a program behavior inconsistent with a program’s requirements. For example, imagine a program that is supposed to count from 1 to 10 using a variable to track and increment the current number, but with a defect that causes it to start at 0. The fault is the value of that variable when it is set to 0. When it is set to 1 through 10, there’s nothing faulty about program behavior. Faults, like defects, do not necessarily cause problems. For example, imagine that the same program prints out the current value, but has another defect that unintentionally skips printing the first value. There would be two defects, a fault on the first number, but no undesirable program behavior, because it would still print 1 to 10.
Finally, a failure is a program behavior that is inconsistent with a program’s requirements. Failures are what we report in bug reports, what we often mean when we say “bug”, and ultimately what matters in the world, as program behavior is what programs do to the world. To use our terminology then, we would say that “ defects may cause faults, faults may cause failures, and failures may cause consequences in the world ”
What then, is debugging, using this terminology? Debugging is any activity that, given a report of a failure, seeks to identify the one or more defects that caused one or more faults, which caused the failure, and then making changes to a program to eliminate the associated defects. How to do this, of course, is the hard part. Therefore, debugging is inherently a process of searching—for faults that cause failures, and defects that cause faults. What’s being searched when debugging is the thousands, millions, or perhaps even billions of instructions that are executed when a program executes (causing faults), and the thousands, or even millions of lines of code that might have have caused those faults.
Finding defects
Research and practice broadly agree: finding defects quickly and successfully requires systematic, and sometimes scientific investigations of causality 13 13 Andreas Zeller (2009). Why programs fail: a guide to systematic debugging. Elsevier.
Beller, M., Spruit, N., Spinellis, D., & Zaidman, A (2018). On the dichotomy of debugging behavior among programmers. ACM/IEEE International Conference on Software Engineering.
The first phase is reproducing the failure, so that the program may be inspected for faults, which can be traced back to defects. Failure reproduction is a matter of identifying inputs to the program (whether data it receives upon being executed, user inputs, network traffic, or any other form of input) that causes the failure to occur. If you found this failure while you were executing the program, then you’re lucky: you should be able to repeat whatever you just did and identify the inputs or series of inputs that caused the problem, giving you a way of testing that the program no longer fails once you’ve fixed the defect. If someone else was the one executing the program (for example, a user, or someone on your team), you better hope that they reported clear steps for reproducing the problem. When bug reports lack clear reproduction steps, bugs often can’t be fixed 3 3 Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann (2008). What makes a good bug report?. ACM SIGSOFT Foundations of Software Engineering (FSE).
Once you can reproduce a failure, the next phase is to minimize the failure-inducing input 12 12 Andreas Zeller, Ralf Hildebrandt (2002). Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering.
"abcdefg"
, is supposed to print all of the vowels in the program in the sequence they appear ( "ae"
), but instead produces just "a"
. The intuition behind minimizing failure-inducing inputs is that they reduce the complexity of the search space in debugging. For example, in our example, we might find that entering the string "abcde"
causes the same failed output of "a"
, or even shorter, that just the string "ae"
causes the failure. To minimize that failure-inducing input, one could just randomly remove parts of the input to find the smallest input that still causes the problem. More effective is to search the input space systematically, perhaps by using knowledge of the program behavior (e.g., vowels are what matter, so get rid of the consonants, as we did above), or even more systematic, doing something like a binary search of the input (e.g., trying successively smaller halves of the string until finding the smallest string that still causes the problem). Note that minimizing failure-inducing input applies to any kind of input, not just data: you can also minimize a program, excluding lines you believe are irrelevant to the failure, finding the smallest possible program that still causes the failure.
Once you have your minimized program and input, the next phase is to localize the defect, trying to identify the cause of the failure in code. There are many different strategies for localizing defects. One of the simplest strategies is to work forward:
- Set a breakpoint to the beginning of the program.
- Reproduce the failure. (If the program doesn’t pause, then either the line with the breakpoint doesn’t execute, the debugger is broken, or you didn’t reproduce the failure).
- Step forward one instruction at a time until the program deviates from intended behavior, monitoring program state and control flow after each step.
- This step that deviates or one of the previous steps caused the failure.
This process, while straightforward, is the slowest, requiring a long, vigilant search. A more efficient scientific strategy can leverage your knowledge of the program by guiding the search with hypotheses you generate 5 5 David Gilmore (1991). Models of debugging. Acta Psychologica.
- Observe the failure
- Form a hypothesis about what caused the failure
- Identify ways of observing program behavior to test your hypothesis.
- Analyzing the data from your observations
- If you’ve identified the defect, move on to the repair phase; if not, return to step 2.
The problems with the strategy above are numerous. First, what if you can’t generate a hypothesis? What if you can, but testing the hypothesis is slow or impossible? You could spend hours generating hypotheses that are completely off-base, effectively analyzing all of your code and its executions before finding the defect.
Another strategy is working backwards 7 7 Amy J. Ko and Brad A. Myers (2008). Debugging reinvented: asking and answering why and why not questions about program behavior. ACM/IEEE International Conference on Software Engineering.
- Observe the failure
- Identify the line of code that caused the failing output
- Identify the lines of code that caused the line of code in step 2 and any data used on the line in step 2
- Repeat three recursively, analyzing all lines of code for defects along the upstream chain of causality until finding the defect.
This strategy guarantees that you will find the defect if you systematically check all of the upstream causes of the failure. It still requires you to analyze each line of code and potentially execute to it in order to inspect what might be wrong, but it requires potentially less work than guessing. As we discussed in the Comprehension chapter, tools can automate some of this process 7 7 Amy J. Ko and Brad A. Myers (2008). Debugging reinvented: asking and answering why and why not questions about program behavior. ACM/IEEE International Conference on Software Engineering.
Yet another strategy called delta debugging is to compare successful and failing executions of the program 11 11 Andreas Zeller (2002). Isolating cause-effect chains from computer programs. FSE.
- Identify a successful set of inputs and minimize them
- Identify a failing set of inputs and minimize them
- Compare the differences in program state from the successful and failing executions during execution
- Identify a change to input that minimizes the differences in states between the two executions
- Variables their values that are different in these two executions contain the defect
This is a powerful strategy, but only when you have successful inputs and when you can automate comparing runs and identifying changes to inputs.
For particularly complex software, it can sometimes be necessary to debug with the help of teammates, helping to generate hypotheses, identify more effective search strategies, or rule out the influence of particular components in a bug 1 1 Jorge Aranda and Gina Venolia (2009). The secret life of bugs: Going past the errors and omissions in software repositories. ACM/IEEE International Conference on Software Engineering.
Thomas D. LaToza, Maryam Arab, Dastyni Loksa, Amy J. Ko (2020). Explicit programming strategies. Empirical Software Engineering.
Ultimately, all of these strategies are essentially search algorithms, seeking the events that occurred while a program executed with a particular set of inputs that caused its output to be incorrect. Because programs execution millions and potentially billions of instructions, these strategies are necessary to reduce the scope of your search. This is where debugging tools come in: if you can find a tool that supports an effective strategy, then your work to search through those millions and billions of instructions will be greatly accelerated. This might be a print statement, a breakpoint debugger, a performance profiler, or one of the many advanced debugging tools beginning to emerge from research.
Fixing defects
Once you’ve found the defect, what do you do? It turns out that there are usually many ways to repair a defect. How developers fix defects depends a lot on the circumstances: if they’re near a release, they may not even fix it if it’s too risky; if there’s no pressure, and the fix requires major changes, they may refactor or even redesign the program to prevent the failure 9 9 Emerson Murphy-Hill, Thomas Zimmermann, Christian Bird, and Nachiappan Nagappan (2013). The design of bug fixes. ACM/IEEE International Conference on Software Engineering.
Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram (2011). How do fixes become bugs?. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Because debugging can be so challenging, and because it is so pervasive and inescapable in programming, it is often a major source of frustration and unpredictability in software engineering. However, finding a defect after a long search can also be a great triumph 4 4 Marc Eisenstadt (1997). My hairiest bug war stories. Communications of the ACM.
References
-
Jorge Aranda and Gina Venolia (2009). The secret life of bugs: Going past the errors and omissions in software repositories. ACM/IEEE International Conference on Software Engineering.
-
Beller, M., Spruit, N., Spinellis, D., & Zaidman, A (2018). On the dichotomy of debugging behavior among programmers. ACM/IEEE International Conference on Software Engineering.
-
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann (2008). What makes a good bug report?. ACM SIGSOFT Foundations of Software Engineering (FSE).
-
Marc Eisenstadt (1997). My hairiest bug war stories. Communications of the ACM.
-
David Gilmore (1991). Models of debugging. Acta Psychologica.
-
Amy J. Ko, Brad A. Myers (2005). A framework and methodology for studying the causes of software errors in programming systems. Journal of Visual Languages & Computing.
-
Amy J. Ko and Brad A. Myers (2008). Debugging reinvented: asking and answering why and why not questions about program behavior. ACM/IEEE International Conference on Software Engineering.
-
Thomas D. LaToza, Maryam Arab, Dastyni Loksa, Amy J. Ko (2020). Explicit programming strategies. Empirical Software Engineering.
-
Emerson Murphy-Hill, Thomas Zimmermann, Christian Bird, and Nachiappan Nagappan (2013). The design of bug fixes. ACM/IEEE International Conference on Software Engineering.
-
Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram (2011). How do fixes become bugs?. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
-
Andreas Zeller (2002). Isolating cause-effect chains from computer programs. FSE.
-
Andreas Zeller, Ralf Hildebrandt (2002). Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering.
-
Andreas Zeller (2009). Why programs fail: a guide to systematic debugging. Elsevier.