Margaret Hamilton working on the Apollo flight software

Chapter 1

History

Computers haven’t been around for long. If you read one of the many histories of computing and information, such as James Gleick’s The Information ⁴ ⁴

James Gleick (2011). The Information: A History, A Theory, A Flood. Pantheon Books.

, Jonathan Grudin’s From Tool to Partner: The Evolution of Human-Computer Interaction ⁵ ⁵

Grudin, Jonathan (2017). From Tool to Partner: The Evolution of Human-Computer Interaction. Source.

, or Margo Shetterly’s Hidden Figures ¹¹ ¹¹

Margot Lee Shetterly (2017). Hidden figures: the American dream and the untold story of the Black women mathematicians who helped win the space race. HarperCollins Nordic.

, you’ll learn that before digital computers, computers were people, calculating things manually. And that after digital computers, programming wasn’t something that many people did. It was reserved for whoever had access to the mainframe and they wrote their programs on punchcards. Computing was in no way a ubiquitous, democratized activity—it was reserved for the few that could afford and maintain a room-sized machine.

Because programming required such painstaking planning in machine code and computers were slow, most programs were not that complex. Their value was in calculating things faster than a person could do by hand, which meant thousands of calculations in a minute rather than one calculation in a minute. Computer programmers were not solving problems that had no solutions yet; they were translating existing solutions (for example, a quadratic formula) into machine instructions. Their power wasn’t in creating new realities or facilitating new tasks, it was accelerating old tasks.

The birth of software engineering, therefore, did not come until programmers started solving problems that didn’t have existing solutions, or were new ideas entirely. Most of these were done in academic contexts to develop things like basic operating systems and methods of input and output. These were complex projects, but as research, they didn’t need to scale; they just needed to work. It wasn’t until the late 1960s when the first truly large software projects were attempted commercially, and software had to actually perform.

The IBM 360 operating system was one of the first big projects of this kind. Suddenly, there were multiple people working on multiple components, all which interacted with one another. Each part of the program needed to coordinate with the others, which usually meant that each part’s authors needed to coordinate, and the term software engineering was born. Programmers and academics from around the world, especially those who were working on big projects, created conferences so they could meet and discuss their challenges. In the first software engineering conference in 1968, attendees speculated about why projects were shipping late, why they were over budget, and what they could do about it. There was a word for the phrase, and many questions, but few answers.

At the time, one of the key people pursuing these answers was Margaret Hamilton , a computer scientist who was Director of the Software Engineering Division of the MIT Instrumentation Laboratory. One of the lab’s key projects in the late 1960’s was developing the on-board flight software for the Apollo space program. Hamilton led the development of error detection and recovery, the information displays, the lunar lander, and many other critical components, while managing a team of other computer scientists who helped. It was as part of this project that many of the central problems in software engineering began to emerge, including verification of code, coordination of teams, and managing versions. This led to one of her passions, which was giving software legitimacy as a form of engineering—at the time, it was viewed as routine, uninteresting, and simple work. Her leadership in the field established the field as a core part of systems engineering.

The first conference, the IBM 360 project, and Hamilton’s experiences on the Apollo mission identified many problems that had no clear solutions:

When you’re solving a problem that doesn’t yet have a solution, what is a good process for building a solution?
When software does so many different things, how can you know software “works”?
How can you make progress when no one on the team understands every part of the program?
When people leave a project, how do you ensure their replacement has all of the knowledge they had?
When no one understands every part of the program, how do you diagnose defects?
When people are working in parallel, how do you prevent them from clobbering each other’s work?
If software engineering is about more than coding, what skills does a good coder need to have?
What kinds of tools and languages can accelerate a programmer’s work and help them prevent mistakes?
How can projects not lose sight of the immense complexity of human needs, values, ethics, and policy that interact with engineering decisions?

As it became clear that software was not an incremental change in technology, but a profoundly disruptive one, countless communities began to explore these questions in research and practice. Black American entrepreneurs began to explore how to use software to connect and build community well before the internet was ubiquitous, creating some of the first web-scale online communities and forging careers at IBM, ultimately to be suppressed by racism in the workplace and society ⁹ ⁹

Charlton D. McIlwain (2019). Black software: the internet and racial justice, from the AfroNet to Black Lives Matter. Oxford University Press.

. White entrepreneurs in Silicon Valley began to explore ways to bring computing to the masses, bolstered by the immense capital investments of venture capitalists, who saw opportunities for profit through disruption ⁷ ⁷

Kenney, M (2000). Understanding Silicon Valley: The anatomy of an entrepreneurial region. Stanford University Press.

. And academia, which had helped demonstrate the feasibility of computing and established its foundations, began to invent the foundational tools of software engineering including, version control systems, software testing, and a wide array of high-level programming languages such as Fortran ¹⁰ ¹⁰

Michael Metcalf (2002). History of Fortran. ACM SIGPLAN Fortran Forum.

, LISP ⁸ ⁸

John McCarthy (1978). History of LISP. History of Programming Languages I.

, C++ ¹² ¹²

Bjarn Stroustrup, B (1996). A history of C++: 1979--1991. History of programming languages II.

and Smalltalk ⁶ ⁶

Alan C. Kay (1996). The early history of Smalltalk. History of programming languages II.

, all of which inspired the design of today’s most popular languages, including Java, Python, and JavaScript. And throughout, despite the central role of women in programming the first digital computers, managing the first major software engineering projects, and imagining how software could change the world, women were systematically excluded from all of these efforts, their histories forgotten, erased, and overshadowed by pervasive sexism in commerce and government ¹ ¹

Janet Abbate (2012). Recoding gender: women's changing participation in computing. MIT Press.

While technical progress has been swift, progress on the human aspects of software engineering, have been more difficult to understand and improve. One of the seminal books on these issues was Fred P. Brooks, Jr.’s The Mythical Man Month ³ ³

Fred P. Brooks (1995). The mythical man month. Pearson Education.

. In it, he presented hundreds of claims about software engineering. For example, he hypothesized that adding more programmers to a project would actually make productivity worse at some level, not better, because knowledge sharing would be an immense but necessary burden. He also claimed that the first implementation of a solution is usually terrible and should be treated like a prototype: used for learning and then discarded. These and other claims have been the foundation of decades of years of research, all in search of some deeper answer to the questions above. And only recently have scholars begun to reveal how software and software engineering tends to encode, amplify, and reinforce existing structures and norms of discrimination by encoding it into data, algorithms, and software architectures ² ²

Ruha Benjamin (2019). Race after technology: Abolitionist tools for the New Jim Code. Polity Books.

. These histories show that, just like any other human activity, there are strong cultural forces that shape how people engineer software together, what they engineer, and what affect that has on society.

If we step even further beyond software engineering as an activity and think more broadly about the role that software is playing in society today, there are also other, newer questions that we’ve only begun to answer. If every part of society now runs on code, what responsibility do software engineers have to ensure that code is right? What responsibility do software engineers have to avoid algorithmic bias? If our cars are to soon drive us around, who’s responsible for the first death: the car, the driver, or the software engineers who built it, or the company that sold it? These ethical questions are in some ways the future of software engineering, likely to shape its regulatory context, its processes, and its responsibilities.

There are also economic roles that software plays in society that it didn’t before. Around the world, software is a major source of job growth, but also a major source of automation, eliminating jobs that people used to do. These larger forces that software is playing on the world demand that software engineers have a stronger understanding of the roles that software plays in society, as the decisions that engineers make can have profoundly impactful unintended consequences.

We’re nowhere close to having deep answers about these questions, neither the old ones or the new ones. We know a lot about programming languages and a lot about testing. These are areas amenable to automation and so computer science has rapidly improved and accelerated these parts of software engineering. The rest of it, as we shall see, has not made much progress. In this book, we’ll discuss what we know and the much larger space of what we don’t.

References

Janet Abbate (2012). Recoding gender: women's changing participation in computing. MIT Press.
Ruha Benjamin (2019). Race after technology: Abolitionist tools for the New Jim Code. Polity Books.
Fred P. Brooks (1995). The mythical man month. Pearson Education.
James Gleick (2011). The Information: A History, A Theory, A Flood. Pantheon Books.
Grudin, Jonathan (2017). From Tool to Partner: The Evolution of Human-Computer Interaction. Source.
Alan C. Kay (1996). The early history of Smalltalk. History of programming languages II.
Kenney, M (2000). Understanding Silicon Valley: The anatomy of an entrepreneurial region. Stanford University Press.
John McCarthy (1978). History of LISP. History of Programming Languages I.
Charlton D. McIlwain (2019). Black software: the internet and racial justice, from the AfroNet to Black Lives Matter. Oxford University Press.
Michael Metcalf (2002). History of Fortran. ACM SIGPLAN Fortran Forum.
Margot Lee Shetterly (2017). Hidden figures: the American dream and the untold story of the Black women mathematicians who helped win the space race. HarperCollins Nordic.
Bjarn Stroustrup, B (1996). A history of C++: 1979--1991. History of programming languages II.

Early days at the author’s software startup in 2012.

Chapter 2

Organizations

by Amy J. Ko

The photo above is a candid shot of some of the software engineers of AnswerDash , a company I co-founded in 2012, that was later acquired in 2020. There are a few things to notice in the photograph. First, you see one of the employees explaining something, while others are diligently working off to the side. It’s not a huge team; just a few engineers, plus several employees in other parts of the organization in another room. This, as simple as it looks, is pretty much what all software engineering work looks like. Some organizations have one of these teams; others have thousands.

What you can’t see is just how much complexity underlies this work. You can’t see the organizational structures that exist to manage this complexity. Inside this room and the rooms around it were processes, standards, reviews, workflows, managers, values, culture, decision making, analytics, marketing, sales. And at the center of it were people executing all of these things as well as they could to achieve the organization’s goal.

Organizations are a much bigger topic than I could possibly address here. To deeply understand them, you’d need to learn about organizational studies , organizational behavior , information systems , and business in general.

The subset of this knowledge that’s critical to understand about software engineering is limited to a few important concepts. The first and most important concept is that even in software organizations, the point of the company is rarely to make software; it’s to provide value ⁸ ⁸

Alexander Osterwalder, Yves Pigneur, Gregory Bernarda, Alan Smith (2015). Value proposition design: how to create products and services customers want. John Wiley & Sons.

. Software is sometimes the central means to providing that value, but more often than not, it’s the information flowing through that software that’s the truly valuable piece. Requirements , which we will discuss in a later chapter, help engineers organize how software will provide value.

The individuals in a software organization take on different roles to achieve that value. These roles are sometimes spread across different people and sometimes bundled up into one person, depending on how the organization is structured, but the roles are always there. Let’s go through each one in detail so you understand how software engineers relate to each role.

Marketers look for opportunities to provide value. In for-profit businesses, this might mean conducting market research, estimating the size of opportunities, identifying audiences, and getting those audiences’ attention. Non-profits need to do this work as well in order to get their solutions to people, but may be driven more by solving problems than making money.
Product managers decide what value the product will provide, monitoring the marketplace and prioritizing work.
Designers decide how software will provide value. This isn’t about code or really even about software; it’s about envisioning solutions to problems that people have.
Software engineers write code with other engineers to implement requirements envisioned by designers. If they fail to meet requirements, the design won’t be implemented correctly, which will prevent the software from providing value.
Sales takes the product that’s been built and tries to sell it to the audiences that marketers have identified. They also try to refine the organization’s understanding of what the customer wants and needs, providing feedback to marketing, product, and design, which engineers then address.
Support helps the people using the product to use it successfully and, like sales, provides feedback to product, design, and engineering about the product’s value (or lack thereof) and its defects.

As I noted above, sometimes the roles above get merged into individuals. When I was CTO at AnswerDash, I had software engineering roles, design roles, product roles, sales roles, and support roles. This was partly because it was a small company when I was there. As organizations grow, these roles tend to be divided into smaller pieces. This division often means that different parts of the organization don’t share knowledge, even when it would be advantageous ³ ³

Parmit K. Chilana, Amy J. Ko, Jacob O. Wobbrock, Tovi Grossman, and George Fitzmaurice (2011). Post-deployment usability: a survey of current practices. ACM SIGCHI Conference on Human Factors in Computing (CHI).

Note that in the division of responsibilities above, software engineers really aren’t the designers by default. They don’t decide what product is made or what problems that product solves. They may have opinions—and a great deal of power to enforce their opinions, as the people building the product—but it’s not ultimately their decision.

There are other roles you might be thinking of that I haven’t mentioned:

Engineering managers exist in all roles when teams get to a certain size, helping to move information from between higher and lower parts of an organization. Even engineering managers are primarily focused on organizing and prioritizing work, and not doing engineering ⁵ ⁵
Eirini Kalliamvakou, Christian Bird, Thomas Zimmermann, Andrew Begel, Robert DeLine, Daniel M. German (2017). What makes a great manager of software engineers?. IEEE Transactions on Software Engineering.
. Much of their time is spent ensuring every engineer has what they need to be productive, while also managing coordination and interpersonal conflict between engineers.
Data scientists , although a new role, typically facilitate decision making on the part of any of the roles above ¹ ¹
Andy Begel, Thomas Zimmermann (2014). Analyze this! 145 questions for data scientists in software engineering. ACM/IEEE International Conference on Software Engineering.
. They might help engineers find bugs, marketers analyze data, track sales targets, mine support data, or inform design decisions. They’re experts at using data to accelerate and improve the decisions made by the roles above.
Researchers , also called user researchers, also help people in a software organization make decisions, but usually product decisions, helping marketers, sales, and product managers decide what products to make and who would want them. In many cases, they can complement the work of data scientists, providing qualitative work to triangulate quantitative data .
Ethics and policy specialists , who might come with backgrounds in law, policy, or social science, might shape terms of service, software licenses, algorithmic bias audits, privacy policy compliance, and processes for engaging with stakeholders affected by the software being engineered. Any company that works with data, especially those that work with data at large scales or in contexts with great potential for harm, hate, and abuse, needs significant expertise to anticipate and prevent harm from engineering and design decisions.

Every decision made in a software team is under uncertainty, and so another important concept in organizations is risk ² ²

Boehm, B. W (1991). Software risk management: principles and practices. IEEE Software.

. It’s rarely possible to predict the future, and so organizations must take risks. Much of an organization’s function is to mitigate the consequences of risks. Data scientists and researchers mitigate risk by increasing confidence in an organization’s understanding of the market and its consumers. Engineers manage risk by trying to avoid defects. Of course, as many popular outlets on software engineering have begun to discover, when software fails, it usually “did exactly what it was told to do.” The reason it failed is that it was told to do the wrong thing. ¹⁰ ¹⁰

James Somers (2017). The coming software apocalypse. The Atlantic Monthly.

Open source communities are organizations too. The core activities of design, engineering, and support still exist in these, but how much a community is engaged in marketing and sales depends entirely on the purpose of the community. Big, established open source projects like Mozilla have revenue, buildings, and a CEO, and while they don’t sell anything, they do market. Others like Linux ⁶ ⁶

Gwendolyn K. Lee, Robert E. Cole (2003). From a firm-based to a community-based model of knowledge creation: The case of the Linux kernel development. Organization science.

rely heavily on contributions both from volunteers ¹¹ ¹¹

Yunwen Ye and Kouichi Kishida (2003). Toward an understanding of the motivation Open Source Software developers. ACM/IEEE International Conference on Software Engineering.

, but also paid employees from companies that depend on Linux, like IBM, Google, and others. In these settings, there are still all of the challenges that come with software engineering, but fewer of the constraints that come from a for-profit or non-profit motive. In fact, recent work empirically uncovered 9 reasons why modern open source projects fail: 1) lost to competition, 2) made obsolete by technology advances, 3) lack of time to volunteer, 4) lack of interest by contributors, 5) outdated technologies, 6) poor maintainability, 7) interpersonal conflicts amongst developers, 8) legal challenges, and 9) acquisition ⁴ ⁴

Jailton Coelho and Marco Tulio Valente (2017). Why modern open source projects fail. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

. Another study showed that funding open source projects often requires substantial donations from large corporations; most projects don’t ask for donations, and those that do receive very little unless they’re well-established, and most of those funds go to paying for basic expenses such as engineering salaries ⁹ ⁹

Cassandra Overney, Jens Meinicke, Christian Kästner, Bogdan Vasilescu (2020). How to not get rich: an empirical study of donations in open source. ACM/IEEE International Conference on Software Engineering.

. Those aren’t too different from traditional software organizations, aside from the added challenges of sustaining a volunteer workforce.

All of the above has some important implications for what it means to be a software engineer:

Engineers are not the only important role in a software organization. In fact, they may be less important to an organization’s success than other roles because the decisions they make (how to implement requirements) have a smaller impact on the organization’s goals than other decisions (what to make, who to sell it to, etc.).
Engineers have to work with a lot of people working with different roles. Learning what those roles are and what shapes their success is important to being a good collaborator ⁷ ⁷
Paul Luo Li, Amy J. Ko, and Andrew Begel (2017). Cross-disciplinary perspectives on collaborations with software engineers. International Workshop on Cooperative and Human Aspects of Software Engineering.
.
While engineers might have many great product ideas, if they really want to shape what they’re building, they should be in a product role, not an engineering role.

All that said, products wouldn’t exist without engineers. They ensure that every detail about a product reflects the best knowledge of the people in their organization, and so attention to detail is paramount. In future chapters, we’ll discuss all of the ways that software engineers manage this detail, mitigating the burden on their memories with tools and processes.

References

Andy Begel, Thomas Zimmermann (2014). Analyze this! 145 questions for data scientists in software engineering. ACM/IEEE International Conference on Software Engineering.
Boehm, B. W (1991). Software risk management: principles and practices. IEEE Software.
Parmit K. Chilana, Amy J. Ko, Jacob O. Wobbrock, Tovi Grossman, and George Fitzmaurice (2011). Post-deployment usability: a survey of current practices. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Jailton Coelho and Marco Tulio Valente (2017). Why modern open source projects fail. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Eirini Kalliamvakou, Christian Bird, Thomas Zimmermann, Andrew Begel, Robert DeLine, Daniel M. German (2017). What makes a great manager of software engineers?. IEEE Transactions on Software Engineering.
Gwendolyn K. Lee, Robert E. Cole (2003). From a firm-based to a community-based model of knowledge creation: The case of the Linux kernel development. Organization science.
Paul Luo Li, Amy J. Ko, and Andrew Begel (2017). Cross-disciplinary perspectives on collaborations with software engineers. International Workshop on Cooperative and Human Aspects of Software Engineering.
Alexander Osterwalder, Yves Pigneur, Gregory Bernarda, Alan Smith (2015). Value proposition design: how to create products and services customers want. John Wiley & Sons.
Cassandra Overney, Jens Meinicke, Christian Kästner, Bogdan Vasilescu (2020). How to not get rich: an empirical study of donations in open source. ACM/IEEE International Conference on Software Engineering.
James Somers (2017). The coming software apocalypse. The Atlantic Monthly.
Yunwen Ye and Kouichi Kishida (2003). Toward an understanding of the motivation Open Source Software developers. ACM/IEEE International Conference on Software Engineering.

Clear and timely communication is at the heart of effective software engineering.

Chapter 3

Communication

by Amy J. Ko

Because software engineering often times distributes work across multiple people, a fundamental challenge in software engineering is ensuring that everyone on a team has the same understanding of what is being built and why. In the seminal book The Mythical Man Month , Fred Brooks argued that good software needs to have conceptual integrity , both in how it is designed, but also how it is implemented ⁵ ⁵

Fred P. Brooks (1995). The mythical man month. Pearson Education.

. This is the idea that whatever vision of what is being built must stay intact, even as the building of it gets distributed to multiple people. When multiple people are responsible for implementing a single coherent idea, how can they ensure they all build the same idea?

The solution is effective communication. As some events in industry have shown, communication requires empathy and teamwork. When communication is poor, teams become disconnected and produce software defects ⁴ ⁴

Nicolas Bettenburg, Ahmed E. Hassan (2013). Studying the impact of social interactions on software quality. Empirical Software Engineering.

. Therefore, achieving effective communication practices is paramount.

It turns out, however, that communication plays such a powerful role in software projects that it even shapes how projects unfold. Perhaps the most notable theory about the effect of communication is Conway’s Law ⁶ ⁶

Melvin E. Conway (1968). How do committees invent. Datamation.

. This theory argues that any designed system—software included—will reflect the communication structures involved in producing it. For example, think back to any course project where you divided the work into chunks and tried to combine them together into a final report at the end. The report and its structure probably mirrored the fact that several distinct people worked on each section of the report, rather than sounding like a single coherent voice. The same things happen in software: if the team writing error messages for a website isn’t talking to the team presenting them, you’re probably going to get a lot of error messages that aren’t so clear, may not fit on screen, and may not be phrased using the terminology of the rest of the site. On the other hand, if those two teams meet regularly to design the error messages together, communicating their shared knowledge, they might produce a seamless, coherent experience. Not only does software follow this law when a project is created, they also follow this law as projects evolve over time ¹⁸ ¹⁸

Minghui Zhou and Audris Mockus (2011). Does the initial environment impact the future of developers?. ACM/IEEE International Conference on Software Engineering.

Because communication is so central, software engineers are constantly seeking information to further their work, going to their coworkers’ desks, emailing them, chatting via messaging platforms, and even using social media ⁸ ⁸

Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.

. Some of the information that developers are seeking is easier to find than others. For example, in the study I just cited, it was pretty trivial to find information about who wrote a line of code or whether a build was done, but when the information they needed resided in someone else’s head (e.g., why a particular line of code was written), it was slow or often impossible to retrieve it. Sometimes it’s not even possible to find out who has the information. Researchers have investigated tools for trying to quantify expertise by automatically analyzing the code that developers have written, building platforms to help developers search for other developers who might know what they need to know ²^,¹² ²

Andrew Begel, Yit Phang Khoo, and Thomas Zimmermann (2010). Codebook: discovering and exploiting relationships in software repositories. ACM/IEEE International Conference on Software Engineering.

¹²

Audris Mockus and James D. Herbsleb (2002). Expertise browser: a quantitative approach to identifying expertise. ACM/IEEE International Conference on Software Engineering.

Communication is not always effective. In fact, there are many kinds of communication that are highly problematic in software engineering teams. For example, Perlow ¹³ ¹³

Leslie A. Perlow (1999). The time famine: Toward a sociology of work time. Administrative science quarterly.

conducted an ethnography of one team and found a highly dysfunctional use of interruptions in which the most expert members of a team were constantly interrupted to “fight fires” (immediately address critical problems) in other parts of the organization, and then the organization rewarded them for their heroics. This not only made the most expert engineers less productive, but it also disincentivized the rest of the organization to find effective ways of preventing the disasters from occurring in the first place. Not all interruptions are bad, and they can increase productivity, but they do increase stress ¹⁰ ¹⁰

Gloria Mark, Daniela Gudith, and Ulrich Klocke (2008). The cost of interrupted work: more speed and stress. ACM SIGCHI Conference on Human Factors in Computing (CHI).

Communication isn’t just about transmitting information; it’s also about relationships and identity. For example, the dominant culture of many software engineering work environments—and even the perceived culture—is one that can deter many people from even pursuing careers in computer science. Modern work environments are still dominated by men, who speak loudly, out of turn, and disrespectfully, with sometimes even sexual harassment ¹⁶ ¹⁶

Jennifer Wang' (2016). Female pursuit of Computer Science with Jennifer Wang. Software Engineering Daily Podcast.

. Computer science as a discipline, and the software industry that it shapes, has only just begun to consider the urgent need for cultural competence (the ability for individuals and organizations to work effectively when their employee’s thoughts, communications, actions, customs, beliefs, values, religions, and social groups vary) ¹⁷ ¹⁷

Alicia Nicki Washington (2020). When twice as good isn't enough: the case for cultural competence in computing. ACM Technical Symposium on Computer Science Education.

. Similarly, software developers often have to work with people in other domains such as artists, content developers, data scientists, design researchers, designers, electrical engineers, mechanical engineers, product planners, program managers, and service engineers. One study found that developers’ cross-disciplinary collaborations with people in these other domains required open-mindedness about the input of others, proactively informing everyone about code-related constraints, and ultimately seeing the broader picture of how pieces from different disciplines fit together; when developers didn’t do these things, collaborations failed, and therefore projects failed ⁹ ⁹

Paul Luo Li, Amy J. Ko, and Andrew Begel (2017). Cross-disciplinary perspectives on collaborations with software engineers. International Workshop on Cooperative and Human Aspects of Software Engineering.

. These are not the conditions for trusting, effective communication.

When communication is effective, it still takes time. One of the key strategies for reducing the amount of communication necessary is knowledge sharing tools, which broadly refers to any information system that stores facts that developers would normally have to retrieve from a person. By storing them in a database and making them easy to search, teams can avoid interruptions. The most common knowledge sharing tools in software teams are issue trackers, which are often at the center of communication not only between developers, but also with every other part of a software organization ³ ³

Dane Bertram, Amy Voida, Saul Greenberg, and Robert Walker (2010). Communication, collaboration, and bugs: the social nature of issue tracking in small, collocated teams. ACM Conference on Computer Supported Cooperative Work (CSCW).

. Community portals, such as GitHub pages or Slack teams, can also be effective ways of sharing documents and archiving decisions ¹⁵ ¹⁵

Christoph Treude and Margaret-Anne Storey (2011). Effective communication of software development knowledge through community portals. ACM SIGSOFT Foundations of Software Engineering (FSE).

. Perhaps the most popular knowledge sharing tool in software engineering today is Stack Overflow ¹ ¹

Jeff Atwood (2016). The state of programming with Stack Overflow co-founder Jeff Atwood. Software Engineering Daily Podcast.

, which archives facts about programming language and API usage. Such sites, while they can be great resources, have the same problems as many media, such as gender bias that prevent contributions from women from being rewarded as highly as contributions from men ¹¹ ¹¹

Anna May, Johannes Wachs, Anikó Hannák (2019). Gender differences in participation and reward on Stack Overflow. Empirical Software Engineering.

Because all of this knowledge is so critical to progress, when developers leave an organization and haven’t archived their knowledge somewhere, it can be quite disruptive to progress. Organizations often have single points of failure, in which a single developer may be critical to a team’s ability to maintain and enhance a software product ¹⁴ ¹⁴

Peter C. Rigby, Yue Cai Zhu, Samuel M. Donadelli, and Audris Mockus (2016). Quantifying and mitigating turnover-induced knowledge loss: case studies of chrome and a project at Avaya. ACM/IEEE International Conference on Software Engineering.

. When newcomers join a team and lack the right knowledge, they introduce defects ⁷ ⁷

Matthieu Foucault, Marc Palyart, Xavier Blanc, Gail C. Murphy, and Jean-Rémy Falleri (2015). Impact of developer turnover on quality in open-source software. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

. Some companies try to mitigate this by rotating developers between projects, “cross-training” them to ensure that the necessary knowledge to maintain a project is distributed across multiple engineers.

What does all of this mean for you as an individual developer? To put it simply, don’t underestimate the importance of talking. Know who you need to talk to, talk to them frequently, and to the extent that you can, write down what you know both to lessen the demand for talking and mitigate the risk of you not being available, but also to make your knowledge more precise and accessible in the future. It often takes decades for engineers to excel at communication. The very fact that you know why communication is important gives you a critical head start.

References

Jeff Atwood (2016). The state of programming with Stack Overflow co-founder Jeff Atwood. Software Engineering Daily Podcast.
Andrew Begel, Yit Phang Khoo, and Thomas Zimmermann (2010). Codebook: discovering and exploiting relationships in software repositories. ACM/IEEE International Conference on Software Engineering.
Dane Bertram, Amy Voida, Saul Greenberg, and Robert Walker (2010). Communication, collaboration, and bugs: the social nature of issue tracking in small, collocated teams. ACM Conference on Computer Supported Cooperative Work (CSCW).
Nicolas Bettenburg, Ahmed E. Hassan (2013). Studying the impact of social interactions on software quality. Empirical Software Engineering.
Fred P. Brooks (1995). The mythical man month. Pearson Education.
Melvin E. Conway (1968). How do committees invent. Datamation.
Matthieu Foucault, Marc Palyart, Xavier Blanc, Gail C. Murphy, and Jean-Rémy Falleri (2015). Impact of developer turnover on quality in open-source software. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.
Paul Luo Li, Amy J. Ko, and Andrew Begel (2017). Cross-disciplinary perspectives on collaborations with software engineers. International Workshop on Cooperative and Human Aspects of Software Engineering.
Gloria Mark, Daniela Gudith, and Ulrich Klocke (2008). The cost of interrupted work: more speed and stress. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Anna May, Johannes Wachs, Anikó Hannák (2019). Gender differences in participation and reward on Stack Overflow. Empirical Software Engineering.
Audris Mockus and James D. Herbsleb (2002). Expertise browser: a quantitative approach to identifying expertise. ACM/IEEE International Conference on Software Engineering.
Leslie A. Perlow (1999). The time famine: Toward a sociology of work time. Administrative science quarterly.
Peter C. Rigby, Yue Cai Zhu, Samuel M. Donadelli, and Audris Mockus (2016). Quantifying and mitigating turnover-induced knowledge loss: case studies of chrome and a project at Avaya. ACM/IEEE International Conference on Software Engineering.
Christoph Treude and Margaret-Anne Storey (2011). Effective communication of software development knowledge through community portals. ACM SIGSOFT Foundations of Software Engineering (FSE).
Jennifer Wang' (2016). Female pursuit of Computer Science with Jennifer Wang. Software Engineering Daily Podcast.
Alicia Nicki Washington (2020). When twice as good isn't enough: the case for cultural competence in computing. ACM Technical Symposium on Computer Science Education.
Minghui Zhou and Audris Mockus (2011). Does the initial environment impact the future of developers?. ACM/IEEE International Conference on Software Engineering.

Productivity isn’t just about working fast

Chapter 4

Productivity

by Amy J. Ko

When we think of productivity, we usually have a vague concept of a rate of work per unit time. Where it gets tricky is in defining “work”. On an individual level, work can be easier to define, because developers often have specific concrete tasks that they’re assigned. But until they’re not, it’s not really easy to define progress (well, it’s not that easy to define “done” sometimes either, but that’s a topic for a later chapter). When you start considering work at the scale of a team or an organization, productivity gets even harder to define, since an individual’s productivity might be increased by ignoring every critical request from a teammate, harming the team’s overall productivity.

Despite the challenge in defining productivity, there are numerous factors that affect productivity. For example, at the individual level, having the right tools can result in an order of magnitude difference in speed at accomplishing a task. One study I ran found that developers using the Eclipse IDE spent a third of their time just physically navigating between source files ⁸ ⁸

Amy J. Ko, Htet Htet Aung, Brad A. Myers (2005). Eliciting design requirements for maintenance-oriented IDEs: a detailed study of corrective and perfective maintenance tasks. ACM/IEEE International Conference on Software Engineering.

. With the right navigation aids, developers could be writing code and fixing bugs 30% faster. In fact, some tools like Mylyn automatically bring relevant code to the developer rather than making them navigate to it, greatly increasing the speed which with developers can accomplish a task ⁶ ⁶

Mik Kersten and Gail C. Murphy (2006). Using task context to improve programmer productivity. ACM SIGSOFT Foundations of Software Engineering (FSE).

. Long gone are the days when developers should be using bare command lines and text editors to write code: IDEs can and do greatly increase productivity when used and configured with speed in mind.

Of course, individual productivity is about more than just tools. Studies of workplace productivity show that developers have highly fragmented days, interrupted by meetings, emails, coding, and non-work distractions ¹³ ¹³

André N. Meyer, Laura E. Barton, Gail C. Murphy, Thomas Zimmermann, Thomas Fritz (2017). The work life of developers: Activities, switches and perceived productivity. IEEE Transactions on Software Engineering.

. These interruptions are often viewed negatively from an individual perspective ¹⁴ ¹⁴

Ben Northup (2016). Reflections of an old programmer. Software Engineering Daily Podcast.

, but may be highly valuable from a team and organizational perspective. And then, productivity is not just about skills to manage time, but also many other skills that shape developer expertise, including skills in designing architectures, debugging, testing, programming languages, etc. ¹ ¹

Sebastian Baltes, Stephan Diehl (2018). Towards a theory of software development expertise. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

Hiring is therefore about far more than just how quickly and effectively someone can code ² ²

Ammon Bartram (2016). Hiring engineers with Ammon Bartram. Software Engineering Daily Podcast.

That said, productivity is not just about individual developers. Because communication is a key part of team productivity, an individual’s productivity is as much determined by their ability to collaborate and communicate with other developers. In a study spanning dozens of interviews with senior software engineers, Li et al. found that the majority of critical attributes for software engineering skill (productivity included) concerned their interpersonal skills, their communication skills, and their ability to be resourceful within their organization ¹⁰ ¹⁰

Paul Luo Li, Amy J. Ko, and Jiamin Zhu (2015). What makes a great software engineer?. ACM/IEEE International Conference on Software Engineering.

. Similarly, LaToza et al. found that the primary bottleneck in productivity was communication with teammates, primarily because waiting for replies was slower than just looking something up ⁹ ⁹

Thomas D. LaToza, Gina Venolia, and Robert DeLine (2006). Maintaining mental models: a study of developer work habits. ACM/IEEE International Conference on Software Engineering.

. Of course, looking something up has its own problems. While StackOverflow is an incredible resource for missing documentation ¹¹ ¹¹

Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann (2011). Design lessons from the fastest Q&A site in the west. ACM SIGCHI Conference on Human Factors in Computing (CHI).

, it also is full of all kinds of misleading and incorrect information contributed by developers without sufficient expertise to answer questions ³ ³

Anton Barua, Stephen W. Thomas & Ahmed E. Hassan (2014). What are developers talking about? An analysis of topics and trends in Stack Overflow. Empirical Software Engineering.

. Finally, because communication is such a critical part of retrieving information, adding more developers to a team has surprising effects. One study found that adding people to a team slowly enough to allow them to onboard effectively could reduce defects, but adding them too fast led to increases in defects ¹² ¹²

Andrew Meneely, Pete Rotella, and Laurie Williams (2011). Does adding manpower also affect quality? An empirical, longitudinal analysis. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

Another dimension of productivity is learning. Great engineers are resourceful, quick learners ¹⁰ ¹⁰

Paul Luo Li, Amy J. Ko, and Jiamin Zhu (2015). What makes a great software engineer?. ACM/IEEE International Conference on Software Engineering.

. New engineers must be even more resourceful, even though their instincts are often to hide their lack of expertise from exactly the people they need help from ⁴ ⁴

Andy Begel, Beth Simon (2008). Novice software developers, all over again. ICER.

. Experienced developers know that learning is important and now rely heavily on social media such as Twitter to follow industry changes, build learning relationships, and discover new concepts and platforms to learn ¹⁶ ¹⁶

Leif Singer, Fernando Figueira Filho, and Margaret-Anne Storey (2014). Software engineering at the speed of light: how developers stay current using Twitter. ACM/IEEE International Conference on Software Engineering.

. And, of course, developers now rely heavily on web search to fill in inevitable gaps in their knowledge about APIs, error messages, and myriad other details about languages and platforms ¹⁸ ¹⁸

Xin Xia, Lingfeng Bao, David Lo, Pavneet Singh Kochhar, Ahmed E. Hassan, Zhenchang Xing (2017). What do developers search for on the web?. Empirical Software Engineering.

Unfortunately, learning is no easy task. One of my earliest studies as a researcher investigated the barriers to learning new programming languages and systems, finding six distinct types of content that are challenging ⁷ ⁷

Amy J. Ko, Brad A. Myers, Htet Htet Aung (2004). Six learning barriers in end-user programming systems. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

. To use a programming platform successfully, people need to overcome design barriers, which are the abstract computational problems that must be solved, independent of the languages and APIs. People need to overcome selection barriers, which involve finding the right abstractions or APIs to achieve the design they have identified. People need to overcome use and coordination barriers, which involve operating and coordinating different parts of a language or API together to achieve novel functionality. People need to overcome comprehension barriers, which involve knowing what can go wrong when using part of a language or API. And finally, people need to overcome information barriers, which are posed by the limited ability of tools to inspect a program’s behavior at runtime during debugging. Every single one of these barriers has its own challenges, and developers encounter them every time they are learning a new platform, regardless of how much expertise they have.

Aside from individual and team factors, productivity is also influenced by the particular features of a project’s code, how the project is managed, or the environment and organizational culture in which developers work ⁵^,¹⁷ ⁵

Tom DeMarco and Tim Lister (1985). Programmer performance and the effects of the workplace. ACM/IEEE International Conference on Software Engineering.

¹⁷

J. Vosburgh, B. Curtis, R. Wolverton, B. Albert, H. Malec, S. Hoben, and Y. Liu (1984). Productivity factors and programming environments. ACM/IEEE International Conference on Software Engineering.

. In fact, these might actually be the biggest factors in determining developer productivity. This means that even a developer that is highly productive individually cannot rescue a team that is poorly structured working on poorly architected code. This might be why highly productive developers are so difficult to recruit to poorly managed teams.

A different way to think about productivity is to consider it from a “waste” perspective, in which waste is defined as any activity that does not contribute to a product’s value to users or customers. Sedano et al. investigated this view across two years and eight software development projects in a software development consultancy ¹⁵ ¹⁵

Todd Sedano, Paul Ralph, Cécile Péraire (2017). Software development waste. ACM/IEEE International Conference on Software Engineering.

, contributing a taxonomy of waste:

Building the wrong feature or product . The cost of building a feature or product that does not address user or business needs.
Mismanaging the backlog . The cost of duplicating work, expediting lower value user features, or delaying necessary bug fixes.
Rework . The cost of altering delivered work that should have been done correctly but was not.
Unnecessarily complex solutions . The cost of creating a more complicated solution than necessary, a missed opportunity to simplify features, user interface, or code.
Extraneous cognitive load . The costs of unneeded expenditure of mental energy, such as poorly written code, context switching, confusing APIs, or technical debt.
Psychological distress . The costs of burdening the team with unhelpful stress arising from low morale, pace, or interpersonal conflict.
Waiting/multitasking . The cost of idle time, often hidden by multi-tasking, due to slow tests, missing information, or context switching.
Knowledge loss . The cost of re-acquiring information that the team once knew.
Ineffective communication . The cost of incomplete, incorrect, misleading, inefficient, or absent communication.

One could imagine using these concepts to refine processes and practices in a team, helping both developers and managers be more aware of sources of waste that harm productivity.

Of course, productivity is not only shaped by professional and organizational factors, but personal ones as well. Consider, for example, an engineer that has friends, wealth, health care, health, stable housing, sufficient pay, and safety: they likely have everything they need to bring their full attention to their work. In contrast, imagine an engineer that is isolated, has immense debt, has no health care, has a chronic disease like diabetes, is being displaced from an apartment by gentrification, has lower pay than their peers, or does not feel safe in public. Any one of these factors might limit an engineer’s ability to be productive at work; some people might experience multiple, or even all of these factors, especially if they are a person of color in the United States, who has faced a lifetime of racist inequities in school, health care, and housing. Because of the potential for such inequities to influence someone’s ability to work, managers and organizations need to make space for surfacing these inequities at work, so that teams can acknowledge them, plan around them, and ideally address them through targeted supports. Anything less tends to make engineers feel unsupported, which will only decrease their motivation to contribute to a team. These widely varying conceptions of productivity reveal that programming in a software engineering context is about far more than just writing a lot of code. It’s about coordinating productively with a team, synchronizing your work with an organization’s goals, and most importantly, reflecting on ways to change work to achieve those goals more effectively.

References

Sebastian Baltes, Stephan Diehl (2018). Towards a theory of software development expertise. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Ammon Bartram (2016). Hiring engineers with Ammon Bartram. Software Engineering Daily Podcast.
Anton Barua, Stephen W. Thomas & Ahmed E. Hassan (2014). What are developers talking about? An analysis of topics and trends in Stack Overflow. Empirical Software Engineering.
Andy Begel, Beth Simon (2008). Novice software developers, all over again. ICER.
Tom DeMarco and Tim Lister (1985). Programmer performance and the effects of the workplace. ACM/IEEE International Conference on Software Engineering.
Mik Kersten and Gail C. Murphy (2006). Using task context to improve programmer productivity. ACM SIGSOFT Foundations of Software Engineering (FSE).
Amy J. Ko, Brad A. Myers, Htet Htet Aung (2004). Six learning barriers in end-user programming systems. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).
Amy J. Ko, Htet Htet Aung, Brad A. Myers (2005). Eliciting design requirements for maintenance-oriented IDEs: a detailed study of corrective and perfective maintenance tasks. ACM/IEEE International Conference on Software Engineering.
Thomas D. LaToza, Gina Venolia, and Robert DeLine (2006). Maintaining mental models: a study of developer work habits. ACM/IEEE International Conference on Software Engineering.
Paul Luo Li, Amy J. Ko, and Jiamin Zhu (2015). What makes a great software engineer?. ACM/IEEE International Conference on Software Engineering.
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann (2011). Design lessons from the fastest Q&A site in the west. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Andrew Meneely, Pete Rotella, and Laurie Williams (2011). Does adding manpower also affect quality? An empirical, longitudinal analysis. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
André N. Meyer, Laura E. Barton, Gail C. Murphy, Thomas Zimmermann, Thomas Fritz (2017). The work life of developers: Activities, switches and perceived productivity. IEEE Transactions on Software Engineering.
Ben Northup (2016). Reflections of an old programmer. Software Engineering Daily Podcast.
Todd Sedano, Paul Ralph, Cécile Péraire (2017). Software development waste. ACM/IEEE International Conference on Software Engineering.
Leif Singer, Fernando Figueira Filho, and Margaret-Anne Storey (2014). Software engineering at the speed of light: how developers stay current using Twitter. ACM/IEEE International Conference on Software Engineering.
J. Vosburgh, B. Curtis, R. Wolverton, B. Albert, H. Malec, S. Hoben, and Y. Liu (1984). Productivity factors and programming environments. ACM/IEEE International Conference on Software Engineering.
Xin Xia, Lingfeng Bao, David Lo, Pavneet Singh Kochhar, Ahmed E. Hassan, Zhenchang Xing (2017). What do developers search for on the web?. Empirical Software Engineering.

Software quality is multidimensional and often coursely measured through issue trackers like this one.

Chapter 5

Quality

by Amy J. Ko

There are numerous ways a software project can fail: projects can be over budget, they can ship late, they can fail to be useful, or they can simply not be useful enough. Evidence clearly shows that success is highly contextual and stakeholder-dependent: success might be financial, social, physical and even emotional, suggesting that software engineering success is a multifaceted variable that cannot be explained simply by user satisfaction, profitability or meeting requirements, budgets and schedules ⁶ ⁶

Paul Ralph and Paul Kelly (2014). The dimensions of software engineering success. ACM/IEEE International Conference on Software Engineering.

One of the central reasons for this is that there are many distinct software qualities that software can have and depending on the stakeholders, each of these qualities might have more or less importance. For example, a safety critical system such as flight automation software should be reliable and defect-free, but it’s okay if it’s not particularly learnable—that’s what training is for. A video game, however, should probably be fun and learnable, but it’s fine if it ships with a few defects, as long as they don’t interfere with fun ⁴ ⁴

Emerson Murphy-Hill, Thomas Zimmermann, and Nachiappan Nagappan (2014). Cowboys, ankle sprains, and keepers of quality: how is video game development different from software development?. ACM/IEEE International Conference on Software Engineering.

There are a surprisingly large number of software qualities ¹ ¹

Barry W. Boehm (1976). Software engineering. IEEE Transactions on Computers.

. Many concern properties that are intrinsic to a software’s implementation:

Correctness is the extent to which a program behaves according to its specification. If specifications are ambiguous, correctness is ambiguous. However, even if a specification is perfectly unambiguous, it might still fail to meet other qualities (e.g., a web site may be built as intended, but still be slow, unusable, and useless.)
Reliability is the extent to which a program behaves the same way over time in the same operating environment. For example, if your online banking app works most of the time, but crashes sometimes, it’s not particularly reliable.
Robustness is the extent to which a program can recover from errors or unexpected input. For example, a login form that crashes if an email is formatted improperly isn’t very robust. A login form that handles any text input is optimally robust. One can make a system more robust by breadth of errors and inputs it can handle in a reasonable way.
Performance is the extent to which a program uses computing resources economically. Synonymous with “fast” and “zippy”. Performance is directly determined by how many instructions a program has to execute to accomplish its operations, but it is difficult to measure because operations, inputs, and the operating environment can vary widely.
Portability is the extent to which an implementation can run on different platforms without being modified. For example, “universal” applications in the Apple ecosystem that can run on iPhones, iPads, and Mac OS without being modified or recompiled are highly portable.
Interoperability is the extent to which a system can seamlessly interact with other systems, typically through the use of standards. For example, some software systems use entirely proprietary and secret data formats and communication protocols. These are less interoperable than systems that use industry-wide standards.
Security is the extent to which only authorized individuals can access a software system’s data and computation.

Whereas the above qualities are concerned with how software behaves technically according to specifications, some qualities concern properties of how developers interact with code:

Verifiability is the effort required to verify that software does what it is intended to do. For example, it is hard to verify a safety critical system without either proving it correct or testing it in a safety-critical context (which isn’t safe). Take driverless cars, for example: for Google to test their software, they’ve had to set up thousands of paid drivers to monitor and report problems on the road. In contrast, verifying that a simple static HTML web page works correctly is as simple as opening it in a browser.
Maintainability is the effort required to correct, adapt, or perfect software. This depends mostly on how comprehensible and modular an implementation is.
Reusability is the effort required to use a program’s components for purposes other than those for which it was originally designed. APIs are reusable by definition, whereas black box embedded software (like the software built into a car’s traction systems) is not.

Other qualities are concerned with the use of the software in the world by people:

Learnability is the ease with which a person can learn to operate software. Learnability is multi-dimensional and can be difficult to measure, including aspects of usability, expectations of prior knowledge, reliance on conventions, error proneness, and task alignment ² ²
Tovi Grossman, George Fitzmaurice, Ramtin Attar (2009). A survey of software learnability: metrics, methodologies and guidelines. ACM SIGCHI Conference on Human Factors in Computing (CHI).
.
User efficiency is the speed with which a person can perform tasks with a program. For example, think about the speed with which you can navigate back to the table of contents of this book. Obviously, because most software supports many tasks, user efficiency isn’t a single property of software, but one that varies depending on the task.
Accessibility is the extent to which people with varying cognitive and motor abilities can operate the software as intended. For example, software that can only be used with a mouse is less accessible than something that can be used with a mouse, keyboard, or speech recognition. Software can be designed for all abilities, and even automatically adapted for individual abilities ⁷ ⁷
Jacob O. Wobbrock, Shaun K. Kane, Krzysztof Z. Gajos, Susumu Harada, and Jon Froehlich (2011). Ability-based design: Concept, principles and examples. ACM Transactions on Accessible Computing (TACCESS).
.
Privacy is the extent to which a system prevents access to information that intended for a particular audience or use. To achieve privacy, a system must be secure; for example, if anyone could log into your Facebook account, it would be insecure, and thus have poor privacy preservation. However, a secure system is not necessarily private: Facebook works hard on security, but shares immense amounts of private data with third parties, often without informed consent.
Consistency is the extent to which related functionality in a system leverages the same skills, rather than requiring new skills to learn how to use. For example, in Mac OS, quitting any application requires the same action: command-Q or the Quit menu item in the application menu; this is highly consistent. Other platforms that are less consistent allow applications to have many different ways of quitting applications.
Usability is an aggregate quality that encompasses all of the qualities above. It is used holistically to refer to all of those factors. Because it is not very precise, it is mostly useful in casual conversation about software, but not as useful in technical conversations about software quality.
Bias is the extent to which software discriminates or excludes on the basis of some aspect of its user, either directly, or by amplifying or reinforcing discriminatory or exclusionary structures in society. For example, data used to train a classifier might used racially biased data, algorithms might use sexist assumptions about gender, web forms might systematically exclude non-Western names and language, and applications might be only accessible to people who can see or use a mouse. Inaccessibility is a form of bias.
Usefulness is the extent to which software is of value to its various stakeholders. Utility is often the most important quality because it subsumes all of the other lower-level qualities software can have (e.g., part of what makes a messaging app useful is that it’s performant, user efficient, and reliable). That also makes it less useful as a concept, because it encompasses so many things. That said, usefulness is not always the most important quality. For example, if you can sell a product to a customer and get a one time payment of their money, it might not matter—at least to a for-profit venture—that the product has low usefulness.

Although the lists above are not complete, you might have already noticed some tradeoffs between different qualities. A secure system is necessarily going to be less learnable, because there will be more to learn to operate it. A robust system will likely be less maintainable because it it will likely have more code to account for its diverse operating environments. Because one cannot achieve all software qualities, and achieving each quality takes significant time, it is necessary to prioritize qualities for each project.

These external notions of quality are not the only qualities that matter. For example, developers often view projects as successful if they offer intrinsically rewarding work ⁵ ⁵

J. Drew Procaccino, June M. Verner, Katherine M. Shelfer, David Gefen (2005). What do software practitioners really think about project success: an exploratory study. Journal of Systems and Software.

. That may sound selfish, but if developers aren’t enjoying their work, they’re probably not going to achieve any of the qualities very well. Moreover, there are many organizational factors that can inhibit developers’ ability to obtain these rewards. Project complexity, internal and external dependencies that are out of a developer’s control, process barriers, budget limitations, deadlines, poor HR planning, and pressure to ship can all interfere with project success ³ ³

Mathieu Lavallee and Pierre N. Robillard (2015). Why good developers write bad code: an observational case study of the impacts of organizational factors on software quality. ACM/IEEE International Conference on Software Engineering.

As I’ve noted before, the person most responsible for isolating developers from these organizational problems, and most responsible for prioritizing software qualities is a product manager.

References

Barry W. Boehm (1976). Software engineering. IEEE Transactions on Computers.
Tovi Grossman, George Fitzmaurice, Ramtin Attar (2009). A survey of software learnability: metrics, methodologies and guidelines. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Mathieu Lavallee and Pierre N. Robillard (2015). Why good developers write bad code: an observational case study of the impacts of organizational factors on software quality. ACM/IEEE International Conference on Software Engineering.
Emerson Murphy-Hill, Thomas Zimmermann, and Nachiappan Nagappan (2014). Cowboys, ankle sprains, and keepers of quality: how is video game development different from software development?. ACM/IEEE International Conference on Software Engineering.
J. Drew Procaccino, June M. Verner, Katherine M. Shelfer, David Gefen (2005). What do software practitioners really think about project success: an exploratory study. Journal of Systems and Software.
Paul Ralph and Paul Kelly (2014). The dimensions of software engineering success. ACM/IEEE International Conference on Software Engineering.
Jacob O. Wobbrock, Shaun K. Kane, Krzysztof Z. Gajos, Susumu Harada, and Jon Froehlich (2011). Ability-based design: Concept, principles and examples. ACM Transactions on Accessible Computing (TACCESS).

Requirements specify what software must do, constraining, focusing, and defining it’s successful functioning.

Chapter 6

Requirements

by Amy J. Ko

Once you have a problem, a solution, and a design specification, it’s entirely reasonable to start thinking about code. What libraries should we use? What platform is best? Who will build what? After all, there’s no better way to test the feasibility of an idea than to build it, deploy it, and find out if it works. Right?

It depends. This mentality towards product design works fine if building and deploying something is cheap and getting feedback has no consequences. Simple consumer applications often benefit from this simplicity, especially early stage ones, because there’s little to lose. For example, if you are starting a company, and do not even know if there is a market opportunity yet, it may be worth quickly prototyping an idea, seeing if there’s interest, and then later thinking about how to carefully architect a product that meets that opportunity. This is how products such as Facebook started , with a poorly implemented prototype that revealed an opportunity, which was only later translated into a functional, reliable software service.

However, what if prototyping a beta isn’t cheap to build? What if your product only has one shot at adoption? What if you’re building something for a client and they want to define success? Worse yet, what if your product could kill people if it’s not built properly? Consider the U.S. HealthCare.gov launch , for example, which was lambasted for its countless defects and poor scalability at launch, only working for 1,100 simultaneous users, when 50,000 were expected and 250,000 actually arrived. To prevent disastrous launches like this, software teams have to be more careful about translating a design specification into a specific explicit set of goals that must be satisfied in order for the implementation to be complete. We call these goals requirements and we call this process requirements engineering ⁷ ⁷

Ian Sommerville, Pete Sawyer (1997). Requirements engineering: a good practice guide. John Wiley & Sons, Inc.

In principle, requirements are a relatively simple concept. They are simply statements of what must be true about a system to make the system acceptable. For example, suppose you were designing an interactive mobile game. You might want to write the requirement The frame rate must never drop below 60 frames per second. This could be important for any number of reasons: the game may rely on interactive speeds, your company’s reputation may be for high fidelity graphics, or perhaps that high frame rate is key to creating a sense of realism. Or, imagine your game company has a reputation for high performance, high fidelity graphics, high frame rate graphics, and achieving any less would erode your company’s brand. Whatever the reasons, expressing it as a requirement makes it explicit that any version of the software that doesn’t meet that requirement is unacceptable, and sets a clear goal for engineering to meet.

The general idea of writing down requirements is actually a controversial one. Why not just discover what a system needs to do incrementally, through testing, user feedback, and other methods? Some of the original arguments for writing down requirements actually acknowledged that software is necessarily built incrementally, but that it is nevertheless useful to write down requirements from the outset ⁶ ⁶

David L Parnas, Paul C. Clements (1986). A rational design process: How and why to fake it. IEEE Transactions on Software Engineering.

. This is because requirements help you plan everything: what you have to build, what you have to test, and how to know when you’re done. The theory is that by defining requirements explicitly, you plan, and by planning, you save time.

Do you really have to plan by writing down requirements? For example, why not do what designers do, expressing requirements in the form of prototypes and mockups? These implicitly state requirements, because they suggest what the software is supposed to do without saying it directly. But for some types of requirements, they actually imply nothing. For example, how responsive should a web page be? A prototype doesn’t really say; an explicit requirement of an average page load time of less than 1 second is quite explicit. Requirements can therefore be thought of more like an architect’s blueprint: they provide explicit definitions and scaffolding of project success.

And yet, like design, requirements come from the world and the people in it and not from software ² ²

Michael Jackson (2001). Problem frames. Addison-Wesley.

. Because they come from the world, requirements are rarely objective or unambiguous. For example, some requirements come from law, such as the European Union’s General Data Protection Regulation GDPR regulation, which specifies a set of data privacy requirements that all software systems used by EU citizens must meet. Other requirements might come from public pressure for change, as in Twitter’s decision to label particular tweets as having false information or hate speech. Therefore, the methods that people use to do requirements engineering are quite diverse. Requirements engineers may work with lawyers to interpret policy. They might work with regulators to negotiate requirements. They might also use design methods, such as user research methods and rapid prototyping to iteratively converge toward requirements ³ ³

Axel van Lamsweerde (2008). Requirements engineering: from craft to discipline. ACM SIGSOFT Foundations of Software Engineering (FSE).

. Therefore, the big difference between design and requirements engineering is that requirements engineers take the process one step further than designers, enumerating in detail every property that the software must satisfy, and engaging with every source of requirements a system might need to meet, not just user needs.

There are some approaches to specifying requirements formally . These techniques allow requirements engineers to automatically identify conflicting requirements, so they don’t end up proposing a design that can’t possibly exist. Some even use systems to make requirements traceable , meaning the high level requirement can be linked directly to the code that meets that requirement ⁴ ⁴

Patrick Mäder, Alexander Egyed (2015). Do developers benefit from requirements traceability when evolving and maintaining a software system?. Empirical Software Engineering.

. All of this formality has tradeoffs: not only does it take more time to be so precise, but it can negatively affect creativity in concept generation as well ⁵ ⁵

Rahul Mohanani, Paul Ralph, and Ben Shreeve (2014). Requirements fixation. ACM/IEEE International Conference on Software Engineering.

Expressing requirements in natural language can mitigate these effects, at the expense of precision. They just have to be complete , precise , non-conflicting , and verifiable . For example, consider a design for a simple to do list application. Its requirements might be something like the following:

Users must be able to add to-do list items with a single action.
To-do list items must contain text and a binary completed state.
Users must be able to edit the text of to-do list items.
Users must be able to toggle the completed state of to-do list items.
Users must be able to delete to-do list items.
All changes made to the state of to-do list items must be saved automatically without user intervention.

Let’s review these requirements against the criteria for good requirements that I listed above:

Is it complete ? I can think of a few more requirements: is the list ordered? How long does state persist? Are there user accounts? Where is data stored? What does it look like? What kinds of user actions must be supported? Is delete undoable? Even just on these completeness dimension, you can see how even a very simple application can become quite complex. When you’re generating requirements, your job is to make sure you haven’t forgotten important requirements.
Is the list precise ? Not really. When you add a to do list item, is it added at the beginning? The end? Wherever a user request it be added? How long can the to do list item text be? Clearly the requirement above is imprecise. And imprecise requirements lead to imprecise goals, which means that engineers might not meet them. Is this to do list team okay with not meeting its goals?
Are the requirements non-conflicting ? I think they are since they all seem to be satisfiable together. But some of the missing requirements might conflict. For example, suppose we clarified the imprecise requirement about where a to do list item is added. If the requirement was that it was added to the end, is there also a requirement that the window scroll to make the newly added to do item visible? If not, would the first requirement of making it possible for users to add an item with a single action be achievable? They could add it, but they wouldn’t know they had added it because of this usability problem, so is this requirement met? This example shows that reasoning through requirements is ultimately about interpreting words, finding source of ambiguity, and trying to eliminate them with more words.
Finally, are they verifiable ? Some more than others. For example, is there a way to guarantee that the state saves successfully all the time? That may be difficult to prove given the vast number of ways the operating environment might prevent saving, such as a failing hard drive or an interrupted internet connection. This requirement might need to be revised to allow for failures to save, which itself might have implications for other requirements in the list.

Now, the flaws above don’t make the requirements “wrong”. They just make them “less good.” The more complete, precise, non-conflicting, and testable your requirements are, the easier it is to anticipate risk, estimate work, and evaluate progress, since requirements essentially give you a to do list for implementation and testing.

Lastly, remember that requirements are translated from a design, and designs have many more qualities than just completeness, preciseness, feasibility, and verifiability. Designs must also be legal, ethical, and just. Consider, for example, the anti-Black redlining practices pervasive throughout the United States. Even through the 1980’s, it was standard practice for banks to lend to lower-income white residents, but not Black residents, even middle-income or upper-income ones. Banks in the 1980’s wrote software to automate many lending decisions; would a software requirement such as this have been legal, ethical, or just?

No loan application with an applicant self-identified as a person of color should be approved.

That requirement is both precise and verifiable. In the 1980’s, it was legal. But was it ethical or just? Absolutely not. Therefore, requirements, no matter how formally extracted from a design specification, no matter how consistent with law, and no matter how aligned with an organization’s priorities, should be free of racist ideas. Requirements are just one of many ways that such ideas are manifested, and ultimately hidden in code ¹ ¹

Ruha Benjamin (2019). Race after technology: Abolitionist tools for the New Jim Code. Polity Books.

References

Ruha Benjamin (2019). Race after technology: Abolitionist tools for the New Jim Code. Polity Books.
Michael Jackson (2001). Problem frames. Addison-Wesley.
Axel van Lamsweerde (2008). Requirements engineering: from craft to discipline. ACM SIGSOFT Foundations of Software Engineering (FSE).
Patrick Mäder, Alexander Egyed (2015). Do developers benefit from requirements traceability when evolving and maintaining a software system?. Empirical Software Engineering.
Rahul Mohanani, Paul Ralph, and Ben Shreeve (2014). Requirements fixation. ACM/IEEE International Conference on Software Engineering.
David L Parnas, Paul C. Clements (1986). A rational design process: How and why to fake it. IEEE Transactions on Software Engineering.
Ian Sommerville, Pete Sawyer (1997). Requirements engineering: a good practice guide. John Wiley & Sons, Inc.

Architecture is how code is organized

Chapter 7

Architecture

by Amy J. Ko

Once you have a sense of what your design must do (in the form of requirements or other less formal specifications), the next big problem is one of organization. How will you order all of the different data, algorithms, and control implied by your requirements? With a small program of a few hundred lines, you can get away without much organization, but as programs scale, they quickly become impossible to manage alone, let alone with multiple developers. Much of this challenge occurs because requirements change , and every time they do, code has to change to accommodate. The more code there is and the more entangled it is, the harder it is to change and more likely you are to break things.

This is where architecture comes in. Architecture is a way of organizing code, just like building architecture is a way of organizing space. The idea of software architecture has at its foundation a principle of information hiding : the less a part of a program knows about other parts of a program, the easier it is to change. The most popular information hiding strategy is encapsulation : this is the idea of designing self-contained abstractions with well-defined interfaces that separate different concerns in a program. Programming languages offer encapsulation support through things like functions and classes , which encapsulate data and functionality together. Another programming language encapsulation method is scoping , which hides variables and other names from other parts of program outside a scope. All of these strategies attempt to encourage developers to maximize information hiding and separation of concerns. If you get your encapsulation right, you should be able to easily make changes to a program’s behavior without having to change everything about its implementation.

When encapsulation strategies fail, one can end up with what some affectionately call a “ball of mud” architecture or “spaghetti code”. Ball of mud architectures have no apparent organization, which makes it difficult to comprehend how parts of its implementation interact. A more precise concept that can help explain this disorder is cross-cutting concerns , which are things like features and functionality that span multiple different components of a system, or even an entire system. There is some evidence that cross-cutting concerns can lead to difficulties in program comprehension and long-term design degradation ¹⁷ ¹⁷

Robert J. Walker, Shreya Rawal, and Jonathan Sillito (2012). Do crosscutting concerns cause modularity problems?. ACM SIGSOFT Foundations of Software Engineering (FSE).

, all of which reduce productivity and increase the risk of defects. As long-lived systems get harder to change, they can take on technical debt , which is the degree to which an implementation is out of sync with a team’s understanding of what a product is intended to be. Many developers view such debt as emerging from primarily from poor architectural decisions ⁶ ⁶

Neil A. Ernst, Stephany Bellomo, Ipek Ozkaya, Robert L. Nord, and Ian Gorton (2015). Measure it? Manage it? Ignore it? Software practitioners and technical debt. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

. Over time, this debt can further result in organizational challenges ⁹ ⁹

Ravi Khadka, Belfrit V. Batlajery, Amir M. Saeidi, Slinger Jansen, and Jurriaan Hage (2014). How do professionals perceive legacy systems and software modernization?. ACM/IEEE International Conference on Software Engineering.

, making change even more difficult.

The preventative solution to this problems is to try to design architecture up front, mitigating the various risks that come from cross-cutting concerns (defects, low modifiability, etc.) ⁷ ⁷

George Fairbanks (2010). Just enough software architecture: a risk-driven approach. Marshall & Brainerd.

. A popular method in the 1990’s was the Unified Modeling Language (UML), which was a series of notations for expressing the architectural design of a system before implementing it. Recent studies show that UML was generally not used and generally not universal ¹³ ¹³

Marian Petre (2013). UML in practice. ACM/IEEE International Conference on Software Engineering.

. While these formal representations have generally not been adopted, informal, natural language architectural specifications are still widely used. For example, Google engineers write design specifications to sort through ambiguities, consider alternatives, and clarify the volume of work required. A study of developers’ perceptions of the value of documentation also reinforced that many forms of documentation, including code comments, style guides, requirements specifications, installation guides, and API references, are viewed as critical, and are only viewed as less valuable because teams do not adequately maintain them ² ²

Emad Aghajani, Csaba Nagy, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, Michele Lanza, David C. Shepherd (2020). Software documentation: the practitioners' perspective. ACM/IEEE International Conference on Software Engineering.

More recent developers have investigated ideas of architectural styles , which are patterns of interactions and information exchange between encapsulated components. Some common architectural styles include:

Client/server , in which data is transacted in response to requests. This is the basis of the Internet and cloud computing ⁵ ⁵
Jürgen Cito, Philipp Leitner, Thomas Fritz, and Harald C. Gall (2015). The making of cloud applications: an empirical study on software development for the cloud. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
.
Pipe and filter , in which data is passed from component to component, and transformed and filtered along the way. Command lines, compilers, and machine learned programs are examples of pipe and filter architectures.
Model-view-controller (MVC) , in which data is separated from views of the data and from manipulations of data. Nearly all user interface toolkits use MVC, including popular modern frameworks such as React.
Peer to peer (P2P) , in which components transact data through a distributed standard interface. Examples include Bitcoin, Spotify, and Gnutella.
Event-driven , in which some components “broadcast” events and others “subscribe” to notifications of these events. Examples include most model-view-controller-based user interface frameworks, which have models broadcast change events to subscribers. For example, views may subscribe to models so they may update themselves to render new model state each time it changes.

Architectural styles come in all shapes and sizes. Some are smaller design patterns of information sharing ⁴ ⁴

Kent Beck, Ron Crocker, Gerard Meszaros, John Vlissides, James O. Coplien, Lutz Dominick, and Frances Paulisch (1996). Industrial experience with design patterns. ACM/IEEE International Conference on Software Engineering.

, whereas others are ubiquitous but specialized patterns such as the architectures required to support undo and cancel in user interfaces ³ ³

Len Bass, Bonnie E. John (2003). Linking usability to software architecture patterns through general scenarios. Journal of Systems and Software.

One fundamental unit of which an architecture is composed is a component . This is basically a word that refers to any abstraction—any code, really—that attempts to encapsulate some well defined functionality or behavior separate from other functionality and behavior. For example, consider the Java class Math : it encapsulates a wide range of related mathematical functions. This class has an interface that decide how it can communicate with other components (sending arguments to a math function and getting a return value). Components can be more than classes though: they might be a data structure, a set of functions, a library, an API, or even something like a web service. All of these are abstractions that encapsulate interrelated computation and state for some well-define purpose.

The second fundamental unit of architecture is connectors . Connectors are code that transmit information between components. They’re brokers that connect components, but do not necessarily have meaningful behaviors or states of their own. Connectors can be things like function calls, web service API calls, events, requests, and so on. None of these mechanisms store state or functionality themselves; instead, they are the things that tie components functionality and state together.

Even with carefully selected architectures, systems can still be difficult to put together, leading to architectural mismatch ⁸ ⁸

Garlan, D., Allen, R., & Ockerbloom, J (1995). Architectural mismatch or why it's hard to build systems out of existing parts. ACM/IEEE International Conference on Software Engineering.

. When mismatch occurs, connecting two styles can require dramatic amounts of code to connect, imposing significant risk of defects and cost of maintenance. One common example of mismatches occurs with the ubiquitous use of database schemas with client/server web-applications. A single change in a database schema can often result in dramatic changes in an application, as every line of code that uses that part of the scheme either directly or indirectly must be updated ¹⁵ ¹⁵

Dong Qiu, Bixin Li, and Zhendong Su (2013). An empirical analysis of the co-evolution of schema and code in database applications. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

. This kind of mismatch occurs because the component that manages data (the database) and the component that renders data (the user interface) are highly “coupled” with the database schema: the user interface needs to know a lot about the data, its meaning, and its structure in order to render it meaningfully.

The most common approach to dealing with both architectural mismatch and the changing of requirements over time is refactoring , which means changing the architecture of an implementation without changing its behavior. Refactoring is something most developers do as part of changing a system ¹¹^,¹⁶ ¹¹

Emerson Murphy-Hill, Chris Parnin, and Andrew P. Black (2009). How we refactor, and how we know it. ACM/IEEE International Conference on Software Engineering.

¹⁶

Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente (2016). Why we refactor? Confessions of GitHub contributors. ACM SIGSOFT Foundations of Software Engineering (FSE).

. Refactoring code to eliminate mismatch and technical debt can simplify change in the future, saving time ¹² ¹²

T. H. Ng, S. C. Cheung, W. K. Chan, and Y. T. Yu (2006). Work experience versus refactoring to design patterns: a controlled experiment. ACM SIGSOFT Foundations of Software Engineering (FSE).

and preventing future defects ¹⁰ ¹⁰

Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan (2012). A field study of refactoring challenges and benefits. ACM SIGSOFT Foundations of Software Engineering (FSE).

. However, because refactoring remains challenging, the difficulty of changing an architecture is often used as a rationale for rejecting demands for change from users. For example, Google does not allow one to change their Gmail address, which greatly harms people who have changed their name (such as this author when she came out as a trans woman), forcing them to either live with an address that includes their old name, or abandon their Google account, with no ability to transfer documents or settings. The rationale for this has nothing to do with policy and everything to do with the fact that the original architecture of Gmail treats the email address as a stable, unique identifier for an account. Changing this basic assumption throughout Gmail’s implementation would be an immense refactoring task.

Research on the actual activity of software architecture is actually somewhat sparse. One of the more recent syntheses of this work is Petre et al.’s book, Software Design Decoded ¹⁴ ¹⁴

Marian Petre, André van der Hoek (2016). Software design decoded: 66 ways experts think. MIT Press.

, which distills many of the practices and skills of software design into a set of succinct ideas. For example, the book states, “ Every design problem has multiple, if not infinite, ways of solving it. Experts strongly prefer simpler solutions over complex ones, for they know that such solutions are easier to understand and change in the future. ” And yet, in practice, studies of how projects use APIs often show that developers do the exact opposite, building projects with dependencies on large numbers of sometimes trivial APIs. Some behavior suggests that while software architects like simplicity of implementation, software developers are often choosing whatever is easiest to build, rather than whatever is least risky to maintain over time ¹ ¹

Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab (2017). Why do developers use trivial packages? An empirical case study on npm. ACM SIGSOFT Foundations of Software Engineering (FSE).

References

Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, and Emad Shihab (2017). Why do developers use trivial packages? An empirical case study on npm. ACM SIGSOFT Foundations of Software Engineering (FSE).
Emad Aghajani, Csaba Nagy, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, Michele Lanza, David C. Shepherd (2020). Software documentation: the practitioners' perspective. ACM/IEEE International Conference on Software Engineering.
Len Bass, Bonnie E. John (2003). Linking usability to software architecture patterns through general scenarios. Journal of Systems and Software.
Kent Beck, Ron Crocker, Gerard Meszaros, John Vlissides, James O. Coplien, Lutz Dominick, and Frances Paulisch (1996). Industrial experience with design patterns. ACM/IEEE International Conference on Software Engineering.
Jürgen Cito, Philipp Leitner, Thomas Fritz, and Harald C. Gall (2015). The making of cloud applications: an empirical study on software development for the cloud. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Neil A. Ernst, Stephany Bellomo, Ipek Ozkaya, Robert L. Nord, and Ian Gorton (2015). Measure it? Manage it? Ignore it? Software practitioners and technical debt. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
George Fairbanks (2010). Just enough software architecture: a risk-driven approach. Marshall & Brainerd.
Garlan, D., Allen, R., & Ockerbloom, J (1995). Architectural mismatch or why it's hard to build systems out of existing parts. ACM/IEEE International Conference on Software Engineering.
Ravi Khadka, Belfrit V. Batlajery, Amir M. Saeidi, Slinger Jansen, and Jurriaan Hage (2014). How do professionals perceive legacy systems and software modernization?. ACM/IEEE International Conference on Software Engineering.
Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan (2012). A field study of refactoring challenges and benefits. ACM SIGSOFT Foundations of Software Engineering (FSE).
Emerson Murphy-Hill, Chris Parnin, and Andrew P. Black (2009). How we refactor, and how we know it. ACM/IEEE International Conference on Software Engineering.
T. H. Ng, S. C. Cheung, W. K. Chan, and Y. T. Yu (2006). Work experience versus refactoring to design patterns: a controlled experiment. ACM SIGSOFT Foundations of Software Engineering (FSE).
Marian Petre (2013). UML in practice. ACM/IEEE International Conference on Software Engineering.
Marian Petre, André van der Hoek (2016). Software design decoded: 66 ways experts think. MIT Press.
Dong Qiu, Bixin Li, and Zhendong Su (2013). An empirical analysis of the co-evolution of schema and code in database applications. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente (2016). Why we refactor? Confessions of GitHub contributors. ACM SIGSOFT Foundations of Software Engineering (FSE).
Robert J. Walker, Shreya Rawal, and Jonathan Sillito (2012). Do crosscutting concerns cause modularity problems?. ACM SIGSOFT Foundations of Software Engineering (FSE).

Specifications add a layer of detail onto architectural plans.

Chapter 8

Specifications

by Amy J. Ko

When you make something with code, you’re probably used to figuring out a design as you go. You write a function, you choose some arguments, and if you don’t like what you see, perhaps you add a new argument to that function and test again. This cowboy coding as some people like to call it can be great fun! It allows systems to emerge more organically, as you iteratively see your front-end design emerge, the design of your implementation emerges too, co-evolving with how you’re feeling about the final product.

As you’ve probably noticed by now, this type of process doesn’t really scale, even when you’re working with just a few other people. That argument you added? You just broke a bunch of functions one of your teammates was planning and when she commits her code, now she gets merge conflicts, which cost her an hour to fix because she has to catch up to whatever design change you made. This lack of planning quickly turns into an uncoordinated mess of individual decision making. Suddenly you’re spending all of your time cleaning up coordination messes instead of writing code.

The techniques we’ve discussed so far for avoiding this boil down to specifying what code should do, so everyone can write code according to a plan. We’ve talked about requirements specifications , which are declarations of what software must do from a users’ perspective. We’ve also talked about architectural specifications , which are high-level declarations of how code will be organized, encapsulated, and coordinated. At the lowest level are functional specifications , which are declarations about the properties of input and output of functions in a program .

In their simplest form, a functional specification can be just some natural language that says what an individual function is supposed to do:

// Return the smaller of the two numbers, 
// or if they're equal, the second number.
function min(a, b) {
  return a < b ? a : b;
}

This comment achieves the core purpose of a specification: to help other developers understand what the requirements and intended behavior of a function are. As long as everyone sticks to this “plan” (everyone calls the function with only numbers and the function always returns the smaller of them), then there shouldn’t be any problems.

The comment above is okay, but it’s not very precise. It says what is returned and what properties it has, but it only implies that numbers are allowed without saying anything about what kind of numbers. Are decimals allowed or just integers? What about not-a-number (the result of dividing 1 by 0). Or infinity?

To make these clearer, many languages use static typing to allow developers to specify types explicitly:

// Return the smaller of the two integers, or if they're equal, the second number.
function min(int a, int b) {
  return a < b ? a : b;
}

Because an int is well-defined in most languages, the two inputs to the function are well-defined.

Of course, if the above was JavaScript code (which doesn’t support static typing), JavaScript does nothing to actually verify that the data given to min() are actually integers. It’s entirely fine with someone sending a string and an object. This probably won’t do what you intended, leading to defects.

This brings us to a second purpose of writing functional specifications: to help verify that functions, their input, and their output are correct. Tests of functions and other low-level procedures are called unit tests . There are many ways to use specifications to verify correctness. By far, one of the simplest and most widely used kinds of unit tests are assertions ³ ³

Clarke, L. A., & Rosenblum, D. S (2006). A historical perspective on runtime assertion checking in software development. ACM SIGSOFT Software Engineering Notes.

. Assertions consist of two things: 1) a check on some property of a function’s input or output and 2) some action to notify about violations of these properties. For example, if we wanted to verify that the JavaScript function above had integer values as inputs, we would do this:

// Return the smaller of the two numbers, or if they're equal, the second number.
function min(a, b) {
  if(!Number.isInteger(a)) 
    alert("First input to min() isn't an integer!");
  if(!Number.isInteger(b)) 
    alert("Second input to min() isn't an integer!");
  return a < b ? a : b;
}

These two new lines of code are essentially functional specifications that declare “ If either of those inputs is not an integer, the caller of this function is doing something wrong ”. This is useful to declare, but assertions have a bunch of problems: if your program can send a non-integer value to min, but you never test it in a way that does, you’ll never see those alerts. This form of dynamic verification is therefore very limited, since it provides weaker guarantees about correctness. That said, a study of the use of assertions in a large database of GitHub projects shows that use of assertions is related to fewer defects ¹ ¹

Casey Casalnuovo, Prem Devanbu, Abilio Oliveira, Vladimir Filkov, and Baishakhi Ray (2015). Assert use in GitHub projects. ACM/IEEE International Conference on Software Engineering.

(though note that I said “related”: we have no evidence that assertions actually prevent defects. It may be possible that developers who use assertions are just better at avoiding defects.)

Assertions are related to the broader category of error handling language features. Error handling includes assertions, but also programming language features like exceptions and exception handlers. Error handling is a form of specification in that checking for errors usually entails explicitly specifying the conditions that determine an error. For example, in the code above, the condition Number.isInteger(a) specifies that the parameter a must be an integer. Other exception handling code such as the Java throws statement indicates the cases in which errors can occur and the corresponding catch statement indicates what is to done about errors. It is difficult to implement good exception handling that provides granular, clear ways of recovering from errors ² ²

Chien-Tsun Chen, Yu Chin Cheng, Chin-Yun Hsieh, and I-Lang Wu (2008). Exception handling refactorings: Directed by goals and driven by bug fixing. Journal of Systems and Software.

. Evidence shows that modern developers are still exceptionally bad at designing for errors; one study found that errors are not designed for, few errors are tested for, and exception handling is often overly general, providing little ability for users to understand errors or for developers to debug them ⁴ ⁴

Felipe Eberta, Fernando Castora, Alexander Serebrenik (2015). An exploratory study on exception handling bugs in Java programs. Journal of Systems and Software.

. These difficulties appear to be because it is difficult to imagine the vast range of errors that can occur ⁶ ⁶

Maxion, Roy A., and Robert T. Olszewski (2000). Eliminating exception handling errors with dependability cases: a comparative, empirical study. IEEE Transactions on Software Engineering.

Researchers have invented many forms of specification that require more work and more thought to write, but can be used to make stronger guarantees about correctness ⁹ ⁹

Jim Woodcock, Peter Gorm Larsen, Juan Bicarregui, and John Fitzgerald (2009). Formal methods: Practice and experience. ACM Computing Surveys.

. For example, many languages support the expression of formal pre-conditions and post-conditions that represent contracts that must be kept for the program to be correct. ( Formal means mathematical, facilitating mathematical proofs that these conditions are met). Because these contracts are essentially mathematical promises, we can build tools that automatically read a function’s code and verify that what it computes exhibits those mathematical properties using automated theorem proving systems. For example, suppose we wrote some formal specifications for our example above to replace our assertions (using a fictional notation for illustration purposes):

// pre-conditions: a in Integers, b in Integers
// post-conditions: result <= a and result <= b
function min(a, b) {
  return a < b ? a : b;
}

The annotations above require that, no matter what, the inputs have to be integers and the output has to be less than or equal to both values. The automatic theorem prover can then start with the claim that result is always less than or equal to both and begin searching for a counterexample. Can you find a counterexample? Really try. Think about what you’re doing while you try: you’re probably experimenting with different inputs to identify arguments that violate the contract. That’s similar to what automatic theorem provers do, but they use many tricks to explore large possible spaces of inputs all at once, and they do it very quickly.

There are definite tradeoffs with writing detailed, formal specifications. The benefits are clear: many companies have written formal functional specifications in order to make completely unambiguous the required behavior of their code, particularly systems that are capable of killing people or losing money, such as flight automation software, banking systems, and even compilers that create executables from code ⁹ ⁹

Jim Woodcock, Peter Gorm Larsen, Juan Bicarregui, and John Fitzgerald (2009). Formal methods: Practice and experience. ACM Computing Surveys.

. In these settings, it’s worth the effort of being 100% certain that the program is correct because if it’s not, people can die. Specifications can have other benefits. The very act of writing down what you expect a function to do in the form of test cases can slow developers down, causing to reflect more carefully and systematically about exactly what they expect a function to do ⁵ ⁵

Davide Fucci, Hakan Erdogmus, Burak Turhan, Markku Oivo, Natalia Juristo (2016). A dissection of test-driven development: Does it really matter to test-first or to test-last?. IEEE Transactions on Software Engineering.

. Perhaps if this is true in general, there’s value in simply stepping back before you write a function, mapping out pre-conditions and post-conditions in the form of simple natural language comments, and then writing the function to match your intentions.

Writing formal specifications can also have downsides. When the consequences of software failure aren’t so high, the difficulty and time required to write and maintain functional specifications may not be worth the effort ⁷ ⁷

Marian Petre (2013). UML in practice. ACM/IEEE International Conference on Software Engineering.

. These barriers deter many developers from writing them ⁸ ⁸

Todd W. Schiller, Kellen Donohue, Forrest Coward, and Michael D. Ernst (2014). Case studies and tools for contract specifications. ACM/IEEE International Conference on Software Engineering.

. Formal specifications can also warp the types of data that developers work with. For example, it is much easier to write formal specifications about Boolean values and integers than string values. This can lead engineers to be overly reductive in how they model data (e.g., using binary models of gender instead of more inclusive non-binary and multidimensional ones).

References

Casey Casalnuovo, Prem Devanbu, Abilio Oliveira, Vladimir Filkov, and Baishakhi Ray (2015). Assert use in GitHub projects. ACM/IEEE International Conference on Software Engineering.
Chien-Tsun Chen, Yu Chin Cheng, Chin-Yun Hsieh, and I-Lang Wu (2008). Exception handling refactorings: Directed by goals and driven by bug fixing. Journal of Systems and Software.
Clarke, L. A., & Rosenblum, D. S (2006). A historical perspective on runtime assertion checking in software development. ACM SIGSOFT Software Engineering Notes.
Felipe Eberta, Fernando Castora, Alexander Serebrenik (2015). An exploratory study on exception handling bugs in Java programs. Journal of Systems and Software.
Davide Fucci, Hakan Erdogmus, Burak Turhan, Markku Oivo, Natalia Juristo (2016). A dissection of test-driven development: Does it really matter to test-first or to test-last?. IEEE Transactions on Software Engineering.
Maxion, Roy A., and Robert T. Olszewski (2000). Eliminating exception handling errors with dependability cases: a comparative, empirical study. IEEE Transactions on Software Engineering.
Marian Petre (2013). UML in practice. ACM/IEEE International Conference on Software Engineering.
Todd W. Schiller, Kellen Donohue, Forrest Coward, and Michael D. Ernst (2014). Case studies and tools for contract specifications. ACM/IEEE International Conference on Software Engineering.
Jim Woodcock, Peter Gorm Larsen, Juan Bicarregui, and John Fitzgerald (2009). Formal methods: Practice and experience. ACM Computing Surveys.

Good process is like a river, seamlessly flowing around obstacles

Chapter 9

Process

by Amy J. Ko

So you know what you’re going to build and how you’re going to build it. What process should you go about building it? Who’s going to build what? What order should you build it in? How do you make sure everyone is in sync while you’re building it? ²² ²²

Tim Pettersen (2016). Git Workflows with Tim Pettersen. Software Engineering Daily Podcast.

And most importantly, how to do you make sure you build well and on time? These are fundamental questions in software engineering with many potential answers. Unfortunately, we still don’t know which of those answers are right.

At the foundation of all of these questions are basic matters of project management : plan, execute, and monitor. But developers in the 1970’s and on found that traditional project management ideas didn’t seem to work. The earliest process ideas followed a “waterfall” model, in which a project begins by identifying requirements, writing specifications, implementing, testing, and releasing, all under the assumption that every stage could be fully tested and verified. (Recognize this? It’s the order of topics we’re discussing in this class!). Many managers seemed to like the waterfall model because it seemed structured and predictable; however, because most managers were originally software developers, they preferred a structured approach to project management ³¹ ³¹

Gerald M. Weinberg (1982). Over-structured management of software engineering. ACM/IEEE International Conference on Software Engineering.

. The reality, however, was that no matter how much verification one did of each of these steps, there always seemed to be more information in later steps that caused a team to reconsider its earlier decision (e.g., imagine a customer liked a requirement when it was described in the abstract, but when it was actually built, they rejected it, because they finally saw what the requirement really meant).

In 1988, Barry Boehm proposed an alternative to waterfall called the Spiral model ⁴ ⁴

Barry W. Boehm (1988). A spiral model of software development and enhancement. IEEE Computer.

: rather than trying to verify every step before proceeding to the next level of detail, prototype every step along the way, getting partial validation, iteratively converging through a series of prototypes toward both an acceptable set of requirements and an acceptable product. Throughout, risk assessment is key, encouraging a team to reflect and revise process based on what they are learning. What was important about these ideas were not the particulars of Boehm’s proposed process, but the disruptive idea that iteration and process improvement are critical to engineering great software.

A spiral, showing successive rounds of prototping and risk analysis. — Boehm’s spiral model of software development.
Boehm

Around the same time, two influential books were published. Fred Brooks wrote The Mythical Man Month ⁶ ⁶

Fred P. Brooks (1995). The mythical man month. Pearson Education.

, a book about software project management, full of provocative ideas that would be tested over the next three decades, including the idea that adding more people to a project would not necessarily increase productivity. Tom DeMarco and Timothy Lister wrote another famous book, Peopleware: Productive Projects and Teams ⁷ ⁷

Tom DeMarco, Tim Lister (1987). Peopleware: Productive projects and teams. Addison-Wesley.

arguing that the major challenges in software engineering are human, not technical. Both of these works still represent some of the most widely-read statements of the problem of managing software development.

These early ideas in software project management led to a wide variety of other discoveries about process. For example, organizations of all sizes can improve their process if they are very aware of what the people in the organization know, what it’s capable of learning, and if it builds robust processes to actually continually improve process ⁹^,¹⁰ ⁹

Tore Dybå (2002). Enabling software process improvement: an investigation of the importance of organizational issues. Empirical Software Engineering.

¹⁰

Tore Dybå (2003). Factors of software process improvement success in small and large organizations: an empirical study in the scandinavian context. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

. This might mean monitoring the pace of work, incentivizing engineers to reflect on inefficiencies in process, and teaching engineers how to be comfortable with process change.

Beyond process improvement, other factors emerged. For example, researchers discovered that critical to team productivity was awareness of teammates’ work ¹⁶ ¹⁶

Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.

. Teams need tools like dashboards to help make them aware of changing priorities and tools like feeds to coordinate short term work ²⁹ ²⁹

Christoph Treude and Margaret-Anne Storey (2010). Awareness 2.0: staying aware of projects, developers and tasks using dashboards and feeds. ACM/IEEE International Conference on Software Engineering.

. Moreover, researchers found that engineers tended to favor non-social sources such as documentation for factual information, but social sources for information to support problem solving ¹⁹ ¹⁹

Allen E. Milewski (2007). Global and task effects in information-seeking among software engineers. Empirical Software Engineering.

. Decades ago, developers used tools like email and IRC for awareness; now they use tools like Slack , Trello , GitHub , and JIRA , which have the same basic functionality, but are much more polished, streamlined, and customizable.

In addition to awareness, ownership is a critical idea in process. This is the idea that for every line of code, someone is responsible for its quality. The owner might be the person who originally wrote the code, but it could also shift to new team members. Studies of code ownership on Windows Vista and Windows 7 found that less a component had a clear owner, the more pre-release defects it had and the more post-release failures were reported by users ³ ³

Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu (2011). Don't touch my code! Examining the effects of ownership on software quality. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

. This means that in addition to getting code written, having clear ownership and clear processes for transfer of ownership are key to functional correctness.

Pace is another factor that affects quality. Clearly, there’s a tradeoff between how fast a team works and the quality of the product it can build. In fact, interview studies of engineers at Google, Facebook, Microsoft, Intel, and other large companies found that the pressure to reduce “time to market” harmed nearly every aspect of teamwork: the availability and discoverability of information, clear communication, planning, integration with others’ work, and code ownership ²⁴ ²⁴

Julia Rubin and Martin Rinard (2016). The challenges of staying together while moving fast: an exploratory study. ACM/IEEE International Conference on Software Engineering.

. Not only did a fast pace reduce quality, but it also reduced engineers’ personal satisfaction with their job and their work. I encountered similar issues as CTO of my startup: while racing to market, I was often asked to meet impossible deadlines with zero defects and had to constantly communicate to the other executives in the company why this was not possible ¹⁷ ¹⁷

Amy J. Ko (2017). A Three-Year Participant Observation of Software Startup Software Evolution. ACM/IEEE International Conference on Software Engineering, Software Engineering in Practice.

Because of the importance of awareness and communication, the distance between teammates is also a critical factor. This is most visible in companies that hire remote developers, building distributed teams, or when teams are fully distributed (such as when there is a pandemic requiring social distancing). One motivation for doing this is to reduce costs or gain access to engineering talent that is distant from a team’s geographical center, but over time, companies have found that doing so necessitates significant investments in socialization to ensure quality, minimizing geographical, temporal and cultural separation ²⁶ ²⁶

Darja Šmite, Claes Wohlin, Tony Gorschek, Robert Feldt (2010). Empirical evidence in global software engineering: a systematic review. Empirical Software Engineering.

. Researchers have found that there appear to be fundamental tradeoffs between productivity, quality, and/or profits in these settings ²³ ²³

Narayan Ramasubbu, Marcelo Cataldo, Rajesh Krishna Balan, and James D. Herbsleb (2011). Configuring global software teams: a multi-company analysis of project productivity, quality, and profits. ACM/IEEE International Conference on Software Engineering.

. For example, more distance appears to lead to slower communication ³⁰ ³⁰

Patrick Wagstrom and Subhajit Datta (2014). Does latitude hurt while longitude kills? Geographical and temporal separation in a large scale software development project. ACM/IEEE International Conference on Software Engineering.

. Despite these tradeoffs, most rigorous studies of the cost of distributed development have found that when companies work hard to minimize temporal and cultural separation, the actual impact on defects was small ¹⁸ ¹⁸

Ekrem Kocaguneli, Thomas Zimmermann, Christian Bird, Nachiappan Nagappan, and Tim Menzies (2013). Distributed development considered harmful?. ACM/IEEE International Conference on Software Engineering.

. These efforts to minimize separation include more structured onboarding practices, more structured communication, and more structured processes, as well as systematic efforts to build and maintain trusting social relationships. Some researchers have begun to explore even more extreme models of distributed development, hiring contract developers to complete microtasks over a few days without hiring them as employees; early studies suggest that these models have the worst of outcomes, with greater costs, poor scalability, and more significant quality issues ²⁷ ²⁷

Klaas-Jan Stol and Brian Fitzgerald (2014). Two's company, three's a crowd: a case study of crowdsourcing software development. ACM/IEEE International Conference on Software Engineering.

A critical part of ensuring all that a team is successful is having someone responsible for managing these factors of distance, pace, ownership, awareness, and overall process. The most obvious person to oversee this is, of course, a project manager ⁵^,²¹ ⁵

Mike Borozdin (2017). Engineering management with Mike Borozdin. Software Engineering Daily Podcast.

²¹

Jeff Norris (2016). Tech leadership with Jeff Norris. Software Engineering Daily Podcast.

. Research on what skills software engineering project managers need suggests that while some technical knowledge is necessary, it the soft skills necessary for managing all of these factors in communication and coordination that distinguish great managers ¹⁵ ¹⁵

Eirini Kalliamvakou, Christian Bird, Thomas Zimmermann, Andrew Begel, Robert DeLine, Daniel M. German (2017). What makes a great manager of software engineers?. IEEE Transactions on Software Engineering.

While all of this research has strong implications for practice, industry has largely explored its own ideas about process, devising frameworks that addressed issues of distance, pace, ownership, awareness, and process improvement. Extreme Programming ² ²

Kent Beck (1999). Embracing change with extreme programming. IEEE Computer.

was one of these frameworks and it was full of ideas:

Be iterative
Do small releases
Keep design simple
Write unit tests
Refactor to iterate
Use pair programming
Integrate continuously
Everyone owns everything
Use an open workspace
Work sane hours

Note that none of these had any empirical evidence to back them. Moreover, Beck described in his original proposal that these ideas were best for “ outsourced or in-house development of small- to medium-sized systems where requirements are vague and likely to change ”, but as industry often does, it began hyping it as a universal solution to software project management woes and adopted all kinds of combinations of these ideas, adapting them to their existing processes. In reality, the value of XP appears to depend on highly project-specific factors ²⁰ ²⁰

Matthias M. Müller and Frank Padberg (2003). On the economic evaluation of XP projects. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

, while the core ideas that industry has adopted are valuing feedback, communication, simplicity, and respect for individuals and the team ²⁵ ²⁵

Helen Sharp, Hugh Robinson (2004). An ethnographic study of XP practice. Empirical Software Engineering.

. Researchers continue to investigate the merits of the list above; for example, numerous studies have investigated the effects of pair programming on defects, finding small but measurable benefits ⁸ ⁸

di Bella, E., Fronza, I., Phaphoom, N., Sillitti, A., Succi, G., & Vlasenko, J (2013). Pair programming and software defects—A large, industrial case study. IEEE Transactions on Software Engineering.

At the same time, Beck began also espousing the idea of “Agile” methods , which celebrated many of the values underlying Extreme Programming, such as focusing on individuals, keeping things simple, collaborating with customers, and being iterative. This idea of being agile was even more popular and spread widely in industry and research, even though many of the same ideas appeared much earlier in Boehm’s work on the Spiral model. Researchers found that Agile methods can increase developer enthusiasm ²⁸ ²⁸

Sharifah Syed-Abdullah, Mike Holcombe & Marian Gheorge (2006). The impact of an agile methodology on the well being of development teams. Empirical Software Engineering.

, that agile teams need different roles such as Mentor, Co-ordinator, Translator, Champion, Promoter, and Terminator ¹³ ¹³

Rashina Hoda, James Noble, and Stuart Marshall (2010). Organizing self-organizing teams. ACM/IEEE International Conference on Software Engineering.

, and that teams are combing agile methods with all kinds of process ideas from other project management frameworks such as Scrum (meet daily to plan work, plan two-week sprints, maintain a backlog of work) and Kanban (visualize the workflow, limit work-in-progress, manage flow, make policies explicit, and implement feedback loops) ¹ ¹

Osama Al-Baik & James Miller (2015). The kanban approach, between agility and leanness: a systematic review. Empirical Software Engineering.

. Research has also found that transitioning a team to Agile methods is slow and complex because it requires everyone on a team to change their behavior, beliefs, and practices ¹⁴ ¹⁴

Rashina Hoda, James Noble (2017). Becoming agile: a grounded theory of agile transitions in practice. ACM/IEEE International Conference on Software Engineering.

Ultimately, all of this energy around process ideas in industry is exciting, but disorganized. None of these efforts really get to the core of what makes software projects difficult to manage. One effort in research to get to this core by contributing new theories that explain these difficulties. The first is Herbsleb’s Socio-Technical Theory of Coordination (STTC) . The idea of the theory is quite simple: technical dependencies in engineering decisions (e.g., this function calls this other function, this data type stores this other data type) define the social constraints that the organization must solve in a variety of ways to build and maintain software ¹¹^,¹² ¹¹

James D. Herbsleb and Audris Mockus (2003). Formulation and preliminary test of an empirical theory of coordination in software engineering. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

¹²

James Herbsleb (2016). Building a socio-technical theory of coordination: why and how. ACM SIGSOFT Foundations of Software Engineering (FSE).

. The better the organization builds processes and awareness tools to ensure that the people who own those engineering dependencies are communicating and aware of each others’ work, the fewer defects that will occur. Herbsleb referred this alignment as sociotechnical congruence , and conducted a number of studies demonstrating its predictive and explanatory power.

I extended this idea to congruence with beliefs about product value ¹⁷ ¹⁷

Amy J. Ko (2017). A Three-Year Participant Observation of Software Startup Software Evolution. ACM/IEEE International Conference on Software Engineering, Software Engineering in Practice.

, claiming that successful software products require the constant, collective communication and agreement of a coherent proposition of a product’s value across UX, design, engineering, product, marketing, sales, support, and even customers. A team needs to achieve Herbsleb’s sociotechnical congruence to have a successful product, but that alone is not enough: the rest of the organization has to have a consistent understanding of what is being built and why, even as that understanding evolves over time.

References

Osama Al-Baik & James Miller (2015). The kanban approach, between agility and leanness: a systematic review. Empirical Software Engineering.
Kent Beck (1999). Embracing change with extreme programming. IEEE Computer.
Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu (2011). Don't touch my code! Examining the effects of ownership on software quality. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Barry W. Boehm (1988). A spiral model of software development and enhancement. IEEE Computer.
Mike Borozdin (2017). Engineering management with Mike Borozdin. Software Engineering Daily Podcast.
Fred P. Brooks (1995). The mythical man month. Pearson Education.
Tom DeMarco, Tim Lister (1987). Peopleware: Productive projects and teams. Addison-Wesley.
di Bella, E., Fronza, I., Phaphoom, N., Sillitti, A., Succi, G., & Vlasenko, J (2013). Pair programming and software defects—A large, industrial case study. IEEE Transactions on Software Engineering.
Tore Dybå (2002). Enabling software process improvement: an investigation of the importance of organizational issues. Empirical Software Engineering.
Tore Dybå (2003). Factors of software process improvement success in small and large organizations: an empirical study in the scandinavian context. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
James D. Herbsleb and Audris Mockus (2003). Formulation and preliminary test of an empirical theory of coordination in software engineering. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
James Herbsleb (2016). Building a socio-technical theory of coordination: why and how. ACM SIGSOFT Foundations of Software Engineering (FSE).
Rashina Hoda, James Noble, and Stuart Marshall (2010). Organizing self-organizing teams. ACM/IEEE International Conference on Software Engineering.
Rashina Hoda, James Noble (2017). Becoming agile: a grounded theory of agile transitions in practice. ACM/IEEE International Conference on Software Engineering.
Eirini Kalliamvakou, Christian Bird, Thomas Zimmermann, Andrew Begel, Robert DeLine, Daniel M. German (2017). What makes a great manager of software engineers?. IEEE Transactions on Software Engineering.
Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.
Amy J. Ko (2017). A Three-Year Participant Observation of Software Startup Software Evolution. ACM/IEEE International Conference on Software Engineering, Software Engineering in Practice.
Ekrem Kocaguneli, Thomas Zimmermann, Christian Bird, Nachiappan Nagappan, and Tim Menzies (2013). Distributed development considered harmful?. ACM/IEEE International Conference on Software Engineering.
Allen E. Milewski (2007). Global and task effects in information-seeking among software engineers. Empirical Software Engineering.
Matthias M. Müller and Frank Padberg (2003). On the economic evaluation of XP projects. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Jeff Norris (2016). Tech leadership with Jeff Norris. Software Engineering Daily Podcast.
Tim Pettersen (2016). Git Workflows with Tim Pettersen. Software Engineering Daily Podcast.
Narayan Ramasubbu, Marcelo Cataldo, Rajesh Krishna Balan, and James D. Herbsleb (2011). Configuring global software teams: a multi-company analysis of project productivity, quality, and profits. ACM/IEEE International Conference on Software Engineering.
Julia Rubin and Martin Rinard (2016). The challenges of staying together while moving fast: an exploratory study. ACM/IEEE International Conference on Software Engineering.
Helen Sharp, Hugh Robinson (2004). An ethnographic study of XP practice. Empirical Software Engineering.
Darja Šmite, Claes Wohlin, Tony Gorschek, Robert Feldt (2010). Empirical evidence in global software engineering: a systematic review. Empirical Software Engineering.
Klaas-Jan Stol and Brian Fitzgerald (2014). Two's company, three's a crowd: a case study of crowdsourcing software development. ACM/IEEE International Conference on Software Engineering.
Sharifah Syed-Abdullah, Mike Holcombe & Marian Gheorge (2006). The impact of an agile methodology on the well being of development teams. Empirical Software Engineering.
Christoph Treude and Margaret-Anne Storey (2010). Awareness 2.0: staying aware of projects, developers and tasks using dashboards and feeds. ACM/IEEE International Conference on Software Engineering.
Patrick Wagstrom and Subhajit Datta (2014). Does latitude hurt while longitude kills? Geographical and temporal separation in a large scale software development project. ACM/IEEE International Conference on Software Engineering.
Gerald M. Weinberg (1982). Over-structured management of software engineering. ACM/IEEE International Conference on Software Engineering.

Program comprehension is about understanding dependencies

Chapter 10

Comprehension

by Amy J. Ko

Despite all of the activities that we’ve talked about so far—communicating, coordinating, planning, designing, architecting—really, most of a software engineers time is spent reading code ¹⁶ ¹⁶

Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke (2014). On the comprehension of program comprehension. ACM Transactions on Software Engineering and Methodology (TOSEM).

. Sometimes this is their own code, which makes this reading easier. Most of the time, it is someone else’s code, whether it’s a teammate’s, or part of a library or API you’re using. We call this reading program comprehension .

Being good at program comprehension is a critical skill. You need to be able to read a function and know what it will do with its inputs; you need to be able to read a class and understand its state and functionality; you also need to be able to comprehend a whole implementation, understanding its architecture. Without these skills, you can’t test well, you can’t debug well, and you can’t fix or enhance the systems you’re building or maintaining. In fact, studies of software engineers’ first year at their first job show that a significant majority of their time is spent trying to simply comprehend the architecture of the system they are building or maintaining and understanding the processes that are being followed to modify and enhance them ⁶ ⁶

Barthélémy Dagenais, Harold Ossher, Rachel K. E. Bellamy, Martin P. Robillard, and Jacqueline P. de Vries (2010). Moving into a new software project landscape. ACM/IEEE International Conference on Software Engineering.

What’s going on when developers comprehend code? Usually, developers are trying to answer questions about code that help them build larger models of how a program works. Because program comprehension is hard, they avoid it when they can, relying on explanations from other developers rather than trying to build precise models of how a program works on their own ¹⁹ ¹⁹

Tobias Roehm, Rebecca Tiarks, Rainer Koschke, and Walid Maalej (2012). How do professional developers comprehend software?. ACM/IEEE International Conference on Software Engineering.

. When they do try to comprehend code, developers are usually trying to answer questions. Several studies have many general questions that developers must be able to answer in order to understand programs ¹⁴^,²² ¹⁴

Thomas D. LaToza, Brad A. Myers (2010). Developers ask reachability questions. ACM/IEEE International Conference on Software Engineering.

²²

Jonathan Sillito, Gail C. Murphy, and Kris De Volder (2006). Questions programmers ask during software evolution tasks. ACM SIGSOFT Foundations of Software Engineering (FSE).

. Here are nearly forty common questions that developers ask:

Which type represents this domain concept or this UI element or action?
Where in the code is the text in this error message or UI element?
Where is there any code involved in the implementation of this behavior?
Is there an entity named something like this in that unit (for example in a project, package or class)?
What are the parts of this type?
Which types is this type a part of?
Where does this type fit in the type hierarchy?
Does this type have any siblings in the type hierarchy?
Where is this field declared in the type hierarchy?
Who implements this interface or these abstract methods?
Where is this method called or type referenced?
When during the execution is this method called?
Where are instances of this class created?
Where is this variable or data structure being accessed?
What data can we access from this object?
What does the declaration or definition of this look like?
What are the arguments to this function?
What are the values of these arguments at runtime?
What data is being modified in this code?
How are instances of these types created and assembled?
How are these types or objects related?
How is this feature or concern (object ownership, UI control, etc) implemented?
What in this structure distinguishes these cases?
What is the “correct” way to use or access this data structure?
How does this data structure look at runtime?
How can data be passed to (or accessed at) this point in the code?
How is control getting (from here to) here?
Why isn’t control reaching this point in the code?
Which execution path is being taken in this case?
Under what circumstances is this method called or exception thrown?
What parts of this data structure are accessed in this code?
How does the system behavior vary over these types or cases?
What are the differences between these files or types?
What is the difference between these similar parts of the code (e.g., between sets of methods)?
What is the mapping between these UI types and these model types?
How can we know this object has been created and initialized correctly?

If you think about the diversity of questions in this list, you can see why program comprehension requires expertise. You not only need to understand programming languages quite well, but you also need to have strategies for answering all of the questions above (and more) quickly, effectively, and accurately.

So how do developers go about answering these questions? Studies comparing experts and novices show that experts use prior knowledge that they have about architecture, design patterns, and the problem domain a program is built for to know what questions to ask and how to answer them, whereas novices use surface features of code, which leads them to spend considerable time reading code that is irrelevant to a question ¹³^,²³ ¹³

Thomas D. LaToza, David Garlan, James D. Herbsleb, and Brad A. Myers (2007). Program comprehension as fact finding. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

²³

Anneliese von Mayrhauser A. Marie Vans (1994). Comprehension processes during large scale maintenance. ACM/IEEE International Conference on Software Engineering.

. Reading and comprehending source code is fundamentally different from those of reading and comprehending natural language ⁴ ⁴

Dave Binkley, Marcia Davis, Dawn Lawrie, Jonathan I. Maletic, Christopher Morrell, Bonita Sharif (2013). The impact of identifier style on effort and comprehension. Empirical Software Engineering.

; what experts are doing is ultimately reasoning about dependencies between code ²⁴ ²⁴

Mark Weiser (1981). Program slicing. ACM/IEEE International Conference on Software Engineering.

. Dependencies include things like data dependencies (where a variable is used to compute something, what modifies a data structure, how data flows through a program, etc.) and control dependencies (which components call which functions, which events can trigger a function to be called, how a function is reached, etc.). All of the questions above fundamentally get at different types of data and control dependencies. In fact, theories of how developers navigate code by following these dependencies are highly predictive of what information a developer will seek next ⁸ ⁸

Scott D. Fleming, Christopher Scaffidi, David Piorkowski, Margaret M. Burnett, Rachel K. E. Bellamy (2013). An information foraging theory perspective on tools for debugging, refactoring, and reuse tasks. ACM Transactions on Software Engineering and Methodology (TOSEM).

, suggesting that expert behavior is highly procedural. This work, and work explicitly investigating the role of identifier names ¹⁵ ¹⁵

Lawrie, D., Morrell, C., Feild, H., & Binkley, D (2006). What's in a name? A study of identifiers. IEEE International Conference on Program Comprehension (ICPC).

, finds that names are actually critical to facilitating higher level comprehension of program behavior.

Of course, program comprehension is not an inherently individual process either. Expert developers are resourceful, and frequently ask others for explanations of program behavior. Some of this might happen between coworkers, where someone seeking insight asks other engineers for summaries of program behavior, to accelerate their learning ¹¹ ¹¹

Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.

. Others might rely on public forums, such as Stack Overflow, for explanations of API behavior ¹⁷ ¹⁷

. These social help seeking strategies are strongly mediated by a developers’ willingness to express that they need help to more expert teammates. Some research, for example, has found that junior developers are reluctant to ask for help out of fear of looking incompetent, even when everyone on a team is willing to offer help and their manager prefers that the developer prioritize productivity over fear of stigma ² ²

Andy Begel, Beth Simon (2008). Novice software developers, all over again. ICER.

. And then, of course, learning is just hard. For example, one study investigated the challenges that developers face in learning new programming languages, finding that unlearning old habits, shifting to new language paradigms, learning new terminology, and adjusting to new tools all required materials that could bridge from their prior knowledge to the new language, but few such materials existed ²¹ ²¹

Shrestha, N., Botta, C., Barik, T., & Parnin, C (2020). Here we go again: why is it difficult for developers to learn another programming language?. ACM/IEEE International Conference on Software Engineering.

. These findings suggest the critical importance of teams ensuring that newcomers view them as psychologically safe places, where vulnerable actions like expressing a need for help will not be punished, ridiculed, or shamed, but rather validated, celebrated, and encouraged.

While much of program comprehension is individual and social skill, some aspects of program comprehension are determined by the design of programming languages. For example, some programming languages result in programs that are more comprehensible. One framework called the Cognitive Dimensions of Notations ⁹ ⁹

Thomas R. G. Green (1989). Cognitive dimensions of notations. People and computers.

lays out some of the tradeoffs in programming language design that result in these differences in comprehensibility. For example, one of the dimensions in the framework is consistency , which refers to how much of a notation can be guessed based on an initial understanding of a language. JavaScript has low consistency because of operators like == , which behave differently depending on what the type of the left and right operands are. Knowing the behavior for Booleans doesn’t tell you the behavior for a Boolean being compared to an integer. In contrast, Java is a high consistency language: == is only ever valid when both operands are of the same type.

These differences in notation can have some impact. Encapsulation through data structures leads to better comprehension that monolithic or purely functional languages ³^,²⁵ ³

Pamela Bhattacharya and Iulian Neamtiu (2011). Assessing programming language impact on development and maintenance: a study on C and C++. ACM/IEEE International Conference on Software Engineering.

²⁵

Scott N. Woodfield, Hubert E. Dunsmore, and Vincent Y. Shen (1981). The effect of modularization and comments on program comprehension. ACM/IEEE International Conference on Software Engineering.

. Declarative programming paradigms (like CSS or HTML) have greater comprehensibility than imperative languages ²⁰ ²⁰

Guido Salvaneschi, Sven Amann, Sebastian Proksch, and Mira Mezini (2014). An empirical study on program comprehension with reactive programming. ACM SIGSOFT Foundations of Software Engineering (FSE).

. Statically typed languages like Java (which require developers to declare the data type of all variables) result in fewer defects ¹⁸ ¹⁸

Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu (2014). A large scale study of programming languages and code quality in GitHub. ACM SIGSOFT Foundations of Software Engineering (FSE).

, better comprehensibility because of the ability to construct better documentation ⁷ ⁷

Stefan Endrikat, Stefan Hanenberg, Romain Robbes, and Andreas Stefik (2014). How do API documentation and static typing affect API usability?. ACM/IEEE International Conference on Software Engineering.

, and result in easier debugging ¹⁰ ¹⁰

Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, Andreas Stefik (2013). An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering.

. In fact, studies of more dynamic languages like JavaScript and Smalltalk ⁵ ⁵

Oscar Callaú, Romain Robbes, Éric Tanter, David Röthlisberger (2013). How (and why) developers use the dynamic features of programming languages: the case of Smalltalk. Empirical Software Engineering.

show that the dynamic features of these languages aren’t really used all that much anyway. Despite all of these measurable differences, the impact of notation seems to be modest in practice ¹⁸ ¹⁸

. All of this evidence suggests that that the more you tell a compiler about what your code means (by declaring types, writing functional specifications, etc.), the more it helps the other developers know what it means too, but that this doesn’t translate into huge differences in defects.

Code editors, development environments, and program comprehension tools can also be helpful. Early evidence showed that simple features like syntax highlighting and careful typographic choices can improve the speed of program comprehension ¹ ¹

Ron Baecker (1988). Enhancing program readability and comprehensibility with tools for program visualization. ACM/IEEE International Conference on Software Engineering.

. I have also worked on several tools to support program comprehension, including the Whyline, which automates many of the more challenging aspects of navigating dependencies in code, and visualizes them ¹² ¹²

Amy J. Ko and Brad A. Myers (2009). Finding causes of program output with the Java Whyline. ACM SIGCHI Conference on Human Factors in Computing (CHI).

The Whyline for Java, a debugging tool that faciliates dependency navigation

The path from novice to expert in program comprehension is one that involves understanding programming language semantics exceedingly well and reading a lot of code, design patterns, and architectures. Anticipate that as you develop these skills, it will take you time to build robust understandings of what a program is doing, slowing down your writing, testing, and debugging.

References

Ron Baecker (1988). Enhancing program readability and comprehensibility with tools for program visualization. ACM/IEEE International Conference on Software Engineering.
Andy Begel, Beth Simon (2008). Novice software developers, all over again. ICER.
Pamela Bhattacharya and Iulian Neamtiu (2011). Assessing programming language impact on development and maintenance: a study on C and C++. ACM/IEEE International Conference on Software Engineering.
Dave Binkley, Marcia Davis, Dawn Lawrie, Jonathan I. Maletic, Christopher Morrell, Bonita Sharif (2013). The impact of identifier style on effort and comprehension. Empirical Software Engineering.
Oscar Callaú, Romain Robbes, Éric Tanter, David Röthlisberger (2013). How (and why) developers use the dynamic features of programming languages: the case of Smalltalk. Empirical Software Engineering.
Barthélémy Dagenais, Harold Ossher, Rachel K. E. Bellamy, Martin P. Robillard, and Jacqueline P. de Vries (2010). Moving into a new software project landscape. ACM/IEEE International Conference on Software Engineering.
Stefan Endrikat, Stefan Hanenberg, Romain Robbes, and Andreas Stefik (2014). How do API documentation and static typing affect API usability?. ACM/IEEE International Conference on Software Engineering.
Scott D. Fleming, Christopher Scaffidi, David Piorkowski, Margaret M. Burnett, Rachel K. E. Bellamy (2013). An information foraging theory perspective on tools for debugging, refactoring, and reuse tasks. ACM Transactions on Software Engineering and Methodology (TOSEM).
Thomas R. G. Green (1989). Cognitive dimensions of notations. People and computers.
Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, Andreas Stefik (2013). An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering.
Amy J. Ko, Rob DeLine, and Gina Venolia (2007). Information needs in collocated software development teams. ACM/IEEE International Conference on Software Engineering.
Amy J. Ko and Brad A. Myers (2009). Finding causes of program output with the Java Whyline. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Thomas D. LaToza, David Garlan, James D. Herbsleb, and Brad A. Myers (2007). Program comprehension as fact finding. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Thomas D. LaToza, Brad A. Myers (2010). Developers ask reachability questions. ACM/IEEE International Conference on Software Engineering.
Lawrie, D., Morrell, C., Feild, H., & Binkley, D (2006). What's in a name? A study of identifiers. IEEE International Conference on Program Comprehension (ICPC).
Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke (2014). On the comprehension of program comprehension. ACM Transactions on Software Engineering and Methodology (TOSEM).
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann (2011). Design lessons from the fastest Q&A site in the west. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu (2014). A large scale study of programming languages and code quality in GitHub. ACM SIGSOFT Foundations of Software Engineering (FSE).
Tobias Roehm, Rebecca Tiarks, Rainer Koschke, and Walid Maalej (2012). How do professional developers comprehend software?. ACM/IEEE International Conference on Software Engineering.
Guido Salvaneschi, Sven Amann, Sebastian Proksch, and Mira Mezini (2014). An empirical study on program comprehension with reactive programming. ACM SIGSOFT Foundations of Software Engineering (FSE).
Shrestha, N., Botta, C., Barik, T., & Parnin, C (2020). Here we go again: why is it difficult for developers to learn another programming language?. ACM/IEEE International Conference on Software Engineering.
Jonathan Sillito, Gail C. Murphy, and Kris De Volder (2006). Questions programmers ask during software evolution tasks. ACM SIGSOFT Foundations of Software Engineering (FSE).
Anneliese von Mayrhauser A. Marie Vans (1994). Comprehension processes during large scale maintenance. ACM/IEEE International Conference on Software Engineering.
Mark Weiser (1981). Program slicing. ACM/IEEE International Conference on Software Engineering.
Scott N. Woodfield, Hubert E. Dunsmore, and Vincent Y. Shen (1981). The effect of modularization and comments on program comprehension. ACM/IEEE International Conference on Software Engineering.

Have you met your requirements? How do you know?

Chapter 11

Verification

by Amy J. Ko

How do you know a program does what you intended?

Part of this is being clear about what you intended (by writing specifications , for example), but your intents, however clear, are not enough: you need evidence that your intents were correctly expressed computationally. To get this evidence, we do verification .

There are many ways to verify code. A reasonable first instinct is to simply run your program. After all, what better way to check whether you expressed your intents then to see with your own eyes what your program does? This is an empirical approach is called testing . Some testing is manual , in that a human executes a program and verifies that it does what was intended. Some testing is automated , in that the test is run automatically by a computer. Another way to verify code is to analyze it, using logic to verify its correct operation. As with testing, some analysis is manual , since humans do it. We call this manual analysis inspection , whereas other analysis is automated , since computers do it. We call this program analysis . This leads to a nice complementary set of verification technique along two axes: degree of automation and type of verification:

Manual techniques include manual testing (which is empirical) and inspections (which is analytical)
Automated techniques include automated testing (which is empirical) and program analysis (which is analytical)

To discuss each of these and their tradeoffs, first we have to cover some theory about verification. The first and simplest ideas are some terminology:

A defect is some subset of a program’s code that exhibits behavior that violates a program’s specifications. For example, if a program was supposed to sort a list of numbers in increasing order and print it to a console, but a flipped inequality in the sorting algorithm made it sort them in decreasing order, the flipped inequality is the defect.
A failure is the program behavior that results from a defect executing. In our sorting example, the failure is the incorrectly sorted list printed on the console.
A bug vaguely refers to either the defect, the failure, or both. When we say “bug”, we’re not being very precise, but it is a popular shorthand for a defect and everything it causes.

Note that because defects are defined relative to intent , whether a behavior is a failure depends entirely the definition of intent. If that intent is vague, whether something is a defect is vague. Moreover, you can define intents that result in behaviors that seem like failures: for example, I can write a program that intentionally crashes. A crash isn’t a failure if it was intended! This might be pedantic, but you’d be surprised how many times I’ve seen professional developers in bug triage meetings say:

“Well, it’s worked this way for a long time, and people have built up a lot of workarounds for this bug. It’s also really hard to fix. Let’s just call this by design. Closing this bug as won’t fix.”

So how do you find defects in a program? Let’s start with testing. Testing is generally the easiest kind of verification to do, but as a practice, it has questionable efficacy. Empirical studies of testing find that it is related to fewer defects in the future, but not strongly related, and it’s entirely possible that it’s not the testing itself that results in fewer defects, but that other activities (such as more careful implementation) result in fewer defects and testing efforts ¹ ¹

Iftekhar Ahmed, Rahul Gopinath, Caius Brindescu, Alex Groce, and Carlos Jensen (2016). Can testedness be effectively measured?. ACM SIGSOFT Foundations of Software Engineering (FSE).

. At the same time, modern developers don’t test as much as they think they do ³ ³

Moritz Beller, Georgios Gousios, Annibale Panichella, and Andy Zaidman (2015). When, how, and why developers (do not) test in their IDEs. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

. Moreover, students are often not convinced of the return on investment of automated tests and often opt for laborious manual tests (even though they regret it later) ⁶ ⁶

Raphael Pham, Stephan Kiesling, Olga Liskin, Leif Singer, and Kurt Schneider (2014). Enablers, inhibitors, and perceptions of testing in novice software teams. ACM SIGSOFT Foundations of Software Engineering (FSE).

. Testing is therefore in a strange place: it’s a widespread activity in industry, but it’s often not executed systematically, and there is some evidence that it doesn’t seem to help prevent defects from being released.

Why is this? One possibility is that no amount of testing can prove a program correct with respect to its specifications . Why? It boils down to the same limitations that exist in science: with empiricism, we can provide evidence that a program does have defects, but we can’t provide complete evidence that a program doesn’t have defects. This is because even simple programs can execute in a infinite number of different ways.

Consider this JavaScript program:

function count(input) {
  while(input > 0)
    input--;
  return input;
}

The function should always return 0, right? How many possible values of input do we have to try manually to verify that it always does? Well, if input is an integer, then there are 2 to the power 32 possible integer values, because JavaScript uses 32-bits to represent an integer. That’s not infinite, but that’s a lot. But what if input is a string? There are an infinite number of possible strings because they can have any sequence of characters of any length. Now we have to manually test an infinite number of possible inputs. So if we were restricting ourselves to testing, we will never know that the program is correct for all possible inputs. In this case, automatic testing doesn’t even help, since there are an infinite number of tests to run.

There are some ideas in testing that can improve how well we can find defects. For example, rather than just testing the inputs you can think of, focus on all of the lines of code in your program. If you find a set of tests that can cause all of the lines of code to execute, you have one notion of test coverage . Of course, lines of code aren’t enough, because an individual line can contain multiple different paths in it (e.g., value ? getResult1() : getResult2() ). So another notion of coverage is executing all of the possible control flow paths through the various conditionals in your program. Executing all of the possible paths is hard, of course, because every conditional in your program doubles the number of possible paths (you have 200 if statements in your program? That’s up to 2 to the power 200 possible paths, which is more paths than there are atoms in the universe .

There are many types of testing that are common in software engineering:

Unit tests verify that functions return the correct output. For example, a program that implemented a function for finding the day of the week for a given date might also include unit tests that verify for a large number of dates that the correct day of the week is returned. They’re good for ensuring widely used low-level functionality is correct.
Integration tests verify that when all of the functionality of a program is put together into the final product, it behaves according to specifications. Integration tests often operate at the level of user interfaces, clicking buttons, entering text, submitting forms, and verifying that the expected feedback always occurs. Integration tests are good for ensuring that important tasks that users will perform are correct.
Regression tests verify that behavior that previously worked doesn’t stop working. For example, imagine you find a defect that causes logins to fail; you might write a test that verifies that this cause of login failure does not occur, in case someone breaks the same functionality again, even for a different reason. Regression tests are good for ensuring that you don’t break things when you make changes to your application.

Which tests you should write depends on what risks you want to take. Don’t care about failures? Don’t write any tests. If failures of a particular kind are highly consequential to your team, you should probably write tests that check for those failures. As we noted above, you can’t write enough tests to catch all bugs, so deciding which tests to write and maintain is a key challenge.

Now, you might be thinking that it’s obvious that the program above is defective for some integers and strings. How did you know? You analyzed the program rather than executing it with specific inputs. For example, when I read (analyzed) the program, I thought:

“if we assume input is an integer, then there are only three types of values to meaningfully consider with respect to the > in the loop condition: positive, zero, and negative. Positive numbers will always decrement to 0 and return 0. Zero will return zero. And negative numbers just get returned as is, since they’re less then zero, which is wrong with respect to the specification. And in JavaScript, strings are never greater than 0 (let’s not worry about whether it even makes sense to be able to compare strings and numbers), so the string is returned, which is wrong.”

The above is basically an informal proof. I used logic to divide the possible states of input and their effect on the program’s behavior. I used symbolic execution to verify all possible paths through the function, finding the paths that result in correct and incorrect values. The strategy was an inspection because we did it manually. If we had written a program that read the program to perform this proof automatically, we would have called it program analysis .

The benefits of analysis is that it can demonstrate that a program is correct in all cases. This is because they can handle infinite spaces of possible inputs by mapping those infinite inputs onto a finite space of possible executions. It’s not always possible to do this in practice, since many kinds of programs can execute in infinite ways, but it gets us closer to proving correctness.

One popular type of automatic program analysis tools is a static analysis tool. These tools read programs and identify potential defects using the types of formal proofs like the ones above. They typically result in a set of warnings, each one requiring inspection by a developer to verify, since some of the warnings may be false positives (something the tool thought was a defect, but wasn’t). Although static analysis tools can find many kinds of defects, they aren’t yet viewed by developers to be that useful because the false positives are often large in number and the way they are presented make them difficult to understand ⁴ ⁴

Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge (2013). Why don't software developers use static analysis tools to find bugs?. ACM/IEEE International Conference on Software Engineering.

. There is one exception to this, and it’s a static analysis tool you’ve likely used: a compiler. Compilers verify the correctness of syntax, grammar, and for statically-typed languages, the correctness of types. As I’m sure you’ve discovered, compiler errors aren’t always the easiest to comprehend, but they do find real defects automatically. The research community is just searching for more advanced ways to check more advanced specifications of program behavior.

Not all analytical techniques rely entirely on logic. In fact, one of the most popular methods of verification in industry are code reviews , also known as inspections . The basic idea of an inspection is to read the program analytically, following the control and data flow inside the code to look for defects. This can be done alone, in groups, and even included as part of process of integrating changes, to verify them before they are committed to a branch. Modern code reviews, while informal, help find defects, stimulate knowledge transfer between developers, increase team awareness, and help identify alternative implementations that can improve quality ² ²

Alberto Bacchelli and Christian Bird (2013). Expectations, outcomes, and challenges of modern code review. ACM/IEEE International Conference on Software Engineering.

. One study found that measures of how much a developer knows about an architecture can increase 66% to 150% depending on the project ⁸ ⁸

Peter C. Rigby and Christian Bird (2013). Convergent contemporary software peer review practices. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

. That said, not all reviews are created equal: the best ones are thorough and conducted by a reviewer with strong familiarity with the code ⁵ ⁵

Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey (2016). Code review quality: how developers see it. ACM/IEEE International Conference on Software Engineering.

; including reviewers that do not know each other or do not know the code can result in longer reviews, especially when run as meetings ⁹ ⁹

Carolyn B. Seaman and Victor R. Basili (1997). An empirical study of communication in code inspections. ACM/IEEE International Conference on Software Engineering.

. Soliciting reviews asynchronously by allowing developers to request reviewers of their peers is generally much more scalable ⁷ ⁷

Peter C. Rigby and Margaret-Anne Storey (2011). Understanding broadcast based peer review on open source software projects. ACM/IEEE International Conference on Software Engineering.

, but this requires developers to be careful about which reviews they invest in. These choices about where to put reviewing attention can result in great disparities in what is reviewed, especially in open source: the more work a review is perceived to be, the less likely it is to be reviewed at all and the longer the delays in receiving a review ¹⁰ ¹⁰

Thongtanunam, P., McIntosh, S., Hassan, A. E., & Iida, H (2016). Review participation in modern code review: An empirical study of the Android, Qt, and OpenStack projects. Empirical Software Engineering.

Beyond these more technical considerations around verifying a program’s correctness are organizational issues around different software qualities. For example, different organizations have different sensitivities to defects. If a $0.99 game on the app store has a defect, that might not hurt its sales much, unless that defect prevents a player from completing the game. If Boeing’s flight automation software has a defect, hundreds of people might die. The game developer might do a little manual play testing, release, and see if anyone reports a defect. Boeing will spend years proving mathematically with automatic program analysis that every line of code does what is intended, and repeating this verification every time a line of code changes. Moreover, requirements may change differently in different domains. For example, a game company might finally recognize the sexist stereotypes amplified in its game mechanics and have to change requirements, resulting in changed definitions of correctness, and the incorporation of new software qualities such as bias into testing plans. Similarly, Boeing might have to respond to pandemic fears by having to shift resources away from verifying flight crash safety to verifying public health safety. What type of verification is right for your team depends entirely on what a team is building, who’s using it, and how they’re depending on it.

References

Iftekhar Ahmed, Rahul Gopinath, Caius Brindescu, Alex Groce, and Carlos Jensen (2016). Can testedness be effectively measured?. ACM SIGSOFT Foundations of Software Engineering (FSE).
Alberto Bacchelli and Christian Bird (2013). Expectations, outcomes, and challenges of modern code review. ACM/IEEE International Conference on Software Engineering.
Moritz Beller, Georgios Gousios, Annibale Panichella, and Andy Zaidman (2015). When, how, and why developers (do not) test in their IDEs. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge (2013). Why don't software developers use static analysis tools to find bugs?. ACM/IEEE International Conference on Software Engineering.
Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey (2016). Code review quality: how developers see it. ACM/IEEE International Conference on Software Engineering.
Raphael Pham, Stephan Kiesling, Olga Liskin, Leif Singer, and Kurt Schneider (2014). Enablers, inhibitors, and perceptions of testing in novice software teams. ACM SIGSOFT Foundations of Software Engineering (FSE).
Peter C. Rigby and Margaret-Anne Storey (2011). Understanding broadcast based peer review on open source software projects. ACM/IEEE International Conference on Software Engineering.
Peter C. Rigby and Christian Bird (2013). Convergent contemporary software peer review practices. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Carolyn B. Seaman and Victor R. Basili (1997). An empirical study of communication in code inspections. ACM/IEEE International Conference on Software Engineering.
Thongtanunam, P., McIntosh, S., Hassan, A. E., & Iida, H (2016). Review participation in modern code review: An empirical study of the Android, Qt, and OpenStack projects. Empirical Software Engineering.

It’s not always easy to see software fail.

Chapter 12

Monitoring

by Amy J. Ko

The first application I ever wrote was a complete and utter failure.

I was an eager eighth grader, full of wonder and excitement about the infinite possibilities in code, with an insatiable desire to build, build, build. I’d made plenty of little games and widgets for myself, but now was my chance to create something for someone else: my friend and I were making a game and he needed a tool to create pixel art for it. We had no money for fancy Adobe licenses, and so I decided to make a tool.

In designing the app, I made every imaginable software engineering mistake. I didn’t talk to him about requirements. I didn’t test on his computer before sending the finished app. I certainly didn’t conduct any usability tests, performance tests, or acceptance tests. The app I ended up shipping was a pure expression of what I wanted to build, not what he needed to be creative or productive. As a result, it was buggy, slow, confusing, and useless, and blinded by my joy of coding, I had no clue.

Now, ideally my “customer” would have reported any of these problems to me right away, and I would have learned some tough lessons about software engineering. But this customer was my best friend, and also a very nice guy. He wasn’t about to trash all of my hard work. Instead, he suffered in silence. He struggled to install, struggled to use, and worst of all struggled to create. He produced some amazing art a few weeks after I gave him the app, but it was only after a few months of progress on our game that I learned he hadn’t used my app for a single asset, preferring instead to suffer through Microsoft Paint. My app was too buggy, too slow, and too confusing to be useful. I was devastated.

Why didn’t I know it was such a complete failure? Because I wasn’t looking . I’d ignored the ultimate test suite: my customer . I’d learned that the only way to really know whether software requirements are right is by watching how it executes in the world through monitoring ¹³ ¹³

James Turnbull (2016). The art of monitoring with James Turnbull. Software Engineering Daily Podcast.

Of course, this is easier said than done. That’s because the (ideally) massive numbers of people executing your software is not easily observable ¹¹ ¹¹

Tim Menzies, Tom Zimmermann (2013). Software analytics: so what?. IEEE Software.

. Moreover, each software quality you might want to monitor (performance, functional correctness, usability) requires entirely different methods of observation and analysis. Let’s talk about some of the most important qualities to monitor and how to monitor them.

These are some of the easiest failures to detect because they are overt and unambiguous. Microsoft was one of the first organizations to do this comprehensively, building what eventually became known as Windows Error Reporting ⁷ ⁷

Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt (2009). Debugging in the (very) large: ten years of implementation and experience. ACM SIGOPS Symposium on Operating Systems Principles (SOSP).

. It turns out that actually capturing these errors at scale and mining them for repeating, reproducible failures is quite complex, requiring classification, progressive data collection, and many statistical techniques to extract signal from noise. In fact, Microsoft has a dedicated team of data scientists and engineers whose sole job is to manage the error reporting infrastructure, monitor and triage incoming errors, and use trends in errors to make decisions about improvements to future releases and release processes. This is now standard practice in most companies and organizations, including other big software companies (Google, Apple, IBM, etc.), as well as open source projects (eg, Mozilla). In fact, many application development platforms now include this as a standard operating system feature.

Performance, like crashes, kernel panics, and hangs, is easily observable in software, but a bit trickier to characterize as good or bad. How slow is too slow? How bad is it if something is slow occasionally? You’ll have to define acceptable thresholds for different use cases to be able to identify problems automatically. Some experts in industry ⁸ ⁸

Andi Grabner (2016). Performance monitoring with Andi Grabner. Software Engineering Daily Podcast.

still view this as an art.

It’s also hard to monitor performance without actually harming performance. Many tools and web services (e.g., New Relic ) are getting better at reducing this overhead and offering real time data about performance problems through sampling.

Monitoring for data breaches, identity theft, and other security and privacy concerns are incredibly important parts of running a service, but also very challenging. This is partly because the tools for doing this monitoring are not yet well integrated, requiring each team to develop its own practices and monitoring infrastructure. But it’s also because protecting data and identity is more than just detecting and blocking malicious payloads. It’s also about recovering from ones that get through, developing reliable data streams about application network activity, monitoring for anomalies and trends in those streams, and developing practices for tracking and responding to warnings that your monitoring system might generate. Researchers are still actively inventing more scalable, usable, and deployable techniques for all of these activities.

The biggest limitation of the monitoring above is that it only reveals what people are doing with your software, not why they are doing it, or why it has failed. Monitoring can help you know that a problem exists, but it can’t tell you why a program failed or why a person failed to use your software successfully.

Usability problems and missing features, unlike some of the preceding problems, are even harder to detect or observe, because the only true indicator that something is hard to use is in a user’s mind. That said, there are a couple of approaches to detecting the possibility of usability problems.

One is by monitoring application usage. Assuming your users will tolerate being watched, there are many techniques: 1) automatically instrumenting applications for user interaction events, 2) mining events for problematic patterns, and 3) browsing and analyzing patterns for more subjective issues ⁹ ⁹

Melody Y. Ivory, Marti A. Hearst (2001). The state of the art in automating usability evaluation of user interfaces. ACM Computing Surveys.

. Modern tools and services like make it easier to capture, store, and analyze this usage data, although they still require you to have some upfront intuition about what to monitor. More advanced, experimental techniques in research automatically analyze undo events as indicators of usability problems ¹ ¹

David Akers, Matthew Simpson, Robin Jeffries, and Terry Winograd (2009). Undo and erase events as indicators of usability problems. ACM SIGCHI Conference on Human Factors in Computing (CHI).

. This work observes that undo is often an indicator of a mistake in creative software, and mistakes are often indicators of usability problems.

All of the usage data above can tell you what your users are doing, but not why . For this, you’ll need to get explicit feedback from support tickets, support forums, product reviews, and other critiques of user experience. Some of these types of reports go directly to engineering teams, becoming part of bug reporting systems, while others end up in customer service or marketing departments. While all of this data is valuable for monitoring user experience, most companies still do a bad job of using anything but bug reports to improve user experience, overlooking the rich insights in customer service interactions ⁵ ⁵

Although bug reports are widely used, they have significant problems as a way to monitor: for developers to fix a problem, they need detailed steps to reproduce the problem, or stack traces or other state to help them track down the cause of a problem ⁴ ⁴

Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann (2008). What makes a good bug report?. ACM SIGSOFT Foundations of Software Engineering (FSE).

; these are precisely the kinds of information that are hard for users to find and submit, given that most people aren’t trained to produce reliable, precise information for failure reproduction. Additionally, once the information is recorded in a bug report, even interpreting the information requires social, organizational, and technical knowledge, meaning that if a problem is not addressed soon, an organization’s ability to even interpret what the failure was and what caused it can decay over time ² ²

Jorge Aranda and Gina Venolia (2009). The secret life of bugs: Going past the errors and omissions in software repositories. ACM/IEEE International Conference on Software Engineering.

. All of these issues can lead to intractable debugging challenges ¹² ¹²

Haseeb Qureshi (2016). Debugging stories with Haseeb Qureshi. Software Engineering Daily Podcast.

Larger software organizations now employ data scientists to help mitigate these challenges of analyzing and maintaining monitoring data and bug reports. Most of them try to answer questions such as ³ ³

Andy Begel, Thomas Zimmermann (2014). Analyze this! 145 questions for data scientists in software engineering. ACM/IEEE International Conference on Software Engineering.

“How do users typically use my application?”
“What parts of a software product are most used and/or loved by customers?”
“What are best key performance indicators (KPIs) for monitoring services?”
“What are the common patterns of execution in my application?”
“How well does test coverage correspond to actual code usage by our customers?”

The most mature data science roles in software engineering teams even have multiple distinct roles, including Insight Providers , who gather and analyze data to inform decisions, Modeling Specialists , who use their machine learning expertise to build predictive models, Platform Builders , who create the infrastructure necessary for gathering data ¹⁰ ¹⁰

Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel (2016). The emerging role of data scientists on software development teams. ACM/IEEE International Conference on Software Engineering.

. Of course, smaller organizations may have individuals who take on all of these roles. Moreover, not all ways of discovering missing requirements are data science roles. Many companies, for example, have customer experience specialists and community managers, who are less interested in data about experiences and more interested in directly communicating with customers about their experiences. These relational forms of monitoring can be much more effective at revealing software quality issues that aren’t as easily observed, such as issues of racial or sexual bias in software or other forms of structural injustices built into the architecture of software.

All of this effort to capture and maintain user feedback can be messy to analyze because it usually comes in the form of natural language text. Services like AnswerDash (a company I co-founded) structure this data by organizing requests around frequently asked questions. AnswerDash imposes a little widget on every page in a web application, making it easy for users to submit questions and find answers to previously asked questions. This generates data about the features and use cases that are leading to the most confusion, which types of users are having this confusion, and where in an application the confusion is happening most frequently. This product was based on several years of research in my lab ⁶ ⁶

Parmit K. Chilana, Amy J. Ko, Jacob O. Wobbrock, Tovi Grossman (2013). A multi-site field study of crowdsourced contextual help: usage and perspectives of end users and software teams. ACM SIGCHI Conference on Human Factors in Computing (CHI).

References

David Akers, Matthew Simpson, Robin Jeffries, and Terry Winograd (2009). Undo and erase events as indicators of usability problems. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Jorge Aranda and Gina Venolia (2009). The secret life of bugs: Going past the errors and omissions in software repositories. ACM/IEEE International Conference on Software Engineering.
Andy Begel, Thomas Zimmermann (2014). Analyze this! 145 questions for data scientists in software engineering. ACM/IEEE International Conference on Software Engineering.
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann (2008). What makes a good bug report?. ACM SIGSOFT Foundations of Software Engineering (FSE).
Parmit K. Chilana, Amy J. Ko, Jacob O. Wobbrock, Tovi Grossman, and George Fitzmaurice (2011). Post-deployment usability: a survey of current practices. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Parmit K. Chilana, Amy J. Ko, Jacob O. Wobbrock, Tovi Grossman (2013). A multi-site field study of crowdsourced contextual help: usage and perspectives of end users and software teams. ACM SIGCHI Conference on Human Factors in Computing (CHI).
Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt (2009). Debugging in the (very) large: ten years of implementation and experience. ACM SIGOPS Symposium on Operating Systems Principles (SOSP).
Andi Grabner (2016). Performance monitoring with Andi Grabner. Software Engineering Daily Podcast.
Melody Y. Ivory, Marti A. Hearst (2001). The state of the art in automating usability evaluation of user interfaces. ACM Computing Surveys.
Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel (2016). The emerging role of data scientists on software development teams. ACM/IEEE International Conference on Software Engineering.
Tim Menzies, Tom Zimmermann (2013). Software analytics: so what?. IEEE Software.
Haseeb Qureshi (2016). Debugging stories with Haseeb Qureshi. Software Engineering Daily Podcast.
James Turnbull (2016). The art of monitoring with James Turnbull. Software Engineering Daily Podcast.

Software changes and that requires planning.

Chapter 13

Evolution

by Amy J. Ko

Programs change. You find bugs, you fix them. You discover a new requirement, you add a feature. A requirement changes because users demand it, you revise a feature. The simple fact about programs are that they’re rarely stable, but rather constantly changing, living artifacts that shift as much as our social worlds shift.

Nowhere is this constant evolution more apparent then in our daily encounters with software updates. The apps on our phones are constantly being updated to improve our experiences, while the web sites we visit potentially change every time we visit them, without us noticing. These different models have different notions of who controls changes to user experience: should software companies control when your experience changes or should you? And with systems with significant backend dependencies, is it even possible to give users control over when things change?

To manage change, developers use many kinds of tools and practices.

One of the most common ways of managing change is to refactor code. Refactoring helps developers modify the architecture of a program while keeping its behavior the same, enabling them to implement or modify functionality more easily. For example, one of the most common and simple refactorings is to rename a variable (renaming its definition and all of its uses). This doesn’t change the architecture of a program at all, but does improve its readability. Other refactors can be more complex. For example, consider adding a new parameter to a function: all calls to that function need to pass that new parameter, which means you need to go through each call and decide on a value to send from that call site. Studies of refactoring in practice have found that refactorings can be big and small, that they don’t always preserve the behavior of a program, and that developers perceive them as involving substantial costs and risks ⁵ ⁵

Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan (2012). A field study of refactoring challenges and benefits. ACM SIGSOFT Foundations of Software Engineering (FSE).

Another fundamental way that developers manage change is version control systems. As you know, they help developers track changes to code, allowing them to revert, merge, fork, and clone projects in a way that is traceable and reliable. Version control systems also help developers identify merge conflicts, so that they don’t accidentally override each others’ work ⁶ ⁶

Nicholas Nelson, Caius Brindescu, Shane McKee, Anita Sarma & Danny Dig (2019). The life-cycle of merge conflicts: processes, barriers, and strategies. Empirical Software Engineering.

. While today the most popular version control system is Git, there are actually many types. Some are centralized , representing one single ground truth of a project’s code, usually stored on a server. Commits to centralized repositories become immediately available to everyone else on a project. Other version control systems are distributed , such as Git, allowing one copy of a repository on every local machine. Commits to these local copies don’t automatically go to everyone else; rather, they are pushed to some central copy, from which others can pull updates.

Research comparing centralized and distributed revision control systems mostly reveal tradeoffs rather than a clear winner. Distributed version control, for example, appears to lead to commits that are smaller and more scoped to single changes, since developers can manage their own history of commits to their local repository ² ²

Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig (2014). How do centralized and distributed version control systems impact software changes?. ACM/IEEE International Conference on Software Engineering.

. Google uses one big centralized version control repository for all of its projects, however, because it offers one source of truth, simplified dependency management, large-scale refactoring, and flexible team boundaries ⁸ ⁸

Rachel Potvin, Josh Levenberg (2016). Why Google stores billions of lines of code in a single repository. Communications of the ACM.

When code changes, you need to test it, which often means you need to build it, compiling source, data, and other resources into an executable format suitable for testing (and possibly release). Build systems can be as simple as nothing (e.g., loading an HTML file in a web browser interprets the HTML and displays it, requiring no special preparation) and as complex is hundreds and thousands of lines of build script code, compiling, linking, and managing files in a manner that prepares a system for testing, such as those used to build operating systems like Windows or Linux. To write these complex build procedures, developers use build automation tools like make , ant , gulp and dozens of others, each helping to automate builds. In large companies, there are whole teams that maintain build automation scripts to ensure that developers can always quickly build and test. In these teams, most of the challenges are social and not technical: teams need to clarify role ambiguity, knowledge sharing, communication, trust, and conflict in order to be productive, just like other software engineering teams ⁷ ⁷

Shaun Phillips, Thomas Zimmermann, and Christian Bird (2014). Understanding and improving software build teams. ACM/IEEE International Conference on Software Engineering.

Perhaps the most modern form of build practice is continuous integration (CI). This is the idea of completely automating not only builds, but also the running of a collection of tests, every time a bundle of changes is pushed to a central version control repository. The claimed benefit of CI is that every major change is quickly built, tested, and ready for deployment, shortening the time between a change and the discovery of failures. Research shows this is true: CI helps projects release more often and is widely adopted in open source ⁴ ⁴

Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, Danny Dig (2016). Usage, costs, and benefits of continuous integration in open-source projects. IEEE/ACM International Conference on Automated Software Engineering.

. Of course, these benefits occur only if builds are fast.

For example, some large projects like Windows can take a whole day to build, making continuous integration of the whole operating system infeasible. When builds and tests are fast, continuous integration can accelerate development, especially in projects with large numbers of contributors ¹² ¹²

Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov (2015). Quality and productivity outcomes relating to continuous integration in GitHub. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

. Some teams even go further than continuous integration, building continuous delivery systems that ensure that complete builds are readily available for release (potentially multiple times per day for software on the web). Having a repeatable, automated deployment process is key for such processes ³ ³

Lianping Chen (2015). Continuous delivery: Huge benefits, but challenges too. IEEE Software.

One last problem with changes in software is managing the releases of software. Good release management should archive new versions of software, automatically post the version online, make the version accessible to users, keep a history of who accesses the new version, and provide clear release notes describing changes from the previous version ¹¹ ¹¹

André van der Hoek, Richard S. Hall, Dennis Heimbigner, and Alexander L. Wolf (1997). Software release management. ACM SIGSOFT Foundations of Software Engineering (FSE).

. By default, all of this is quite manual, but many of these steps can be automated, streamlining how teams release changes to the world. You’ve probably encountered these most in the form of software updates to applications and operating systems.

With so many ways that software can change, and so many tools for managing that change, it also becomes important to manage the risk of change. One approach to managing this risk is impact analysis ¹ ¹

Robert Arnold, Shawn Bohner (1996). Software change impact analysis. IEEE Computer Society Press.

, an activity of systematically and precisely analyzing the consequences of a change before making the change. This can involve analyzing dependencies that are affected by a bug fix, running unit tests on smaller parts of an implementation ¹⁰ ¹⁰

Per Runeson (2006). A survey of unit testing practices. IEEE Software.

, and running regression tests on previously encountered failures ⁹ ⁹

Gregg G. Rothermel, Mary Jean Harrold (1996). Analyzing regression test selection techniques. IEEE Transactions on Software Engineering.

, and running users tests to ensure that the way in which you fixed a bug does not inadvertently introduce problems with usability, usefulness, or other qualities critical to meeting requirements.

Impact analysis, and software evolution in general, is therefore ultimately a process of managing change. Change in requirements, change in code, change in data, and change in how software is situated in the world. And like any change management, it must be done cautiously, both to avoid breaking critical functionality, but also ensure that whatever new changes are being brought to the world achieve their goals.

References

Robert Arnold, Shawn Bohner (1996). Software change impact analysis. IEEE Computer Society Press.
Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig (2014). How do centralized and distributed version control systems impact software changes?. ACM/IEEE International Conference on Software Engineering.
Lianping Chen (2015). Continuous delivery: Huge benefits, but challenges too. IEEE Software.
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, Danny Dig (2016). Usage, costs, and benefits of continuous integration in open-source projects. IEEE/ACM International Conference on Automated Software Engineering.
Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan (2012). A field study of refactoring challenges and benefits. ACM SIGSOFT Foundations of Software Engineering (FSE).
Nicholas Nelson, Caius Brindescu, Shane McKee, Anita Sarma & Danny Dig (2019). The life-cycle of merge conflicts: processes, barriers, and strategies. Empirical Software Engineering.
Shaun Phillips, Thomas Zimmermann, and Christian Bird (2014). Understanding and improving software build teams. ACM/IEEE International Conference on Software Engineering.
Rachel Potvin, Josh Levenberg (2016). Why Google stores billions of lines of code in a single repository. Communications of the ACM.
Gregg G. Rothermel, Mary Jean Harrold (1996). Analyzing regression test selection techniques. IEEE Transactions on Software Engineering.
Per Runeson (2006). A survey of unit testing practices. IEEE Software.
André van der Hoek, Richard S. Hall, Dennis Heimbigner, and Alexander L. Wolf (1997). Software release management. ACM SIGSOFT Foundations of Software Engineering (FSE).
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov (2015). Quality and productivity outcomes relating to continuous integration in GitHub. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

Debugging is inevitable because defects are inevitable

Chapter 14

Debugging

by Amy J. Ko

Despite all of your hard work at design, implementation, and verification, your software has failed. Somewhere in its implementation there’s a line of code, or multiple lines of code, that, given a particular set of inputs, causes the program to fail. Debugging is the activity of finding those causes and identifying changes to code that will prevent those failures. Of course, because defects are inevitable in code that human developers write, debugging is no niche process in software engineering: it is a central, critical, and challenging activity that is part of nearly all aspects of creating software.

Before we can talk about debugging, it’s first important to consider what counts as a “bug”. This term is thought to have emerged in the late 19th century to describe any problem with a machine. Grace Hopper’s team reported then found it amusing when they found an actual moth in an electromechanical relay in their early computer. And yet, the term is actually quite vague. Is the “bug” the incorrect code? Is it the faulty behavior that occurs at runtime when that incorrect code executes? Is it the problem that occurs in the world when the software misbehaves, such as a program crashing, or an operating system hanging? “ Bug ” actually refers to all of things things, which makes it a colloquial, but imprecise term.

To clarify things, consider four definitions ⁶ ⁶

Amy J. Ko, Brad A. Myers (2005). A framework and methodology for studying the causes of software errors in programming systems. Journal of Visual Languages & Computing.

To begin, let’s consider program behavior , which we will define as any program output, at either a point in time, or over time, that is perceived or processed by a person or other software. Behavior, in this sense, is what we see programs do: they crash, hang, retrieve incorrect information, show error codes, compute something incorrectly, exhibit incomprehensible behavior, and so on. Program behavior is what requirements attempt to constraint (e.g., “the program should always finish in less than one second” is a statement about the program’s behavior over time.)

Given this definition of behavior, we can then define a defect is some set of program fragments that may cause program behavior that is inconsistent a program’s requirements. Note that this definition actually has some non-obvious implications. First, defects do not necessarily cause problems; many defects may actually never be executed, or never executed with inputs that cause a program to misbehave. Second, defects can only be defined as such to the extent that requirements are clear. If you haven’t written those requirements down in an unambiguous way, there will be debate about whether something is defect. Take, for example, a web application that has SQL injection security vulnerabilities, but the for the purpose of learning how to identify such vulnerabilities. Those aren’t defects because they are there intentionally.

A fault is a program state caused by a defect that may result in a program behavior inconsistent with a program’s requirements. For example, imagine a program that is supposed to count from 1 to 10 using a variable to track and increment the current number, but with a defect that causes it to start at 0. The fault is the value of that variable when it is set to 0. When it is set to 1 through 10, there’s nothing faulty about program behavior. Faults, like defects, do not necessarily cause problems. For example, imagine that the same program prints out the current value, but has another defect that unintentionally skips printing the first value. There would be two defects, a fault on the first number, but no undesirable program behavior, because it would still print 1 to 10.

Finally, a failure is a program behavior that is inconsistent with a program’s requirements. Failures are what we report in bug reports, what we often mean when we say “bug”, and ultimately what matters in the world, as program behavior is what programs do to the world. To use our terminology then, we would say that “ defects may cause faults, faults may cause failures, and failures may cause consequences in the world ”

What then, is debugging, using this terminology? Debugging is any activity that, given a report of a failure, seeks to identify the one or more defects that caused one or more faults, which caused the failure, and then making changes to a program to eliminate the associated defects. How to do this, of course, is the hard part. Therefore, debugging is inherently a process of searching—for faults that cause failures, and defects that cause faults. What’s being searched when debugging is the thousands, millions, or perhaps even billions of instructions that are executed when a program executes (causing faults), and the thousands, or even millions of lines of code that might have have caused those faults.

Research and practice broadly agree: finding defects quickly and successfully requires systematic, and sometimes scientific investigations of causality ¹³ ¹³

Andreas Zeller (2009). Why programs fail: a guide to systematic debugging. Elsevier.

. And yet, despite decades of research and practice, most developers never learn the skills for debugging systematically and don’t know how to properly use debugging tools to support systematic debugging. In fact, most still rely in basic print statements, partly because they are the most available and flexible tool ² ²

Beller, M., Spruit, N., Spinellis, D., & Zaidman, A (2018). On the dichotomy of debugging behavior among programmers. ACM/IEEE International Conference on Software Engineering.

. Despite this, debugging systematically has a few essential phases.

The first phase is reproducing the failure, so that the program may be inspected for faults, which can be traced back to defects. Failure reproduction is a matter of identifying inputs to the program (whether data it receives upon being executed, user inputs, network traffic, or any other form of input) that causes the failure to occur. If you found this failure while you were executing the program, then you’re lucky: you should be able to repeat whatever you just did and identify the inputs or series of inputs that caused the problem, giving you a way of testing that the program no longer fails once you’ve fixed the defect. If someone else was the one executing the program (for example, a user, or someone on your team), you better hope that they reported clear steps for reproducing the problem. When bug reports lack clear reproduction steps, bugs often can’t be fixed ³ ³

Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann (2008). What makes a good bug report?. ACM SIGSOFT Foundations of Software Engineering (FSE).

Once you can reproduce a failure, the next phase is to minimize the failure-inducing input ¹² ¹²

Andreas Zeller, Ralf Hildebrandt (2002). Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering.

. Imagine, for example, a program that, given a string "abcdefg" , is supposed to print all of the vowels in the program in the sequence they appear ( "ae" ), but instead produces just "a" . The intuition behind minimizing failure-inducing inputs is that they reduce the complexity of the search space in debugging. For example, in our example, we might find that entering the string "abcde" causes the same failed output of "a" , or even shorter, that just the string "ae" causes the failure. To minimize that failure-inducing input, one could just randomly remove parts of the input to find the smallest input that still causes the problem. More effective is to search the input space systematically, perhaps by using knowledge of the program behavior (e.g., vowels are what matter, so get rid of the consonants, as we did above), or even more systematic, doing something like a binary search of the input (e.g., trying successively smaller halves of the string until finding the smallest string that still causes the problem). Note that minimizing failure-inducing input applies to any kind of input, not just data: you can also minimize a program, excluding lines you believe are irrelevant to the failure, finding the smallest possible program that still causes the failure.

Once you have your minimized program and input, the next phase is to localize the defect, trying to identify the cause of the failure in code. There are many different strategies for localizing defects. One of the simplest strategies is to work forward:

Set a breakpoint to the beginning of the program.
Reproduce the failure. (If the program doesn’t pause, then either the line with the breakpoint doesn’t execute, the debugger is broken, or you didn’t reproduce the failure).
Step forward one instruction at a time until the program deviates from intended behavior, monitoring program state and control flow after each step.
This step that deviates or one of the previous steps caused the failure.

This process, while straightforward, is the slowest, requiring a long, vigilant search. A more efficient scientific strategy can leverage your knowledge of the program by guiding the search with hypotheses you generate ⁵ ⁵

David Gilmore (1991). Models of debugging. Acta Psychologica.

Observe the failure
Form a hypothesis about what caused the failure
Identify ways of observing program behavior to test your hypothesis.
Analyzing the data from your observations
If you’ve identified the defect, move on to the repair phase; if not, return to step 2.

The problems with the strategy above are numerous. First, what if you can’t generate a hypothesis? What if you can, but testing the hypothesis is slow or impossible? You could spend hours generating hypotheses that are completely off-base, effectively analyzing all of your code and its executions before finding the defect.

Another strategy is working backwards ⁷ ⁷

Amy J. Ko and Brad A. Myers (2008). Debugging reinvented: asking and answering why and why not questions about program behavior. ACM/IEEE International Conference on Software Engineering.

Observe the failure
Identify the line of code that caused the failing output
Identify the lines of code that caused the line of code in step 2 and any data used on the line in step 2
Repeat three recursively, analyzing all lines of code for defects along the upstream chain of causality until finding the defect.

This strategy guarantees that you will find the defect if you systematically check all of the upstream causes of the failure. It still requires you to analyze each line of code and potentially execute to it in order to inspect what might be wrong, but it requires potentially less work than guessing. As we discussed in the Comprehension chapter, tools can automate some of this process ⁷ ⁷

Amy J. Ko and Brad A. Myers (2008). Debugging reinvented: asking and answering why and why not questions about program behavior. ACM/IEEE International Conference on Software Engineering.

Yet another strategy called delta debugging is to compare successful and failing executions of the program ¹¹ ¹¹

Andreas Zeller (2002). Isolating cause-effect chains from computer programs. FSE.

Identify a successful set of inputs and minimize them
Identify a failing set of inputs and minimize them
Compare the differences in program state from the successful and failing executions during execution
Identify a change to input that minimizes the differences in states between the two executions
Variables their values that are different in these two executions contain the defect

This is a powerful strategy, but only when you have successful inputs and when you can automate comparing runs and identifying changes to inputs.

For particularly complex software, it can sometimes be necessary to debug with the help of teammates, helping to generate hypotheses, identify more effective search strategies, or rule out the influence of particular components in a bug ¹ ¹

Jorge Aranda and Gina Venolia (2009). The secret life of bugs: Going past the errors and omissions in software repositories. ACM/IEEE International Conference on Software Engineering.

. In fact, some work shows that following systematic debugging strategies can make novice developers just as effective as experienced ones ⁸ ⁸

Thomas D. LaToza, Maryam Arab, Dastyni Loksa, Amy J. Ko (2020). Explicit programming strategies. Empirical Software Engineering.

, but that many novices struggle to use them, either because they are overconfident, they struggle to self-regulate their work, or because they lack sufficient prior knowledge to execute the strategies, and have to revert to simpler ones that require less knowledge.

Ultimately, all of these strategies are essentially search algorithms, seeking the events that occurred while a program executed with a particular set of inputs that caused its output to be incorrect. Because programs execution millions and potentially billions of instructions, these strategies are necessary to reduce the scope of your search. This is where debugging tools come in: if you can find a tool that supports an effective strategy, then your work to search through those millions and billions of instructions will be greatly accelerated. This might be a print statement, a breakpoint debugger, a performance profiler, or one of the many advanced debugging tools beginning to emerge from research.

Once you’ve found the defect, what do you do? It turns out that there are usually many ways to repair a defect. How developers fix defects depends a lot on the circumstances: if they’re near a release, they may not even fix it if it’s too risky; if there’s no pressure, and the fix requires major changes, they may refactor or even redesign the program to prevent the failure ⁹ ⁹

Emerson Murphy-Hill, Thomas Zimmermann, Christian Bird, and Nachiappan Nagappan (2013). The design of bug fixes. ACM/IEEE International Conference on Software Engineering.

. This can be a delicate, risky process: in one study of open source operating systems bug fixes, 27% of the incorrect fixes were made by developers who had never read the source code files they changed, suggesting that key to correct fixes is a deep comprehension of exactly how the defective code is intended to behave ¹⁰ ¹⁰

Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram (2011). How do fixes become bugs?. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

Because debugging can be so challenging, and because it is so pervasive and inescapable in programming, it is often a major source of frustration and unpredictability in software engineering. However, finding a defect after a long search can also be a great triumph ⁴ ⁴

Marc Eisenstadt (1997). My hairiest bug war stories. Communications of the ACM.

, bringing together the most powerful aspects of developer tools, the collective knowledge of a team, and the careful, systematic work of a programmer, trying to make sense of code. As with all things in software engineering, persistence and patience is rewarded.

References

Jorge Aranda and Gina Venolia (2009). The secret life of bugs: Going past the errors and omissions in software repositories. ACM/IEEE International Conference on Software Engineering.
Beller, M., Spruit, N., Spinellis, D., & Zaidman, A (2018). On the dichotomy of debugging behavior among programmers. ACM/IEEE International Conference on Software Engineering.
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann (2008). What makes a good bug report?. ACM SIGSOFT Foundations of Software Engineering (FSE).
Marc Eisenstadt (1997). My hairiest bug war stories. Communications of the ACM.
David Gilmore (1991). Models of debugging. Acta Psychologica.
Amy J. Ko, Brad A. Myers (2005). A framework and methodology for studying the causes of software errors in programming systems. Journal of Visual Languages & Computing.
Amy J. Ko and Brad A. Myers (2008). Debugging reinvented: asking and answering why and why not questions about program behavior. ACM/IEEE International Conference on Software Engineering.
Thomas D. LaToza, Maryam Arab, Dastyni Loksa, Amy J. Ko (2020). Explicit programming strategies. Empirical Software Engineering.
Emerson Murphy-Hill, Thomas Zimmermann, Christian Bird, and Nachiappan Nagappan (2013). The design of bug fixes. ACM/IEEE International Conference on Software Engineering.
Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram (2011). How do fixes become bugs?. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Andreas Zeller (2002). Isolating cause-effect chains from computer programs. FSE.
Andreas Zeller, Ralf Hildebrandt (2002). Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering.
Andreas Zeller (2009). Why programs fail: a guide to systematic debugging. Elsevier.

History

References

Organizations

References

Communication

References

Productivity

References

Quality

References

Requirements

References

Architecture

References

Specifications

References

Process

References

Comprehension

References

Verification

Testing

Analysis

References

Monitoring

Discovering Failures

Discovering Missing Requirements

References

Evolution

References

Debugging

What is debugging?

Finding defects

Fixing defects

References