Evolution

Programs change. You find bugs, you fix them. You discover a new requirement, you add a feature. A requirement changes because users demand it, you revise a feature. The simple fact about programs are that they’re rarely stable, but rather constantly changing, living artifacts that shift as much as our social worlds shift.

Nowhere is this constant evolution more apparent then in our daily encounters with software updates. The apps on our phones are constantly being updated to improve our experiences, while the web sites we visit potentially change every time we visit them, without us noticing. These different models have different notions of who controls changes to user experience: should software companies control when your experience changes or should you? And with systems with significant backend dependencies, is it even possible to give users control over when things change?

To manage change, developers use many kinds of tools and practices.

One of the most common ways of managing change is to refactor code. Refactoring helps developers modify the architecture of a program while keeping its behavior the same, enabling them to implement or modify functionality more easily. For example, one of the most common and simple refactorings is to rename a variable (renaming its definition and all of its uses). This doesn’t change the architecture of a program at all, but does improve its readability. Other refactors can be more complex. For example, consider adding a new parameter to a function: all calls to that function need to pass that new parameter, which means you need to go through each call and decide on a value to send from that call site. Studies of refactoring in practice have found that refactorings can be big and small, that they don’t always preserve the behavior of a program, and that developers perceive them as involving substantial costs and risks ⁵

Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan (2012). A field study of refactoring challenges and benefits. ACM SIGSOFT Foundations of Software Engineering (FSE).

Another fundamental way that developers manage change is version control systems. As you know, they help developers track changes to code, allowing them to revert, merge, fork, and clone projects in a way that is traceable and reliable. Version control systems also help developers identify merge conflicts, so that they don’t accidentally override each others’ work ⁶

Nicholas Nelson, Caius Brindescu, Shane McKee, Anita Sarma & Danny Dig (2019). The life-cycle of merge conflicts: processes, barriers, and strategies. Empirical Software Engineering.

. While today the most popular version control system is Git, there are actually many types. Some are centralized , representing one single ground truth of a project’s code, usually stored on a server. Commits to centralized repositories become immediately available to everyone else on a project. Other version control systems are distributed , such as Git, allowing one copy of a repository on every local machine. Commits to these local copies don’t automatically go to everyone else; rather, they are pushed to some central copy, from which others can pull updates.

Research comparing centralized and distributed revision control systems mostly reveal tradeoffs rather than a clear winner. Distributed version control, for example, appears to lead to commits that are smaller and more scoped to single changes, since developers can manage their own history of commits to their local repository ²

Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig (2014). How do centralized and distributed version control systems impact software changes?. ACM/IEEE International Conference on Software Engineering.

. Google uses one big centralized version control repository for all of its projects, however, because it offers one source of truth, simplified dependency management, large-scale refactoring, and flexible team boundaries ⁸ .

When code changes, you need to test it, which often means you need to build it, compiling source, data, and other resources into an executable format suitable for testing (and possibly release). Build systems can be as simple as nothing (e.g., loading an HTML file in a web browser interprets the HTML and displays it, requiring no special preparation) and as complex is hundreds and thousands of lines of build script code, compiling, linking, and managing files in a manner that prepares a system for testing, such as those used to build operating systems like Windows or Linux. To write these complex build procedures, developers use build automation tools like make , ant , gulp and dozens of others, each helping to automate builds. In large companies, there are whole teams that maintain build automation scripts to ensure that developers can always quickly build and test. In these teams, most of the challenges are social and not technical: teams need to clarify role ambiguity, knowledge sharing, communication, trust, and conflict in order to be productive, just like other software engineering teams ⁷

Shaun Phillips, Thomas Zimmermann, and Christian Bird (2014). Understanding and improving software build teams. ACM/IEEE International Conference on Software Engineering.

Perhaps the most modern form of build practice is continuous integration (CI). This is the idea of completely automating not only builds, but also the running of a collection of tests, every time a bundle of changes is pushed to a central version control repository. The claimed benefit of CI is that every major change is quickly built, tested, and ready for deployment, shortening the time between a change and the discovery of failures. Research shows this is true: CI helps projects release more often and is widely adopted in open source ⁴

Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, Danny Dig (2016). Usage, costs, and benefits of continuous integration in open-source projects. IEEE/ACM International Conference on Automated Software Engineering.

. Of course, these benefits occur only if builds are fast.

For example, some large projects like Windows can take a whole day to build, making continuous integration of the whole operating system infeasible. When builds and tests are fast, continuous integration can accelerate development, especially in projects with large numbers of contributors ¹²

Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov (2015). Quality and productivity outcomes relating to continuous integration in GitHub. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

. Some teams even go further than continuous integration, building continuous delivery systems that ensure that complete builds are readily available for release (potentially multiple times per day for software on the web). Having a repeatable, automated deployment process is key for such processes ³ .

One last problem with changes in software is managing the releases of software. Good release management should archive new versions of software, automatically post the version online, make the version accessible to users, keep a history of who accesses the new version, and provide clear release notes describing changes from the previous version ¹¹

André van der Hoek, Richard S. Hall, Dennis Heimbigner, and Alexander L. Wolf (1997). Software release management. ACM SIGSOFT Foundations of Software Engineering (FSE).

. By default, all of this is quite manual, but many of these steps can be automated, streamlining how teams release changes to the world. You’ve probably encountered these most in the form of software updates to applications and operating systems.

With so many ways that software can change, and so many tools for managing that change, it also becomes important to manage the risk of change. One approach to managing this risk is impact analysis ¹

Robert Arnold, Shawn Bohner (1996). Software change impact analysis. IEEE Computer Society Press.

, an activity of systematically and precisely analyzing the consequences of a change before making the change. This can involve analyzing dependencies that are affected by a bug fix, running unit tests on smaller parts of an implementation ¹⁰ , and running regression tests on previously encountered failures ⁹ , and running users tests to ensure that the way in which you fixed a bug does not inadvertently introduce problems with usability, usefulness, or other qualities critical to meeting requirements.

Impact analysis, and software evolution in general, is therefore ultimately a process of managing change. Change in requirements, change in code, change in data, and change in how software is situated in the world. And like any change management, it must be done cautiously, both to avoid breaking critical functionality, but also ensure that whatever new changes are being brought to the world achieve their goals.

References

Robert Arnold, Shawn Bohner (1996). Software change impact analysis. IEEE Computer Society Press.
Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig (2014). How do centralized and distributed version control systems impact software changes?. ACM/IEEE International Conference on Software Engineering.
Lianping Chen (2015). Continuous delivery: Huge benefits, but challenges too. IEEE Software.
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, Danny Dig (2016). Usage, costs, and benefits of continuous integration in open-source projects. IEEE/ACM International Conference on Automated Software Engineering.
Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan (2012). A field study of refactoring challenges and benefits. ACM SIGSOFT Foundations of Software Engineering (FSE).
Nicholas Nelson, Caius Brindescu, Shane McKee, Anita Sarma & Danny Dig (2019). The life-cycle of merge conflicts: processes, barriers, and strategies. Empirical Software Engineering.
Shaun Phillips, Thomas Zimmermann, and Christian Bird (2014). Understanding and improving software build teams. ACM/IEEE International Conference on Software Engineering.
Rachel Potvin, Josh Levenberg (2016). Why Google stores billions of lines of code in a single repository. Communications of the ACM.
Gregg G. Rothermel, Mary Jean Harrold (1996). Analyzing regression test selection techniques. IEEE Transactions on Software Engineering.
Per Runeson (2006). A survey of unit testing practices. IEEE Software.
André van der Hoek, Richard S. Hall, Dennis Heimbigner, and Alexander L. Wolf (1997). Software release management. ACM SIGSOFT Foundations of Software Engineering (FSE).
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov (2015). Quality and productivity outcomes relating to continuous integration in GitHub. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).