Back to table of contents

On the left, a diagram of a person and their thoughts about a shopping card; on the right, a user interface presenting a shopping cart, mediating mathematical functions that model shopping carts. Credit: Amy J. Ko

A theory of user interfaces

Amy J. Ko

First history and now theory? What a way to start a practical book about user interfaces. But as social psychologist Kurt Lewin said, "There's nothing as practical as a good theory" (Lewin 1943).

Let's start with why theories are practical. Theories, in essence, are explanations for what something is and how something works. These explanatory models of phenomena in the world help us not only comprehend the phenomena, but also predict what will happen in the future with some confidence, and perhaps more importantly, they give us a conceptual vocabulary to exchange ideas about the phenomena. A good theory about user interfaces would be practical because it would help explain what user interface are, and what governs whether they are usable, learnable, efficient, error-prone, etc.

HCI researchers have written a lot about theory, including theories from other disciplines, and new theories specific to HCI (Rogers 2012, Jacko 2012). There are theories of activities, conceptual models, design processes, experiences, information seeking, work, symbols, communication, and countless other aspects of people's interaction with computers. Most of these are theories about people and their behavior in situations where computers are critical, but the interaction with computers is not the primary focus. The theory I want to share, however, is about user interfaces at the center, and the human world at the periphery.

A theory of user interfaces

Let's begin with what user interfaces are. User interfaces are software and/or hardware that bridge the world of human action and computer action. The first world is the natural world of matter, motion, sensing, action, reaction, and cognition, in which people (and other living things) build models of the things around them in order to make predictions about the effects of their actions. It is a world of stimulus, response, and adaptation. For example, when an infant is learning to walk, they're constantly engaged in perceiving the world, taking a step, and experiencing things like gravity and pain, all of which refine a model of locomotion that prevents pain and helps achieve other objectives. Our human ability to model the world and then reshape the world around us using those models is what makes us so adaptive to environmental change. It's the learning we do about the world that allows us to survive and thrive in it. Computers, and the interfaces we use to operate them, are one of the things that humans adapt to.

The other world is the world of computing. This is a world ultimately defined by a small set of arithmetic operations such as adding, multiplying, and dividing, and an even smaller set of operations that control which operations happen next. These instructions, combined with data storage, input devices to acquire data, and output devices to share the results of computation, define a world in which there is only forward motion. The computer always does what the next instruction says to do, whether that's reading input, writing output, computing something, or making a decision. Sitting atop these instructions are functions, which take input and produce output using algorithms. Essentially all computer behavior leverages this idea of a function, and the result is that all computer programs (and all software) are essentially collections of functions that humans can invoke to compute things and have effects on data. All of this functional behavior is fundamentally deterministic; it is data from the world (content, clocks, sensors, network traffic, etc.), and increasingly, data that models the world and the people in it, that gives it its unpredictable, sometimes magical or intelligent qualities.

Now, both of the above are basic theories of people and computers. In light of this, what are user interfaces? User interfaces are a mapping from the sensory, cognitive, and social human world to these collections of functions exposed by a computer program. User interfaces offer learnable representations of these functions, their inputs, their algorithms, and their outputs, so that a person can build mental models of these representations that allow them to sense, take action, and achieve their goals, as they do with anything in the natural world. Now, the natural world provides many interfaces for people to act in the natural world. We use perception to interface with matter and language to interface with other people. Some of these are interfaces we're born to understand, or predisposed to learn, but most of them are learned. User interfaces in computers are not too different from these other types of interfaces: they leverage our senses too. However, because user interfaces are mostly artificial, most aspects of an interface must be learned. Once we learn them, we can act upon a user interface, perceive its reaction, and productively achieve our goals, as we do with the natural world.

Don Norman, in his book, The Design of Everyday Things (Norman 2013), does a nice job giving labels to some of the challenges that arise in this learning. One of his first big ideas is the gulf of execution. Gulfs of execution are gaps between the user’s goal and the input they have to provide to achieve it. To illustrate, think back to the first time you used a voice user interface, such as a personal assistant on your smartphone or an automated voice interface on a phone. In that moment, you experienced a gulf of execution: what did you have to say to the interface to achieve your goal? What concepts did you have to learn to understand that there was a command space and a syntax for each command? You didn't have the concepts to even imagine those actions. Someone or something had to "bridge" that gulf of execution, teaching you some of the actions that were possible and how those actions mapped onto your goals. (They probably also had to teach you the goals, but that's less about user interfaces and more about culture). Learning is therefore at the heart of both using user interfaces and designing them (Grossman et al. 2009).

What is it that people need to learn to take action on an interface? Norman (and later Gaver 1991, Hartson 2003, and Norman 2013 again), argued that what we're really learning is affordances. An affordance is a relationship between a person and a property of what can be done to an interface in order to produce some effect. For example, a physical computer mouse can be clicked, which allows information to be communicated to a computer. A digital personal assistant like Amazon's Alexa can be invoked by saying "Alexa". However, these are just a property of a mouse and Alexa; affordances arise when a person recognizes that opportunity and knows how to act upon it.

How can a person know what affordances an interface has? That's where the concept of a signifier becomes important. Signifiers are any sensory or cognitive indicator of the presence of an affordance. Consider, for example, how you know that a computer mouse can be clicked. It's physical shape might evoke the industrial design of a button. It might have little tangible surfaces that entreat you to push your finger on them. A mouse could even have visual sensory signifiers, like a slowly changing colored surface that attempts to say, "I'm interactive, try touching me." These are mostly sensory indicators of an affordance. Personal digital assistants like "Alexa", in contrast, lack most of these signifiers. What about an Amazon Echo says, "You can say Alexa and speak a command?" In this case, Amazon relies on tutorials, stickers, and even television commercials to signify this affordance.

While both of these examples involve hardware, the same concepts of affordance and signifier apply to software too. Buttons in a graphical user interface have an affordance: if you click within their rectangular boundary, the computer will execute a command. If you have sight, you know a button is clickable because long ago you learned that buttons have particular visual properties such as a rectangular shape and a label. If you are blind, you might know a button is clickable because your screen reader announces that something is a "button" and reads its label. All of this requires you knowing that interfaces have affordances such as buttons that are signified by a particular visual motif. Therefore, a central challenge of designing a user interface is deciding what affordances an interface will have and how to signify that they exist.

But execution is only half of the equation. Norman also discussed gulfs of evaluation, which are the gaps between the output of a user interface and a users' goal. Once a person has performed some action on a user interface via some functional affordance, the computer will take that input and do something with it. It's up to the user to then map that feedback onto their goal. If that mapping is simple and direct, the gulf is a small. For example, consider an interface for printing a document. If after pressing a print button, the feedback was "Your document was sent to the printer for printing," that would clearly convey progress toward the user's goal, minimizing the gulf between the output and the goal. If after pressing the print button, the feedback was "Job 114 spooled," the gulf is larger, forcing the user to know what a "job" is, what "spooling" is, and what any of that has to do with printing their document.

In designing user interfaces, there are many ways to bridge gulfs of execution and evaluation. One is to just teach people all of these affordances and help them understand all of the user interface's feedback. A person might take a class to learn the range of tasks that can be accomplished with a user interface, steps to accomplish those tasks, concepts required to understand those steps, and deeper models of the interface that can help them potentially devise their own procedures for accomplishing goals. Alternatively, a person can read tutorials, tooltips, help, and other content, each taking the place of a human teacher, approximating the same kinds of instruction a person might give.

To many user interface designers, the need to explicitly teach a user interface is a sign of design failure. There is a belief that designs should be "self-explanatory" or "intuitive." What these phrases actually mean are that the interface is doing the teaching rather than a person or content that is a proxy for a person. To bridge gulfs of execution, a user interface designer might conceive of physical, cognitive, and sensory affordances that are quickly learnable, for example. One way to make them quickly learnable is to leverage conventions, which are essentially user interface design patterns that people have already learned by learning other user interfaces. Want to make it easy to learn how to tell a computer add something to your cart on a website? Use a button labeled "Add to cart," which most people will have already learned from using other e-commerce sites. Or, leverage natural language, allowing people to say, "Alexa, order me more tissues." Alternatively, interfaces might even try to anticipate what people want to do, personalizing what's available, and in doing so, minimizing how much a person has to learn. From a design perspective, there's nothing inherently wrong with learning, it's just a cost that a designer may or may not want to impose on a new user. (Perhaps learning a new interface affords new power not possible with old conventions, and so the cost is justified).

To bridge gulfs of evaluation, a user interface needs to provide feedback that explains what effect the person's action had on the computer. Some feedback is explicit instruction that essentially teaches the person what functional affordances exist, what their effects are, what their limitations are, how to invoke them in other ways, what new functional affordances are now available and where to find them, what to do if the effect of the action wasn't desired, and so on. Clicking the "Add to cart" button, for example, should result in some message to the user that says

"I added this item to your cart. You can look at your cart over there. If you didn't mean to add that to your cart, you can take it out like this. If you're ready to buy everything in your cart, you can go here. Don't know what a cart is? Read this."

Some feedback is implicit instruction, suggesting the effect of an action, but not explaining it explicitly. For example, after pressing "Add to cart," there might be an icon showing an abstract icon of an item being added to a cart, with some kind of animation to capture the person's attention. Whether implicit or explicit, all of this feedback is still contextual instruction on the specific effect of the user's input (and optionally more general instruction about other affordances in the user interface that will help a user accomplish their goal).

The result of all of this learning is a mental model (Carroll and Anderson 1987) in a person's head of what inputs are possible, what outputs those result in, and how all of those inputs and outputs are related to various goals that person might have. However, because human learning is imperfect, and all of people, documentation, and contextual help provided to teach an interface is imperfect, people's mental models about user interfaces are nearly always imperfect. People end up with brittle, fragile, and partially correct predictive models of what effect their actions in user interfaces will have, and this results in unintended effects and confusion. If there's no one around to correct the person's mental model, or the user interface itself isn't a very good teacher, the person will fail to learn, and get confused and probably frustrated (in the same way you might fumble to open a door with a confusing handle). These episodes of user interface failure, which we can also describe as errors or breakdowns, are signs that a user's mental model is inconsistent with the actual behavior of some software system. In HCI, we blame these breakdowns on designers rather than users, and so we try to maximize the learnability and minimize the error-proneness of user interface designs using usability evaluation methods.

Note that in this entire discussion, we've said little about tasks or goals. The broader HCI literature theorizes about those broadly (Rogers 2012, Jacko 2012), and the gist is this: rarely does a person have such a well-defined goal in their head that tasks can be perfectly defined to fit them. For example, imagine you have a seamless, well-defined interaction flow for adding tissues to shopping carts, and checking out with carts, but a person's goal was vaguely to "get some tissues that don't hurt my skin, but also don't cost much." The breakdown in your user interface may occur long after a person has used your user interface, after a few days of actually using the tissues, finding them uncomfortable, and therefore not worth the cost. Getting the low-level details of user interface design to be learnable is one challenge—designing experiences that support vague, shifting, unobservable human goals is an entirely different one.

How is this theory useful

You have some new concepts about user interfaces, and an underlying theoretical sense about what user interfaces are and why designing them is hard. Let's recap:

The grand challenge of user interface design is therefore trying to conceive of interaction paradigms that have small gulfs of execution and evaluation, while also offering expressiveness, efficiency, power, and other attributes that augment human ability.

What do you do with these ideas? Well, given that the world is constantly designing new user interfaces, and the HCI research community believes these ideas about user interfaces are fundamental to all user interfaces, not just the 2D graphical user interfaces we've lived with since 1960, every single time you design a user interface, you'll have to solve these problems. Starting from this theoretical view of user interfaces allows you to ask the right questions. For example, rather than trying to vaguely identify the most "intuitive" experience, you can systematically ask: "Exactly how will our software behave and how will our user interface effectively teach this behavior?" Or, rather than relying on stereotypes, such as "older adults will struggle to learn to use computers," you can be more precise, saying, "This particular population of adults has not learned the design conventions of iOS, and so will need to learn those before successfully utilizing this application's interface." Approaching user interface design and evaluation from this perspective will help you identify major gaps in your user interface design analytically and systematically, rather than relying only on observations of someone struggling to use an interface, or worse yet, stereotypes or ill-defined concepts such as "intuitive" or "user-friendly." Of course, in practice, not everyone will know these theories, and other constraints will often prevent you from doing what you know might be right. That tension between theory and practice is inevitable, and something we'll return to throughout this book.

In the rest of this book, I'll use the theoretical ideas above to explain how different types of user interface paradigms work, and why they need to work that way.

Next chapter: How user interfaces mediate

Further reading

Carroll, J. M., Anderson, N. S., & National Research Council. (1987). Mental models in human-computer interaction: Research issues about what the user of software knows (No. 12). National Academies.

Hartson, R. (2003). Cognitive, physical, sensory, and functional affordances in interaction design. Behaviour & Information Technology, 22(5), 315-338.

Gaver, W.W. (1991). Technology affordances. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI '91). New Orleans, Louisiana (April 27-May 2, 1991). New York: ACM Press, pp. 79-84.

Grossman, T., Fitzmaurice, G., & Attar, R. (2009, April). A survey of software learnability: metrics, methodologies and guidelines. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 649-658). ACM.

Jacko, J. A. (Ed.). (2012). Human computer interaction handbook: Fundamentals, evolving technologies, and emerging applications. CRC Press.

Lewin (1943, 118), as cited in Karl E. Weick, "Theory and practice in the real world." in: The Oxford Handbook of Organization Theory, Tsoukas et al. (eds.), Oxford University Press, 2003, p. 460; Also in Lewin, K. (1951). Field theory in social science: Selected theoretical papers (D. Cartwright, Ed.). New York, NY: Harper & Row

Norman, D. (2013). The design of everyday things: Revised and expanded edition. Basic Books (AZ).

Rogers, Y. (2012). HCI theory: classical, modern, and contemporary. Synthesis Lectures on Human-Centered Informatics, 5(2), 1-129.