A very close subpixel image of a mouse cursor.
What is an interface?
Chapter 2

A Theory of Interfaces

by Amy J. Ko

First  history  and now theory? What a way to start a practical book about user interfaces. But as social psychologist Kurt Lewin said, “There’s nothing as practical as a good theory” 6 6

Lewin (1943). Theory and practice in the real world. The Oxford Handbook of Organization Theory.

.

Let’s start with  why  theories are practical. Theories, in essence, are  explanations  for what something is and how something works. These explanatory models of phenomena in the world help us not only comprehend the phenomena, but also predict what will happen in the future with some confidence, and perhaps more importantly, they give us a conceptual vocabulary to exchange ideas about the phenomena. A good theory about user interfaces would be practical because it would help explain what user interface are, and what governs whether they are usable, learnable, efficient, error-prone, etc.

HCI researchers have written a lot about theory, including theories from other disciplines, and new theories specific to HCI 5,9 5

Jacko, J. A. (2012). Human computer interaction handbook: Fundamentals, evolving technologies, emerging applications. CRC Press.

9

Rogers, Y. (2012). HCI theory: classical, modern, contemporary. Synthesis Lectures on Human-Centered Informatics, 5(2), 1-129.

. There are theories of activities, conceptual models, design processes, experiences, information seeking, work, symbols, communication, and countless other aspects of people’s interaction with computers. Most of these are theories about people and their behavior in situations where computers are critical, but the interaction with computers is not the primary focus. The theory I want to share, however, is about user interfaces at the center, and the human world at the periphery.

Code showing a function named digital_best_reviews_posts
Interfaces are elaborate facades for functions in code.

Let’s begin with what user interfaces  are User interfacesuser interface: A special kind of software designed to map human activities in the physical world (clicks, presses, taps, speech, etc.) to functions defined in a computer program.  are  software and/or hardware that bridge the world of human action and computer action . The first world is the natural world of matter, motion, sensing, action, reaction, and cognition, in which people (and other living things) build models of the things around them in order to make predictions about the effects of their actions. It is a world of stimulus, response, and adaptation. For example, when an infant is learning to walk, they’re constantly engaged in perceiving the world, taking a step, and experiencing things like gravity and pain, all of which refine a model of locomotion that prevents pain and helps achieve other objectives. Our human ability to model the world and then reshape the world around us using those models is what makes us so adaptive to environmental change. It’s the learning we do about the world that allows us to survive and thrive in it. Computers, and the interfaces we use to operate them, are one of the things that humans adapt to.

The other world is the world of computing. This is a world ultimately defined by a small set of arithmetic operations such as adding, multiplying, and dividing, and an even smaller set of operations that control which operations happen next. These instructions, combined with data storage, input devices to acquire data, and output devices to share the results of computation, define a world in which there is only forward motion. The computer always does what the next instruction says to do, whether that’s reading input, writing output, computing something, or making a decision. Sitting atop these instructions are  functionsfunction: An idea from mathematics of using algorithms to map some input (e.g., numbers, text, or other data), to some output. Functions can be as simple as basic arithmetic (e.g.,  multiply , which takes two numbers and computes their product), or as complex as machine vision ( object recognition , which takes an image and a machine learned classifier trained on millions of images and produces a set of text descriptions of objects in the image. , which take input and produce output using some algorithms. Essentially all computer behavior leverages this idea of a function, and the result is that all computer programs (and all software) are essentially collections of functions that humans can invoke to compute things and have effects on data. All of this functional behavior is fundamentally deterministic; it is data from the world (content, clocks, sensors, network traffic, etc.), and increasingly, data that models the world and the people in it, that gives it its unpredictable, sometimes magical or intelligent qualities.

Now, both of the above are basic theories of people and computers. In light of this, what are user interfaces? User interfaces are mappings from the sensory, cognitive, and social human world to computational functions in computer programs. For example, a save button in a user interface is nothing more than an elaborate way of mapping a person’s physical mouse click or tap on a touch screen to a command to execute a function that will take some data in memory and permanently store it somewhere. Similarly, a browser window, displayed on a computer screen, is an elaborate way of taking the carefully rendered pixel-by-pixel layout of a web page, and displaying it on a computer screen so that sighted people can perceive it with their human perceptual systems. These mappings from physical to digital, digital to physical, are how we talk to computers.

If we didn’t have user interfaces to execute functions in computer programs, it would still be possible to use computers. We’d just have to program them using programming languages (which, as we discuss later in  Programming Interfaces , can be hard to learn). What user interfaces offer are more  learnable  representations of these functions, their inputs, their algorithms, and their outputs, so that a person can build mental models of these representations that allow them to sense, take action, and achieve their goals, as they do with anything in the natural world. However, there is nothing natural about user interfaces: buttons, scrolling, windows, pages, and other interface metaphors are all artificial, invented ideas, designed to help people use computers without having to program them. While this makes using computers easier, interfaces must still be learned.

A screenshot of the Microsoft Excel toolbar
Toolbars pose massive gulfs of execution: which command will help achieve my goal?

Don Norman, in his book,  The Design of Everyday Things 8 8

Don Norman (2013). The design of everyday things: Revised and expanded edition. Basic Books.

, does a nice job giving labels to some of the challenges that arise in this learning. One of his first big ideas is the  gulf of executiongulf of execution: Everything a person must learn in order to acheive their goal with an interface, including learning what the interface is and is not capable of and how to operate it correctly. . Gulfs of execution are gaps between the user’s goal and the input they have to provide to achieve it. To illustrate, think back to the first time you used a voice user interface, such as a personal assistant on your smartphone or an automated voice interface on a phone. In that moment, you experienced a gulf of execution: what did you have to say to the interface to achieve your goal? What concepts did you have to learn to understand that there was a command space and a syntax for each command? You didn’t have the concepts to even imagine those actions. Someone or something had to “bridge” that gulf of execution, teaching you some of the actions that were possible and how those actions mapped onto your goals. (They probably also had to teach you the goals, but that’s less about user interfaces and more about culture). Learning is therefore at the heart of both using user interfaces and designing them 3 3

Grossman, T., Fitzmaurice, G., & Attar, R. (2009). A survey of software learnability: metrics, methodologies and guidelines. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI).

.

What is it that people need to learn to take action on an interface? Norman (and later Gaver 2 2

Gaver, W.W. (1991). Technology affordances. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI '91). New Orleans, Louisiana (April 27-May 2, 1991). New York: ACM Press, pp. 79-84.

, Hartson 4 4

Rex Hartson (2003). Cognitive, physical, sensory, and functional affordances in interaction design. Behaviour & Information Technology.

, and Norman again 8 8

Don Norman (2013). The design of everyday things: Revised and expanded edition. Basic Books.

), argued that what we’re really learning is  affordancesaffordance: The potential for action in an interface, ultimately defined by what underlying functionalty it has been designed and engineered to support. . An affordance is a relationship between a person and a property of what can be done to an interface in order to produce some effect. For example, a physical computer mouse can be clicked, which allows information to be communicated to a computer. A digital personal assistant like Amazon’s Alexa can be invoked by saying “Alexa”. However, these are just a property of a mouse and Alexa; affordances arise when a person recognizes that opportunity and knows how to act upon it.

A photograph of an Alexa smart speaker on a table.
Smart speakers have a myriad of affordances, but very few signifiers.

How can a person know what affordances an interface has? That’s where the concept of a  signifiersignifier: Any indicator of an affordance in an interface.  becomes important. Signifiers are any sensory or cognitive indicator of the presence of an affordance. Consider, for example, how you know that a computer mouse can be clicked. It’s physical shape might evoke the industrial design of a button. It might have little tangible surfaces that entreat you to push your finger on them. A mouse could even have visual sensory signifiers, like a slowly changing colored surface that attempts to say, “ I’m interactive, try touching me. ” These are mostly sensory indicators of an affordance. Personal digital assistants like Alexa, in contrast, lack most of these signifiers. What about an Amazon Echo says, “ You can say Alexa and speak a command? ” In this case, Amazon relies on tutorials, stickers, and even television commercials to signify this affordance.

While both of these examples involve hardware, the same concepts of affordance and signifier apply to software too. Buttons in a graphical user interface have an affordance: if you click within their rectangular boundary, the computer will execute a command. If you have sight, you know a button is clickable because long ago you learned that buttons have particular visual properties such as a rectangular shape and a label. If you are blind, you might know a button is clickable because your screen reader announces that something is a “button” and reads its label. All of this requires you knowing that interfaces have affordances such as buttons that are signified by a particular visual motif. Therefore, a central challenge of designing a user interface is deciding what affordances an interface will have and how to signify that they exist.

A screenshot of an error message that says “Collection was mutated while being enumerated
Error messages can pose great gulfs of evaluation: what do they mean for a user’s goal and what should a user do next? This particular error message was meant for a software developer, not a user.

But  execution  is only half of the story. Norman also discussed  gulfs of evaluationgulf of evaluation: Everything a person must learn in order to understand the effect of their action on an interface, including understanding error messages or a lack of response from an interface. , which are the gaps between the output of a user interface and a user’s goal. Once a person has performed some action on a user interface via some functional affordance, the computer will take that input and do something with it. It’s up to the user to then map that feedback onto their goal. If that mapping is simple and direct, then the gulf is a small. For example, consider an interface for printing a document. If after pressing a print button, the feedback was “ Your document was sent to the printer for printing, ” that would clearly convey progress toward the user’s goal, minimizing the gulf between the output and the goal. If after pressing the print button, the feedback was “ Job 114 spooled, ” the gulf is larger, forcing the user to know what a “ job ” is, what “ spooling ” is, and what any of that has to do with printing their document.

In designing user interfaces, there are many ways to bridge gulfs of execution and evaluation. One is to just teach people all of these affordances and help them understand all of the user interface’s feedback. A person might take a class to learn the range of tasks that can be accomplished with a user interface, steps to accomplish those tasks, concepts required to understand those steps, and deeper models of the interface that can help them potentially devise their own procedures for accomplishing goals. Alternatively, a person can read tutorials, tooltips, help, and other content, each taking the place of a human teacher, approximating the same kinds of instruction a person might give. There are entire disciplines of technical writing and information experience that focus on providing seamless, informative, and encouraging introductions to how to use a user interface. 7 7

Linda Newman Lior (2013). Writing for Interaction: Crafting the Information Experience for Web and Software Apps. Morgan Kaufmann.



Two screenshots of a hamburger menu, before and after it was expanded.
The hamburger menu rapidly became a convention for collapsing large menus with the advent of small-screen mobile devices.

To many user interface designers, the need to explicitly teach a user interface is a sign of design failure. There is a belief that designs should be “self-explanatory” or “intuitive.” What these phrases actually mean are that the  interface  is doing the teaching rather than a person or some documentation. To bridge gulfs of  execution , a user interface designer might conceive of physical, cognitive, and sensory affordances that are quickly learnable, for example. One way to make them quickly learnable is to leverage  conventionsconvention: A widely-used and widely-learned interface design pattern (e.g., a web form, a login screen, a hamburger menu on a mobile website). , which are essentially user interface design patterns that people have already learned by learning other user interfaces. Want to make it easy to learn how to add an item to a cart on an e-commerce website? Use a button labeled “ Add to cart, ” a design convention that most people will have already learned from using other e-commerce sites.Alternatively, interfaces might even try to anticipate what people want to do, personalizing what’s available, and in doing so, minimizing how much a person has to learn. From a design perspective, there’s nothing inherently wrong with learning, it’s just a cost that a designer may or may not want to impose on a new user. (Perhaps learning a new interface affords new power not possible with old conventions, and so the cost is justified).

To bridge gulfs of  evaluation , a user interface needs to provide  feedbackfeedback: Output from a computer program intended to explain what action a computer took in response to a user’s input (e.g., confirmation messages, error messages, or visible updates).  that explains what effect the person’s action had on the computer. Some feedback is  explicit instruction  that essentially teaches the person what functional affordances exist, what their effects are, what their limitations are, how to invoke them in other ways, what new functional affordances are now available and where to find them, what to do if the effect of the action wasn’t desired, and so on. Clicking the “ Add to cart ” button, for example, might result in some instructive feedback like this:

I added this item to your cart. You can look at your cart over there. If you didn’t mean to add that to your cart, you can take it out like this. If you’re ready to buy everything in your cart, you can go here. Don’t know what a cart is? Read this documentation.



Some feedback is implicit, suggesting the effect of an action, but not explaining it explicitly. For example, after pressing “ Add to cart ,” there might be an icon showing an abstract icon of an item being added to a cart, with some kind of animation to capture the person’s attention. Whether implicit or explicit, all of this feedback is still contextual instruction on the specific effect of the user’s input (and optionally more general instruction about other affordances in the user interface that will help a user accomplish their goal).

The result of all of this learning is a  mental modelmental model: A person’s beliefs about an interface’s affordances and how to operate them. 1 1

Carroll, J. M., Anderson, N. S. (1987). Mental models in human-computer interaction: Research issues about what the user of software knows. National Academies.

 in a person’s head of what inputs are possible, what outputs those result in, and how all of those inputs and outputs are related to various goals that person might have. However, because human learning is imperfect, and all of people, documentation, and contextual help provided to teach an interface is imperfect, people’s mental models about user interfaces are nearly always imperfect. People end up with brittle, fragile, and partially correct predictive models of what effect their actions in user interfaces will have, and this results in unintended effects and confusion. If there’s no one around to correct the person’s mental model, or the user interface itself isn’t a very good teacher, the person will fail to learn, and get confused and probably frustrated (in the same way you might fumble to open a door with a confusing handle). These episodes of user interface failure, which we can also describe as  breakdownsbreakdown: Mistakes and confusion that occur because a user’s mental model is inconsistent with an interface’s actual functionality. , are signs that a user’s mental model is inconsistent with the actual behavior of some software system. In HCI, we blame these breakdowns on designers rather than users, and so we try to maximize the learnability and minimize the error-proneness of user interface designs using usability evaluation methods.

Note that in this entire discussion, we’ve said little about tasks or goals. The broader HCI literature theorizes about those broadly 5,9 5

Jacko, J. A. (2012). Human computer interaction handbook: Fundamentals, evolving technologies, emerging applications. CRC Press.

9

Rogers, Y. (2012). HCI theory: classical, modern, contemporary. Synthesis Lectures on Human-Centered Informatics, 5(2), 1-129.

, and the gist is this: rarely does a person have such a well-defined goal in their head that tasks can be perfectly defined to fit them. For example, imagine you have a seamless, well-defined interaction flow for adding tissues to shopping carts, and checking out with carts, but a person’s goal was vaguely to “ get some tissues that don’t hurt my skin, but also don’t cost much. ” The breakdown in your user interface may occur long after a person has used your user interface, after a few days of actually using the tissues, finding them uncomfortable, and therefore not worth the cost. Getting the low-level details of user interface design to be learnable is one challenge — designing experiences that support vague, shifting, unobservable human goals is an entirely different one.

You have some new concepts about user interfaces, and an underlying theoretical sense about what user interfaces are and why designing them is hard. Let’s recap:

  • User interfaces bridge human goals and cognition to functions defined in computer programs.
  • To use interfaces successfully, people must learn an interface’s affordances and how they can be used to achiever their goals.
  • This learning must come from somewhere, either instruction by people, explanations in documentation, or teaching by the user interface itself.
  • Most user interfaces have large gulfs of execution and evaluation, requiring substantial learning.

The grand challenge of user interface design is therefore trying to conceive of interaction designs that have  small  gulfs of execution and evaluation, while also offering expressiveness, efficiency, power, and other attributes that augment human ability.

These ideas are broadly relevant to all kinds of interfaces, including all those already invented, and new ones invented every day in research. And so these concepts are central to design. Starting from this theoretical view of user interfaces allows you to ask the right questions. For example, rather than trying to vaguely identify the most “intuitive” experience, you can systematically ask: “ Exactly what is our software’s functionality affordances and what signifiers will we design to teach it? ” Or, rather than relying on stereotypes, such as “ older adults will struggle to learn to use computers, ” you can be more precise, saying, “ This particular population of adults has not learned the design conventions of iOS, and so they will need to learn those before successfully utilizing this application’s interface.

Approaching user interface design and evaluation from this perspective will help you identify major gaps in your user interface design analytically and systematically, rather than relying only on observations of someone struggling to use an interface, or worse yet, stereotypes or ill-defined concepts such as “intuitive” or “user-friendly.” Of course, in practice, not everyone will know these theories, and other constraints will often prevent you from doing what you know might be right. That tension between theory and practice is inevitable, and something we’ll return to throughout this book.

References

  1. Carroll, J. M., Anderson, N. S. (1987). Mental models in human-computer interaction: Research issues about what the user of software knows. National Academies.

  2. Gaver, W.W. (1991). Technology affordances. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI '91). New Orleans, Louisiana (April 27-May 2, 1991). New York: ACM Press, pp. 79-84.

  3. Grossman, T., Fitzmaurice, G., & Attar, R. (2009). A survey of software learnability: metrics, methodologies and guidelines. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI).

  4. Rex Hartson (2003). Cognitive, physical, sensory, and functional affordances in interaction design. Behaviour & Information Technology.

  5. Jacko, J. A. (2012). Human computer interaction handbook: Fundamentals, evolving technologies, emerging applications. CRC Press.

  6. Lewin (1943). Theory and practice in the real world. The Oxford Handbook of Organization Theory.

  7. Linda Newman Lior (2013). Writing for Interaction: Crafting the Information Experience for Web and Software Apps. Morgan Kaufmann.

  8. Don Norman (2013). The design of everyday things: Revised and expanded edition. Basic Books.

  9. Rogers, Y. (2012). HCI theory: classical, modern, contemporary. Synthesis Lectures on Human-Centered Informatics, 5(2), 1-129.