A photograph of a wall of thousands of dusty old books.
Is this a wall of books, data, information, or knowledge?
Chapter 3 Foundations

Data, information, knowledge

by Amy J. Ko

When I was a child in the 1980’s, there was no internet, there were no mobile devices, and computers were only just beginning to reach the wealthiest of homes. My experience with information was therefore decidedly analog. I remember my three primary sources of information fondly. First, a few times a month, my mother would take my brother and I to our local public library, and we would browse, find a pile of books that captured our attention, and then simply sit, in silence, and read together for hours. Eventually, we would get hungry, and we would check out a dozen books, and then devour them at home together for the next few weeks, repeating the cycle again. My second source was the newspaper. Every morning, my father would leave early in the morning to get a donut and coffee, and go to the local newspaper rack on the street to buy a copy of The Oregonian for a nickel, a dime, or a quarter. Sometimes I would join him and get a donut myself, and then we would come home, eating donuts together while he read the news and I read the comics. My third source was magazines. In particular, I subscribed to  3-2-1 Contact , a science, technology, and math education magazine that accompanied the 1980 broadcast television show. The monthly magazine came with fiction, non-fiction, math puzzles, and even reader submissions of computer programs written in the BASIC programming language—type them in and see what happens! I would run out to our mailbox every morning near the beginning of the month to see if the latest issue had come. And when it did, it consumed the next week of my free time.

Of course, this analog world was quickly replaced with digital. Forty years later, the embodied joy of reading books, news, and magazines with my family, and the wonderful anticipation that came with having to wait for information, was replaced with speed. I still read books, but I click a button to get them on my tablet instantaneously. I still read the news, but I scroll through a personalized news feed at breakfast, with little sense of shared experience with family. And I still read magazines, but on a tiny smartphone screen, whenever I want, which is rarely. Instead, I fall mindlessly into the infinite YouTube rabbit hole, with no real sense of continuity, anticipation, or wonder. Computer science imagined a world in which we could get whatever information we want, whenever we want, and then realized it over the past forty years. And while the words I can find in these new media are the same kind as those forty years ago, somehow, the experience of this information just isn’t the same.

This change in media begs an important question: what  is  information? We can certainly name many things that seem to contain it: news, books, and magazines, like above, but also movies, speeches, music, data, talking, writing, and perhaps even non-verbal things like sign language, dancing, facial expressions, and posture. Or are these just containers for information and the information itself is things like words, images, symbols, and bits? And what about things in nature, like DNA? Information seems to be everywhere—in nature, in our brains, in our language, and in our technology—but can something that seems to be in everything be a useful idea?

A picture of a man reading a newspaper
Is a newspaper information?

This is a question that Michael Buckland grappled with in 1991 2 2

Michael K. Buckland (1991). Information as thing. Journal of the American Society for information Science.

. In his article, he notes the long struggle to define what information is, discovering many competing ideas.

  • One idea was as information as a  process , in which a person becomes informed and their knowledge changes. This would suggest that information is not some object in the world, but rather some event that occurs in the world, in the interaction between things. The challenge with this notion of information is that process is situational and contextual: the door in my office, to me, might not be informational at all, it might just play the role of keeping heat inside. But to someone else, the door being closed might be informational, signaling my availability. From a process perspective, the door itself is not information, but particular people in particular situations may glean different information from the door and its relation to other social context about its meaning. If information is process, then  anything  can be information, and that doesn’t really help us define what information is.
  • Another idea that Buckland explored was information as  knowledge . This notion of information makes it intangible, as knowledge, belief, opinion, and ideas are personal, subjective, and stored in the mind. The only way to access them is for that knowledge to be communicated in some way, through speech, writing, or other signal. For example, I know what it feels like to be bored, but communicating that feeling requires some kind of translation of that feeling into some media (e.g., me posting on Twitter, “I’m bored.”).
  • The last idea that Buckland explored was information as  thing . Here, the idea was that information is different from knowledge and process in that it is tangible, observable, and physical. It can be stored and retrieved. The implication of this view is that we can only interact with information through things, and so information might as well just be the things themselves: the books, the magazines, the websites, the spreadsheet, and so on.

Buckland was not the only one to examine what information might be. Another notable perspective came from Gregory Bateson in his work  Form, Substance, and Difference 1 1

Gregory Bateson (1970). Form, substance and difference. Essential Readings in Biosemiotics.

, in which he wrote in reference to physical energy:

What we mean by information - the elementary unit of information - is a difference which makes a difference, and it is able to make a difference because the neural pathways along which it travels and is continuously transformed are themselves provided with energy... But what is a difference? A difference is a very peculiar and obscure concept. It is certainly not a thing or an event. This piece of paper is different from the wood of this lectern. There are many differences between them-of color, texture, shape, etc. But if we start to ask about the localization of those differences, we get into trouble. Obviously the difference between the paper and the wood is not in the paper; it is obviously not in the wood; it is obviously not in the space between them, and it is obviously not in the time between them.

Bateson 1 1

Gregory Bateson (1970). Form, substance and difference. Essential Readings in Biosemiotics.

Bateson’s somewhat obtuse thought experiment is a slightly different idea than Buckland’s  process knowledge , and  thing  notions of information: it imagines the world as full of noticeable  differences , and that those differences are not in the things themselves, but in their relationships to each other. For example, in DNA, it is not the cytosine, guanine, adenine, or thymine themselves that encode proteins, but the ability of cells to distinguish between them. Or, to return to the earlier example of my office door, it is not the door itself that conveys information, but the the fact that the door maybe open, closed, slightly ajar—those differences, and knowledge of them, are what allow for the door to convey information. Whether the “difference” encoded in the sequence of letters is conveyed by in a print magazine or a digital one, the differences are conveyed nonetheless, suggesting that information is less about medium than it is the ability of a “perceiver” to notice differences in that medium.

In 1948, well before Buckland and Bateson were theorizing about information conceptually, Claude Shannon was trying to understand information from an engineering perspective. Working in signal processing at Bell Labs, he was trying to find ways of transmitting telephone calls more reliably. In his seminal work, 8 8

Claude E. Shannon (1948). A mathematical theory of communication. The Bell System Technical Journal.

 he linked information to the concept of entropy from thermodynamics. Entropy is a measurable physical property of a state of disorder, randomness, or uncertainty. For example, if you tossed a coins four times and every toss came up heads this situation would have low entropy, conceptually, as there’s not any disorder in the results. In contrast, if the tosses came up half heads and half tails this would have high entropy, with no apparent pattern and maximum disorder. Shannon’s view of information was thus as an amount of information, measured by the disorder of the results.

Another way to think about Shannon’s entropic idea of information is through probability: in the first example there’s only one way toss four heads so the probability is low. In contrast, in the second sequence, you have a high probability of getting half heads and half tails and could get those flips in many different ways: two heads then two tails, or one head followed by two tails followed by one head, and so on. The implication of these ideas is that the more rare “events” or “observations” in some phenomenon, the more information there is.

A third way to think about Shannon’s idea was that the the amount of information in anything is inversely related to is  compressability . For example, is there a shorter way to say “1111111111”? We might say “ten 1’s”. But is there a shorter way to say “1856296289”? As a prime number, no. Shannon took this notion of compressibility to the extreme, observed that a fundamental unit of difference might be called a  bit : either something is or isn’t, 1 or 0, true or false. He postulated that all information might be encoded as bit sequences and that the more compressible a bit sequence was, the less information content it has. This idea, of course, went on to shape not only telecommunications, but mathematics, statistics, computing, and biology, and enabled the modern digital world we have today.

A photograph of the reading room in the University of Washington’s Suzzallo library, showing tables, stacks, and an arched ceiling
Is a library a place of data, information, or knowledge?

The nuance, variety, and inconsistency of all of these ideas bothered some scholars, who struggled to reconcile these definitions. Charles Meadow and Weijing Yuan tried to create some order on these concepts in their 1997 paper,  Measuring the impact of information: defining the concepts 5 5

Charles Meadow, Weijing Yuan (1997). Measuring the impact of information: defining the concepts. Information Processing & Management.

. Building upon dozens of prior attempts to define information and related concepts, including those described above, they proposed the following:

  • Datadata: Analog or digital symbols that someone might perceive in the world and ascribe meaning.  are a set of “symbols”, broadly construed to include any form of perceptible difference in the world. In this definition, the individual symbols have  potential  for meaning, but that they may or may not be meaningful or parsable to a recipient. For example, the binary sequence 00000001 is indeed a set of symbols, but you as the reader do not know if they encode the decimal number 1, or some message about cats, encoded by youth inventing their own secret code. A hand gesture with five fingers stretched out in a plane might mean someone is stretching their fingers or a non-verbal signal meant to get someone’s attention. In the same way, this entire chapter is data, in that it is a sequence of symbols that likely has much meaning to those fluent in English, but very little meaning to those not. Data, as envisioned by Shannon, is an abstract message, which may or may not have informational content.
  • Informationinformation: The process of receiving, perceiving, and interpreting data into knowledge. , in Meadow and Yuan’s definition, is realization of the informational potential of data: it is the process of receiving, perceiving, and translating data into knowledge. The distinction from data, therefore, is a subtle one. Consider this bullet point, for example. The  data  is the sequence of Roman characters, stored on a web server, delivered to your computer, and rendered by your web browser. Thus far, all of this is data, being transmitted by a computer, and translated into different data representations. The  process  of you reading this English-encoded data, and comprehending the meaning of the words and sentences that it encodes, is information. Someone else might read it and experience different information.
  • Knowledgeknowledge: An interconnected system of information in a mind. , in contrast to information, is what comes after the process of perceiving and interpreting data. It is the accumulation of information received by a particular entity. The authors do not get into whether that entity must be human—can cats have knowledge, or even single-cell organisms, or even non-living artifacts?—but this idea is enough to distinguish information from knowledge.

Some scholars have extended this framework to also include  wisdom 7 7

Jennifer Rowley (2007). The wisdom hierarchy: representations of the DIKW hierarchy. Journal of Information and Communication.

, suggesting that wisdom is somehow constructed out of knowledge, and concerns at underlying questions of “why is” and “why do” 9 9

Milan Zeleny (2005). Human Systems Management: Integrating Knowledge, Management and Systems. World Scientific.

. Others have argued that this extension is unnecessary, as those are just other forms of knowledge. With or without wisdom, most scholars have settled on these three concepts broadly, with continued debate about their nuances.

One challenge with all of these conception of data, information, and knowledge, is the broader field of  epistemologyepistemology: The study of how we know we know things. , which is a branch of philosophy concerned with  how  we know that know things. For example, one epistemological position called  logical positivism  is that we know things through logic, such as formal reasoning or mathematical proofs. Another stance called  positivism , otherwise broadly known as empiricism, and widely used in the sciences, argues that we know things through a combination of observation and logic.  Postpositivism  takes the same position as positivism, but argues that there is inherent subjectivity and bias in sensory experience and reasoning, and only by recognizing our biases can we maintain objectivity.  Interpretivism  largely abandons claims of objectivity, arguing that all knowledge involves human subjectivity, and instead frames knowledge as subjective meaning. These and the many other perspectives on what knowledge is complicate simple classifications of data, information, and knowledge, because they question what it means to even know something.

A photograph of a field with a gazelle in the back and a giraffe in the front, blurred
Context conveys meaning to information.

These many works that attempt to define information largely stem from mathematical, statistical, and organizational traditions, and have sought formal, abstract definitions amenable for science, technology, and engineering. However, other perspectives on information challenge these ideas, or at least complicate simplistic notions of “messages”, “recipients”, and “encoding”. For example, consider the work of behavioral economists 4 4

Jennifer Lerner (2015). Emotion and decision making. Annual Review of Psychology.

, which has found that people do not simply receive and interpret information and translate it into knowledge. Instead,  emotions  interact with and mediate our interpretation of data, influencing the information we receive, strongly shaping our knowledge, and therefore decisions. This suggests that information, far from being an objective process of receiving and interpreting data, is partly a subjective emotional process, shaped by the many emotional and cognitive biases that humans find hard to overcome.

An example of such bias is  confirmation bias 6 6

Raymond Nickerson (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology.

, which is the tendency for people to look for, interpret, and remember information in a way that supports their prior beliefs and their values. This bias is often unconscious, and leads people to ignore or discount information that might change their beliefs, and interpret ambiguous information in ways that supports their beliefs. Especially for matters that are emotionally charged, confirmation bias can be amplified in social media, where it is not simply the person curating what information they attend to and how they interpret it, but also algorithms. If information is indeed a process of interpreting data through an emotional lens, then it is an inherently biased one.

A second aspect of information often overlooked by mathematical, symbolic definitions of information is  contextcontext: Social, situational information that shapes the meaning of information being received. . This idea of context is implied in Bateson’s notion of difference, Buckland’s notion of information being a  thing  in a context, and implicit in Meadow and Yuan’s formulation in the recipient’s perception of information. But in all of these definitions, and in the work on the role of emotions in decisions, context appears to play a powerful role in shaping what information means, perhaps even more powerful than whatever data is encoded, or what emotions are at play in interpreting data. For example, in 1945, novelist and cultural critical Michael Ventura, said:

Without context, a piece of information is just a dot. It floats in your brain with a lot of other dots and doesn’t mean a damn thing. Knowledge is information-in-context … connecting the dots.

Michael Ventura

To illustrate his point, consider, for example, this sequence of statements, which reveals progressively more context about the information presented in the first statement.

  • I have been to war.
  • In that war, I have killed many people.
  • Sometimes, killing in that war brought me joy and laughter.
  • The war was a game called  Call of Duty Black Ops: Cold War .
  • The game was designed by Treyarch and Raven Software.
  • I play it with my friends.

The first statement is still true in a way, but each of the other pieces of information fundamentally changed your perception of the meaning of the prior statements. Even simple examples like this demonstrate that while we may be able to objectively encode messages with symbols, and transmit them reliably, these mathematical notions of information fail to account for the meaning of information, and how it can change in the presence of our emotions and other information that arrives later. Defining information simply as data perceived and understood is therefore overly reductive, hiding the complexity of human perception, cognition, identity, and culture.

It also hides the complexity of context. Consider, for example, the many kinds of context that can shape the meaning of information:

  • How was the information created?  What process was followed? Was it verified by someone credible? Is it true? These questions fundamentally shape the meaning of information, and yet are rarely visible in information itself (with the exception of academic research, which has a practice of thoroughly describing the methods by which information was produced), and journalism, which often follows ethical standards for acquiring information, and sometimes reveals sources.
  • When was the information created?  A news story that was released 5 years ago does not have the same meaning that it does now; our knowledge of the future that occurred after it was published changes how we see its events, and how we interpret their meaning. And yet, we often do not pay attention to when news was reported, when a Wikipedia page was written, when a scientific study was published, or when someone wrote a tweet. Information is created in a temporal context that shapes its meaning.
  • For whom was the information created?  Messages have intended recipients and audiences, with whom an author has shared knowledge. This chapter was written for students learning about information; tweets are written for followers; love letters are meant for lovers; cyberbullying text messages are meant for victims. Without knowing for whom the message was created, it is not possible to know the full meaning of information, because one cannot know the shared knowledge of the two parties.
  • Who created the information?  The identity of the person creating the information shapes its meaning as well. For example, when I write “Being transgender can be hard.”, it matters that I am a transgender person saying it. It conveys a certain credibility through lived experience, while also establishing some authority. It also shapes how the message is interpreted, because it conveys personal experience. But if a cisgender person says it, their position in relation to transgender people shapes its meaning: are they an ally expressing solidarity, a mental health expert stating an objective fact, or an uninformed bystander with no particular knowledge of trans people?
  • Why was the information created?  The intent behind information can shape its meaning as well. Consider, for example, when someone posts some form of hate speech on social media. Did they post it to get attention? To convey an opinion? To connect with like-minded people? To cause harm? As a joke? These different motives shape how others might interpret the message. That this context is often missing from social media posts is why short messages so often lead to confusion, misinterpretation, and outrage.

These many forms of context, and the many others not listed here, show that while some aspects of information may be able to be represented with data, the social, emotional, cultural, and political context of how that data is received can shape information as well. Therefore, as D’Ignazio and Klein argued in  Data Feminism , “the numbers don’t speak for themselves” 3 3

Catherine D'Ignazio, Lauren F. Klein (2020). Data Feminism. MIT Press.

: no information is neutral; all information reflects the social processes that created them, including their values, beliefs, and biases. Context is how we identify those values, beliefs, and biases.


Returning to my experiences as a child, the diverse notions of information above reveal a few things. First, while the data contained in the books, news, and magazines of my youth might not be different kind from that in my adulthood, the information  is  different. The social context in which I experience it changes what I take from it, my motivation to seek it has changed, and my ability to understand how it was created, by whom, for what, and when has been transformed by computing. Thus, while the “data” behind information has changed little over time, information itself has changed considerably as media, and the contexts in which we create and consume it, have changed in form and function. And if we are to believe the formulations above that relate information to knowledge, then the knowledge I gain from books, news, and magazines has almost certainly changed too. What implications this has on our individual and collective worlds is something we have yet to fully understand.

Want to learn more about the importance of context in information? Consider these podcasts:

References

  1. Gregory Bateson (1970). Form, substance and difference. Essential Readings in Biosemiotics.

  2. Michael K. Buckland (1991). Information as thing. Journal of the American Society for information Science.

  3. Catherine D'Ignazio, Lauren F. Klein (2020). Data Feminism. MIT Press.

  4. Jennifer Lerner (2015). Emotion and decision making. Annual Review of Psychology.

  5. Charles Meadow, Weijing Yuan (1997). Measuring the impact of information: defining the concepts. Information Processing & Management.

  6. Raymond Nickerson (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology.

  7. Jennifer Rowley (2007). The wisdom hierarchy: representations of the DIKW hierarchy. Journal of Information and Communication.

  8. Claude E. Shannon (1948). A mathematical theory of communication. The Bell System Technical Journal.

  9. Milan Zeleny (2005). Human Systems Management: Integrating Knowledge, Management and Systems. World Scientific.