## Shannon’s Theory of Information

### Information is Not Meaning. It Symbolizes Meaning.

What does it mean to have a theory of information? How can information be studied, categorized or quantified? How can we measure the information in a Shakespeare play or a sad message you receive about the death of a close friend? The answer starts with realizing that information is not the same as meaning. Where meaning is something experienced by a sender or receiver of information, information itself is something more basic. In the realm of human experience, information is to meaning as the motion of objects is to classical ballet. When humans communicate information they do so symbolically, whether by sound, the written word, gestures, and even art or music. The symbols themselves don’t actually contain meaning but rather select meaning out of a shared context between the sender and the receiver. (I recommend reading my previous article on Information and Meaning if you need more clarification).

It was this realization that led mathematician Claude Shannon to develop an entire theory of information. Shannon was working for Bell Laboratories in the mid 20th century on a project to solve an engineering problem about reliable communications on noisy telephone lines. He not only solved that problem, but in the process ended up founding the modern science of information theory on which we have based all our modern communications and computing technology. But Shannon’s theory reaches beyond engineering to go deep into quantifying information as something as fundamental to the universe as energy or matter. It would not be wrong to think of Claude Shannon as the “Isaac Newton” of information. Where Newton developed a “System of the World” for motion all throughout the universe, Shannon has done the same for information.

### Information Reduces Uncertainty By Answering a Question

Shannon’s theory is based on just a few simple concepts. First, the function of information is to inform,. Shannon says that we are informed to the degree that our uncertainty about something is reduced. Uncertainty is not knowing the answer to a question. Information reduces uncertainty by answering a question. Think of uncertainty as a hole, and information is that which reduces the size of that hole.

### Uncertainty is Related to the Number of Possible Answers to a Question.

Complex questions can be broken down further into simpler sub-questions until you reach the point where the simplest questions have only a certain number of possible answers. Let’s call that number N. For example, suppose you wanted to know the exact date that the first shot was fired in the Civil War. That question can be broken down to separate questions about the day of the week, the month, day of the month, and the year. The number of possible answers for the day-of-week question is seven, so in this case N = 7. Suppose you only have a vague idea for when the war began. Let’s say that for the day-of-week question, you are completely clueless. Although there are only seven possible answers, they are all equally probable as far as you are concerned. This represents the maximum uncertainty for this question for this case where the number of possible answers is N = 7.

Although you are maximally uncertain about the week or the month the shot was fired, Shannon would say that the uncertainty about the month is higher than the uncertainty about the week, because any particular month has a one in twelve chance of being the answer, whereas any particular week has a only one in seven chance of being the answer. Where all answers are equally probable, the question with the highest N has the highest uncertainty. ^{1)}You can get a better feel for this by considering the extreme case where one question is was it day or night that the shot was fired (N = 2), vs what was the name of the person who fired the shot (N = some very large number). As you might be completely clueless about the answer to either question, you can understand why the uncertainty about the name of the shooter is far higher than the day or night answer.

### The Possible Answers Are Represented by a Set of Symbols That Make Up a Code.

Shannon says that if each question has a particular number of possible answers (N), the answers to a particular question can be represented by a set of N unique symbols. It doesn’t matter what the symbols are as long as there are N symbols that can be distinguished from each other by both the sender and the receiver of information. The sender needs to be able to select and send a symbol that the receiver can associate with one of the possible answers. The set of N symbols that represent the answers to a question is called a Code. ^{2)}For example, one code would be English words. The list of symbols in the Code for the day and night question could simply be [daytime, nighttime]. Similarly, days of the week could be the usual English [Mon, Tue, …. , Fri] and so forth. The Code for the month might be [Jan, Feb, Mar, …, Dec].

### The Smallest Code With Information has Two Symbols

Consider a question that has only one answer, where N = 1.. Your uncertainty about that answer is zero, so a message containing the one symbol in the code will not reduce your uncertainty. A question with two answers, however, has the possibility of uncertainty, such as the daytime-nighttime question above. Any information carrying code has to have at least two symbols.

### Information Can Be Measured in Bits

You can represent the symbols in a code by numbering them from 0 to N-1. The Code for [Daytime, Nighttime], can be represented by [0, 1], for example. If you encoded that in binary bits, you would only need one bit since one bit can be either 0 or 1. The answer to that question can be sent with one binary bit set to either 0 or 1. A question that has more possible answers needs more bits. A Code for the day of the week needs seven symbols that could be numbered 0 through 6. That range of numbers would require 3 bits ^{3)} Three binary bits can encode the numbers 0 through 7 with the following sequences [000, 001, 010, 011, 100, 101, 110, 111]. The general rule is that where the number of symbols in the code is N, the number of bits is I = ln2(N) ^{4)} Ln2(N) is the log base 2 of the number . This is the average amount of information in a symbol in a Code of N symbols.

### The Possible Answers Are Not Always Equally Probable

In some cases the receiver might not consider all possible answers as equally probable. Consider a weather application that is telling you the current temperature. Since you are fully aware of the season or can see that it is snowing outside, all possible temperature messages from the weather service to you are not equally probable, as temperatures near freezing in this particular case are far more probable than a temperature of 90%. A weather application could exploit this by knowing the average daily temperature for each day of the year averaged over the last five years, and send only the differences from that average from the service to your phone. The code could use fewer bits for smaller differences coming closer to an “optimum” code.

This is much more than an engineering exercise that saves you money on your phone’s data plan. It actually represents the amount of information carried by each symbol in the Code. Since your expectation increases for values closer to the average daily temperature, receiving one of those values has less influence on your already low uncertainty. This is consistent with what was stated above, that the amount of information carried by a symbol in a Code is directly related to how much uncertainty it reduces in the receiver.

### Not All Symbols Carry Information

Consider two different messages, with one saying “Last night’s lowest temperature was 10 degrees F”, or “Last night’s lowest temperature was a **frosty** 10 degrees F.” The addition of the word “frosty” adds no additional information to a human receiver. If the person already has no uncertainty about what 10 degrees feels like, the word “frosty” does not reduce it further, so it contains no information. A Code could be very inefficient that way using far more bits than the minimum required to reduce the uncertainty.

### Information Always Comes in a Message Containing Symbols Selected From Codes.

Although English words are used for both pieces of information,”daytime,Tuesday” would be a message that answers two questions, with two Codes. The first Code is a set of two possible symbols, followed by a second Code with seven possible symbols. Shannon says that information is always carried in a message that contains one or more symbols that are part of one or more Codes answering one or more questions.

### Information is a Material Thing

This might come as a surprise to the reader, but the consequences of all of this is that no information moves in the world without involving something to do with energy or matter. There is no way you can send a symbol to me without using a pulse of energy or moving some matter somewhere. Bits are more than just a mathematical notion. You cannot transmit of store even one bit without storing it as energy or as a piece of matter. If this were not true, you would never need more computer memory or disk space and you would not have to pay your Internet provider a usage fee for each megabyte of data you sent or received.

This became such a powerful fundamental observation that it led to solving such problems as how many extra bits you might need to use to overcome random noise in a communications channel or in imperfect persistence in computer memory. It even led to linking Shannon Information to other fundamental properties in the universe such as thermodynamics.

### Information Does Not Require an Intelligent Sender or Receiver

Here is the most dramatic outcome from Shannon’s theory. Consider that if Shannon Information is not meaning but symbolizes meaning, and that it informs by reducing the receiver’s uncertainty as to which symbol in a Code was selected by the sender, and if it can specified, stored, and transmitted in a certain number of bits, then Shannon Information is not something that only pertains to human communications. It turns out that Shannon Information is everywhere in nature. We can find it, isolate it, and quantify it just like we can do for a message between two humans. In Shannon’s Information Theory, there is no requirement for an intelligent sender or an intelligent receiver of information. In fact, naturally produced information is flowing all throughout the universe.

Notes and References

1. | ↑ | You can get a better feel for this by considering the extreme case where one question is was it day or night that the shot was fired (N = 2), vs what was the name of the person who fired the shot (N = some very large number). As you might be completely clueless about the answer to either question, you can understand why the uncertainty about the name of the shooter is far higher than the day or night answer. |

2. | ↑ | For example, one code would be English words. The list of symbols in the Code for the day and night question could simply be [daytime, nighttime]. Similarly, days of the week could be the usual English [Mon, Tue, …. , Fri] and so forth. The Code for the month might be [Jan, Feb, Mar, …, Dec]. |

3. | ↑ | Three binary bits can encode the numbers 0 through 7 with the following sequences [000, 001, 010, 011, 100, 101, 110, 111] |

4. | ↑ | Ln2(N) is the log base 2 of the number |