Information Theory is a branch of communications theory which originated with the work of Claude Shannon, a researcher at the Bell Telephone Laboratories during the late 1940s and '50s. In 1949, Shannon published a famous paper entitled "The Mathematical Basis of Communication" where he outlined the basic concepts of Information Theory.

Between about 1950 and 1960, Information Theory was
influential in music theory circles as well as in linguistics.
Leonard Meyer's *Emotion and Meaning in Music*
published
in 1956 was influenced by many of the basic concepts
underlying Information Theory.

Information Theory depends upon a precise (but limited)
definition of the word
*information*
which answers the question of how to define the
*quantity*
of information contained in a message being
transmitted, received, or stored.

(As you might imagine, the telphone company would have more than an idle interest in measuring the quantity of information, since this might allow better ways of calculating prices for telephone services. If the telephone company was to expand to provide more than voice communications, one might wish to know, for example, whether a picture is truly worth 1,000 words.)

There are a number of ways one can approach measuring
the quantity of information in a given message.
In the case of audible messages, one might simply
measure their duration;
for printed messages, the number of words or letters
could be tabulated.
Shannon was looking for a more universal (and more precise)
way of measuring and comparing messages --
a way that is independent of the medium through
which the message is communicated.
He came up with the idea of measuring the quantity
of information by considering the message's
*predictability* -- that is, the message's
surprise value.

Shannon posited the notion that messages could be compared according to how probable or improbable they are. Some messages would be highly predictable while others would be highly unpredictable. Most messages would, of course, occupy a middle ground.

One of the foundational ideas of Information Theory is,
then, the notion of a continuum of messages from very
*redundant* or *banal* (that is, highly predictable)
to the very unpredictable (or what Information Theory
dubs *original*).

As an informal illustration, consider the following texts. The initial texts are highly predictable, whereas the later texts are highly unpredictable:

- One two one two one two one two one two one two one two one two one two

- Look both ways.

Sale.

Two scoops of raisins in a package.

Keep right.

- "Hello."

"Hi."

"How's it going?"

"Pretty good. And you?"

"All right. What's up?"

"Not too much ..."

- One of the most important road signs in North America is the common "stop" sign. The STOP sign is an eight-sided (octagonal) flat surface with a red background. Prominently displayed in the center of the sign are four upper-case letters: "S" "T" "O" and "P". A STOP sign is usually bolted to a metal pole, which, in turn is firmly embedded vertically in the ground.

- Mozart was particularly fond of billiards. Instances are recorded of his stopping in the middle of a game to make notes, or of his humming, as he played, a theme which later turned up in one of his compositions. There is no doubt that Mozart spend some of his most fruitful hours alone at the billiard table with his notebook at this side.

- Being glued to the throne, Chuck sucked a swig of the knuckleberry pumpkin jug and wagged a gollywog jiggle to the lascivious Liszt oozing from the Seeburg. "One lum or two?" he mused of the frog squirming in his throat. Chuck oogled the investment councillor poised on the station platform dressed in standard Shriner attire. "Some slicker," he thunked.

- An average danger or shopkeeper resented and likely imitates against footsteps issuing, as she did, to prosperity as well as from exercise. Along from obese looms, they posed a discontent of the dozens.

- Cobbler oblique pickled rainbow dowling rutila hunk empathy chug arcane promiscuously cartel hostel jetty bandit seedling gag circumscribe bomb.

Information Theory defines the quantity of information conveyed by a particular message as inversely proportional to the predictability of that message; when a message is entirely certain (that is, its probability is 1), then the quantity of information conveyed is zero. When a message is totally improbable (that is, its probability is 0), then to receive such a message would be to receive an infinite quantity of information.

Most messages we encounter lie between the extremes
of entirely certain and impossible.
In fact, we never receive impossible messages by
definition -- so we will never encounter messages of
infinite information quantity.
Nevertheless, information
*can* run very high for certain sorts of
messages -- as, for example, when a tossed coin
happens to land on its edge.
Highly improbably things *do* happen.

Probability is the mathematical study of randomness. Probability theory assumes that all probabilities can be represented by numbers between zero and one. A probability of zero indicates that the state simply never occurs. A probability of one indicates that the state is absolutely certain. A probability of .5 indicates that there is a fifty-fifty chance that the state will occur. An example of a probability of .5 is the probability that the toss of a fair coin will land heads. Probabilities can also be represented by fractions between zero and one, rather than by decimal numbers.

Probability has its own language:

p(A) means the probability of the occurrence of state `A' p(X)=0 means that state X will never occur p(Z)=1.0 means that the occurrence of state `Z' is certain p(a)=.5 means that the probability of the occurrence of state `a' is fifty percent p(AB) means the probability of the occurrence of the state `AB' or the probability of the occurrence of the state `A' followed by the state `B' p(B|A) means the probability of the occurrence of state `B' given the prior circumstance state `A' p(C|AB) means the probability of the occurrence of state `C' given the prior circumstance of state `A' followed by state `B' p( ) means the probability of an unspecified state

The Greek letter phi is used to indicate the "null" state, or non-existent state. The symbol is useful in order to identify the situation before any state occurs or after the last state has occurred.

Information Theory ultimately measures the quantity of
information contained in a message in
*bits*.
One bit of information is equivalent to the information
of a single question which may be answered with only Yes
or No.
Such questions may be simple or sophisticated.
For example, a simple question might be:
"Is the message made up of arabic numerals?"
A complex question might be:
"Is the message a key signature with two or more
sharps occuring on an alto clef?"
Information Theory attempts to devise sophisticated
"questions" by which messages of apparently high
information content are reduced to their minimum.

In most common messages, the meaning of the elements or symbols which make up the message is usually dependent to some extent on the context. Thus a quarter note on the third line of a treble staff is different from a quarter note on the third line of a bass staff.

When rolling dice, we know that the number rolled is independent of numbers previously rolled (that is true even for loaded dice). In contrast, other types of probabilities may be dependent upon preceding states -- for example, the probability of the occurrence of the letter "u" in English text is considerably increased when the preceding letter is "q". Likewise, in tonal music, the probability of occurrence of the tonic degree is increased when preceded by the leading tone.

One concern of conditional probability is the
*depth*
of the
*context of dependency* -- that is,
do remote previous states also constrain
the probability of occurrence, or only just the
immediately neighboring states?
The size of the context of probabilitistic
influence is called the
*frame of predictability*.
When the probability of occurrence for elements is
totally independent of preceding elements
(as with fairly thrown dice) the predictability
frame is called a *monogram*;
frames which take into account a single
immediately preceding element are called *digrams*;
*trigrams* denote frames of predictability
which take into account two immediately preceding elements,
and so on.

Probabilisitically linked chains of events
(where the occurrence of an event is dependnent
upon preceding outcomes) are called
*Markov chains*.
Markov chains are ordered sequences of symbols
(elements or states) which are related to neighboring
symbols by consistent conditional probabilities.
Such sequences were first studied by the Russian
mathematician Andrei Markov (Sr.) in 1906-1907.

*Markov chains* are linear sequences of states
which are related to their neighboring states by
consistent conditional probabilities.
A `state' may be anything, from a letter, word, or note,
to a condition such as happy, on vacation, or pregnant.

Conditional probabilities can be represented via a
table or *probability matrix*.
The antecedent states are listed vertical along
the edge of the table and the consequent states are
listed horizontally across the table.

This table specifies all the possible relationships between the sequencing of pairs of wors. The table indicates, for example, that the only word which may follow the word "over" is the word "the" -- no other word is permitted to follow it. Similarly, the word "athletic" can be followed by any one of three words: "bovine," "cow," or "dog" -- with the word "cow" being the most favored.

Given the probabilities specified in the above table, the following "sentences" would be considered to be Markov Chains because each of the state transitions (one word following another) have probabilities greater than zero.

The cow jumped over the moon.

The athletic dog pole-vaulted over the fence.

The full moon.

The spritely bovine ambled over the athletic cow jumped over the dog jumped over the fence.

Conversely, the following "sentences" would *not*
be considered to be Markov Chains based on the information
given in the table:

The moon jumped over the fence.

("jumped" cannot follow "moon")

The dog pole-vaulted over the athletic bovine.

("bovine" cannot end a sentence)

The crazy cow hopped over the moon.

("crazy" and "hopped" are unknown states)

Some markov chains are more probable than others. Thus the sentence:

has a high probability of occurrence. While the sentence:The cow jmped over the moon.

has a lower probability of occurrence. The probability of each individual sentence can be calculated by merely multiply together the probabilities of each state transition:The dog ambled over the dog ambled over the full moon.

Markov chains can be of "low order" -- taking into account only one or a few previous states; or they can be of "high order" -- involving probabilities related to the appearance of several previous states.

The following "hymn tunes" were generated by computer from an analysis of the probabilities of notes occurring in various hymns. A set of hymn melodies were encoded (all in C major). Only hymn melodies in 4/4 meter and containing two four-bar phrases were used. The first "tune" was generated by simply randomly selecting notes from each of the corresponding points in the analyzed melodies. Since the most common note at the end of each phrase was `C' there is a strong likelihood that the randomly selected pitch ending each phrase is C.

From Brooks, Hopkins, Neumann & Wright. "An experiment in musical composition."IRE Transactions on Electronic Computers,Vol. 6, No. 1 (1957).

© Copyright David Huron, 2001.

This document is available at http://csml.som.ohio-state.edu/Music829D/Notes/Infotheory.html