prefix-free encoding

(thing) by ariels (19.3 hr) Thu Mar 30 2006 at 22:32:47

(Coding theory, what did you expect?)

Say one wishes to describe some set X using symbols Σ. A code is a one to one function c:X→Σ* that matches every element of X to a word using the letters of Σ. Call the set C=c(X) of all words used to describe X the set of codewords. c is prefix-free if no codeword is a prefix of another codeword.

Examples and non-examples

  • ASCII, or any fixed-length encoding is prefix-free, since different words of the same length cannot be prefixes of one another.
  • Morse code is not prefix-free! For instance, "." (the code for 'e') is a prefix of "..." (the code for 's'). Indeed, when transmitting Morse code three symbols are used: dot, dash, and a short silence between letters.
  • Any code that has a distinct stop symbol is prefix-free. Morse code considered with the 3 symbols above is one example; null terminated strings are another.
  • Consider X={a,b,c,d,e}. The code
    a ⇒ 01, b ⇒ 00, c ⇒ 100, d ⇒ 1011, e ⇒ 1111
    is prefix-free, but if we reverse the encoding of each letter the code is no longer prefix-free.

Uses

A prefix-free encoding is useful because it is self-delimiting: if we have any string of symbols, there cannot be two ways to decode it into a sequence of elements of X. This has obvious advantages for transmission, but prefix-free encodings are also useful elsewhere:

  • Huffman codes are prefix-free (even though their reverses often aren't, as above); being prefix-free has obvious advantages in compression, and indeed...
  • Kolmogorov-Chaitin complexity is defined using a prefix-free encoding of Turing machines.
  • If we ignore comments, XML documents are prefix-free: There is one top-level tag, and upon closing it the document cannot be extended.


Yay! Another long short for bq2K6!

Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.