Everything2
Near Matches
Ignore Exact
Full Text
Everything2

the expressive power of regular expressions

created by rp

(idea) by rp (1.2 d) (print)   ?   (I like it!) 1 C! Fri Jul 14 2000 at 10:14:00

The expressive power of regular expressions is the set of languages they can express. This is relevant when comparing different kinds of regular expressions, or when comparing them to other formalisms that express languages.

Many different variants of regular expressions exist but the kind that is used in theory only has literal characters, choice, and Kleene star. For example,


  a u b*
    # the string "a" or any string consisting of "b"s

  (a u b)*
    # any string consisting of "a"s and "b"s

  (ab*)*
    # the same

Comparing to other formalisms, this type of regular expression can express exactly the class of languages expressed by finite state automata: the regular languages. There is no difference in expressive power between them. There is, however, a significant difference in compactness: some classes of regular languages can only be described by automata that grow exponentially in size, while the required regular expressions only grow linearly. (Although I haven't checked, the reverse is probably true as well.)

We can also study expressive power within the formalism. As the example shows, different regular expressions can express the same language: the formalism is redundant. To what extent can this redundancy be eliminated? Can we find an interesting subset of regular expressions that is still fully expressive? Kleene star and choice operator are obviously required, but perhaps we can restrict their use.

This turns out to be a surprisingly difficult problem. As simple as the regular expressions are, it turns out there is no method to systematically rewrite them to some normal form that minimizes the star depth. They are not finitely axiomatizable. So we have to resort to other methods.

This leads to the star height problem.

Many tools that support regular expressions (grep, Perl, PHP, many editors, etc.) support extensions to the syntax. These extensions allow more compact expressions and sometimes serve to optimize recognition; but they rarely add expressive power of the language. The most notable exception is the backreference:


  (ab+)\1
    # an egrep (or Perl) regexp describing {xx | x in {a,b}*}

Note that the bracketed part can be arbitrarily long, which means that left-to-right recognition of this language requires takes an arbitrary large amount of memory, while it only takes a fixed amount of memory for regular languages, since they can be recognized by finite state automata. This is, in fact, one way to define the regular languages. (If backreferencing is limited to expressions that describe only strings up to a certain maximum length, it remains limited to regular languages.)


printable version
chaos

star height problem Mastering Regular Expressions Kleene star Why Robert Heinlein bugs the hell out of me
"a^n b^n" language regex Leaning Toothpick Syndrome finite state automaton
regular expression steps to UNIX familiarity Differential equation punctuation etymology
Myhill-Nerode Theorem monitor bezel asterisk formal language theory
pumping lemma Myhill / Nerode Theorem Redundant .NET
Language
Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.
  Epicenter
Login
Password

password reminder
register

Everything2 Help

Cool Staff Picks
The best nodes of all time:
Do female homosexuals have it easier than male homosexuals?
Why strapping buttered toast to a cat's back will not produce infinite power
Les Fleurs du Mal
Anna Leonowens
central dogma
snapshot
Invader ZIM
How to land a jet plane on an aircraft carrier
human growth hormone
Analysis of Everything
Buy a Gun
There's nothing harder than learning how to receive.
Chihiro Iwasaki
New Writeups
XWiz
Trism(review)
artman2003
Briefcase Full of Souls - Part I(fiction)
Dreamvirus
Alan Ladd(person)
waverider37
Harold Holt(person)
The Debutante
Until death do us part(fiction)
Ysardo
a brother to a sister(personal)
antigravpussy
your warm whispers(personal)
Clarke
Multiculturalism(idea)
aneurin
Earl of Landaff(person)
Heitah
Pseudocide(idea)
XWiz
Google Knol(lede)
Mythi
July 24, 2008(personal)
locke baron
The fall of Earth(fiction)
BookReader
Fear the Cold(dream)
Pavlovna
Kathleen MacInnes(person)
This affordable entertainment brought to you by The Everything Development Company