Regular Expressions:
This
node serves as an introduction to
regular expressions.
It covers concatenation, the union operator
'|' and star
'*'
These are the only
necessary operators.
Other Operators simply make the expression more simple, but do not add to the expressive
power of the expressions themselves.
Consider a
language to be a
set of
elements.
An
expression is used to describe the elements of the set without explicitly
listing them.
Consider a
regular language as a set of
strings
A
regular expression is used to describe all strings in the set without explicitly listing every possible one.
Examples:
{"Do"} is a language, or a set containing one element. That element is
the string, "Do".
A
suitable expression for this language would be:
Do
{"Do","Die"} is another language. It consists of two strings, "Do" and
"Die".
A suitable expression for this language would be:
Do|Die
This says that if a string is made up of either the
sequence of
chars "D
o" or the sequence of chars "Die", then it is in the language .
{"Do","Die","Dig"} similarly can be expressed by:
Do|Die|Dig
Now consider the language, {"Dos","Dies","Digs"}
In the same manner as above, we can describe this language with the expression:
Dos|Dies|Digs
Notice that there is a
common char 's' at the end of each string. By
concatenation, this can be distributed over the expression:
(Dos|Dies|Digs) is equivalent to
(Do|Die|Dig)s
Also, notice that there is a common char 'D' at the beginning of each string. This can be similarly
distributed over the expression:
(Do|Die|Dig)s is equivalent to
D(o|ie|ig)s
This latest expression says that every string in the language begins with a D, which is followed by one of ("o","ie","ig") and ends with an s.
If a string does not
follow this
form, it is not in the language described by the expression.
Notice that the expression:
D(o|ie|ig)s is
also equivalent to
D(o|(i(e|g)))s
All of the languages mentioned so far have a finite number of elements.
Regular expressions can also describe languages that are made up of an infinite number of elements.
Consider the language,
{"Dos","Doso",Dosoo","Dosoo",...}
U
{"Dies","Dieso","Diesoo","Diesooo",...}
U
{"Digs","Digso","Digsoo","Digsooo",...}
This contains all elements described by the regular expression
D(o|ie|g)s as well as all strings that start with one of these members, followed by any number of the letter "o".
A regular expression that describes all strings consisting of zero or more "o"'s would be:
o*
This describes the set {
e,"o","oo","ooo",...}
We can express the concatenation of this set with the last set with the
regular expression:
D(o|ie|g)so*
More regular expressions:
One digit, 0 Through 9:
(0|1|2|3|4|5 ... |9) or (0-9)
At least one digit:
(0-9)(0-9)*
All decimals:
(0-9)(0-9)*("."|e)(0-9)*
A regular expresion over {0,1} where "0" is the third digit from the last:
(0|1)*0(0|1)(0|1)
A regular expresion over {0,1} where "0" is the
fourth digit from the last:
(0|1)*0(0|1)(0|1)(0|1)
A string consisting of an even number of ones and zeros
(e|01|10|00|11)* or ((0|1)(0|1))*