Regular Expressions:

These are expressions that can be built up from individual characters in a language (or epsilon)  with the operations of concatenation, union, and Kleene closure (or star closure).

Programming languages that use regular expressions (or regexps) almost always special extra features (like + that means one or more copies, or a range function 4-7 or something) but these extra features could be emulated with the basic operations if you wanted to.


Regular expressions are not unique.  There is more than one way to express a language with a regular expression.

Classic example:  Strings consisting entirely of a's and b's.

(a|b)*

babbbbab

(a*b*)*
Also describes strings consisting entirely of a's and b's.

Finite automata.  Finite automata is called a theoretical machine (or a "paper machine").  We can describe what it does on paper.  No one actually builds one.  There is a mathematical definition for it that is used in formal proofs, but you don't need the formal mathematical definition to get an idea of what it does and what it's for.


If you have two regular expressions, how might you decide (or program) whether the two regular expressions generate the same language?

Convert each expression to an automaton.  Minimize the automata.  Check if they match.

How would you even represent an automaton as a data structure?