PUMPING THEOREM FOR REGULAR EXPRESSIONS What we've already seen: DFA's, NFA's and regular expressions and regular grammars all represent the exact same class of languages (which we call type 3). And so, if you want to prove that a certain is regular, all you need to is find the DFA, NFA, regexp or regular grammar that accepts that language. (And usually, that's not particularly hard.) But how do you prove that a certain language ISN'T regular? PUMPING THEOREM: For any regular language, there exists an integer k >= 1, such that for all strings in the language whose length is at least k, the string can be expressed as uvw such that the length of uv <=k, the length of v is at least 1, and with all strings, uw, uvw, uvvw, uvvvw, uvvvvw, etc. are all part of the language. For example, the language consisting of any number of a's followed by any number of b's satisfies the pumping lemma: Let k be 1, and take any string whose length is at least 1. Divide the string into uvw where u is the empty string and v is the first character of the given string and w is rest of the string, If my string is aaabbb, set u = e v = a w = aabbb. aabbb is part of the language. aaabbb is part of the langauge. aaaabbb is part of the language. Any string contained in u v* w is part of the language. With the string aaabbb, I CAN'T set u = aa v = ab w = bb. Because pumping v gives me strings not in the language. aaabababababbb <== NOT in the language. The Pumping Theorem only guarantees a uvw split somewhere in a sufficiently long string. The basic idea for the proof of the pumping theorem is that a DFA has a finite number of states. If a valid string is longer than the number of states, the DFA must return to a previously visited state at some point while processing the string. So, there must be loop somewhere in that DFA which a certain sequence of characters must cycle the loop and end up back where it was, so if you repeat that sequence of characters any number of times, the loop will cycle around that many times and end up at the starting point of the loop (and the rest of the string will still take you to an accept state). You use the Pumping Theorem to prove a language is NOT regular by showing that for any length l, there is a string in the language of at least length l that CANNOT be split into appropriate uvw. You don't have to show EVERY strings fails to pump, you just have one string for any arbitrarily large l that fails to pump. For example: prove that the language consisting of balanced parentheses is not regular. ((())) (()()) ()()() If it's regular, there be some k >= 1 so that for strings in the language whose length is at least k, uvw can be found so that |v| >= 1 |uv| <= k <=== CRITICAL RULE uv*w is a subset of the language. Take any l >= k and consider the string of l ('s followed l )'s. This string is clearly in the language of balanced parentheses. Call it s. Well, any division of this string into uvw guarantees that |uv| <= k, and since l >=k, the uv string must consist entirely of ('s. ((((((((((((((((((((((((()))))))))))))))))))))))))) ^ ^ Since |v| >= 1, v has at least one (. If I pump v, I keep increasing the number of ('s without changing the number of )'s. So, no, these pumped strings are NOT part of the language. I can't choose which v fails to pump. I can choose my example string, but then I have to show that no choice of uv will allow it to pump. Since I found a string of length greater than the threshhold of k that can't be pumped, then balanced parentheses cannot be regular. This being said, ( ( ) ( ) ) can be pumped. 1 1 2 1 2 3 1 2 3 4 5 6 <== NOT REGULAR CONTEXT-FREE LANGUAGES Also called type 2 languages. These languages are more easily developed through the use of CONTEXT-FREE GRAMMARS. All modern programming languages are based around context-free grammars. "Based around" does not mean "exactly generated by" and this is important. Most programming languages are technically context sensitive, not context free, but no one wants to work with context sensitive grammars, so what you do is you design your language with a CFG, and use code to tweak it a bit. The big issue is declaring and using variables. A context free grammar is not powerful enough to tell whether a variable is properly declared. (It can't "match" substrings from one part of the string with another.) Balanced Parentheses: S -> e S -> SS S -> (S) The production rules consist of exactly one non-terminal on the left and anything you want on the right. Starting with S, I can generate any balanced parentheses string I want, and only those strings. (()()) S : (S) : (SS) : ((S)S) : (()S) : (()(S)) : (()())