This is what I said last time (proving that prime numbers are not a regular language): Let's say that p is our p-value. Find some prime longer than p digits whose first digit is 1. There are an infinite number of these. <== There are an infinite number of primes beginning with any sequence of digits you can name. There an infinite primes beginning with 6007006007. So, that was true, but not terribly relevant. Then pump the 1. If our prime is 1. Then look at these numbers: 1, 11, 111. One of these three numbers has a digit sum divisible by 3. The one divisible by 3, can't be prime. My error was saying "Pump the 1." Actually, you can't choose which part of the string gets pumped. I have to prove that I can generate a string not in the language, no matter which part gets pumped. So, in short, to prove a language not regular: I don't get to choose the p-value. Given the p-value, I have to choose a string longer than that p-value, but I am allowed to choose a string. (I get to choose it because I only need to find one string that can't be pumped to prove that the pumping lemma isn't working.) Once I choose the string, I have to prove that it cannot be pumped at all. So, I don't get to choose where to pump it. I have to show that any sort of pump will lead to a string not in the language. Prove balanced parentheses are not a regular language. I don't get to choose the p-value. But whatever the p-value is, I get to choose the string. I choose the string with p ('s followed by p )'s. (This is obviously in the language.) my string can be broken into uvw such that length(uv) <= p, length (v) > 0, uv*w is always part of the language. So, I don't get to choose the uvw. That's chosen for me. But given this choice, I do know that length (uv) <= p, length (v) > 0. Can I then show that by pumping it enough times, I get a string that's not in the language. So, in the case of balanced parentheses. I don't get to choose u, v, w, but since length (uv) <= p, AND the first p characters are ('s. I know that u and v consist entirely of ('s. So, when I pump v, I know that v has to contain at least 1 (. So, if I pump v even once, I add more ('s to the string without adding any )'s. Whatever my u and v were, after just one pump, I now have a string not in the language. The pumping lemma therefore failed, so this language is not regular. Now, getting back to prime numbers. I can't choose the p-value; it is chosen for me. Once I have a p-value, I can choose any string I want as long as the length is greater than p. In this case, I don't care which prime I get, as long as the length is greater than p. (There are infinite of primes, I can always find a prime as long as I need it to be). Now, I can't choose what u, v, w are, but I know that length(uv) <= p, and I know that length (v) > 1. So, let's take a look at v. It might be one digit like 7, It might be two digits like 32, it might be longer, like 24352452354325. Look at v as a stand-alone integer. Not counting 2 or 5 (2 and 5 do not work because since we're in base 10, 2 and 5 don't behave like other primes), find any prime that does NOT go into v. Also don't pick the whole number uvw as your prime. This prime does not go into v. It also does not go into uvw, since uvw is prime. For any prime, other than 2 or 5, there is divisibility test for that prime involving a linear combination of its digits. 3: ones + tens + hundreds + thousands + ... 7: (1,3,2,6,4,5,...)*(1,10,100,1000,10000,100000) 11: (1,10,...) * (1,10,...) 497 7 + 10*9 + 4 = 101 1 + 10*0 + 1 = 2, 497%11 = 2. Since our number v is not divisible by the selected prime, our divisibility test will not give us a number divisible by the prime either. uvw also fails the divisibility test. ,= If uvw % P = m and v % P = n then look at uvvw % P = m+n <== This is not quite right. uvw, uvvw, uvvvw, uvvvvw, etc. One of these is going to end up being divisible by P. For example, 113 is prime. Let's say the 11 gets pumped. (11)3 is not divisible by 3. 1+1+3 = 5. 11113 is not divisible by 3. 1+1+1+1+3 = 7. 1111113 is divisible by 3. 1+1+1+1+1+1+3 = 9. Let's say it's the 3 that's supposed to get pumped. 11(3)* is never ever going to be divisible by 3 because 3 113 : 5 1133: 8 11333:11 113333:14 1133%7 = 6 11333%7 = 0 Pumping Lemma for Context-Free Languages. Every CFL has a p-value so that every string whose length is >= p can be split into uvwxy, with length (vx) > 0 (in other words, maybe v or x will be empty, not both). length (vwx) <= p And u v^i w x^i y is also in the language for i >= 0. In other words, it's very similar to the pumping lemma for regular languages but in this case you pump two strings: the v and the x AND this is very important, you can pump v and x as many times as you wish but you must pump v and x the same number of times as each other. In regular grammars, the p-value derives from the number of states in the finite automaton. In CFL's, the p-value derives (not equal to but derives from) from the number of production rules in the Chomsky Normal Form grammar for that language. So, let's take a look at balanced parentheses. I don't know what the p-value for the CFL for balanced parentheses, but let's look at a string that is "reasonably" long. ()()()()() <== This is remarkably easy to pump. In a regular grammar, the pumping part has to come within the first p characters, but that's not true in CFL's. length (vwx) <= p, but vwx can occur pretty much anywhere. So, just find the first ) in the string. The first ) must be preceded immediately by a left. You can pump that left and that right as much as you want and still have balanced parentheses. u = empty v = empty w = empty x = () y = ()()()() length (vwx) < p...and length(vx) > 0 ()()()()()()()()()()()() ((()()())) u = (( v = empty w = empty x = () y = ()())) ((()()()()()()()()()()()()())) Here's another CFL (which is not regular). {a^n b^n} All strings with an equal number of a's and b's: all the a's come first, all the b's come later: ab, aabb, aaabbb, aaaabbbb, these strings are all in the language. Given any (large) string, how would we define uvwxy? u = a^(n-1) v = a w = empty x = b y = b^(n-1) aaaaa aaaaaaa bbbbbbb bbbbb Since v and x are pumped together there will still be the same number of a's as b's. We use the pumping lemma to prove that a language is NOT context-free. Classic example. {a^n b^n c^n} equal number of a's b's and c's with all the a's first then all the b's then all the c's. 1. I don't get to choose the p-value. I'm given the p-value. 2. I get to choose a string longer than p-value. I choose a^p b^p c^p p a's followed by p b's followed by p c's. Now, I don't get to choose how to break up the string. uvwxy. But I do know that however it's broken up length (vwx) <= p. This means that if vwx contains an a, it cannot possibly contain a c. Because if it contains even one a, it can't have more than p-1 b's. Since there p b's, vwx can't contain all of them and the c's come after the b's. Suppose p = 4: aaaabbbbcccc There is no way to choose 4 consecutive characters with both an a and a c. So, my vwx is going to be missing an a or a c or possibly both. So, when I start pumping my v and x...some letter isn't going to get repeated. I might increase my number of b's but either a or c isn't going to be increased. So, if I pump even once, I no longer have an equal number of a's b's and c's. So, this language is not context-free. Now, intuitively, you imagine that PDA is a NFA associated with a stack. And you can see how to parse a^n b^n: When you are reading the a's, put something on the stack, when you are reading the b's pop something off. If the stack ends up empty, there must be an equal number of a's and b's. But when you get to the c's, the PDA is not powerful enough to remember how many a's and b's there were, so it doesn't know how many c's to look for. Context-Sensitive Languages: not studied as much because they're difficult to work with. A Context-Sensitive Language is represented by a Context-Sensitive Grammar. The big difference is this. Context-Senstive Grammars are allowed to have terminals on the left: Valid production rules are such that the strings on the left have at least one nonterminal, and that the string on the right is at least as long as the string on the left. S->(S) (S)->[T] T->[T] [T]->{U} U->{U} {U}->{{}} S (S) ((S)) ([T]) ([[T]]) (({U}]) (({{U}}]) (({{{{}}}}]) The thing about Context-Sensitive languages is that they are "solvable": If you give me a grammar and a string, I can write a program that can tell you definitively yes or no whether that string is generated by that grammar.