Pumping Lemma for CFL's

For any context-free language, there is a value k, such that for all strings in the language whose length is >= k, every string can be split into u v w x y, so that |vx| >= 1  (the reason for this rule is to ensure at least one of v and x is nonempty), |vwx| <= k , and for all
n >= 0  u v^n w x^n y are also in the language.  You can have any number of copies of x, any number of copies of v, but you have to the same number of copies of v and x.

As one might expect, the point of this is to that a language is NOT context-free.  If you can show that a language violates the pumping lemma then you know that it can't be context-free.

Now, without going through all the steps of the proof of this, I'll draw a comparison.

For regular languages, we look at the finite automaton.  We know that if a string is long enough, in order to be accepted the automaton must hit some state more than once while processing it.  Thus there is a natural loop that can be traveled than once, and this is the basis of the regular language pumping lemma.

For context-free languages, we look at the parse tree.  Since we have a finite number of rules, and hence a finite number of non-terminals, if a string is long enough, some non-terminal has to appear more than once on some branch of the tree.  This is the loop in this case.  If a terminal ultimately generates itself, then it generate itself as many times as we would like while also generating other copies of substrings on the left and on the right of that non-terminal.  This is the general idea of why the context-free pumping lemma works.

So, classic example:

a^n b^n c^n:  String of a's followed by string of b's followed by string of c's, exact same number of a's b's and c's.

Informally, this is hard to imagine for a CFL since the the pushdown automaton can use the stack while processing a's, pop from the stack while processing b's (so that the number of a's and b's are the same) but now we don't have a stack anymore when it comes time to process the c's.

Can we prove that it is not context-free using the Pumping Lemma.

So, if this language is context-free, there must be a k as described above.  Look at the string a^k b^k c^k.  And divide it into uvwxy anyway you way you like provided of course that |vx|>=1 and |vwx| <= k.

Now, ain't no way that uvw contains a's b's and c's all three.  Which means that since |vx| >= 1, when I start pumping this string, some letter is going to increase in number.  Maybe a's maybe b's, maybe c's, maybe some combination of two thereof, but not all three.  So, pumping it even once will give us an unequal number of a's b's and c's, so that we have generated a string not in the language.

So, this language cannot be context-free.

The Pumping Lemma cannot be used to prove that something IS context-free (however, there are some generalizations that can be used to prove yes or no is a language context-free).  Mostly, though, the way to that a language is context-free is simply to find a grammar for it.

Context-Sensitive Languages:  (aka Type 1 languages)

A context-sensitive language, informally, is a language that you can give can a give a definitive YES or NO to for whether a specific string is in the language, using the resources of a normal computer (with all the memory you require) or a Turing Machine.  So, the language of all prime numbers in base 10--context-sensitive.

Coming up with context-sensitive grammars that actually accept these fairly sophisticated languages (such as primes) is not always easy.  In general, you can design a grammar to mimic or emulate a Turing Machine (and this, incidentally, is how the proof works that Context-Sensitive Grammars can recognize decidable languages on a Turing Machine.

There are two major definitions of a Context-Sensitive Grammar, and they accept the same classes of languages.

The older definition:

The left hand side at least one non-terminal with a string to the left and a string to the right (and the strings can contain any numbers of terminals and non-terminals and can even be empty)  The right hand side contains the left string and the right string but the non-terminal is replaced by a NON-EMPTY string of terminals and non-terminals.

abCd -> abECfgd  would be an example of a rule.  The ab on the left and the d on the right are presevered but the C was replaced with ECfg.

This is where the term "context-sensitive" comes from.

A CFG rule like S->(S) , this means that an S can ALWAYS be replaced by (S) no matter where it is in the string or level of parsing.  Thus, the rule is always valid, free of context.

However, a rule like abCd -> abECfgd  indicates that C can be replaced by ECfg specifically within the context of an ab to the left and a d to the right.  Thus the replacement is dependent upon or sensitive the context surrounding C.  Hence the names.

The more modern definition (which is equivalent)  is that the left side can contain any string with at least one non-terminal.  The right side as well, but the right must be at least as long as the left side.

ABcD -> defG <= OK
ABcD -> defGH <= OK
ABcD -> def <= not OK

These definitions are equivalent.  They generate the same classes of languages.  However, under this definition, the empty string can never be generated.  All Context-Free languages that do not contain the empty string are context-sensitive as well.  And some definitions of a context-sensitive languages allow you to specify in your definition that, yes, the empty string should be included.

(You could also say the start state is not allowed on the right hand side of a production and the rule S->e is allowed.)

Example:  CSG that generates a^n b^n c^n  which we already know is not context-free.

S -> a B C
S -> a S B C
CB -> BC
aB -> ab
bB -> bb
bC -> bc
cC -> cc

S
a S B C
a a S B C B C
a a a B C B C B C
a a a B B C C B C
a a a B B C B C C
a a a B B B C C C
a a a b B B C C C
a a a b b B C C C
a a a b b b C C C
a a a b b b c C C
a a a b b b c c C
a a a b b b c c c

S
a S B C
a a S B C B C
a a a B C B C B C
a a a b C B C B C
a a a b c B C B C