Grammars

S -> SS
S -> (S)
S -> epsilon

This is the grammar for balanced parentheses.

S

SS
SS S
(S)(S)(S)
()()()

S
SS
S SS
(S)(S)(S)
()()()

If there's more than one way to generate a string from a grammar, the grammar is ambiguous.

Better ambiguous grammar:

E -> n
E -> E+E
E -> E*E
E -> (E)

3+5*4 ==> This should be parsed to give us 23...but will it?

E->S
S->S + S
S->P
P->P*P
P->n
P->(E)

E
S
S+S
P+P
P+P*P
3+5*4

For now, this is a perfectly good way to express balanced parentheses.

S -> SS
S -> (S)
S -> epsilon

Based on this grammar, we can come up with inference rules that define balanced parentheses.

Let's call P the set of strings with balanced parentheses.

phi
________________     (1)
epsilon is in P.

[Don't confuse the empty string with the empty set.]

S1 is in P ; S2 is in P
____________________________________________________  (2)
S1S2 is in P [the concatenation of those two strings

S is in P
___________    (3)
(S) is in P



Derive that (()()) is in P.

_______________       _______________
epsilon is in P (1)   epsilon is in P (1)
_______________       _______________
() is in P (3)        () is in P (3)

_________________________________________
()() is in P (2)
________________
(()()) is in P (3)


S->aS
S->T
T->bT
T->b

s is in S
_________  (1)
as is in S

s is in T
_________  (2)
s is in S

s is in T
_________  (3)
bs is in T

_________  (4)
b is in T


Prove abb is in S.

_________
b is in T (4)
_________
bb is in T (3)
__________
bb is in S (2)
__________
abb is in S (1)

Classic ForTran example of bad language design:

DO 100 I = 1,5

This is a for loop in ForTran.  The DO line begins the loop and the loop ends at line 100.  All the stuff in between the DO line and line 100 is the loop body and will be executed 5 times with I taking on the values of 1 through 5 in the process.

ENTER / ERASE
ENTER would read lines of code from a text file.  ERASE removed lines of code from the program.  And ENTER and ERASE could be run as part of the program.  So you could change the program while it was running.


DO 100 I = 1.5

This defined a variable named DO100I and set it to the floating point number 1.5

In ForTran you were totally allowed to put spaces in the variable name.  And unless a variable began with a letter between I and N, it was by default, without being declared a floating point number.

x = 10//**/2
;
printf ("%d\n",x);

Prove that the grammar for balanced parentheses generates the set of strings that can be verified by the counting algorithm:  Start from zero.  Move from left to right.  Add 1 for (.  Subtract 1 for ).  If, at the end of the string the count is 0 and you never went negative anywhere in the process, it's GOOD.  Otherwise, it's BAD.

If it's in set A it must also be in set B.
If it's in set B is must also be in set A.

Suppose a string is in set A (the grammar), prove that it is in set B (the counting algorithm).

We use induction, but not exactly the numerical induction, but induction on the grammar itself.

Can we prove that rule 3 gives us a GOOD in the counting algorithm?
In other words, is epsilon satisfied by the counting algorithm?  Yes (duh). (Base case).

Now, assume that S gives us a good in the counting algorithm, does (S) also give us a GOOD?

Yes, the first ( makes the count 1.  Then the count as we go through S is going to be one higher at every step than it would have been if we hadn't started with (.  When we finish S, we will end at 1, and will never have fallen below 1 in the process (so it's never negative.)  The the final ) takes the count to 0 (and we never went negative).

Now, assume S1 and S2 both give us a good in the counting algorithm, does S1S2 also give us a GOOD?

Yes, as we process the S1 part of S1S2, we never go negative and we're done with S1 part, we're back at zero.  Then as we process the S2 part, we again end at zero, and again, never go negative.  So, yeah.

==> Proven

Assume the counting algorithm gives us a GOOD.  Can we prove that the string is generated by a grammar?

Let's say the count never makes it to zero at any point in the process, except at the beginning and end.
If it's the empty string we're looking at, well, that is automatically generated by rule 3, so it's generated by the grammar.

If we're looking at a string that is not the empty string, but still the count never goes to zero except at the beginning and end.  From this we derive that the first character must be ( and the last character must be ).  (Because we did start and end at zero, so those have to be the first and last characters.)

Assume its true for strings shorter than the given string.
Look at the substring between the first ( and the last ).  Since the count algorithm never goes below 1, if we omit the first (, the count is one lower than it would be otherwise, and so never goes below 0.  And it ends at 0.  So by the inductive assumption that string IS generated by the grammar, and if we call this string S, then (S) is also in the grammar because that is rule 2.

OK, now assume that the count DOES hit zero at some internal point, and again, we're assuming the inductive assumption that it works for strings shorter than my own.

Find the first internal point at which the count is zero.
The substring from the beginning to that internal point has no internal point of its own where the count hits zero.  By the earlier case, since the count hits zero and there's no internal point where it actually hits zero, that must be generated by the grammar.  Call this S1.  Now look at the rest of the string from that point to the end.  This is a shorter string than the current string (and it may or may not have other internal spots where the count hits zero), but it's still shorter than ours and so by our inductive assumption, this one is also generated by the grammar, call it S2.

Now, from rule 1, since S1 and S2 are generated by the grammar, S1S2 is as well.

QED.