Edit: Here is a more verbose proof. The original proof is retained below.
Introduction
Let $\Sigma$ be some alphabet with $n$ symbols. Let $L$ be the language of words in which each symbol appears an even number of times. Suppose $G$ is a grammar for $L$. Our goal is to given an exponential lower bound on the size of $G$ in terms of $n$.
Transition to Chomsky Normal Form
According to Wikipedia (and not difficult to prove), there is a grammar $G'$ for $L$ in Chomsky Normal Form such that $|G'| = O(|G|^2)$. We will show an exponential lower bound on $|G'|$, and this will imply an exponential lower bound on $|G|$ (if $|G'| = \Omega(c^n)$ for $c > 1$ then $|G| = \Omega(\sqrt{c}^n)$).
For our purposes, a grammar is in Chomsky Normal Form (CNF) if the productions are of the following two types: $A \rightarrow BC$ and $A \rightarrow a$, where $A,B,C$ are non-terminals and $a$ is terminal. Some people also place the restriction that $B,C$ cannot equal the starting symbol $S$, but we do not need this condition.
Word derivation in CNF grammars
Let $w \in L$. Choose some derivation of $w$ in $G'$ (there might be several sharing the same derivation tree). The derivation can be thought of as a binary tree (derivation tree) where each non-leaf node is labeled by a terminal, and each leaf is labeled by a non-terminal; a non-leaf node either has two non-leaf children or one leaf child.
As an example, consider the derivation $$S \rightarrow AB \rightarrow ACD \rightarrow^* xyz.$$ The corresponding derivation tree can be described as follows (sorry for the missing graphics):
- The root is labeled $S$.
- The root has two children labeled $A,B$.
- The node labeled $B$ has two children labeled $C,D$.
- The node labeled $A$ has one child labeled $x$, which is a leaf.
- The node labeled $C$ has one child labeled $y$, which is a leaf.
- The node labeled $D$ has one child labeled $z$, which is a leaf.
Note that in general, several nodes can share a label (indeed, this is how one proves the pumping lemma).
Each node in the tree generates a subword of the original word, which is formed from its descendant leaf nodes in order. Continuing our example:
- The root generates the entire word $xyz$.
- The node labeled $A$ generates the subword $x$.
- The node labeled $B$ generates the subword $yz$.
- The node labeled $C$ generates the subword $y$.
- The node labeled $D$ generates the subword $z$.
A subword lemma
Lemma. Suppose $w \in L$ is of length $m$, and choose some positive $\ell \leq m$. Then there is a node in the derivation tree of $w$ that generates a subword of length $\alpha$ satisfying $\ell \leq \alpha < 2\ell$.
Proof. Run the following algorithm:
- Set $v$ to the root of the derivation tree.
- While the subword generated by $v$ has size at least $2\ell$, replace $v$ by its child whose generated subword is longest.
- Output the subword generated by $v$.
Note that the node $v$ always has two children, since otherwise it generated a subword of length $1$. If the loop in step (2) never runs then certainly $v$ generates a subword of the requisite length, by our assumption that $\ell \leq m$.
Otherwise, consider the last iteration of the loop. Denote by $w(u)$ the subword generated by a node $u$. In the last iteration of the loop, there is a node $v$ with $|w(v)| \geq 2\ell$, with children $v_1,v_2$ such that $|w(v_1)| \geq |w(v_2)|$; we move to $v_1$. Note that $$2\ell \leq |w(v)| = |w(v_1)| + |w(v_2)| \leq 2|w(v_1)|$$ so that $|w(v_1)| \geq \ell$. Since the loop stopped, $|w(v_1)| < 2\ell$ and $w(v_1)$ is of the requisite size.
The proof
We consider the set $P$ of words in $L$ of the form $\pi(\Sigma) \pi(\Sigma)$, where $\pi$ runs over all permutations of $n$ symbols. For example, if $\Sigma = \{0,1\}$ then $$P = \{0101,1010\}.$$ Note that indeed $P \subseteq L$.
Since all words in $P$ are of length $2n$ (recall $n$ is the size of the alphabet $\Sigma$), using the lemma with $\ell = n/3$ (assume for simplicity that $n$ is divisible by $3$) we get that some node in the derivation tree of any $w \in P$ generates a subword $x(w)$ whose size satisfies $$n/3 \leq |x(w)| < 2n/3.$$ Denote by $s(w)$ the non-terminal generating $x(w)$ (it is the label of the node).
Since $|x(w)| \leq n$, all symbols in $x(w)$ are distinct: indeed, any subword of $w$ of length $n$ is a cyclic rotation of the original permutation (think of all subwords of length $3$ in $abcabc$: $abc,bca,cab,abc$).
Our strategy now will be to show that any single non-terminal can generate only a relatively small number of the $x(w)$; since $P$ is big, it will follow that $G'$ must have lots of non-terminals. More explicitly, let $$S = \{ s(w) : w \in P\}.$$ We can find a list of representative, distinct words $w_1,\ldots,w_{|S|}$ such that $$S = \{ s(w_i) : 1 \leq i \leq |S| \}.$$ We will show that for each $w_i$, the set of words $w \in P$ with $s(w) = s(w_i)$ consists of at most $B$ words ($B$ will be determined below). It will follow that the number of distinct non-terminals is at least $$\frac{|P|}{B} = \frac{n!}{B}.$$
So let $w \in P$, and suppose that $s(w) = s(w_i)$ for some fixed $i$. What this means is that we can replace $x(w)$ in $w$ with $x(w_i)$ and still get a word in $L$. The word $w$ has a decomposition $w = lx(w)r$. Every symbol appearing in $x(w)$ appears exactly once in $lr$, and every other symbol appears exactly twice in $lr$. Note that this defines the word $x(w)$ up to permutation, i.e. since we assumed that $lx(w_i)r \in L$, then $x(w_i)$ must be a permutation of $x(w)$ (more accurately, the set of (unique) symbols occurring in $x(w),x(w_i)$ must be the same).
We will now bound the number of words in $P$ that have a subword of length $x(w_i)$ which is a permutation of $x(w_i)$. Any such word $w$ can be broken as $lyzr$ where $yz$ is a permutation of $x(w_i)$ and $|ly|=|zr|=n$; in fact, since $w \in P$ then $ly = zr$. There are $|x(w_i)|!$ possible choices for $yz$. Together, these define $|x(w_i)|$ points of the permutation $\pi(\Sigma) = ly$. For the rest of the points we have $(n-|x(w_i)|)!$ possible choices. Finally, $yz$ can start in at most $n$ "cyclic" locations inside $\pi$, and so $$B \leq n|x(w_i)|!(n-|x(w_i)|)!.$$ When is this expression maximized, as a function of $\alpha = |x(w_i)|$? Taking the logarithm of the bound, $$\log B \leq \log 2n + \log \alpha! + \log (n-\alpha)!.$$ The factorial function (in fact, the Gamma function) is log-convex, and so subject to the condition $n/3 \leq \alpha \leq 2n/3$, this expression is maximized for $\alpha=n/3$ or $\alpha =2n/3$ (the value is the same). That is $$B \leq n(n/3)!(2n/3)!.$$
Plugging the value of $B$ that we have laboriously estimated, we conclude that the number of different non-terminals in $G'$ must be at least $$\frac{|P|}{B} = \frac{n!}{n(n/3)!(2n/3)!} = \frac{1}{n} \binom{n}{n/3}.$$ Now we will use Stirling's approximation $$m! \sim \sqrt{2\pi m} (m/e)^m.$$ Plugging the approximation, we get
$$\frac{n!}{2n(n/3)!(2n/3)!} \sim \frac{\sqrt{2\pi n}(n/e)^n}{2n\sqrt{2\pi(n/3)}(n/3e)^{n/3} \sqrt{2\pi(2n/3)}(2n/3e)^{2n/3}}.$$ All powers of $n$ and $e$ cancel, and we're left with the following $$\frac{n!}{n(n/3)!(2n/3)!} \sim \frac{1}{n\sqrt{4\pi n/9}} 3^{n/3} (3/2)^{2n/3} = \Omega\left(n^{-3/2} c^n \right),$$ where the constant $c$ is $$c = 3^{1/3} (3/2)^{2/3} = \frac{3}{2^{2/3}} > 1.$$
In particular, the original grammar $G$ must be of size at least $\Omega(n^{-3/4} \sqrt{c}^n)$.
Exactly 0 or 2 occurrences
Another questioned concerned the language $L'$ of all words which contain exactly $0$ or $2$ occurrences of each symbol in $\Sigma$. The proof given in the preceding section applies equally well to $L'$ since the set of words $P$ we used belongs to $L'$; the rest of the proof uses the fact that certain words are not in $L$, and since $L \supset L'$, they are a fortiori not in $L'$.
Generalization
We have seen that the lower bound holds for both $L$ and $L'$. When does the proof hold for a language $\Lambda$? We used only two properties of $\Lambda$:
- $\Lambda$ contains $P$.
- If $w_1,w_2 \in P$, $x_1,x_2$ are the subwords of $w_1,w_2$ chosen using the subword lemma, and $w_1(x_1:=x_2),w_2(x_2:=x_1) \in \Lambda$ (i.e. the word resulting by replacing $x_1$ with $x_2$ in $w_1$, and the word resulting by replacing $x_2$ with $x_1$ in $w_2$) then $x_2$ is a permutation of $x_1$.
When will the second condition fail? The words $x_1,x_2$ both have no repeated symbols. If they are not permutations of each other then $w_1(x_1:=x_2)$ will have a symbol repeated $1$ or $3$ times. So words containing such symbols must lie out of $\Lambda$. Considering $w_2(x_2:=x_1)$, we see that it is enough to exclude one of the possibilities. We get the following theorem.
Theorem. Suppose $\Lambda$ is a language containing $P$ in which no word contains any symbol exactly once, or in which no word contains any symbol exactly $3$ times. Then any grammar for $\Lambda$ must have size $\Omega(n^{-3/4} C^n)$, where $C = \sqrt{3}/\sqrt[3]{2}$.
As an example, the theorem applies for the language in which the number of occurrences of each symbol lies in $S$ as long as $2 \in S$ and $1,3 \notin S$. For example, $L$ corresponds to $S = \{0,2,4,6,\ldots\}$ and $L'$ corresponds to $S = \{0,2\}$.
We leave the reader the following easy generalizations.
Theorem. Suppose $\Lambda$ is a language containing $\{ \pi(\Sigma)^\ell \}$ for some $\ell > 0$, in which no word contains any symbol exactly $\ell + \epsilon$ times, where $\epsilon$ is either $1$ or $-1$. Then any grammar for $\Lambda$ must have size $\Omega(n^{-3/4} C^n)$, where $C = \sqrt{3}/\sqrt[3]{2}$.
Original, condensed proof
Let $L$ be the language of words in which each symbol appears an even number of times. We give an exponential lower bound (in $n = |\Sigma|$) on the size of a Chomsky Normal Form (CNF) grammar for $L$. This implies an exponential lower bound on the size of any context-free grammar for $L$, since (according to Wikipedia) the blowup to CNF is at most quadratic.
Given any derivation of a word $w$ of length $m$ and any $\ell < m/2$ we can always find a non-terminal symbol in the derivation which derives a subword $x$ of $w$ of length $\ell \leq |x| < 2\ell$: start from the root of the derivation, and always pick the larger branch. The first time we hit a branch smaller than $2\ell$, it must be at least $\ell$ in size.
Consider the set of all $S = n!$ words in $L$ of the form $\pi\pi$ (all of size $2n$), where $\pi$ is a permutation of $\Sigma$. For each such word $w$ we can find a symbol $s_w$ generating a subword $x_w$ of size $n/3 \leq |x_w| < 2n/3$. Note that all symbols in $x_w$ are distinct. Therefore if $s_u = s_v$ then $x_u,x_v$ must be permutations of each other.
Given $w$, for how many words $u$ can it be that $s_w = s_u$? To get from $w$ to $u$ we need to permute $x_w$, to permute the rest of the permutation, and to choose the starting location of $x_u$ within the permutation generating $w_u$. All in all, the number of possibilities is at most $$M = n|x_w|!(n-|x_w|)! \leq n(n/3)!(2n/3)!.$$ Therefore the number of distinct symbols in the grammar is at least $$\frac{S}{M} = \frac{n!}{n(n/3)!(2n/3)!} = O\left(\frac{c^n}{n^{1.5}}\right),$$
where $c = 3/2^{2/3} \approx 1.88988$ (thanks to Wolfram Alpha).