4
$\begingroup$

Can a polynomial size CFG over large alphabet describe any of these languages:

  1. Each terminal appears $0$ or $2$ times
  2. Word repetition $\{www^* \mid w \in \Sigma^*\}$ (word repetition of an arbitrary word $w$)

"Polynomial size" Context Free Grammar is defined as Chomsky Normal Form polynomial in $|Σ|$.

References to poly-size CFGs over large alphabets will be appreciated.

EDIT: This is not a homwork. Positive answer can give extension to Restricted read twice BDDs and context free grammars

  • 0
    Crosspost in cstheory: http://cstheory.stackexchange.com/questions/3844/can-a-polynomial-size-cfg-over-large-alphabet-describe-any-of-these-languages2010-12-17

1 Answers 1

1

The language described in (2) is not context free. Take the word $a^nba^nba^nb$ for $n$ large enough, and use the pumping lemma to get a word not in the language.

As for (1), consider a grammar in CNF (Chomsky Normal Form) generating that language. Let $n = |\Sigma|$. Consider all $S=(2n)!/2^n$ saturated words containing each character twice. Looking at the tree generating each of these words $w$, there must be a symbol $A_w$ which generates a subword $s_w$ of length between $1/3$ of $2/3$ of the entire word (of length $2n$). Suppose $A_u = A_v$. Then $|s_u| = |s_v|$ and $s_u = s_v$ up to permutation, since otherwise we can replace one occurrence by the other to get a word not in the language.

How many times can a given $A_u$ appear? Equivalently, in how many saturated words does $s_u$ appear, up to permutation, somewhere in the word? Any such word can be obtained from $u$ by permuting $s_u$, permuting the rest of the word, any rotating the result, to a total of $2n|s_u|!(2n-|s_u|)! \leq 2n(2n/3)!(4n/3)!$. Therefore the number of different symbols in the language is at least $$ \frac{(2n)!/2^n}{2n(2n/3)!(4n/3)!} = O\left(\frac{c^n}{n^{1.5}}\right),$$ where $c = 9/4\sqrt[3]{2} \approx 1.78583$ (asymptotics courtesy of Wolfram Alpha).

According to Wikipedia, one can transform a grammar $G$ into a CNF grammar of size $O(|G|^2)$. Therefore this language requires exponentially many symbols for any grammar.

  • 0
    Thank you. Before accepting I will need some time.2010-12-21
  • 0
    For an alternative proof, see http://math.stackexchange.com/questions/14311/can-a-polynomial-size-cfg-over-large-alphabet-describe-a-language-where-each-ter.2010-12-25