5
$\begingroup$
  1. Can a CFG over large alphabet describe a language, where each terminal appears even number of times?

  2. If yes, would the Chomsky Normal Form be polynomial in |Σ| ?

EDIT: What about a language where each terminal appears 0 or 2 times?

EDIT: Solving this may give interesting extension to "Restricted read twice BDDs and context free grammars"

Find a context-free grammar (CFG) for the language L of all words such that each terminal in a word occurs even number of times over a possibly large alphabet Σ

My long aproach is (the only nonterminal is S):

S ⟶ ε | SS

x ∈ Σ : S ⟶ xSx

x,y ∈ Σ : S ⟶ xxSyy | yySxx | xySxy | xySyx | yxSyx | yxSxy

Second try, incremental approach.

S ⟶ ε | SS
Terminal productions (suboptimal): x,y ∈ Σ : S ⟶ xx | yy |xxyy | yyxx | xyxy | xyyx | yxyx | yxxy
Incremental: x ∈ Σ : S ⟶ SxSxS

By the way, the homework is over now.

Technically it wasn't a homework assigned to me. I don't know the answer and thought the question was easy. Best.

EDIT: I didn't award the bounty :-)

  • 0
    Please specify the desired language precisely. I have a different idea that uses only regular rules. How would you build an automaton that recoginizes the language?2010-12-14
  • 0
    Each terminal in a word must appear even number of times. I wouldn't try regular expressions because palindromes are in the language. I need this for a graph problem.2010-12-14
  • 1
    `abcabc` isn't in your language, I think!2010-12-14
  • 0
    "abcabc isn't in your language, I think!" According to a parser it is...2010-12-14
  • 3
    Whoops! Have I messed up? The proof outline I had in mind went as follows: none of the second and third productions matches up (as `a` != `c` and `ab` uses different letters than `bc`) so the starting production would have to be `S -> SS`. But each `S` will eventually produce each terminal an even number of times. Chopping `abcabc` up as `ab`/`cabc` or `abca`/`bc` won't work either, for obvious reasons. What sequence of productions did your parser find?2010-12-14
  • 0
    @User4750: could you sketch the parse sequence for that? I also don't see it.2010-12-14
  • 0
    @all: Thank you! It may be a parser bug (java cfgto) - if someone is interested I can upload the CNF somewhere in about 14 hours.2010-12-14
  • 0
    Please do, I'm getting curious by now! You may also want to look at the third set of productions, I think alternatives 1 & 2 and 3 & 5 & 6 are the same. What you want for this grammar is `S -> xxSyy | xySxy | xySyx` since you can swap `x` and `y` around. (I suppose you meant to put `yxSxy` in there somewhere :)2010-12-14
  • 1
    Sorry, I can't make much sense out of that... For the alphabet `{a,b,c}` you need the following grammar, if you follow Carl's construction: `S -> eps | aT | bU | cV [], T -> aS | bW | cX [a], U -> aW | bS | cY [b], V -> aX | bY | cS [c], W -> aU | bT | cZ [ab], X -> aV | bZ | cT [ac], Y -> aZ | bV | cU [bc], Z -> aY | bX | cZ [abc]` In brackets behind each non-terminal is the set of terminals that has occurred an odd number of times.2010-12-15
  • 0
    Instead of joking around with a parser, you might as well proof correctness.2010-12-15
  • 0
    I will bet my construction is wrong. What I dislike in Carl's construction is being exponential :-( Thank you all!2010-12-15
  • 1
    Alas! Your second try is also not working, again with `abcabc` as counterexample. The second production can't be used since there are no substrings matching those patterns. For the third, x is either `a` but then S needs to produce `bc` twice *independently!* , `b` but then S needs to produce `a` which can't be done, or `c` which has the same problem as `a`. My gut feeling is it can't be done, but I can't back that up...2010-12-15
  • 1
    Of course, if you are just looking for a parser you can as well use full power (TM/RM) and recognize the language in time $\mathcal{O}(n)$ with $k$ bits in memory.2010-12-16
  • 0
    Can someone recommend reading with examples for dummies what poly-size CFGs can describe over large alphabet?2010-12-16
  • 0
    What about a language where each terminal appears 0 or 2 times?2010-12-16
  • 0
    For 0 or 2 times, probably using a pumping lemma (with bounds depending on the size of the grammar) would show that the grammar needs to be extremely verbose.2010-12-20
  • 0
    @yatima2975 I will appreciate help in quality assurance of positive answers for the bounty.2010-12-21
  • 0
    If someone knows better forum he/she or I may ask there. I am surprised the way I earn reputation points here. I give email on my profile. Help on improving the quality of the question is appreciated.2010-12-22
  • 0
    @jerr18: I will look at Yuval's proof more closely, but it's Christmas as well :) OTOH, Christmas is boring and my brain could use some stimulation...2010-12-26
  • 0
    @yatima2975 Merry Christmas Eyeryone!2010-12-26
  • 0
    _Technically_ it wasn't a homework assigned to me. I don't know the answer and thought the question was easy.2010-12-26

5 Answers 5