In
syntactic theory in
linguistics, sentences are built up from smaller units embedded in a hierarchy. A central embedding is one that places an element inside another, rather than at one side of it. A prime example of a complex central embedding is the
relative clause:
The house is green can have a qualifier attached to
house, such as
The house that Jack built is green.
That is no more difficult to understand than a peripheral embedding: This is the house becomes This is the house that Jack built. Native speakers instantly parse both complex sentences correctly. There is, however, a crucial difference. Embedding at the side can be continued indefinitely, without problem: This is the malt that lay in the house that Jack built, up to This is the cow with the crumpled horn, that tossed the dog, that worried the cat.... Listeners can just as easily process this as they can a lot of separate short sentences in a complicated text such as a novel.
But central embedding hits a cognitive bedrock after two levels. Any repeated embedding is either incomprehensible or extremely difficult.
(1) The man is a baker.
(2) The boy spoke to the man.
(3) The girl likes the boy.
(4) The man the boy spoke to is a baker. = (2) embedded in (1)
(5) The boy the girl likes spoke to the man. = (3) embedded in (2)
(6) *The man the boy the girl likes spoke to is a baker. = unacceptable embedding of (3) into (2) into (1)
The logic is easy. We can see what (6) is supposed to mean. But linguistically it's grossly unnatural and difficult, far more so than the entire rhyme
The House that Jack Built.
Note that this is a strictly linguistic inability. It is quite possible, once the logic or grammar has been explained, to work out logically what it's supposed to mean. It's just impossible to naturally understand it as meaning that. This is a contentious subject in syntax, and terminology hinges on it. Do we say these sentences are ungrammatical, or do we say that they're grammatical but unacceptable and/or incomprehensible? In Syntactic Structures (1957), Noam Chomsky called these sentences grammatical, because they clearly obey very simple generative rules of grammar. In that case this adds another class of sentence to those that are grammatical but unacceptable: such as the famous Colorless green ideas sleep furiously.
It also seems to show that our language faculty is different from other mental powers such as abstract reasoning. Our language can't produce or accept things that we can. Note that this sort of restriction is a big problem for a generically computational or cognitive approach to language processing, because it's trivially easy for a computational generator to embed to arbitrary depth. A true account of human language has to prohibit this possibility.
One kind of example sometimes quoted is quoted for the opposite reason: that seemingly incomprehensible sentences can be worked out as having a valid logical structure. The example Buffalo buffalo buffalo buffalo buffalo is explained under buffalo and similar ones occur under Grammatical and syntactic puzzles. But that exploits peculiar properties of the word buffalo for repetition. The central embedding problem is clearer if we avoid repetition.
The problem is not with the particular nouns and verbs. It's not with the fact that we're omitting the relative complementizer that either:
(7) *The man that the boy that the girl likes spoke to is a baker. = still unacceptable
Nor is it that we're rushing the words together and not allowing for
intonation. Adding intonational gaps doesn't help much:
(8) *The man, whom the boy, whom the girl likes, spoke to, is a baker. = still unacceptable
Here are two more odd things. A central embedding only one deep is eminently acceptable, and the embedded part can be very complicated:
(1) The man the boy spoke to is a baker.
(9) The man the boy with the dark hair spoke to is a baker.
(10) The man the boy with the dark hair spoke to last night after school is a baker.
(11) The man the boy with the dark hair spoke to last night after school about our perennial problem with the pastry not rising is a baker.
Now (11) takes a bit of
concentration: intonation and punctuation would help here, and you might get confused, but it doesn't seem plain
incomprehensible the way (6) to (8) do.
The other odd thing is that the embedding restriction only applies to very similar structures. (11) has lots of embedded pieces qualifying boy, but none of them echo the top level. Let's alter the grammatical roles slightly in the original triplet, and we'll see we get a much more acceptable structure:
(1) The man is a baker.
(2) The boy spoke to the man.
(3') The boy likes the girl. = (3) with roles swapped
(4) The man the boy spoke to is a baker. = (2) embedded in (1)
(5') The boy who likes the girl spoke to the man. = (3') embedded in (2)
(6') The man the boy who likes the girl spoke to is a baker. = probably acceptable embedding of (3') into (2) into (1)
Another cognitive limit in language is the triple negative.