How many sentences (actual or possible) are there in the English language? And of these, what’s the longest?
Actually, those are trick questions. Because in fact there’s an infinite number of possible sentences in English—and none of them is the longest! The same is true (arguably) in every human language.
In this post, we’ll do two things. (i) First, we’ll explore some examples that demonstrate how this can be so. (ii) And second, we’ll look at what this tells us, in broad strokes, about human language.
Note: this is the first in a series of five posts by Dr. Dexterous on the theme ‘Capturing Infinity in Natural Language’.
One pattern that demonstrates the infinity of human language is what linguists call ‘sentence embedding‘. This is when you embed one sentence inside of a larger sentence (optionally preceded by that), as in:
- I know. → You said (that) I know.
- You said ‘hello’. → I know (that) you said ‘hello’.
Embedding is not restricted to one level. In fact, you can combine multiple embeddings, to make increasingly long sentences. For example:
I know. → You said (that) I know. → I believe (that) you said (that) I know. → Mary thinks (that) I believe (that) you said (that) I know. → John knows (that) Mary thinks (that) I believe (that) you said (that) I know. → …
The interesting thing about this pattern is that there is no linguistic limit to how many times you can embed. Even with a finite vocabulary, you can continue as long as you want. Eventually non-linguistic factors (boredom, the limits of memory, death, etc.) will stop you—but linguistically, it’s unlimited. If you could live for ever, you could keep saying one sentence forever. Weird!
To see for yourself, try extending the sentence below. However long you make it, you’ll find that you never ‘hit a wall’ where the language forces you to stop—though you will eventually have to stop to go to the bathroom! (Click the buttons below to extend the sentence.)
Length: 2 Words
Nested Relative Clauses
Another pattern that demonstrates natural language infinity is what linguists call ‘relative clause nesting‘.
A relative clause is a special type of embedding, where you use the embedded sentence to modify a noun, like this:
- The catN came back. → The catN [(that) you fed yesterday]RC came back.
- The ratN ran away. → The ratN [that saw the cat ]RC ran away.
Relative clauses have a different structure from the previous embedding pattern, but RC’s can also be ‘stacked’ or ‘nested’ multiple times. For example:
I saw the catN. → I saw the catN [that ate the rat]RC. → I saw the cat N [that ate the ratN [that swallowed the fly]RC]RC.→ …
Again, there are no linguistic limits to how far you can go with this pattern—try it yourself, by extending the sentence below. (Click the buttons!)
This is the cat.
Length: 4 Words.
Note: The nursery rhyme ‘This is the House that Jack Built‘ is a famous example of nested relative clauses.
What this tells us about human language
This stuff may seem abstract, but in fact these ‘infinite sentences’ reveal some important truths about human language, including the following three points.
(1) A human language is not a list. –
When you learned your language, you didn’t just learn a list of all the possible sentences. You couldn’t have: the list would have to be infinite, yet your mind is (sorry, dude) finite. It follows that language must be more than just a list.
(2) Human language is based on generative rules. –
If you didn’t learn a list, a corollary is that you must have learned some set of rules that lets you generate your own creations. It’s like Leggo: the pieces fit together only in certain ways, but you can still be creative.
(3) This generative system is finite, yet powerful enough to capture infinity. –
You can only know a fixed number of generative rules; yet, this system of rules must somehow produce the unlimited complexity illustrated above. Therefore, if we want to understand how human language works, we must look for a finite set of rules that captures infinite sequences.
These observations are the starting point for one of the most influential theories in modern linguistics: a theory of word order and sentence structure that linguists call ‘generative grammar‘.
In the next four posts (2, 3, 4, and 5) in this series, we will explore generative grammar with examples, and see how it captures the infinity of natural language:
- Post 1 – World’s Longest Sentence? (Introduction, this post)
- Post 2 – The Power of Grammatical Categories
- Post 3 – The Infinite Power of Generative Rules (Coming soon)
- Post 4 – The Power of NP (Coming in while)
- Post 5 – VP, S, and Infinite Syntax (Coming last in the series)
Talk like a linguist! – Many linguists user the term ‘recursion‘ or ‘recursive structures’ to refer to multiple embeddings and similar ‘looping’ patterns. Some mathematicians (who use the term ‘recursion’ slightly differently) think this is a loose way of talking, but it is common linguistic usage.
All human languages appear to have recursion and recursive structures (in the loose linguistic sense above). There is some debate about one little-studied language, called Pirahã (a language from the Amazon region); but actually it looks like even Pirahã has multiply-embedded structures. See here for a good discussion of the issues.
Certain patterns of strings (such as strings of letters or numbers) are potentially infinite in kind of a simple way. For example, if we have the rule ‘add b at the end‘, we can generate a potentially infinite sequence like this:
a → ab → abb → abbbb → …
Human language infinity is not so easy: our rules can add multiple items at once, and in non-adjacent orders—and that is what makes the puzzle so challenging.
- Chomsky, Noam. Syntactic Structures. The Hague: Mouton, 1957. Print.
- Chomsky, Noam. “On Certain Properties of Formal Grammars.” Information and Control 2 (1959): 137-67.
- Lasnik, Howard, Marcela A. Depiante, and Arthur Stepanov. Syntactic Structures Revisited: Contemporary Lectures on Classic Transformational Theory. Cambridge, MA: MIT, 2000. Print.
Chomsky’s Syntactic Structures was one of the seminal works in generative grammar. If you’re looking for an overview of the ideas, Lasnik, Depiante, and Stepanov’s book provides a fairly accessible technical introduction to the issues and formalism.