Every now and then, people ask about the size of vocabulary in different languages, or propagate some opinions on such issue. Quite often the discussion revolves around a claim that English has the largest vocabulary.
The correct answer is that the question is meaningless in its general formulation (as above), in the strict sense that it contains words that are too vague in this context (“language” and “vocabulary” and “largest” and “have”). Therefore, although the grammatical form is that of a question, no question is really posed.
The question is convertible to meaningful and possibly answerable questions in many ways. None of them is really more obvious or natural than the others, unless you essentially think in terms of one language or one language type, such as isolating languages. Thus the result is a set of questions that do not have much in common and are rather uninteresting at least to people who wish to rank languages by vocabulary size.
Measuring is one problem of course. In particular, a language is not the same thing as a dictionary, or the best dictionary (whatever you mean by “best”), or even the collection of all dictionaries. A language is something that is used by people in many ways and forms, and dictionaries record just a small part thereof.
But the more fundamental problem is how you define what you are trying to measure. What does “vocabulary size” mean? We might try to arrive at a consensus on counting only root words, or counting only base forms (ignoring all flexion), or counting the words in current use (in some sense). Restricting us to root words only is an obvious way of making isolating or nearly isolating languages have more words than they actually have, as compared with other types of languages. To put it in another words, if a language has little or no tools for creating new words from its existing words (using, say, suffixes or productive flexion), it is obvious that it has to “borrow” words from other languages. This makes its glossary of “root words” larger, although the “root-words” are roots only relative to that language; “international” is a root word in English, but it is based on a (neo-)Latin words with quite a few morphemes in it.
If you pick up a large general dictionary of English, it will most probably contain the words “fiancÚ” and “sauna” and “vice versa”. Now are these actually English words, or French, Finnish, and Latin words (perhaps pronounced in a weird way by English-speaking people), which are just often used in writing and speech that is otherwise in English? If you say they’re English, well, I might say that similarly any English word that Germans often use (say, “computer”, perhaps spelled “Computer”) is a German word. You get the idea? Foreign words can be taken into temporary use at least, and there is a continuous spectrum from such usage to fully adapted and adopted loan words.
I will conclude with a proof that Finnish has an infinite number of words. In Finnish, there is a derived word for any numeral, corresponding in meaning the words in the sequence simple, double, triple, etc. You take the numeral, make it one word, and append the word -kertainen possibly after some changes to the stem. Thus from tuhat viisi ‘1005’ we get tuhatviisikertainen. And generally, there is the sequence of numerals yksinkertainen, kaksinkertainen, kolminkertainen and so on – literally ad infinitum.
Someone probably argues that I just proved a ‘potential’ infinity, not an actual one. But this is really irrelevant to answering the question under discussion. What matters is that if you make any quantified claim, saying that language X has N words, I can easily construct a set of Finnish words, containing surely more than N words. And this proves little about Finnish; there are similar examples in any sufficiently synthetic language. What this proves is that the question “Which language has largest vocabulary?” is pointless.