RESOURCE HOME
The 6,318 Most Commonly-Used Words in English? Yes, more or less.

If you like to memorize vocabulary, these are the words you should begin with. 

Adam Kilgarriff , senior research fellow at the University of Brighton, UK, has prepared a list of words he calls the LEMMA. Computers analyzed the British National Corpus (BNC) which is100 million words from books, newspapers, magazines, TV, radio, etc. in the UK. The computers then produced the LEMMA which is a list of the 6,318 most commonly-used words from the sample of 100 million. The LEMMA is a valuable tool for those students who want to improve their English ability. You may have your own ideas or look at How To Use the LEMMA.

To read a longer explanation of all this, click here  or continue to read from MORE DETAILS below.

The list begins with "the" and you will see this information given:

1 6187267 the det 

1: The first number (1) means that this is the most often used word of the 100 million words that were analyzed.

6187267: This number tells you that "the" was used 6,187,267 times in the 100 million sample.

the: The word (the) is the most-used word in 100 million word sample.

det: The last word (det) tells you that "the" is a determiner. Determiners are also called "articles" and are usually "a," "an," and "the."

To see and PRINT OUT Professor Kilgariff's list click here. (The server may not respond sometimes. If so, try again in a few hours.)

To download a zip file of the list click here. 

MORE DETAILS

Each of the 6,318 words in the LEMMA had more than 800 occurrences in the BNC.

The following set of word classes was used:

conj (conjunction) 34 items were conjunctions
adv (adverb) 427 items were adverbs
v (verb) 1281 items were verbs
det (determiner) 47 items were determiners
pron (pronoun) 46 items were pronouns
interjection 13 items were interjections
a (adjective) 1124 items were adjectives
n (noun) 3262 items were nouns
prep (preposition) 71 items were prepositions
modal 12 items were modals
infinitive-marker 1 item was an infinitive marker: to

Words may appear more than once in the list. For example "help" will appear as a noun and as a verb. A word like "right" has four list entries, for adjective, adverb, interjection and noun. Only ten words appear more than three times. Words such as "helps," "helped" and "helping" were all counted as "help."

Numbers, names, and items that would usually be capitalized are not included. Only simple (non-hyphenated) words were used.

All spelling is British. The differences between British and US spelling are minor. Only sixty words—less than 1% of the LEMMA's 6,318—differ from US spelling and all are easily recognized by students who have studied only US spelling. Most of the spelling differences involved "z" and "s," for example: British "analyse" versus US "analyze." Another minor difference is that British spellling sometimes uses a "u" where US spelling does not. For example: British spelling: "behaviour" versus US spelling "behavior." Colour vs color is another. Another minor difference is the "r" ending. British spelling "litre" versus US spelling "liter." These differences are not likely to cause anyone much trouble.

Note: At least two items reflect colloquial speech. For example, item 625, "cos," is a short form of "because" which might also be spelled as "’cos." In the US this word would be spelled "cuz," "’cuz" or similarly. Students who have not studied British English may not know that "telly" is a short form of "television" that is widely used in the UK but not yet in the US."

Note: Creating such word frequency lists as the LEMMA is not an exact science. It could be argued that other lists will produce slightly different results. That is almost certainly true, but the key word is "slightly." When you analyze something as large as the BNC which has 100 million words, you can be quite sure that the words that come in the first 6,000 for frequency of usage are VERY commonly used words indeed.