Have you seen this list of the longest words on dictionary.com? Pretty fun, huh!
Hah, yeah I tried to memorise a similar list when I was a kid.
What’s the deal with that first one, “Methionylthreonylthreonylglutaminylarginyl…”; why do people say scientific words are not real words?
Because if you let us scientists run free we’ll just try to game the system and “win” the biggest word.
Lol, how did you get to be so jaded about your own colleagues?
True story: I once read the following advice in the instructions for a grant application (still available online), which had strictly enforced word limits:
Below is a set of examples on how to maximise word count while referencing outcomes:
• Our lead compound kills breast cancer cells in an in vitro model – 12 words
• Our lead reduces MCF7 cell viability by 98% in vitro – 10 words
• Lead activity-in vitro (MCF7-Growth Inhibition): Emax-98%, EC50-10μM, 8-point titration – 9 words!
There you go - official Dept of Health advice was to fool their own word count software by removing spaces and replacing with hyphens. Microsoft Word counts those hyphenated phrases like “activity-in” as single words.
So how do the words on that dictionary.com list game the system?
OK, I’m going to refer to it as The List from now on. First word on The List: Methionylthreonylthreonylglutaminylarginyl…
The chemical name for the protein titin, which spans over 189 thousand letters, is often argued to be the longest word in the world. Its absurd length is due to the fact that proteins are named by combining the names of all of the individual amino acids used to form them. In the case of titin, its scientific name takes over three and a half hours to say out loud. Still, many people refuse to accept it as the world’s longest word because it is a scientific name.
The rules for making up scientific names for chemicals are set by an organisation called IUPAC. That chemical name above follows the rules and is therefore technically correct, but no scientist would actually use it - it’s almost unreadable.
What would they actually use instead?
Lol “titin” of course. Seriously though, two options: for simply storing the name in a database, each of the amino acids is abbreviated to a single letter. “Methionyl” becomes “M”, “threonyl” becomes “T”, so that ungodly 42-letter fragment above shrinks down1 to just five letters: MTTQA…
Much more readable by machines, and economical on storage space too.
Second option; expanding the amino acid abbreviations to three letters makes it a bit easier for humans to read, like this: Met-Thr-Thr-Gln-Arg-…
Stretched out like that there’s more room for annotations if you’re drawing it out (in part) on a whiteboard or something.
Hmmm…neither of those options look much like words anymore.
Exactly. You can take any datafile and express it in an English-like form, but if that makes it more difficult to understand then there’s not much point.
Proteins like titin are real things and have real names, but there are slightly fewer than 20,000 different proteins2 encoded in the average human genome - if we let the names of all of these count as words then the top 20,000 entries on The List are going to be pretty boring.
Tell me something fun about The List then!
The fun part for me is the rules themselves. I like seeing the ways that people try to game the system to make bigger and bigger words; you can smush any combination of letters together into a string, but only some combinations pass as acceptable words.
I feel like you’re about to talk about that word “agglutination” that you hinted in the title.
Yes! Few writers have the courage that James Joyce did, when he made up a bunch of 100-letter words from scratch for his novel Finnegans Wake. Most words on The List were formed by “agglutination”, or gluing existing word fragments together into a chain.
Oh, so all that stuff about proteins being chains of amino acids was an extended simile?
Thank you kindly for pointing that out. Yes, the rules for English allow you a big pile of different prefixes and suffixes, and you can just glue them onto other words to make them longer.
The whole English language is like a few different languages glued together, isn’t it?
Yeah, pretty much. We don’t mind using building blocks from other languages if they leave them lying around where we can get our hands on them. Bill Bryson used the word “trusteeship” as an example3 to illustrate the three major contributors to modern English - “trust” was supposedly from Old Norse,4 “-ee” from Old French, and “-ship” from Old English.
Languages other than English use agglutination though, right?
Yeah, definitely. A quick mention of German, which has some amazing compound nouns built up from shorter words; there I guess it’s technically “compounding” rather than agglutination, but it has the same effect. Words like “Donaudampfschifffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft” (80 letters) are acceptable under the rules of German,5 although again people argue that this example was just invented to win The List.
(From Wikipedia the word translates to “Association for Subordinate Officials of the Main Maintenance Building of the Danube Steam Shipping Electrical Services”, and while the parent organisation Donaudampfschiffahrtsgesellschaft (DDSG) was a real company, there is no evidence that the suborganisation ever existed)
And of course you’re going to mention Turkish now…
Thank you for the segue. Dictionary.com also has a list of the longest words in various other languages, featuring some truly magnificent entries. However, without speaking the languages I feel like I can’t appreciate the true beauty of most of these behemoths.
Turkish, though; I understand a few of the rules there. The Turkish entry on The List also has a page (in Turkish) on Wikipedia that explains how this word was constructed, so I reckon I can talk you through it.
Don’t keep us waiting! What’s the longest word in Turkish?
Well, Dictionary.com gets it wrong for a start; being short by four letters. Cut-and-pasted from Turkish Wikipedia, the word is: “Muvaffakiyetsizleştiricileştiriveremeyebileceklerimizdenmişsinizcesine” (70 letters). To be referred to as “The Word” from now on.
I pasted that into Google Translate and it didn’t even try…
Yeah, The Word doesn’t make a lot of sense out of context . Even though the rules of Turkish don’t forbid agglutinating even further, this one is pushing the boundaries of comprehensibility even for native speakers. The Dictionary.com definition is not too bad though: “As though you are from those whom we may not be able to easily make into a maker of unsuccessful ones.”
What the… how do I use that in a sentence?
In fact, one reason why this is popularly agreed to be the longest word in Turkish, even though lengthier agglutinated abominations are possible, is that the original inventors6 managed to use The Word in a sentence. Just barely though; it needs a full paragraph of backstory to set the scene (henceforth known as The Backstory):
Kötü amaçların güdüldüğü bir öğretmen okulundayız. Yetiştirilen öğretmenlere öğrencileri nasıl muvaffakiyetsizleştirecekleri öğretiliyor. Yani öğretmenler birer muvaffakiyetsizleştirici olarak yetiştiriliyorlar. Fakat öğretmenlerden biri muvaffakiyetsizleştirici olmayı, yani muvaffakiyetsizleştiricileştirilmeyi reddediyor, bu konuda ileri geri konuşuyor. Bütün öğretmenleri kolayca muvaffakiyetsizleştiricileştiriverebileceğini sanan okul müdürü bu duruma sinirleniyor, ve söz konusu öğretmeni makamına çağırıp ona diyor ki: "Demek muvaffakiyetsizleştiricileştiriveremeyebileceklerimizdenmişsinizcesine sözler söylüyorsunuz, öyle mi?"
OK, Google Translate does a little better with that. Still not great though.
Yes, we can do a better translation. In fact, I reckon I can lead you through the whole thing. The Backstory leads its intended audience of native speakers through four shorter versions of The Word, before dumping the full thing on them like a truckload of surplus syllables. We will have to go a bit slower than that, though.
To understand how The Word was constructed, remember that Turkish words begin with a root, onto which various suffixes are agglutinated.7 The root in this case is “muvaffakiyet”, an archaic word meaning “success”.
Is that cheating? The modern Turkish word for “success” is “başarı”, and it’s only half as long.
“Muvaffakiyet” is still in use, enough to be in the dictionary, so it counts. Lets start agglutinating!
The Backstory begins as follows:
Kötü amaçların güdüldüğü bir öğretmen okulundayız.
We are at a teacher’s school with bad intentions.
Like Slytherin from Harry Potter, you mean?
No, its a teacher’s school; where you graduate with a degree in Education. Or in this case, since they have bad intentions, you end up with a degree in Making Students Unsuccessful. In Turkish, this is:
Yetiştirilen öğretmenlere öğrencileri nasıl muvaffakiyetsizleştirecekleri öğretiliyor.
Trained teachers are taught how to make students unsuccessful.
Whoa, that’s a big jump from “muvaffakiyet” to “muvaffakiyetsizleştirecekleri”. Talk me through it?
The suffix “-siz” makes the negative, so “muvaffakiyetsiz” is the opposite of success, i.e. “failure”.
Next, if you know the word “birleşik” (“united”) you’ll appreciate what the “-leş” suffix does; “muvaffakiyetsizleş(mek)” means something like “to meet with failure”.
Then “-tir” is causative; the teachers are causing the failure. “-Ecek” makes it future tense, because the failure is not happening just yet, and sticking on the plural noun suffix “-leri” makes the whole thing into a noun: “those who…”.
“Muvaffakiyetsizleştirecekleri” is therefore the word for the graduates of this Evil Teacher’s School; “those who will cause (others) to meet with failure”. Not too bad for a native Turkish speaker to parse, but yeah, a bit of a jump for the rest of us.
Hmm, I’m still struggling with that.
The next sentence rephrases it slightly differently, which should help:
Yani öğretmenler birer muvaffakiyetsizleştirici olarak yetiştiriliyorlar.
In other words, teachers are trained to cause failure.
The suffix “-tir” is causative, and so a “-tirici” is a causative agent. Like in the word “yapıştırıcı” which means “glue; adhesive” (lit.: “make-together-causer”).8 “Muvaffakiyetsizleştirici” is therefore just another simpler word for a graduate of the Evil School; “one who causes (others) to meet with failure”.
Alright, I get it, I think. Next sentence?
To make this simpler, let’s introduce an abbreviation. That causative agent of others’ failure, the “Muvaffakiyetsizleştirici”, will henceforth be known as a “failmaker”.
The next sentence in The Backstory is:
Fakat öğretmenlerden biri muvaffakiyetsizleştirici olmayı, yani muvaffakiyetsizleştiricileştirilmeyi reddediyor, bu konuda ileri geri konuşuyor.
However, one of the teachers refuses to be a failmaker; that is he refuses the process of being made into a failmaker, and he discusses this back and forth.
This sentence introduces the second layer of “-leştir”. We’ve moved on from students who are being made into failures, to the meta-level above; of teachers being made into failmakers. The duplication of “-leştir” gives us the verb: “Muvaffakiyetsizleştiricileştir(mek)”, which means “to become a failmaker by the hand of another”.
(In the Backstory sentence above, this new verb is further agglutinated with “-ilme” which turns it from a verb into a noun. Like the difference between “to go to school” and “going to school”. This probably helps the explanation in Turkish, but its just a distraction for us working in English. Don’t worry about it too much.)
OK, final sentence. And it’s a big one.
Yes, we will have to break this into parts, or else it may break us. First part:
Bütün öğretmenleri kolayca muvaffakiyetsizleştiricileştiriverebileceğini sanan okul müdürü, […]
The principal of the school, who thinks that she can quickly and easily make all the teachers into failmakers, […]
This introduces a new suffix “-iver” which I had to look up in a textbook.9 It expresses “The (unexpected) speed with which an action is performed or the brief span of time in which an event takes place”. This meaning is reinforced by the additional use of the word “kolayca” (“easily”).
Then, “-iver” is further extended with “-ebilecek” (“will be able to”), to give the next iteration of The Word:
”Muvaffakiyetsizleştiricileştiriverebilecek”; meaning “will be able to quickly cause (another) to become a failmaker”.
We should also cover one more syllable here, which is to insert “-emey” to negate the previous version:
”Muvaffakiyetsizleştiricileştiriveremeyebilecek”; meaning “will be unable to quickly cause (another) to become a failmaker”.
This softens the impact of the big jump in syllables that the backstory is about to make.
Wow, you’re right; it jumps from 42 letters all the way to 70 letters!
Indeed, but at least the end is in sight, right? Continuing the sentence from where we left off:
[…] okul müdürü bu duruma sinirleniyor, ve söz konusu öğretmeni makamına çağırıp ona diyor ki: "Demek muvaffakiyetsizleştiricileştiriveremeyebileceklerimizdenmişsinizcesine sözler söylüyorsunuz, öyle mi?"
The Principal […] is growing angry at the situation, and calling the teacher in question to his office, says to him: “So you are saying things as if you are one of our ones that we will be unable to easily turn into a failmaker?”
Ok, deep breath! Firstly, we saw above how adding the plural suffix “-ler” turns it into a noun: “those who …”. Add on “-imiz” to demonstrate the Principal’s ownership (“those of ours who…”), and “-den” to make it genitive ablative10 (“came from”). String all those together and you get “-lerimizden…” (“coming from those of ours who …”). This brings us to: ”Muvaffakiyetsizleştiricileştiriveremeyebileceklerimizden”, or “coming from those of ours that we will be unable to easily turn into a failmaker”. Got that?
I think so. It needs a subject though.
Right. The subject is the teacher, that the Principal is speaking to in the second person, so the correct pronoun is “sen” (“you”); or rather “siniz” because she’s speaking formally.
And it’s two letters longer.
Of course! The remaining few syllables are whats known as a similative construction,11 telling you how some action was carried out. The suffix “-cesine” corresponds to “as if” in English, and so “-mişsinizcesine” maps to “as if you were…”.
There are some subtleties with “-miş” though. It’s not just a straightforward past-tense marker, but implies that the speaker didn’t witness the action personally, instead is reporting second-hand information or drawing conclusions. This adds to “-mişsinizcesine” the extra meaning of “it seems as if you were…”
I think that’s every suffix covered now. Put it all together for me?
So the Dictionary.com definition was pretty close, as I said:
muvaffakiyetsizleştiricileştiriveremeyebileceklerimizdenmişsinizcesine
As though you are from those whom we may not be able to easily make into a maker of unsuccessful ones
If I’ve got some of the subtleties correct above, I might instead go with: “seeming as though you were from those of ours whom we may not be able to easily make into a maker of unsuccessful ones”
Thanks, I think. I need to go lie down…
When you get back, check this one for me and see if it makes sense:
Bonus section: here are some suggestions (via Wikipedia) for how you might go even further.
If you set out to build the longest word, it makes sense to start with one of the longest words in Turkish dictionaries as the root.
Apparently, you don’t get many root words longer than about 20 letters, which curiously matches the longest words we regularly encounter in English (“uncharacteristically” being the most common example).
Wikipedia gives the longest words in Turkish dictionaries as:
kuyruksallayangiller (“wagtails” - a type of bird)
ademimerkeziyetçilik (“decentralisation”)
egzistansiyalizm (“existentialism”, obviously)
elektroensefalografi (“electroencephalography”; another easy one)
But for whatever reason, words agglutinated from those roots weren’t catchy enough, and they never appear on The List. For example, just directly swapping out “success” for “wagtails” gets you an 81-letter word that may follow the rules of Turkish, but you’ll struggle to get anyone to use it in conversation:
kuyruksallayangillersizleştiricileştiriveremeyebileceklerimizdenmişsinizcesinedir
seeming as though you were from those of ours whom we may not be able to easily make into a maker of ones without wagtails
Play around with them and modify the suffixes, then come up with a convincing sentence to illustrate their use. Then it’s just a matter of getting it to go viral, and you could win The List. Good luck!
And the “word” for titin shrinks down to 34,350 amino acid “letters”
At that link, the Human Proteome Project predicted in April 2023 that the human genome encoded 19,778 proteins, of which 18,379 (93.01%) were already discovered.
I read it in his book The Mother Tongue: English and How It Got That Way.
Looking this up just now on Wiktionary though, I see the etymology has changed:
“Long considered a borrowing from Old Norse traust (“confidence, help, protection”), itself from Proto-Germanic *traustą, but the root vocalism is incompatible, and now it's considered a reflex of an unattested Old English *trust, from a rare zero-grade proto-Germanic variant of the same root also attested in Middle High German getrüste (“host”).”
Whoever noticed that the German spelling reform of 1996 increased the letter count from 79 to 80 must have been pretty pleased with themselves.
No idea who to credit for inventing The Word, since it’s not mentioned in any of Wikipedia’s sources so far as I can see. The Word does seem to have evolved over time though, suggesting various people may have contributed at different stages.
This differs from English, which has both prefixes and suffixes; Turkish uses suffixes only.
That suggests “yapıştırma” as the Turkish word for “agglutination”, but I think for languages at least they might use “ekleme” (“adding”).
Gerjan van Schaaik’s The Oxford Turkish Grammar, 1st Ed., (2020). Highly recommended.
Apologies, I always confuse genitive and ablative. Edited 05 Aug 2023.
Thanks again to van Schaaik.