The battle over the way we should speak

On the increasing usage of improper English, Joan Acocella of The New Yorker notes:

English is a melding of the languages of the many different peoples who have lived in Britain; it has also changed through commerce and conquest. English has always been a ragbag, and that encouraged further permissiveness. In the past half century or so, however, this situation has produced a serious quarrel, political as well as linguistic, with two combatant parties: the prescriptivists, who were bent on instructing us in how to write and speak; and the descriptivists, who felt that all we could legitimately do in discussing language was to say what the current practice was.

But the most curious flaw in the descriptivists’ reasoning is their failure to notice that it is now they who are doing the prescribing. By the eighties, the goal of objectivity had been replaced, at least in the universities, by the postmodern view that there is no such thing as objectivity: every statement is subjective, partial, full of biases and secret messages. And so the descriptivists, with what they regarded as their trump card—that they were being accurate—came to look naïve, and the prescriptivists, with their admission that they held a specific point of view, became the realists, the wised-up.

I guess that will make me closer to a descriptivist since I think there’s nothing wrong with Singlish.

Longest word in the english dictionary is…

“Antidisestablishmentarianism”. It is a political position that originated in nineteenth-century Britain, in opposition to proposals to remove the Church of England’s status as the state church of Ireland and Wales (Source: Wikipedia).

Well, it’s the longest word that is not a scientific name. The longest word in the English dictionary, as recorded by Guinness Book of World Records, is “pneumonoultramicroscopicsilicovolcanoconiosis” at 45 characters. “pneumonoultramicroscopicsilicovolcanoconiosis” is used to refer to a lung disease caused by the inhalation of very fine silica dust, causing inflammation in the lungs. Oh a medical term. And you know why doctors scribble? They couldn’t spell.

List of stop words

Stop words sometimes known as stopwords or Noise Words (in the case of SQL Server), is the name given to words which are filtered out prior to, or after, processing of natural language data (text). Hans Peter Luhn, one of the pioneers in information retrieval, is credited with coining the phrase and using the concept in his design. It is controlled by human input and not automated. This is sometimes seen as a negative approach to the natural articles of speech as mentioned above. (Source: Wikipedia)

Here’s a list of stop words, it’s compiled from Mark Sanderson’s Information Retrieval linguistic utilities stop words list. It has been formatted to a PHP array for easy use:

[code lang=”php”]var $stop_words = array(“a”, “about”, “above”, “across”, “after”, “afterwards”, “again”, “against”, “all”, “almost”, “alone”, “along”, “already”, “also”, “although”, “always”, “am”, “among”, “amongst”, “amoungst”, “amount”, “an”, “and”, “another”, “any”, “anyhow”, “anyone”, “anything”, “anyway”, “anywhere”, “are”, “around”, “as”, “at”, “back”, “be”, “became”, “because”, “become”, “becomes”, “becoming”, “been”, “before”, “beforehand”, “behind”, “being”, “below”, “beside”, “besides”, “between”, “beyond”, “bill”, “both”, “bottom”, “but”, “by”, “call”, “can”, “cannot”, “cant”, “co”, “computer”, “con”, “could”, “couldnt”, “cry”, “de”, “describe”, “detail”, “do”, “done”, “down”, “due”, “during”, “each”, “eg”, “eight”, “either”, “eleven”, “else”, “elsewhere”, “empty”, “enough”, “etc”, “even”, “ever”, “every”, “everyone”, “everything”, “everywhere”, “except”, “few”, “fifteen”, “fify”, “fill”, “find”, “fire”, “first”, “five”, “for”, “former”, “formerly”, “forty”, “found”, “four”, “from”, “front”, “full”, “further”, “get”, “give”, “go”, “had”, “has”, “hasnt”, “have”, “he”, “hence”, “her”, “here”, “hereafter”, “hereby”, “herein”, “hereupon”, “hers”, “herself”, “him”, “himself”, “his”, “how”, “however”, “hundred”, “i”, “ie”, “if”, “in”, “inc”, “indeed”, “interest”, “into”, “is”, “it”, “its”, “itself”, “keep”, “last”, “latter”, “latterly”, “least”, “less”, “ltd”, “made”, “many”, “may”, “me”, “meanwhile”, “might”, “mill”, “mine”, “more”, “moreover”, “most”, “mostly”, “move”, “much”, “must”, “my”, “myself”, “name”, “namely”, “neither”, “never”, “nevertheless”, “next”, “nine”, “no”, “nobody”, “none”, “noone”, “nor”, “not”, “nothing”, “now”, “nowhere”, “of”, “off”, “often”, “on”, “once”, “one”, “only”, “onto”, “or”, “other”, “others”, “otherwise”, “our”, “ours”, “ourselves”, “out”, “over”, “own”, “part”, “per”, “perhaps”, “please”, “put”, “rather”, “re”, “same”, “see”, “seem”, “seemed”, “seeming”, “seems”, “serious”, “several”, “she”, “should”, “show”, “side”, “since”, “sincere”, “six”, “sixty”, “so”, “some”, “somehow”, “someone”, “something”, “sometime”, “sometimes”, “somewhere”, “still”, “such”, “system”, “take”, “ten”, “than”, “that”, “the”, “their”, “them”, “themselves”, “then”, “thence”, “there”, “thereafter”, “thereby”, “therefore”, “therein”, “thereupon”, “these”, “they”, “thick”, “thin”, “third”, “this”, “those”, “though”, “three”, “through”, “throughout”, “thru”, “thus”, “to”, “together”, “too”, “top”, “toward”, “towards”, “twelve”, “twenty”, “two”, “un”, “under”, “until”, “up”, “upon”, “us”, “very”, “via”, “was”, “we”, “well”, “were”, “what”, “whatever”, “when”, “whence”, “whenever”, “where”, “whereafter”, “whereas”, “whereby”, “wherein”, “whereupon”, “wherever”, “whether”, “which”, “while”, “whither”, “who”, “whoever”, “whole”, “whom”, “whose”, “why”, “will”, “with”, “within”, “without”, “would”, “yet”, “you”, “your”, “yours”, “yourself”, “yourselves”);[/code]

And here is a list of Google stop words, I can’t recall where I got this from but there’re numerous sites with such information. Once again formatted in a PHP array which you can quite easily convert to Java array:

[code lang=”php”]var $google_stop_words = array(“I” ,”a” ,”about” ,”an” ,”are” ,”as” ,”at” ,”be” ,”by” ,”com” ,”de” ,”en” ,”for” ,”from” ,”how” ,”in” ,”is” ,”it” ,”la” ,”of” ,”on” ,”or” ,”that” ,”the” ,”this” ,”to” ,”was” ,”what” ,”when” ,”where” ,”who” ,”will” ,”with” ,”und” ,”the” ,”www”);[/code]

This is useful for filtering out common words in an English paragraph that may be deemed insignificant. This is one of the things I used to implement something like a tag discoverer based on word frequency.

I want to go MAKДOHAЛД’C

Ever wonder how to pronounce Russian words like: компью́тер, студе́нт, па́спорт? Actually they aren’t that hard as Gadling teaches you to read the Cyrillic alphabet in 5 minutes. I took longer.

And that’s I want to go McDoanald’s in the title.

By the way, why is it “pronounce” but “pronunciation”?

Obama campaign introduces Al the shoesalesman

This is a brilliant ad by the Obama campaign. For those of you who ain’t familiar with what’s going on, American politics is really interesting. McCain-Palin (Republicans) brought in phrases into American newspapers such as “hockey moms”, “Joe Six Pack” and “Joe the plumber”. These phrases are used to stereotype the typical American.

The thing that got me interested in politics is not the results the politicians are going to deliver. After all, staying thousands of miles away from the USA makes little difference on who’s elected anyway. What made me look at politics is the speeches, or more precisely, the ingenious use of the English language to reach people emotionally.

Introducing characters is just one way of doing so. As stupid as these phrases sound, people actually remember them. You can laugh at time but as long as you talk about it (even in a negative way), you are spreading the point of the politicians indirectly.

I think of these characters as stock characters (in the theater arts way) as they’re recycled time and again for every election. And politicians would just rebrand them in some little ways to make them sound new again.

McCain-Palin campaign has numerous such characters. I’m sick of them but I still laugh at them (alone, since no one bothers about US in Singapore). Anyway, here’s one endorsed by the Obama campaign:

Find out your tax cut under Barack’s plan at whether you’re single or married with children.


Previously John McCain repeated mentioned Joe the Plumber during his speeches, claiming he is a concern citizen who prefers the McCain tax plans.

Just to digress

For those people who knows the location of my other blog, it’s a tough decision if I want to put this post in this blog or that which is rather US. In the end I figured I should put it here since I want this blog to have more of my opinion. The other blog is visited by McCain supporters and they blast me even when I post a video that’s pro-Obama. That’s freedom of expression for me I guess.

And speaking of “plumber”, Uzyn corrected me on my pronunciation. I had always been pronouncing it as “plumb-ber”. Read it wrong for many years. “Plum-er,” he corrected me.

What English sounded like to Japanese

This is interesting. It’s a recording of a Japanese kid trying to sound like he is speaking English. He speaks in what he thinks English sound like.

I hear people trying to speak inventive Tamil language before so hears how others think English sounded like.

This is what English REALLY sounds like

Funny Chinglish – Translate server error

I think it’s great that everyone these days is making an effort to translate things to another language. Globalization’s really happening, so are mistranslations:

Translation server error in restaurant


Erm, it’s the thoughts that counts right?


English is evolving, nothing wrong with Singlish

Wired wrote something that got me tinking a bit. I’ll quote in excerpts, the full article is here. I’m more interested in the Singlish portions.

How English Is Evolving Into a Language We May Not Even Understand

An estimated 300 million Chinese — roughly equivalent to the total US population — read and write English but don’t get enough quality spoken practice. The likely consequence of all this? In the future, more and more spoken English will sound increasingly like Chinese.

It’s the 1.3 billion people can’t be wrong thing. If more Chinese speak in their Chinglish, they would be the majority. We can’t say the majority of English language speakers are speaking it wrongly, can we?

In Singaporean English (known as Singlish), think is pronounced “tink,” and theories is “tee-oh-rees.”

Dude, it’s Singapore English, not Singaporean English. I never heard of tee-oh-rees in Singapore anyway. Do we say that? I don’t tink so!

One noted feature of Singlish is the use of words like ah, lah, or wah at the end of a sentence to indicate a question or get a listener to agree with you. They’re each pronounced with tone – the linguistic feature that gives spoken Mandarin its musical quality – adding a specific pitch to words to alter their meaning. (If you say “xin” with an even tone, it means “heart”; with a descending tone it means “honest.”) According to linguists, such words may introduce tone into other Asian-English hybrids.

I haven’t thought of the ah, lah, loh stuff this way leh. To me, it was added to sound more casual and to fit in. If everyone doesn’t add this, no one would use it. It’s just to fit in. But our government launched a campaign to go against it – clearly not fitting in well enough.

And it’s possible Chinglish will be more efficient than our version, doing away with word endings and the articles a, an, and the. After all, if you can figure out “Environmental sanitation needs your conserve,” maybe conservation isn’t so necessary.

I tink we’re in some sort of transition. If the Chinese can end up standardizing English by bastardizing the current standard of English, so be it. We would see a bunch of English purist crying but hey, we switched old English to middle English to our modern English. Yeah, it took ages but today we are experiencing an acceleration on technology advancements, globalization etc.. Maybe we forgotten that language developments can accelerate too.

Welcome to post-modern Asianglish.

We use the f*ckhead pattern

This made me laugh, sorry I couldn’t reduce the length. Reducing any parts weakens the effect of the following:

Not to be Confused with the Abstract F*cktory Pattern

Recently I (a Java architect) and one of our IT managers were interviewing a guy for a Java developer position. He was pretty bright, but unfortunately English was his second (or perhaps even third) language.

It was the usual interview story, asking the guy to tell us about the system that he’s currently working on. He went on to describe in some detail their Hibernate persistence layer, how they used Spring, and how their business layer worked. I was very impressed with his knowledge.

Then he said “and this is where we used the f*ckhead pattern.”


Regaining my composure, I asked what was on everyone’s mind. “The what?”

“The f*ckhead pattern” was the response, enunciated perfectly.

…prolonged silence…

After taking a few moments to think of how to phrase the question, I finally asked “…would you mind spelling that for us?”

“F-A-C-A-D-E. F*ckhead.” he replied.

I had to bite my tongue not to laugh. You couldn’t make up stuff like this. Despite the hilarity of the situation, he was actually very sharp technically, so we ended up offering him the job. (Source: DailyWTF)