Tag Archives: english

The battle over the way we should speak

On the increasing usage of improper English, Joan Acocella of The New Yorker notes:

English is a melding of the languages of the many different peoples who have lived in Britain; it has also changed through commerce and conquest. English has always been a ragbag, and that encouraged further permissiveness. In the past half century or so, however, this situation has produced a serious quarrel, political as well as linguistic, with two combatant parties: the prescriptivists, who were bent on instructing us in how to write and speak; and the descriptivists, who felt that all we could legitimately do in discussing language was to say what the current practice was.

But the most curious flaw in the descriptivists’ reasoning is their failure to notice that it is now they who are doing the prescribing. By the eighties, the goal of objectivity had been replaced, at least in the universities, by the postmodern view that there is no such thing as objectivity: every statement is subjective, partial, full of biases and secret messages. And so the descriptivists, with what they regarded as their trump card—that they were being accurate—came to look naïve, and the prescriptivists, with their admission that they held a specific point of view, became the realists, the wised-up.

Source: New Yorker

I guess that will make me closer to a descriptivist since I think there’s nothing wrong with Singlish.

Longest word in the english dictionary is…

“Antidisestablishmentarianism”. It is a political position that originated in nineteenth-century Britain, in opposition to proposals to remove the Church of England’s status as the state church of Ireland and Wales (Source: Wikipedia).

Well, it’s the longest word that is not a scientific name. The longest word in the English dictionary, as recorded by Guinness Book of World Records, is “pneumonoultramicroscopicsilicovolcanoconiosis” at 45 characters. “pneumonoultramicroscopicsilicovolcanoconiosis” is used to refer to a lung disease caused by the inhalation of very fine silica dust, causing inflammation in the lungs. Oh a medical term. And you know why doctors scribble? They couldn’t spell.

List of stop words

Stop words sometimes known as stopwords or Noise Words (in the case of SQL Server), is the name given to words which are filtered out prior to, or after, processing of natural language data (text). Hans Peter Luhn, one of the pioneers in information retrieval, is credited with coining the phrase and using the concept in his design. It is controlled by human input and not automated. This is sometimes seen as a negative approach to the natural articles of speech as mentioned above. (Source: Wikipedia)

Here’s a list of stop words, it’s compiled from Mark Sanderson’s Information Retrieval linguistic utilities stop words list. It has been formatted to a PHP array for easy use:

var $stop_words = array("a", "about", "above", "across", "after", "afterwards", "again", "against", "all", "almost", "alone", "along", "already", "also", "although", "always", "am", "among", "amongst", "amoungst", "amount", "an", "and", "another", "any", "anyhow", "anyone", "anything", "anyway", "anywhere", "are", "around", "as", "at", "back", "be", "became", "because", "become", "becomes", "becoming", "been", "before", "beforehand", "behind", "being", "below", "beside", "besides", "between", "beyond", "bill", "both", "bottom", "but", "by", "call", "can", "cannot", "cant", "co", "computer", "con", "could", "couldnt", "cry", "de", "describe", "detail", "do", "done", "down", "due", "during", "each", "eg", "eight", "either", "eleven", "else", "elsewhere", "empty", "enough", "etc", "even", "ever", "every", "everyone", "everything", "everywhere", "except", "few", "fifteen", "fify", "fill", "find", "fire", "first", "five", "for", "former", "formerly", "forty", "found", "four", "from", "front", "full", "further", "get", "give", "go", "had", "has", "hasnt", "have", "he", "hence", "her", "here", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "him", "himself", "his", "how", "however", "hundred", "i", "ie", "if", "in", "inc", "indeed", "interest", "into", "is", "it", "its", "itself", "keep", "last", "latter", "latterly", "least", "less", "ltd", "made", "many", "may", "me", "meanwhile", "might", "mill", "mine", "more", "moreover", "most", "mostly", "move", "much", "must", "my", "myself", "name", "namely", "neither", "never", "nevertheless", "next", "nine", "no", "nobody", "none", "noone", "nor", "not", "nothing", "now", "nowhere", "of", "off", "often", "on", "once", "one", "only", "onto", "or", "other", "others", "otherwise", "our", "ours", "ourselves", "out", "over", "own", "part", "per", "perhaps", "please", "put", "rather", "re", "same", "see", "seem", "seemed", "seeming", "seems", "serious", "several", "she", "should", "show", "side", "since", "sincere", "six", "sixty", "so", "some", "somehow", "someone", "something", "sometime", "sometimes", "somewhere", "still", "such", "system", "take", "ten", "than", "that", "the", "their", "them", "themselves", "then", "thence", "there", "thereafter", "thereby", "therefore", "therein", "thereupon", "these", "they", "thick", "thin", "third", "this", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "top", "toward", "towards", "twelve", "twenty", "two", "un", "under", "until", "up", "upon", "us", "very", "via", "was", "we", "well", "were", "what", "whatever", "when", "whence", "whenever", "where", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "whoever", "whole", "whom", "whose", "why", "will", "with", "within", "without", "would", "yet", "you", "your", "yours", "yourself", "yourselves");

And here is a list of Google stop words, I can’t recall where I got this from but there’re numerous sites with such information. Once again formatted in a PHP array which you can quite easily convert to Java array:

var $google_stop_words = array("I" ,"a" ,"about" ,"an" ,"are" ,"as" ,"at" ,"be" ,"by" ,"com" ,"de" ,"en" ,"for" ,"from" ,"how" ,"in" ,"is" ,"it" ,"la" ,"of" ,"on" ,"or" ,"that" ,"the" ,"this" ,"to" ,"was" ,"what" ,"when" ,"where" ,"who" ,"will" ,"with" ,"und" ,"the" ,"www");

This is useful for filtering out common words in an English paragraph that may be deemed insignificant. This is one of the things I used to implement something like a tag discoverer based on word frequency.

I want to go MAKДOHAЛД’C

Ever wonder how to pronounce Russian words like: компью́тер, студе́нт, па́спорт? Actually they aren’t that hard as Gadling teaches you to read the Cyrillic alphabet in 5 minutes. I took longer.

And that’s I want to go McDoanald’s in the title.

By the way, why is it “pronounce” but “pronunciation”?

Obama campaign introduces Al the shoesalesman

This is a brilliant ad by the Obama campaign. For those of you who ain’t familiar with what’s going on, American politics is really interesting. McCain-Palin (Republicans) brought in phrases into American newspapers such as “hockey moms”, “Joe Six Pack” and “Joe the plumber”. These phrases are used to stereotype the typical American.

The thing that got me interested in politics is not the results the politicians are going to deliver. After all, staying thousands of miles away from the USA makes little difference on who’s elected anyway. What made me look at politics is the speeches, or more precisely, the ingenious use of the English language to reach people emotionally.

Introducing characters is just one way of doing so. As stupid as these phrases sound, people actually remember them. You can laugh at time but as long as you talk about it (even in a negative way), you are spreading the point of the politicians indirectly.

I think of these characters as stock characters (in the theater arts way) as they’re recycled time and again for every election. And politicians would just rebrand them in some little ways to make them sound new again.

McCain-Palin campaign has numerous such characters. I’m sick of them but I still laugh at them (alone, since no one bothers about US in Singapore). Anyway, here’s one endorsed by the Obama campaign:

Obama campaign introduces Al the shoesalesman

Find out your tax cut under Barack’s plan at http://taxcut.barackobama.com whether you’re single or married with children.

Previously John McCain repeated mentioned Joe the Plumber during his speeches, claiming he is a concern citizen who prefers the McCain tax plans.

Just to digress

For those people who knows the location of my other blog, it’s a tough decision if I want to put this post in this blog or that which is rather US. In the end I figured I should put it here since I want this blog to have more of my opinion. The other blog is visited by McCain supporters and they blast me even when I post a video that’s pro-Obama. That’s freedom of expression for me I guess.

And speaking of “plumber”, Uzyn corrected me on my pronunciation. I had always been pronouncing it as “plumb-ber”. Read it wrong for many years. “Plum-er,” he corrected me.

What English sounded like to Japanese

This is interesting. It’s a recording of a Japanese kid trying to sound like he is speaking English. He speaks in what he thinks English sound like.

I hear people trying to speak inventive Tamil language before so hears how others think English sounded like.

This is what English REALLY sounds like