The “honest and reasonable mistake”

Regarding the incident where Howard Shaw pleaded guilty to having paid sex with an underage girl, I like how the lawyer puts it:

His lawyer Mr Harpreet Singh argued that Shaw had made an “honest and reasonable mistake.” He said that there was “no pre-meditation nor intention” as the offence was committed “unwittingly”. Mr Singh called for a non-custodial sentence as Shaw’s case “falls at or very near, the lowest end of the culpability spectrum”.

The former Singapore Environment Council executive director was among 48 men who were earlier charged with paid sex with the 17-year-old.

How words are learned

MIT researcher Deb Roy wanted to understand how his infant son learned language — so he wired up his house with videocameras to catch every moment (with exceptions) of his son’s life, then parsed 90,000 hours of home video to watch “gaaaa” slowly turn into “water.” Astonishing, data-rich research with deep implications for how we learn.

Deb Roy: The birth of a word

Longest word in the english dictionary is…

“Antidisestablishmentarianism”. It is a political position that originated in nineteenth-century Britain, in opposition to proposals to remove the Church of England’s status as the state church of Ireland and Wales (Source: Wikipedia).

Well, it’s the longest word that is not a scientific name. The longest word in the English dictionary, as recorded by Guinness Book of World Records, is “pneumonoultramicroscopicsilicovolcanoconiosis” at 45 characters. “pneumonoultramicroscopicsilicovolcanoconiosis” is used to refer to a lung disease caused by the inhalation of very fine silica dust, causing inflammation in the lungs. Oh a medical term. And you know why doctors scribble? They couldn’t spell.

List of stop words

Stop words sometimes known as stopwords or Noise Words (in the case of SQL Server), is the name given to words which are filtered out prior to, or after, processing of natural language data (text). Hans Peter Luhn, one of the pioneers in information retrieval, is credited with coining the phrase and using the concept in his design. It is controlled by human input and not automated. This is sometimes seen as a negative approach to the natural articles of speech as mentioned above. (Source: Wikipedia)

Here’s a list of stop words, it’s compiled from Mark Sanderson’s Information Retrieval linguistic utilities stop words list. It has been formatted to a PHP array for easy use:

[code lang=”php”]var $stop_words = array(“a”, “about”, “above”, “across”, “after”, “afterwards”, “again”, “against”, “all”, “almost”, “alone”, “along”, “already”, “also”, “although”, “always”, “am”, “among”, “amongst”, “amoungst”, “amount”, “an”, “and”, “another”, “any”, “anyhow”, “anyone”, “anything”, “anyway”, “anywhere”, “are”, “around”, “as”, “at”, “back”, “be”, “became”, “because”, “become”, “becomes”, “becoming”, “been”, “before”, “beforehand”, “behind”, “being”, “below”, “beside”, “besides”, “between”, “beyond”, “bill”, “both”, “bottom”, “but”, “by”, “call”, “can”, “cannot”, “cant”, “co”, “computer”, “con”, “could”, “couldnt”, “cry”, “de”, “describe”, “detail”, “do”, “done”, “down”, “due”, “during”, “each”, “eg”, “eight”, “either”, “eleven”, “else”, “elsewhere”, “empty”, “enough”, “etc”, “even”, “ever”, “every”, “everyone”, “everything”, “everywhere”, “except”, “few”, “fifteen”, “fify”, “fill”, “find”, “fire”, “first”, “five”, “for”, “former”, “formerly”, “forty”, “found”, “four”, “from”, “front”, “full”, “further”, “get”, “give”, “go”, “had”, “has”, “hasnt”, “have”, “he”, “hence”, “her”, “here”, “hereafter”, “hereby”, “herein”, “hereupon”, “hers”, “herself”, “him”, “himself”, “his”, “how”, “however”, “hundred”, “i”, “ie”, “if”, “in”, “inc”, “indeed”, “interest”, “into”, “is”, “it”, “its”, “itself”, “keep”, “last”, “latter”, “latterly”, “least”, “less”, “ltd”, “made”, “many”, “may”, “me”, “meanwhile”, “might”, “mill”, “mine”, “more”, “moreover”, “most”, “mostly”, “move”, “much”, “must”, “my”, “myself”, “name”, “namely”, “neither”, “never”, “nevertheless”, “next”, “nine”, “no”, “nobody”, “none”, “noone”, “nor”, “not”, “nothing”, “now”, “nowhere”, “of”, “off”, “often”, “on”, “once”, “one”, “only”, “onto”, “or”, “other”, “others”, “otherwise”, “our”, “ours”, “ourselves”, “out”, “over”, “own”, “part”, “per”, “perhaps”, “please”, “put”, “rather”, “re”, “same”, “see”, “seem”, “seemed”, “seeming”, “seems”, “serious”, “several”, “she”, “should”, “show”, “side”, “since”, “sincere”, “six”, “sixty”, “so”, “some”, “somehow”, “someone”, “something”, “sometime”, “sometimes”, “somewhere”, “still”, “such”, “system”, “take”, “ten”, “than”, “that”, “the”, “their”, “them”, “themselves”, “then”, “thence”, “there”, “thereafter”, “thereby”, “therefore”, “therein”, “thereupon”, “these”, “they”, “thick”, “thin”, “third”, “this”, “those”, “though”, “three”, “through”, “throughout”, “thru”, “thus”, “to”, “together”, “too”, “top”, “toward”, “towards”, “twelve”, “twenty”, “two”, “un”, “under”, “until”, “up”, “upon”, “us”, “very”, “via”, “was”, “we”, “well”, “were”, “what”, “whatever”, “when”, “whence”, “whenever”, “where”, “whereafter”, “whereas”, “whereby”, “wherein”, “whereupon”, “wherever”, “whether”, “which”, “while”, “whither”, “who”, “whoever”, “whole”, “whom”, “whose”, “why”, “will”, “with”, “within”, “without”, “would”, “yet”, “you”, “your”, “yours”, “yourself”, “yourselves”);[/code]

And here is a list of Google stop words, I can’t recall where I got this from but there’re numerous sites with such information. Once again formatted in a PHP array which you can quite easily convert to Java array:

[code lang=”php”]var $google_stop_words = array(“I” ,”a” ,”about” ,”an” ,”are” ,”as” ,”at” ,”be” ,”by” ,”com” ,”de” ,”en” ,”for” ,”from” ,”how” ,”in” ,”is” ,”it” ,”la” ,”of” ,”on” ,”or” ,”that” ,”the” ,”this” ,”to” ,”was” ,”what” ,”when” ,”where” ,”who” ,”will” ,”with” ,”und” ,”the” ,”www”);[/code]

This is useful for filtering out common words in an English paragraph that may be deemed insignificant. This is one of the things I used to implement something like a tag discoverer based on word frequency.

New Year’s almost ending

And assignment submission date is coming. My assignments are more or less completed. Yes they look a little like rushed work but they’re completed. I had this crazy obsession with formatting in Word 2007. And that of course is the reason why I can’t move away from Word. Whoever tell me Open Office is cool apparently does not work on numbered document listings and table of contents much. So please save your voices, Office Word 2007 rocks for me and I will not convert at this point of time.

Looking like Word does not mean it is Word.

The 10 essential vi editor commands

This is just so annoying. I had to use the vi editor and it’s hard for me. I’ve gotten use to the mouse and backspaces and enter and all the Word shortcuts. To downgrade to this vi editor sucks.

Ten vi editor commands

  1. To insert – a (INS works too it seems)
  2. To insert on new line below the cursor – o
  3. To replace the one character under your cursor – r
  4. Left Down Up RIght – h j k l (Arrow keys works too it seems)
  5. Undo – u
  6. Delete the line – dd
  7. Delete the character on cursor – x
  8. To get out of the Editing mode – ESC
  9. To quit the bloody editor without saving – :q!
  10. To save and quit – :wq

This is by no way the complete list, the complete list have hundreds of commands that only a true geek who reaches the stage of nirvana can remember.

This post is more for my personal note. I encounter editing when using the visudo command.