How to text search in terminal

There’re times when I need to quickly search through occurrences of certain phrases in my PHP source files. Here is when I have to search for ‘giraffes’.

This is an example finding the word ‘giraffes’ in all PHP source files from the current directory recursively.

[code lang=”bash”]find . -name “*.php” -exec grep -i -H -n “giraffes” {} ;[/code]

This works for Ubuntu, CentOS and Mac OS X.

6,081 visits last month, thanks!

Search engines sent 67.95% of the visitors. Asides search engines, most of the visitors come from Twitter, Ping.sg, Facebook and Stackoverflow.

Firefox is a favorite browser with 60%. Chrome visits constitutes 5% which I am responsible for at least half of Chrome visits I presume.

The top searches are XAMPP, jQuery, Django and PostgreSQL related. I code using CakePHP but no one reads my CakePHP stuff. My top tags are NetBeans, XAMPP and Sex, the combination upsets me.

Top countries visiting are US, UK, Singapore, India and Canada. I had a visitor from Zimbabwe, I didn’t know they can afford computers. Zimbabwe, a victim of hyper inflation, announced last February the removal of 12 zeroes from the currency, i.e. 1,000,000,000,000 Zimbabwe dollars being exchanged for 1 new dollar.

Alright, before I go offtrack, kthxbai.

List of stop words

Stop words sometimes known as stopwords or Noise Words (in the case of SQL Server), is the name given to words which are filtered out prior to, or after, processing of natural language data (text). Hans Peter Luhn, one of the pioneers in information retrieval, is credited with coining the phrase and using the concept in his design. It is controlled by human input and not automated. This is sometimes seen as a negative approach to the natural articles of speech as mentioned above. (Source: Wikipedia)

Here’s a list of stop words, it’s compiled from Mark Sanderson’s Information Retrieval linguistic utilities stop words list. It has been formatted to a PHP array for easy use:

[code lang=”php”]var $stop_words = array(“a”, “about”, “above”, “across”, “after”, “afterwards”, “again”, “against”, “all”, “almost”, “alone”, “along”, “already”, “also”, “although”, “always”, “am”, “among”, “amongst”, “amoungst”, “amount”, “an”, “and”, “another”, “any”, “anyhow”, “anyone”, “anything”, “anyway”, “anywhere”, “are”, “around”, “as”, “at”, “back”, “be”, “became”, “because”, “become”, “becomes”, “becoming”, “been”, “before”, “beforehand”, “behind”, “being”, “below”, “beside”, “besides”, “between”, “beyond”, “bill”, “both”, “bottom”, “but”, “by”, “call”, “can”, “cannot”, “cant”, “co”, “computer”, “con”, “could”, “couldnt”, “cry”, “de”, “describe”, “detail”, “do”, “done”, “down”, “due”, “during”, “each”, “eg”, “eight”, “either”, “eleven”, “else”, “elsewhere”, “empty”, “enough”, “etc”, “even”, “ever”, “every”, “everyone”, “everything”, “everywhere”, “except”, “few”, “fifteen”, “fify”, “fill”, “find”, “fire”, “first”, “five”, “for”, “former”, “formerly”, “forty”, “found”, “four”, “from”, “front”, “full”, “further”, “get”, “give”, “go”, “had”, “has”, “hasnt”, “have”, “he”, “hence”, “her”, “here”, “hereafter”, “hereby”, “herein”, “hereupon”, “hers”, “herself”, “him”, “himself”, “his”, “how”, “however”, “hundred”, “i”, “ie”, “if”, “in”, “inc”, “indeed”, “interest”, “into”, “is”, “it”, “its”, “itself”, “keep”, “last”, “latter”, “latterly”, “least”, “less”, “ltd”, “made”, “many”, “may”, “me”, “meanwhile”, “might”, “mill”, “mine”, “more”, “moreover”, “most”, “mostly”, “move”, “much”, “must”, “my”, “myself”, “name”, “namely”, “neither”, “never”, “nevertheless”, “next”, “nine”, “no”, “nobody”, “none”, “noone”, “nor”, “not”, “nothing”, “now”, “nowhere”, “of”, “off”, “often”, “on”, “once”, “one”, “only”, “onto”, “or”, “other”, “others”, “otherwise”, “our”, “ours”, “ourselves”, “out”, “over”, “own”, “part”, “per”, “perhaps”, “please”, “put”, “rather”, “re”, “same”, “see”, “seem”, “seemed”, “seeming”, “seems”, “serious”, “several”, “she”, “should”, “show”, “side”, “since”, “sincere”, “six”, “sixty”, “so”, “some”, “somehow”, “someone”, “something”, “sometime”, “sometimes”, “somewhere”, “still”, “such”, “system”, “take”, “ten”, “than”, “that”, “the”, “their”, “them”, “themselves”, “then”, “thence”, “there”, “thereafter”, “thereby”, “therefore”, “therein”, “thereupon”, “these”, “they”, “thick”, “thin”, “third”, “this”, “those”, “though”, “three”, “through”, “throughout”, “thru”, “thus”, “to”, “together”, “too”, “top”, “toward”, “towards”, “twelve”, “twenty”, “two”, “un”, “under”, “until”, “up”, “upon”, “us”, “very”, “via”, “was”, “we”, “well”, “were”, “what”, “whatever”, “when”, “whence”, “whenever”, “where”, “whereafter”, “whereas”, “whereby”, “wherein”, “whereupon”, “wherever”, “whether”, “which”, “while”, “whither”, “who”, “whoever”, “whole”, “whom”, “whose”, “why”, “will”, “with”, “within”, “without”, “would”, “yet”, “you”, “your”, “yours”, “yourself”, “yourselves”);[/code]

And here is a list of Google stop words, I can’t recall where I got this from but there’re numerous sites with such information. Once again formatted in a PHP array which you can quite easily convert to Java array:

[code lang=”php”]var $google_stop_words = array(“I” ,”a” ,”about” ,”an” ,”are” ,”as” ,”at” ,”be” ,”by” ,”com” ,”de” ,”en” ,”for” ,”from” ,”how” ,”in” ,”is” ,”it” ,”la” ,”of” ,”on” ,”or” ,”that” ,”the” ,”this” ,”to” ,”was” ,”what” ,”when” ,”where” ,”who” ,”will” ,”with” ,”und” ,”the” ,”www”);[/code]

This is useful for filtering out common words in an English paragraph that may be deemed insignificant. This is one of the things I used to implement something like a tag discoverer based on word frequency.

I compete with Colbie Caillat in Google for “justrealized”

I was googling my website to around and I just realized I did not come up top. Top is a song from Colbie Caillat called “Realize” in YouTube. No doubt it’s a beautiful song.

Competing with Colbie Caillat in Google

(Competing with Colbie Caillat in Google.)

And here’s the song. I couldn’t watch the one right at the top of the results due to region restriction in YouTube. Oh come on, there is near to no benefit restricting a music video to certain regions only.

Colbie Caillat – Realize – Roxy – Hollywood CA

Colbie Marie Caillat (born May 28, 1985 in Newbury Park, California) is an American pop singer-songwriter and guitarist from Malibu, California. (Okay I took that from Wikipedia)

Google now crawls Flash

Flash developers would be pleased on this one – Google has now learnt to crawl the text within the Flash files.

Google learns to crawl Flash

Google has been developing a new algorithm for indexing textual content in Flash files of all kinds, from Flash menus, buttons and banners, to self-contained Flash websites. Recently, we’ve improved the performance of this Flash indexing algorithm by integrating Adobe’s Flash Player technology.

In the past, web designers faced challenges if they chose to develop a site in Flash because the content they included was not indexable by search engines. They needed to make extra effort to ensure that their content was also presented in another way that search engines could find.

Now that we’ve launched our Flash indexing algorithm, web designers can expect improved visibility of their published Flash content, and you can expect to see better search results and snippets. There’s more info on the Webmaster Central blog about the Searchable SWF integration. (Source: Google Blog)

This should have been done a long time ago. But it’s great news for ActionScript lovers. You text are not left out any more.

But! But!! But!!! Honestly, given a choice I much prefer see JavaScript if Flash is not really need. If you just want to create some rollovers or an image gallery, consider JavaScript. For one, it just works out-of-the-box. JavaScript libraries have cool effects that you can achieve with Flash too.

AVG Free 8 installs useless crap, thanks!

I use AVG Free on my Windows XP virtual machine. I installed 3 Windows XP the past 2 days and I am damn familiar with installing antivirus software AVG after a while. I always customize my installations and AVG forces me to install AVG Safe Search and there is no option to opt out. It’s unfortunate that AVG seem to be more and more reluctant to offer transparent software.

AVG Safe Search not working in Firefox 3

Yes, I use Firefox 3, but at least I should be able to uninstall a plugin that I don’t want right? Stopping malicious links should be browser’s job anyway.

On a side note, it is harder to find AVG Free these days as compare to a year ago. Kinda reminds me of Real Player back then where they keep putting their free player in the most obsure location. I rather go exploring new lands on Earth than find that bloody button.

Gridsoft appears to follow suit and hide the link of AVG Free. They offer AVG Internet Security for download but there it doesn’t tell you that you have to pay till you reach this checkout page. AVG Free is still no where to be found in the homepage it appears.

[ad#highlight]

So where can you get AVG Free then?

http://free.grisoft.com/

“i just realized” in Google returns with 6.5 million results

Well, if you are bored like me, you’ll search for “i just realized” in Google and realizing that there are 6.5 millions results for that.

That’s a lot of people who just realized their stuff. Sad to say, i.justrealized.com is not within the top 100 results.

Technorati picked up some blog reactions from this blog and visitors are very slowly coming in. Around 2 visitors per day that is not me. It’s a good start alright!