Source code search engines, Part 1

December 27, 2009

Update: This article is copied from my old blog host. Part 2 will be coming real soon now.

Lately, there has been an upsurge in the emergence of source code search engines. My friend Scott Granneman (see his linked blog[4]) alerted me to a new one from our pals at Google Labs. Called Code Search it features many of the familiar Google search attributes, as well as some specific to searching for source code, such as search by programming language, license and full POSIX extended regular expression search. Prior to this new entry, I have been using Krugle.com and Koders.com. I was in the beta test program for the later two so I am more familiar with them than Google’s new entry. Does anyone know of any other code search engines? There is also O’Reilly’s Code Search which I will profile in part 2. It only searches code examples in O’Reilly’s book catalog, however. Nonetheless, a high quality database, I’m sure. And just regular Google search is an old standby.

In this article I am just going to review the general features of the three engines. I will also present some preliminary benchmarks. In part
2 I will go into more detail with my actual experience.

Basics
Search by language, license.
View the file.
Download the file.

Koders.com
225 Million LOCs
Connects directly to source code repositories, using Subversion or CVS
30 programming languages, including hot new ones such as Ruby, Lua, Matlab.
Open Source Zeitgeist (like Google’s shows trends, pie charts. etc based on search term frequencies.)
Currently, the hot topic is Ruby. Natch.
Output in RSS feeds if you want.

Krugle.com (out of Beta)
??? LOCs
Obscure languages like YACC/Bison, Flex/Lex and Haskell
Searches source code repositories, using Subversion or CVS
A full AJAX experience. When you click on the code snippet, you get the full source as well as a project tree navigation on the right showing where the file is at.
Add notes (requires registration and log in.)
Search by project. Search only in function definitions, function calls, comments, only source, class definitions, etc.
Views the file in a new tab, allowing compare two or more snippets of results.
Does not highlight the search term.
Powered by Lucene (A high-quality Java based search engine)

Google’s Code Search (Labs, not yet Beta)
??? LOCs
Views the file the result appears in with the result highlighted and the browser positioned at the first occurrence.
Can download the entire package the result file is in.
I like the use of google style url expressions. For instance, you can use operators like lang:ruby or license:gpl or package:hibernate. You can use these to integrate the search engine into your projects, like you can with other Google products.
And you can use these operators to search multiple packages and languages.

Benchmarks
I have attempted 3 types of search. First a search for a method/attribute within a language specific to a project. Then a more global search for a usage of an object attribute within a language. And finally a search for files related to a topic, like spam filtering.

Searching for “connection”, language: Ruby, project: Rails
Krugle: 588 matches
Google : 299 matches
Koders: 459 matches. (Can’t search within projects, so many of these are duplicates if the project is a Rails project, and many are irrelevant to Rails
Via a convoluted series of steps, I was able to get to the Rails project and “Search this project.”
Koders: 59 matches..
Update! I pestered Koders and they got right back to me today. Here is their response:
“Ed,
A very good idea. So much so that we are releasing a site update in the next few days that will have Advanced Search and Project Search. Stay tuned! And thanks for using Koders.
Regards,
Mike”
Wonderful! Thanks Mike. That will make Koders even easier to use.

Searched for “window.location”, language Javascript
Google: 4000 matches
Krugle: 1944 matches
Koders: 228 matches.

Searched for “spam” (as in recognize/filter,) language: lisp
Google: 200 matches
Koders: 35 matches
Krugle: 2 matches

As you can see, no one engine is clearly ahead of the others. Google is new, so they might not have crawled everything yet. Expect it to get better. Krugle clearly leads the pack in usability but at the expense of not being able to integrate it with external stuff.* Koders provides this with downloable search plugins for Eclipse, Netbeans, Visual Studio Mac dashboard and Firefox. You can add Koders to your site with some javascript library.

Overall, I like Krugle for the user interface, Google for the relevant results and Koders for some interesting stats about the project. Also Google and to a lesser extent, Koders for the ability to pass in search phrases in the web interface or externally.

* Update on Krugle: They have added this feature! You can now do the URL and search box things that Google uses, as well as deep links. See Chris’s comments below.

Finally, in addition to code searches, programmers should have knowledge of the mailing lists and usenet groups related to projects they are interested in. If you subscribe to a mailing list, and use something like GMail to archive it, you can use GMail’s superior search capabilities to find relevant topics and answers to your questions. And if you can’t find them, you can ask yourself. But if you haven’t (yet) subscribed, you can use the new Google Groups[5] Beta to search for topics among many groups. Most programming related mailing lists are (or will) be mirrored at
Google Groups.

[1] Koders.com http://www.koders.com
[2] Krugle.com http://krugle.com
[3] Google Code Search http://www.google.com/codesearch
[4] Scott Granneman’s blog. All Scott, All the time: http://www.downloadsquad.com/bloggers/scott-granneman/
[5] Google Groups Beta http://groups-beta.google.com/

I have added ChrisBurmester’s comment (from Krugle) inline here as this is copied over from Blogspot:

Ed – I appreciate your thoughtful analysis. Looking forward to more. And glad you like our interface!

Regarding external links to searches, etc., Krugle quietly released deep link functionality in our last “dot” release, a few days ago. I’ll be blogging about it in detail on Krugle’s blog sometime in the next few days, but here’s enough to get you going:

For example, using the searches you used for comparison above:

Searching for “connection”, language: Ruby, project: Rails:
http://www.krugle.com/kse/files?query=connection&lang=ruby&project=rails

Searched for “window.location”, language Javascript:

http://www.krugle.com/kse/files?query=window.location&lang=javascript

Searched for “spam” (as in recognize/filter,) language: lisp:
http://www.krugle.com/kse/files?query=spam&lang=lisp

For code search use:

http://www.krugle.com/kse/files?query=keywords&lang=LANG_TOKEN&project=project_keywords&findin=AREA_TOKEN

For valid tokens for the language and findin filters, use “view source” and look at the popup options for the lang and findin select elements. Find in is the filter that let’s you search specific logical areas of the code – comments, function definitions, function calls, etc.

For project search, make calls like:

http://www.krugle.com/kse/projects?query=JSUnit&lang_pr=javascript

For our technical pages web search, make calls like:

http://www.krugle.com/kse/techpages?query=keywords

All of the search deep links accept the following additional arguments:

* start={0-(maxHits-1)} – display results starting with hit number (start+1). Default 0.
* hpp={1-500} – display the specified number of hits per results page. Default 10.

Mouse over any clickable link on our search site, and you’ll see the corresponding deep link URL you can use to link to that object or action externally. Use your standard right-click or cntrl-click contextual menu to copy the URL or open it in a new tab/window.

In doing this, you’ll also see that you can link to individual code or project files you find in your searches and link to them from the outside.

As for a public API and JavaScript mash up objects, keep an eye on our blog…

Cloudy Skies ahead. Ruby, jQuery and VPSs

November 17, 2009

It’s that time of year again. You know when the pontificators pontificate about the upcoming year. What will Oh Ten bring us? Well, here is my predictions: Much the same as Oh Nine, but with some cool new stuff. You want examples? Ok. Everything will be in the ‘cloud’. All your apps will be mobile. Apple will release a touchscreen computer ala Star Trek. Google will make gobs of money. Microsoft will patent both kinds of bits, 1’s ___and__ 0’s in an attempt to foil Free/Open source once and for all. Ruby will become the premier language and JavaScript will ride its coattails to glory.
Oh, and I forgot: 2010 will finally be the year of the Linux Desktop.

Well, having just scooped Tim O’Reily and Cringly, what really is going on?
I’ve noticed a trend, and I’m sure you noticed it too. JavaScript is really talking off. Client side applications are web-based, but increasingly run (much of) the user interface in the browser. AJAX is used as the transport layer to/from the server and HTML is really just the envelope to load the JS code and the initial state of the presentation layer. After that, jQuery uses the DOM and CSS to wow us with its magik. Apps like Google Docs, Wave, GMail and others demonstrate the feasibility of web-based client side appllications.

On the server side, Ruby is used to express concise meaning to the Semantic Web. REST is used as a means to aggregate solution domains together, providing superior user experiences.

Beyond the server, the cloud is becoming the place to be. The advent of Virtual Private Servers with VM appliances built in, will eliminate the need for initial setup and a lot of maintenance. Scalability of your application will no longer be in the domain of the local sysadmin. He will outsource that to the VPS company. His boss will just hsve to pay the bill, eventually.

We are seeing this trend develop now. Eventually, I think we will have another paradigm shift. We moved from server side apps (Mainframes), to client side apps (PC Revolution: The ’80s), to client-server apps (’90s) to web apps (’00s) to cloud based client side apps. The source code is no longer static. It moves around. The servers send it around to each other (XML/REST), and then on to the browser (JS/JSON).

We will stop thinking in terms of deploying our code on some physical layer (Floppy Disk/CD-ROM/DVD/USB Flash/Internet download) and expect that it will start in the cloud and migrate to the needed place.

Want to be on this cutting edge? Beef up your Ruby for REST services with database back ends (maybe not relational ones). Polish up your jQuery/CSS skills. Look for plugins that enhance the user experience. And quit worrying about performance. That’s Mr. PointyHHair’s domain. :=)

Happy new year. Enjoy the new decade.

GiveCamp – Code wrangling weekend for Non-Profits

November 16, 2009

Well, GRGivecamp http://grgivecamp.org/ is unfortunately over. I had a wonderful time. Despite being physically challenged, everyone there lent a hand with food and coding expertise. I met a lot of wonderful folks and we ate a bunch of great food and drinks from many corporate sponsors such as Bigby Coffee, Panera, Brick House Pizza and Sandman’s BBQ. GiveCamp was started by a Microsoft employee a couple of years ago and continues to grow. There have been two ann Arbor Mi (U of M home – Go Wolverines!) and several in other places. One of organizers, Chris (Woody) Woodruf, is on the national board and said that there are at least five more scheduled across the country in the next few months. I hope there is another in GR  Fall  ’10. (Why do I keep saying “Oh- Ten”?)

What was my experience like? Well, in my case, I was assigned to Neighborhood Ventures, a non-profit that promotes businesses, economic development and community support in GR. Their representative, Sylvia Harris, was a wonderful woman who took the time to really define what they wanted done to their website: http://www.neighboorhoodventures.com. This is a LAMP based site that used PHP and jQuery. It was nicely done and hosted by the folks at Community Media Center. Our team, which was lean and mean, consisting mainly of me with a lot of help from floaters (specialists like HTML/Graphics/CSS designers) and some help from GRGiveCamp Cincinnati edition,  had to move some content around and fix up some bugs in Safari and add some links. It was about the right amount of work for one person with a lot of help. I managed to get it done to Sylvia’s satisfaction by the end of the weekend on Sunday, just in time to demo it to the crowd.

We had over a hundred volunteers show up to work on 23 non-profits. Quite a few were using Drupal, .Net Nuke, Joomla and other CMS systems. This seems about the right solution for NP’s as they mainly have content and presentation. There is really no need for the heavy lifting of a framework like Rails or some other app. Although, I counted 2 Rails apps, one of which was BDD driven by Cucumber and RSpec by our team of folks at our Ruby group in GR (http://ruby.meetup.com/46/calendar/11787410/) and the developers at Mutually Human Software (one of the corp sponsor.) Whichever tool fits. I think. The organizers did a great job of fitting the wide range of  skills of the various developers to the needs of the NP’s. Every project got done in time, albeit with a lot of all-nighters.

To quote Woody, we may be highly paid consultants in our day jobs, but we don’t feel that NPs should have to pay for tech work. We as citizens of the community all benefit from their services equally. We can afford to give back. 148 strong, the attendance suggests for our little big town, I guess so.

 

We got quite a bit of media coverage and there are some online stuff you can view:  The Rapidian http://www.therapidian.org/grgivecamp-enlists-100-volunteers-help-23-nonprofits

WZZM Channel 13 http://www.wzzm13.com/news/news_article.aspx?storyid=115722  (Sorry no video, but was on TV).

 

I’ll update this as they become available.