Archive for December, 2009

Source code search engines, Part 1

December 27, 2009

Update: This article is copied from my old blog host. Part 2 will be coming real soon now.

Lately, there has been an upsurge in the emergence of source code search engines. My friend Scott Granneman (see his linked blog[4]) alerted me to a new one from our pals at Google Labs. Called Code Search it features many of the familiar Google search attributes, as well as some specific to searching for source code, such as search by programming language, license and full POSIX extended regular expression search. Prior to this new entry, I have been using and I was in the beta test program for the later two so I am more familiar with them than Google’s new entry. Does anyone know of any other code search engines? There is also O’Reilly’s Code Search which I will profile in part 2. It only searches code examples in O’Reilly’s book catalog, however. Nonetheless, a high quality database, I’m sure. And just regular Google search is an old standby.

In this article I am just going to review the general features of the three engines. I will also present some preliminary benchmarks. In part
2 I will go into more detail with my actual experience.

Search by language, license.
View the file.
Download the file.
225 Million LOCs
Connects directly to source code repositories, using Subversion or CVS
30 programming languages, including hot new ones such as Ruby, Lua, Matlab.
Open Source Zeitgeist (like Google’s shows trends, pie charts. etc based on search term frequencies.)
Currently, the hot topic is Ruby. Natch.
Output in RSS feeds if you want. (out of Beta)
??? LOCs
Obscure languages like YACC/Bison, Flex/Lex and Haskell
Searches source code repositories, using Subversion or CVS
A full AJAX experience. When you click on the code snippet, you get the full source as well as a project tree navigation on the right showing where the file is at.
Add notes (requires registration and log in.)
Search by project. Search only in function definitions, function calls, comments, only source, class definitions, etc.
Views the file in a new tab, allowing compare two or more snippets of results.
Does not highlight the search term.
Powered by Lucene (A high-quality Java based search engine)

Google’s Code Search (Labs, not yet Beta)
??? LOCs
Views the file the result appears in with the result highlighted and the browser positioned at the first occurrence.
Can download the entire package the result file is in.
I like the use of google style url expressions. For instance, you can use operators like lang:ruby or license:gpl or package:hibernate. You can use these to integrate the search engine into your projects, like you can with other Google products.
And you can use these operators to search multiple packages and languages.

I have attempted 3 types of search. First a search for a method/attribute within a language specific to a project. Then a more global search for a usage of an object attribute within a language. And finally a search for files related to a topic, like spam filtering.

Searching for “connection”, language: Ruby, project: Rails
Krugle: 588 matches
Google : 299 matches
Koders: 459 matches. (Can’t search within projects, so many of these are duplicates if the project is a Rails project, and many are irrelevant to Rails
Via a convoluted series of steps, I was able to get to the Rails project and “Search this project.”
Koders: 59 matches..
Update! I pestered Koders and they got right back to me today. Here is their response:
A very good idea. So much so that we are releasing a site update in the next few days that will have Advanced Search and Project Search. Stay tuned! And thanks for using Koders.
Wonderful! Thanks Mike. That will make Koders even easier to use.

Searched for “window.location”, language Javascript
Google: 4000 matches
Krugle: 1944 matches
Koders: 228 matches.

Searched for “spam” (as in recognize/filter,) language: lisp
Google: 200 matches
Koders: 35 matches
Krugle: 2 matches

As you can see, no one engine is clearly ahead of the others. Google is new, so they might not have crawled everything yet. Expect it to get better. Krugle clearly leads the pack in usability but at the expense of not being able to integrate it with external stuff.* Koders provides this with downloable search plugins for Eclipse, Netbeans, Visual Studio Mac dashboard and Firefox. You can add Koders to your site with some javascript library.

Overall, I like Krugle for the user interface, Google for the relevant results and Koders for some interesting stats about the project. Also Google and to a lesser extent, Koders for the ability to pass in search phrases in the web interface or externally.

* Update on Krugle: They have added this feature! You can now do the URL and search box things that Google uses, as well as deep links. See Chris’s comments below.

Finally, in addition to code searches, programmers should have knowledge of the mailing lists and usenet groups related to projects they are interested in. If you subscribe to a mailing list, and use something like GMail to archive it, you can use GMail’s superior search capabilities to find relevant topics and answers to your questions. And if you can’t find them, you can ask yourself. But if you haven’t (yet) subscribed, you can use the new Google Groups[5] Beta to search for topics among many groups. Most programming related mailing lists are (or will) be mirrored at
Google Groups.

[3] Google Code Search
[4] Scott Granneman’s blog. All Scott, All the time:
[5] Google Groups Beta

I have added ChrisBurmester’s comment (from Krugle) inline here as this is copied over from Blogspot:

Ed – I appreciate your thoughtful analysis. Looking forward to more. And glad you like our interface!

Regarding external links to searches, etc., Krugle quietly released deep link functionality in our last “dot” release, a few days ago. I’ll be blogging about it in detail on Krugle’s blog sometime in the next few days, but here’s enough to get you going:

For example, using the searches you used for comparison above:

Searching for “connection”, language: Ruby, project: Rails:

Searched for “window.location”, language Javascript:

Searched for “spam” (as in recognize/filter,) language: lisp:

For code search use:

For valid tokens for the language and findin filters, use “view source” and look at the popup options for the lang and findin select elements. Find in is the filter that let’s you search specific logical areas of the code – comments, function definitions, function calls, etc.

For project search, make calls like:

For our technical pages web search, make calls like:

All of the search deep links accept the following additional arguments:

* start={0-(maxHits-1)} – display results starting with hit number (start+1). Default 0.
* hpp={1-500} – display the specified number of hits per results page. Default 10.

Mouse over any clickable link on our search site, and you’ll see the corresponding deep link URL you can use to link to that object or action externally. Use your standard right-click or cntrl-click contextual menu to copy the URL or open it in a new tab/window.

In doing this, you’ll also see that you can link to individual code or project files you find in your searches and link to them from the outside.

As for a public API and JavaScript mash up objects, keep an eye on our blog…