# Hackage Search: Regex-Based Online Code Search

December 15th, 2020

This is the announcement of the tool. If you’re just searching for the link, it’s here: https://hackage-search.serokell.io/

Have you ever wanted to find something on Hackage? If yes, then I’ve got just the thing for you!

Regular expressions are a convenient way to search plain text files, which is why grep is a popular tool among software developers. So much so that ‘grep’ became a verb: grep this, grep that, you know what I mean, the same thing happened to Google.

One day I needed to grep all of Hackage to see how often forall is used as an identifier in Haskell programs. This way, I could estimate the breakage that would arise from making forall a keyword proper (it is currently a pseudo-keyword with special meaning in type-level contexts only). But in order to grep, one needs a local copy of the data, and the size of Hackage is over 4.5 GB (if we only consider the latest package versions), so that could possibly take a while to download.

Not to mention that there isn’t (or, rather, wasn’t, prior to this work) a tool to download all of Hackage incrementally. If a simple script is used to do it, then it’s hard to keep the local copy up to date without redownloading the same packages all over again.

Naturally, a better download tool had to be created, but the important bit was to get somebody else’s computer to run it! And thus, Hackage Search was born: an online grep for Hackage. Just type in your query and get the results.

On the backend, it uses ripgrep (rg) – a Rust-based alternative to grep. It’s fast and can produce machine-readable JSON as output. The syntax of its regex engine, according to documentation, “… is similar to Perl-style regular expressions, but lacks a few features like look around and backreferences. In exchange, all searches execute in linear time with respect to the size of the regular expression and search text.”

After some additional processing on the backend, the results are grouped by package and streamed at the [rg/] endpoint. The frontend loads them one by one and renders them in a human-readable manner, highlighting the matched substring.

Go ahead and give it a try! Here’s a query to look for the promoted list constructor: \s’:\s. The interesting thing about it is that this quotation mark is the DataKinds syntax for namespace disambiguation, and it’s always redundant in case of :. And yet people like to write it anyway, as evidenced by the results of this query!

By clicking on a package name, you can expand or collapse the detailed information about the matched lines. This way you can collapse the results to see what packages are affected (for example, if you are estimating breakage, which was my use case), or you can expand the results and see the exact code involved.

Another useful feature of Hackage Search is that for any matched line, you can click on its number and see it in a broader context:

In terms of technology, here’s the resulting stack:

• Backend: Rust (rg) + Haskell
• Frontend: TypeScript
• Packaging: Nix

I also used Haskell to write the build script for the frontend.

Let me know if you find other interesting uses for this service. And, of course, it is open source, so check out the code on GitHub if you’re curious.

More from Serokell