Can URLs Be Too Clean?

I have long been a proponent of Clean URLs (URIs). The URL should be treated as a human-readable representation of the data resouces being accessed. Many sites are getting better about this, especially with the growing popularity of applications like WordPress, Django, Ruby on Rails, etc. which make it incredibly easy to sculpt beautiful URLs.

An interesting difference in clean URL design has emerged recently. On many (or most) sites there is a search feature, allowing a user to search some data set provided by the site. The URLs used to access these searches have taken two distinct forms, the former of which I think is more traditional with its use of the query string to pass through a key-value pair of search criteria, and the latter of which has some more interesting characteristics.

Form A
example.com/search?q=foo+bar

Form B
example.com/search/foo+bar

Of course there are several variations on these forms, like how Eventful’s searches don’t explicitly designate a “search” resource, but rather use the existing information in the URL to determine what is being searched for (e.g. /events?q=music). The same can be found on dictionary.com, only using the second form: /browse/meninx.

In any case, the consequences of choosing one of these forms over the other is the more interesting topic. URLs adhering to Form A are probably more ubiquitous due to their traditional place on the web, but they also seem more natural as direct consequences of using a form element with method="get". Indeed, this is how the search feature on many sites is implemented. It is a simple and direct way to designate a search and it is well-known to web savvy people in general.

URLs adhering to Form B can be found on sites like Technorati, 9rules, Squidoo, and others. Obviously because they don’t use Form A for their search URLs, they can’t have a simple GET form for searches. Instead, they are forced to take alternate routes. Taking Technorati as an example, we find the following HTML for their search form:

<form action="/search.php" method="post">

While this is a child of necessity, it’s interesting to see what the HTTP 1.1 RFC has to say about GET vs. POST. First for GET:

The GET method means retrieve whatever information (in the form of an entity) is identified by the Request-URI. If the Request-URI refers to a data-producing process, it is the produced data which shall be returned as the entity in the response and not the source text of the process, unless that text happens to be the output of the process.

So essentially the HTTP GET method is intended for use when one needs to get information identified by the URI (big surprise). As for POST:

The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line.

A search request doesn’t seem much like a request that the origin server accept some enclosed entity to me, but this is pedantry. What are the real consequences of using Form B for search URLs? Well one possibly surprising one is that the searches can be indexed by Google.

Try a search for css architecture structure site:squidoo.com or rock music site:technorati.com. The results are littered with search requests, each of which does an active search on the site and gives back matching results. This is in stark contrast with music san diego site:eventful.com, which gives mostly destinations for content related to music in San Diego, but no actual search requests like eventful.com/events?q=music&l=san+diego.

What conclusions should we draw from this? Maybe it’s better for search indexing of site content to use URLs of Form B for searches, or maybe that’s just gaming the system and not adding any real value to user searches. If, for instance, Google indexes a search request for /search/bunnies on a site, but the site doesn’t currently have any bunnies to offer, the user who clicks on that link will be mislead and disappointed to find zero results, hurting the entire search experience and making Google’s own results less relevant. Indeed, it seems silly for Google to simply be a proxy for search queries on other websites when Google’s primary mission is to index directly useful information, not information about searches for possibly useful information.

It is still too early to tell what other effects this difference in URL construction will have, but for now I will defer to the HTTP and URI specifications, which indicate a clear intention toward Form A.

Originally published:
October 11, 2006

Archived at:
http://h3h.net/technology/can-urls-be-too-clean