It's been noted by better (and prompter)
brains than mine that Google's awareness of and handling of date-based
queries is lame in the extreme. Other people have pointed this out
before... I'm just the latest to slam into it.
I've got a project I'm working on which
combines the Google Search Appliance with the Vivisimo Clustering Engine
to provide search services for a high-traffic, lots-of-content internal
website. Problem is, one of the prime features users got used to
with our old search engines (first, Sovereign-Hill's amazing InQuery
and later, the crappy-but-functional Convera RetrievalWare) is the ability
to select documents based on ranges of dates. Thing is, every document
in the system has a date associated with it, and that date is not necessarily
the last date the document was physically edited in the system. Every
document has an effective date, which may or may not be the day it was
actually stuck out on the server, and which is practically never
the date that Google got around to indexing it.
In our earlier search engines (including
Domino's own inbuilt fulltext search) you can pretty easily say, "give
me the documents where the field PublicationDate is between this and this,
or older than this, or newer than that."
It's impossible to do this with Google.
The only dates it knows are its own: the last time it indexed
Needless to say, for users, who could
not give a green shit about when Google indexed the thing but who
care passionately about the official date of the documents, not being able
to select things based on a particular range of publication dates is maddening,
and to the best of my knowledge, there's nothing I can do to help them
except smile and wave.
Yes, I'm aware that the Google Search
Appliance, unlike google.com itself, actually indexes and uses META values
in the document header. But since all META values are interpreted
by Google as string, there's no way to tell it, "the value 'date'
in the header is a value of type date/time, please treat it as such."
You can say, &requiredfields=date:01/22/2006 and you will
get documents where Google sees that the "date" string in the
header matches that string, but you can't do any date math on it.
Some of you really advanced people will
probably mention the little-documented daterange directive in the
Google API. Forget it. I tried it; it doesn't work. In
fact, when I did a search for myself on the web, I got 1.9 million hits
for "The Turtle" as a phrase, but when I then used the daterange
directive to restrict it to documents indexed only on January 1, 2006,
I got 2.0 million hits!
Anyway, if anyone has any good suggestions
on how to work around this, I'm here. For now, what I can offer users
is that if they want to do any fancy date-based stuff, we'll redirect their
query to the Domino search engine and then try to pretty the results up
so they sorta look like Google output.
Beyond that, I'll just remind them of
what my Uncle Charlie used to say:
People in hell want icewater.
This page has been accessed 44 times.
1. Tim Latta02/23/2006 02:49:44 PM
Wow, Sovereign-Hill. I think we looked at that years ago. NEVER heard of anyone else using it!