PermaLink As usual, I can't get a date02/15/2006 04:37 PM

It's been noted by better (and prompter) brains than mine that Google's awareness of and handling of date-based queries is lame in the extreme.  Other people have pointed this out before... I'm just the latest to slam into it.
I've got a project I'm working on which combines the Google Search Appliance with the Vivisimo Clustering Engine to provide search services for a high-traffic, lots-of-content internal website.  Problem is, one of the prime features users got used to with our old search engines (first, Sovereign-Hill's amazing InQuery and later, the crappy-but-functional Convera RetrievalWare) is the ability to select documents based on ranges of dates.  Thing is, every document in the system has a date associated with it, and that date is not necessarily the last date the document was physically edited in the system.  Every document has an effective date, which may or may not be the day it was actually stuck out on the server, and which is practically never the date that Google got around to indexing it.

In our earlier search engines (including Domino's own inbuilt fulltext search) you can pretty easily say, "give me the documents where the field PublicationDate is between this and this, or older than this, or newer than that."  

It's impossible to do this with Google.   The only dates it knows are its own:  the last time it indexed the document.  

Needless to say, for users, who could not give a green shit about when Google indexed the thing but who care passionately about the official date of the documents, not being able to select things based on a particular range of publication dates is maddening, and to the best of my knowledge, there's nothing I can do to help them except smile and wave.

Yes, I'm aware that the Google Search Appliance, unlike google.com itself, actually indexes and uses META values in the document header.  But since all META values are interpreted by Google as string, there's no way to tell it, "the value 'date' in the header is a value of type date/time, please treat it as such."  You can say, &requiredfields=date:01/22/2006 and you will get documents where Google sees that the "date" string in the header matches that string, but you can't do any date math on it.

Some of you really advanced people will probably mention the little-documented daterange directive in the Google API.  Forget it.  I tried it; it doesn't work.  In fact, when I did a search for myself on the web, I got 1.9 million hits for "The Turtle" as a phrase, but when I then used the daterange directive to restrict it to documents indexed only on January 1, 2006, I got 2.0 million hits!

Anyway, if anyone has any good suggestions on how to work around this, I'm here.  For now, what I can offer users is that if they want to do any fancy date-based stuff, we'll redirect their query to the Domino search engine and then try to pretty the results up so they sorta look like Google output.

Beyond that, I'll just remind them of what my Uncle Charlie used to say:

People in hell want icewater.
This page has been accessed 47 times. .
Blabber :v

1. Tim Latta02/23/2006 02:49:44 PM


Wow, Sovereign-Hill. I think we looked at that years ago. NEVER heard of anyone else using it!




Links
Other stuff to waste your time:
Weightless Dog
My YouTube videos
My Head Talking
Today's Poll
PlanetLotus
Recent Entries
The BlogRoll
Calendar
No calendar found.
Monthly Archive
Lotus Domino ND8 RSS News Feed RSS Comments Feed RSS Validator Blog Admin Lotus Geek OpenNTF BlogSphere
Say hi