Greg and Jeremy discuss the idea of making search a dialogue – asking the engine to listen to you as you attempt to find what you’re looking for. The software might ask, as many do now for spell checking, “Did you mean…”, then it would refine what it presents to you based on your input. Together you and the software can zero in on the perfect answer for you.
I’ve asked folks about this* in the course of my reporting on search, and always gotten the same response: It’s really hard to do. Such an approach to results works particularly well with limited and/or structured data sets (ie “I see you’re looking for a movie. Did you want a comedy or a drama?”) but not so hot with horizontal, unstructured data.
However, that doesn’t mean folks aren’t working on it (or that some engines, like Teoma or AllTheWeb, don’t have some solutions already, and Yahoo’s “Also Try…” is close as well). The problem is that it’s hard to make the choices presented relevant enough of the time – so that overall, the service is really, really useful, as opposed to often right, but often also wrong.
(*And I also asked how come it was that “Did you mean…” works so well for spell checking. Turns out it’s relatively easy to write an algorithm that takes note of common misspellings and maps them to properly spelled words. The same is not true, however, for concepts.)
7 thoughts on “Did you mean…”
For those interested in the technical details behind “did you mean…”-type features, you can browse the following, recent (and still live) thread on one of Lucene mailing lists:
There is even some Java code there, for those who need it.
I think it was the interview that you held with Gary Flake, he talked about “data points” in terms of personilised search. Take the 1st data point as the first query….yes its hard to apply the concept (e.g. “windows”)….but then apply a second data point to the algo “windows” + “microsft”) this should be enough to give the algo the confidence to start the conversation. Confidence levels can also be generated from the analysis of the above across millions of quereis via collaborative filtering.
A great discussion topic! My full post here (http://www.buzzhit.com/2004/09/search-engines-and-active-listening.html); snippet follows:
“But I think Jeremy’s meta point is more interesting; search should use the “active listening” skills that one is taught in an Org Behavior class… my search engine should be in tune with me, continually helping me get closer to my “intent” by looking at my search stream both in real time and historically, leveraging any other knowledge (my blog topics, email, et al) that I care to feed it with, and most importantly, asking me clarifying questions (just like “active listening”)…”
I’ve had some success in extending Lucene to process a user’s initial search results and automatically provide suggestions for query refinement eg “you searched for ‘party’ – did you mean ‘childrens party’ or ‘labour party’ or ‘stag party’?
This approach is interesting in theory but raises some questions when used in a large scale commercial environment:
1) Is the additional processing power required to do such analysis economically viable?
2) Such interfaces will greatly increase the average number of words/phrases used in searches – this will complicate the issue of matching adverts to searches and upset the current PPC models. “Exact matching” of ads may become an extreme rarity when the average number of search terms becomes much larger than 2 or 3.
Tony, I agree, but what happens if you swtich search provider, how long till the intelligence is acquired again? or what if your online apps are provided by numerous providers. who is going to collate the inteligence? I could not envisage using the same company for the next 3yrs let alone 30.
The key question however is are you comfortable with a organisation knowing that much information about you. I’ll leave the definition of “organisation” to your own imagination.
ID:entity – As is typical of your comments here on John’s blog, your reply is especially insightful, as it points out a couple of the bigger problems we’re seeing in several aspects of “digital life”.
Let me use “social networking” as an example. I co-founded a company named WishClick “back in the day” and built a social networking component into it, based on sixdegrees.com; WishClick was a networked gift registry (acting as both a destination site and ASP for online retailers), and as a registry is viral and personal by nature, it seemed like a great fit.
Well, things didn’t exactly go our way, but as you well know, the last few years have seen all sorts of social apps (Friendster, LinkedIn, et al). One of the popular memes (and I think John’s commented on this) is: Is this a feature or a company? Why? Well, at least in part, because IMHO, social networking should be a platform, with a variety of applications (dating, business networking, etc) built on top of it.
So issue one: should the knowledge that allows for search refinement be siloed (which providers would love from a “stickiness” perspective), or should it be platform that disparate apps feed and pull from? (My vote is for the latter.)
Issue two? You nailed that too. Privacy. Exactly how much should any one organization know about me, or, to the other side of issue one, how can I control which organizations have access to my central profile/repository?
What I do know is that consumers are still too uncomfortable (and rightly so) to buy into this in mass; MSFT Passport/Hailstorm’s utter failure is a great example of the disaster that can occur when you fail consumers on both issues 1 & 2 (and fail to secure the trust of partners for your platform play).
Conceptually though… I do think that search algos must be fed with more context on the searcher if they are to approach John’s “perfect search” world; more processing on the data (concept clustering, etc) is valuable, but ultimately insufficient.
Provocative enough for a response? I hope so; I’m enjoying the thread! 🙂
Yes it has to be said that the more information known about a user the more specific the targeting can be achieved, however think the R&D dept’s are just being really careful to ensure that they actually offer a tool to the mass market that’s generally perceived as more useful than MSFT Office assistant.
In terms of Issue 1