A while back I had the opportunity to speak with the folks behind Nexidia, a company that takes a unique approach to solving the audio (and by extension video) search problem. Gary has briefly grokked Nexidia in the past, but this was my first chance to dig in and see what they have to offer. In short, it’s pretty cool, and the implications, should the company scale and get access to large datasets (ie become a consumer property or inform one), are significant.
I spoke with Nexidia’s SVP/Media, Drew Lanham. He told me Nexidia is already a profitable company, due in large part to its call center business. For that segment of the market, Nexidia provides audio mining technology that allows companies to identify patterns in customer contact, for example, and design better customer interactions (are you listening, Dell?).
Most stuff I’ve seen about audio and video search uses either text (ie closed caption) or tagging and metadata as a solution. So how does Nexidia work? In short, the company’s technology reduces speech to phonemes, the most basic unit of language, and uses those base units in much the same way that a text engine uses words. This approach is not novel, but Nexidia has apparently figured out a tack that not only works, it also scales, which is critical to the problem at hand. From Drew’s follow up notes to me:
” For example, if you assumed daily additions of 10,000 hours, a taxonomy of 10,000 words, and 50 dual processor boxes, it would take about 8.7 hours to index (produce XML for location of word, file name, quality of phonetic score, frequency of word, language, etc. to be combined with other relevant metadata). I find the 10K hours relevant because if you assume CNN broadcasts 16 hours of content per day, then it would be cheap to index all audio and video created across 600+ radio and television stations (a rough guess of all the spoken word content on a daily basis created in North America). As you know, 50 boxes is trivial.”
Google showed us that when you push to a new level in scale, all sorts of previously unimagined applications can be found. Nexidia is already being used in call center applications, as I mentioned, and counts the “homeland security” industry as a client as well. But what gets me excited is the potential in media search, which is Drew’s focus as well. Nexidia turns any search query (a text input) into a phonetic code, which is then matched against a database of audio and video files. The potential here is rather large – coupled with a smart query UI, one can imagine a new approach to finding relevant data inside non-textual corpuses. Imagine – search all podcasts for a mention of “Google China” for example. Or all newscasts for coverage of “Iraq War Oil”. Should audio/video search become this easy, advertising models open up, as do commerce opportunities (show me every movie where “rosebud” is spoken…). And don’t get me started about what might happen if you mix Nexidia with Skype….
For now, Nexidia plans to work as a back end supplier to consumer sites, but I wouldn’t be surprised if they decided to go it alone and try to become a consumer facing engine that crawled the web as well. I asked Drew about that, and he said only that the company wasn’t going to take that option off the table. What I saw was impressive, though as faithful readers know, I am no technical expert. Regardless, this seems one to watch in 2006….