Gary gives us a pause from YT madness….from his coverage:
Nexidia is a multimedia search company out of Georgia that creates a searchable corpus (words and phrases) but does it unlike other products that provide transcript (every word spoken) search.
Nexidia takes words and phrases and breaks them down into phonetic sounds (phonemes) and then indexes them. ….
About 40 phonemes exist in every language with about 400 in all spoken languages. …To this point, it’s been difficult to demo Nexidia technology since there haven’t been any public demo sites. Most of their business is with private companies (recording call center chats, for example) and the government.
However, as of today, we have a publicly accessible demo to take a look at with Nexidia. It comes from Channel 11 (WXIA) in Atlanta and allows you to keyword search all of their news programming (no sports) plus some exclusive web footage. Look for the search box in the middle of the page. Of course, don’t forget that this is a beta release.
3 thoughts on “Nexidia: Cool Video Search Stuff”
Um, isn’t this actually a step down, rather than a step up? All the sites that index words from audio (i.e. Podzinger, speech-to-text) also go through a phonemes stage. However, these other sites use language models (neighboring statistical frequencies) to disambiguate and clarify the exact words used.
That way, you can distinguish between “I love you”, “olive you”. The language model says “I love you” is much more likely than “olive you”, and most of the time it is going to be right.
Seems to me that if you’re just going to match phonemes, and not the words, you won’t be able to match different dialects. Because someone with a different dialect is going to use a different phoneme to express the same word. (Think about someone from Boston who “pahks the cah”, while someone from Texas drills for “earl wells”. If you do phoneme matching only, “earl” won’t match “oil”, and “pahk” won’t match “park”. Even though it really should.
So how exactly is “phoneme-only” matching better?
I was required to use Nexidia’s product for a video retrieval system. Yes, phoneme matching sucks. Even worse since the data I had was highly technical (a lot of acronyms and model numbers which may either be pronounced or spelt letter by letter) and may have non-native English speakers (heavily accented). And some pronunciations are not obvious either to a computer without a specialized dictionary and some idea of context, for example, SQL can be pronounced as “sequel”.
Beta release indeed. I’ve tried their software about 2 years ago. I doubt it would seriously take off.
I’ve done some consulting work with Nexidia recently, and from what to I understand, the dialect/accent question is not an issue. Nexidia uses audio samples of thousands of people to support all accents and dialects in North American English. The audio used to create these models is carefully selected based on dialect variations, as well as speaker gender and age. Results are then sorted by confidence and frequency.
For example, if someone is going to search the phrase “park the car” they type in those words. The phonetic search technology then looks for a match among it’s dozens to hundreds of variations on how that phrase could be pronounced. This helps the searcher in the case of misspellings as well. If someone were looking for “Brittney Spears” or “Britney Speers” or “Britnay Spears” they would get all the same results using phonetic search. This is a vastly different product from two years ago, and worth checking out.