Alexa (Make that Amazon) Looks to Change the Game

(Update: Alexa platform is now live) Every so often an idea comes along that has the potential to change the game. When it does, you find yourself saying – "Sheesh, of course that was going to happen. Why didn't I predict it?" Well, I didn't predict this happening, but…

Alexa(Update: Alexa platform is now live)

Every so often an idea comes along that has the potential to change the game. When it does, you find yourself saying – “Sheesh, of course that was going to happen. Why didn’t I predict it?” Well, I didn’t predict this happening, but here it is, happening anyway.

In short, Alexa, an Amazon-owned search company started by Bruce Gilliat and Brewster Kahle (and the spider that fuels the Internet Archive), is going to offer its index up to anyone who wants it. Alexa has about 5 billion documents in its index – about 100 terabytes of data. It’s best known for its toolbar-based traffic and site stats, which are much debated and, regardless, much used across the web.

OK, step back, and think about that. Anyone can use Alexa’s index, to build anything. But wait, there’s more. Much more.

Anyone can also use Alexa’s servers and processing power to mine its index to discover things – perhaps, to outsource the crawl needed to create a vertical search engine, for example. Or maybe to build new kinds of search engines entirely, or …well, whatever creative folks can dream up. And then, anyone can run that new service on Alexa’s (er…Amazon’s) platform, should they wish.

It’s all done via web services. It’s all integrated with Amazon’s fabled web services platform. And there’s no licensing fees. Just “consumption fees” which, at my first glance, seem pretty reasonable. (“Consumption” meaning consuming processor cycles, or storage, or bandwidth).

The fees? One dollar per CPU hour consumed. $1 per gig of storage used. $1 per 50 gigs of data processed. $1 per gig of data uploaded (if you are putting your new service up on their platform).

In other words, Alexa and Amazon are turning the index inside out, and offering it as a web service that anyone can mashup to their hearts content. Entrepreneurs can use Alexa’s crawl, Alexa’s processors, Alexa’s server farm….the whole nine yards.

Does this change the game? Because I was embargoed and could not really talk to anyone about this, I have not had a chance to talk to folks who are smarter than me about this. So my analysis is limited to my imagination. And that itself is limited by the pricing structure – I do not know if using this service will be cheaper for developers and entrepreneurs than rolling their own. But I can only imagine that indeed it is, or Amazon would not be doing this.

So what has been a jealously guarded secret – the contents of the entire index – is now available to anyone who wants it (of course, this assumes Alexa’s index is comparable to the big guys – honestly, I have no idea). The costs are modest – a few thousand bucks to process the entire web, Gilliat told me. How might that change the game? You guys are smarter than me – what do you think?

I am quite sure this means that Yahoo and Google will have to stare hard at their own (somewhat limited) search services and APIs, and think what they might do to compete, that much is certain. And if this starts to gain traction, all of a sudden, Amazon is a major search player, right next to Yahoo, Google, MSN, and IAC. A9+Alexa+web services= hmmmm….

Again, what do you think? Will this be like A9, a groundbreaking development that fails to get traction with a wider audience? Or might this just start something?

Wired News (not up yet) and the WSJ (free link) were also briefed on this news.

71 thoughts on “Alexa (Make that Amazon) Looks to Change the Game”

  1. What an absolutely BRILLIANT and INNOVATIVE idea…It would be so-o fantastic if you could get an interview with the innovators.

    This is a very logical direction, and for sure, this is the direction many others will take after this.

  2. As a developer and tech entrepreneur, I think this is very very exciting and I’ve been waiting for something like this for some time. I don’t know if Amazon/Alexa is going to be the solution when the smoke clears, but I sure hope this will serve as a wakeup call to G and Y! to get their heads out of the sand and stop putting ridiculous restrictions on their search APIs. As it is now, their APIs are restricted enough such that a developer cannot build anything serious with them because of both usage limits (1k requests per day for G and 5k for Y!) and usage terms which disallow commercial use. I am more than willing to pay and I think the pay per consumption model is spot on.

  3. Google’s strength at the end of the day is not in its index or search but its ability to infer things from what you call the “database of intentions”. Someone asked Norvig after one of his talk, given the falling price of hardware and open source software like Nutch what would prevent anyone from indexing the entire web on their local disks, his answer: nothing. And that is google’s Achilles’ heel. That’s why google print’s so crucial to their success.

  4. Well, if there had ever been a walled garden and competitive advantage in the search game, the index was a very signifcant one. Amazon engineers are really shaking things up, even as huge underdogs in the search battle. Opening up their data is an incredible testimate to the power of open networks and their ability to change industry dynamics.

    You know what would be ironic? If people started using googlebase to hold their Alexa data in order to cut costs further.

  5. This is actually just a natural progression of what they’ve been doing with the web services. Last year, Amazon introduced a “simple queuing service” as a beta that lets developers write queue-based, loosely-coupled applications that take advantage of Amazon’s horsepower for storing and delivering the messages. This is just the expansion of making their spare horsepower available to anyone who’s willing to pay for it. In some ways, Alexa may be the true start of the promise of grid computing.

  6. Anyone here thought about the “database of intentions” privacy concern?

    Now we don’t need an in-house psycho engineer to query our private information, but the whole world can do it now (ok, access and info is very limited, but if G/Y adopt the same philosophy…)

  7. Kudos to Amazon for introducing something game-changing. Now what if they coupled this with their own Adsense program which gave 99% of the proceeds to publishers and made that API-enabled? Talk about commoditization of search…

  8. Having such services available is one thing, but in the long run it does not matter if or if not I can use such services because of the power, but how easy it is.

    Everyone was able to publish like they do today back then when we had no ‘blogs’. But they did not (at least not at this rate) because it was too complicated to do a webpage.

    Now with blogs it is just click and do. A service like you describe it which is only available to developpers with knowledge in how to programm such a system is only half the game.

    _If_ I am a decent programmer, I could use spider technology myself and would not need to use such a system. If I am a crappy programmer, at least it will be very expensive to build crappy code.

    But if I am no programmer at all and I want to use this system, I would not only have to pay Amazon but also a programmer to fulfill the ideas I might have.

    While this is an interesting step, it is only the beginning. And paying for consumption is a very good idea too – everybody should be forced to pay a cent or whatever for every feed request – that would bring down those stupid “i request your feed every minute” applications. :o)

  9. While I think its a good development, it would be even better if they offered tools that would allow technically-oriented people, not just programmers, to be able to utilize these APIs and services.

  10. This really isn’t that original as some might expect. I call this WebFountain 2.0.

    The IBM WebFountain project crawled the whole web and allowed IBM clients to run processes against it to mine data. The catch there was that you had to have IBM write the program — you just provided a question that you wanted answered.

    Alexa’s innovation is that the have created an API around the content that allows anyone to create and run code against the store. Kind of scary when you think about it. You’d better be sure that code is pretty solid.

    The Alexa platform is the next evoluationary step in the WebFountain business model. The other major innovation is that they allow clients to use their private cluster not only create meta-content, but also to publish it for end user consumption in the form a custom search engine.

    More on my blog post.

  11. Amazon is paying alot of attention to the search industry. First, A9…now this! I can see Google going this direction, with keeping its limited-free version (cause they like to try to keep things free for most) and then implementing a paid-version with more capabilities similar to Alexa pricing model. It will be interesting to see the growing trend.

  12. An attempt to answer Paul Kedrosky’s question as to why this is a big deal – here:

    Here’s a summary of the key concepts:

    Key concepts:

    * Grid Computing Framework: Any programmer has access to a supercomputer environment similar to what an employee of Yahoo or Google would have.
    * Internet in local storage: Alexa’s crawls the whole internet incrementally. Your program can now have fast access to every document – and all the metadata (like links, file types, etc.)
    * Web Services: If you build an application, you can publish access to it through an web service.

  13. I did some calculations on the back of a napkin and came up with a total of $150,000 per month to operate vertical search on the Alexa platform, just for the basics.

    I’d love it if someone did the math for real.

  14. I don’t understand the gushing excitement over this. It is not that different from what Google and Yahoo already do with their search APIs. The only new thing I see is that the restrictions on use can be lifted, if you’re willing to pay for it. This is a good thing, but I’m not so sure it will change the landscape all that much.

  15. A big problem with Nutch has been that it takes
    a good chunk of resources just to acquire a
    non-trivial index. The code works and is free,
    but buying 2-10 machines plus bandwidth still
    takes a piece of cash. Search had entered the realm of friend-funded startups and
    researchers with big grants, but many people are
    still excluded (e.g., hobbyists,
    researchers without lots of funds, pre-funding startups).

    If this project further lowers the bar on
    playing with indices, it’s all for the good.

    It’s not yet clear whether the pricing
    would allow a search-driven company to
    outsource much of its processing.

  16. Wait, is it direct access to the index calls, or just a very open API? API’s are important, because they allow the programmers who manage the underlying program to improve it without breaking all the programs that run on top of it. Google and Yahoo improve their search engines all the time to root out spam, rank better, etc. I would hope that Alexa hasn’t ruled out their own ability to improve it.

    I would also think that this is the “long tail” of Search engines. For non-specialized uses it will probably remain cheaper to roll their own, because economies of scale would kick in. You can bet Google doesn’t pay $1 per CPU hour.

  17. Indeed this could be called WebFountain 2.0. (See my post.) IBM struggled with the business model which would have allowed them to get the WF project out of their labs. Can Amazon get it right? I don’t know, but they seem to have a way to get innovation to make money.

  18. AMZN is definitely not doing this for the money. This is not going to generate much profit. But they don’t have much to lose either.

    They will get some free R&D out of it, as people try out new search ideas. Is there an academic discount?

    Do they let developers access the query logs of their published engines? Will they let developers accumulate any user data? Slap on advertising? Open up the frontends? Because otherwise it’s just a mass of useless web pages with marginal and incremental utility. The real value lies in the ways these pages get linked to each other, and to queries, by users. So you need users and user histories, lots of them, to be really valuable …

  19. Alan,
    Care to sketch what that would look like?

    I’m sure niche fee-based services will spring up to address technical-but-not-coder needs.

    Where do you think people should get started? πŸ™‚

  20. I don’t claim to be a search expert by any means, but this doesn’t seem like a big deal to me.

    Technically, I don’t think this is a big deal because having the index (and programmatic access to it) is nice, but that’s not the difficult problem for search. Expensive, perhaps. But difficult? No.

    The difficult problem is relevancy. That’s why Google killed Inktomi and all the rest. Their results were just better. No one goes to the inferior solution, even if it’s 95% as good. Everyone wants to know what that extra 5% of search relevancy might offer them.

    Not only is Google’s index is great, it’s PageRank and relevancy algorithms are even better. So what Alexa is offering you is an inferior index and leaving it up to you to create the relevancy application.

  21. Chris,
    Relevance is important, no doubt. But general search isn’t the only search.

    You’re right. Nothing I do will out-google google. But the Lawrence Journal-World doesn’t out-CNN CNN, either. LJ World provides rediculously hyper-local news, and in so doing fills a niche that CNN couldn’t, and provides a valuable service in the doing.

    I think Alexa’s index could allow hyper-specific search, too. Just think about it a bit before dismissing it. πŸ˜‰

  22. Ian, your numbers are wrong. I commented on your blog, but processing a dataset would be around $2000 each 2 months, not $100,000 per month.

    I have no idea if your CPU hour numbers are anywhere in the ballpark, but they seem high to me.

  23. I’m one of those debating the much-debated Alexa traffic stats. I’ve found them extremely overrated and leading to some poor decisions. (Some of those poor decisions involved my favorite subject, webcomics; others involved my professional stint in online classifieds. So all biases are on the table, here.)

    The Internet Archive is inarguably useful, but that’s data, not sorting, and it seems peripheral to Alexa’s identity these days.

    Alexa here seems to be admitting, “we don’t know how to sort our own data, so we’ll ‘let’ you do it– and charge you a buck per unit for the privilege.”

    Which part of this am I supposed to be excited about, again?

  24. Earlier this month the company posted respectable quarterly results. It generated $28 million in free cash flow as it grew its margins and revenue in what some may consider a difficult economic climate. The company has been able to use its earnings to clean up its balance sheet. It now sports a net positive cash balance. All this will only get better with the Circuit City and Best Buy contracts.

    Matt Richey wrote favorably about TeleTech last year when the stock was trading for roughly half of today’s going price. With the fundamentals improving, one is left wondering if TeleTech isn’t just teaming up with Best Buy. In Wall Street’s fickle terms, maybe it is the best buy.

  25. Surprised at some of the unenthusiastic comments.

    > It is not that different from what Google and Yahoo already do

    Oh yes, it is! Google and Yahoo let you use their search, they do not provide you with their index. You can analyze the crawled pages with Alexa. That’s a very important distinction.

    > Alexa here seems to be admitting, “we don’t know how to sort our own data, so we’ll ‘let’ you do it– and charge you a buck per unit for the privilege.” Which part of this am I supposed to be excited about, again?

    You should be excited with the part where you release your own brilliant specialized search engine raking in tons of AdSense dollars and not having to spent 5 years crawling the web and making your own index. The exciting part is the time saving for a reasonable cost.

    > So what Alexa is offering you is an inferior index and leaving it up to you to create the relevancy application.

    Like Jeremy mentioned, I think the greatest benefit is to niche search engines. If you want a search engine for sheep farmers, you can probably build one in a week while not paying much money. The other option is spending months or years crawling pages and probably spending as much or more money on labor and hardware.

    Also, Google is great, but that doesn’t mean that others shouldn’t try to improve on them. I think Google’s great challenge right now is filtering out all the spam pages out there. They really aren’t that great at it. Spam pages routinely end up in the top 10 results for lots of queries. If someone could take the Alexa index and figure a way to get rid of the spam, they would be doing a huge service.

  26. I agree that this hits directly at the Achilles’ heel for Google, namely that the index (and search API on top of it) is rapidly becoming a commodity.

    Yes, Google can continue to work to acquire unique content (such as working with publishers and libraries to scan copyrighted material, etc.) but the is a laborious, non-technical asset, and does not align with their key core competency: top software and systems talent.

    It is possible that someday search will move off the browser, and the Overture model of unobtrusively placing premium content nearby the search result will face difficult challenges. What is special about Google when that happens?

  27. >> It is not that different from what Google and Yahoo already do

    >Oh yes, it is! Google and Yahoo let you use
    >their search, they do not provide you with their
    >index. You can analyze the crawled pages with
    >Alexa. That’s a very important distinction.

    i would think the same like howie. the future, i think, give us better answers

  28. Me thinks, in the short term, guys at Google, Yahoo, Microsoft may not even respond to this. Only when (if?) some compelling applications start getting built using the Alexa index and servers (which may not be really powerful enough to compete with GYM), the big 3 will respond by opening up their current search APIs further to allow more detailed search attributes and allow unlimited queries using these search APIs either for free or at a small price. The current API by Yahoo (for web search, images search, flickr, maps, music, etc) is already good enough for developers.. the only restriction is that it limits to only 5000 queries in a day and developers can’t pay online to increase this limit

  29. If someone could take the Alexa index and figure a way to get rid of the spam, they would be doing a huge service.

    They’d also be hired or bought by the big 3. πŸ˜‰

    Jeff mentioned WebFountain (thanks, I hadn’t heard of it!), and it hit me this morning: this opens up Fundable custom searches. Someone writes the code, lots of people get together to pay the fees, and when the search is done, everyone paying benefits.

    Good stuff.

  30. Ovi, there was no pricing on that old page. It was “call us and maybe we can work something out”. It was contract, now it’s service. It was vendor negotiation, now it’s a specific cost.

    You really don’t see a difference?

    Grep for “Bad Idea #2” on Joel’s discussion of pricing. It’s quite a long essay, so here’s a choice excerpt:

    The reason I bring this up is because software is priced three ways: free, cheap, and dear.

    Free. Open source, etc. Not relevant to the current discussion. Nothing to see here. Move along.
    Cheap. $10 – $1000, sold to a very large number of people at a low price without a salesforce. Most shrinkwrapped consumer and small business software falls into this category.
    Dear. $75,000 – $1,000,000, sold to a handful of rich big companies using a team of slick salespeople that do six months of intense PowerPoint just to get one goddamn sale. The Oracle model.

    All three methods work fine.

    Notice the gap? There’s no software priced between $1000 and $75,000. I’ll tell you why. The minute you charge more than $1000 you need to get serious corporate signoffs. You need a line item in their budget. You need purchasing managers and CEO approval and competitive bids and paperwork. So you need to send a salesperson out to the customer to do PowerPoint, with his airfare, golf course memberships, and $19.95 porn movies at the Ritz Carlton. And with all this, the cost of making one successful sale is going to average about $50,000. If you’re sending salespeople out to customers and charging less than $75,000, you’re losing money.

    The joke of it is, big companies protect themselves so well against the risk of buying something expensive that they actually drive up the cost of the expensive stuff, from $1000 to $75000, which mostly goes towards the cost of jumping all the hurdles that they set up to insure that no purchase can possibly go wrong.

  31. Recently, Caltech reached a new milestone in how fast data can be sent by transferring 475 terabytes of data in 24 hours and I wondered what it would be like to have your own personal cache of the Web. Well, Alexa has taken a huge step in that general direction, but with far greater implications.

    The move is big because of what now becomes possible. Google has lead the world with innovative applications beyond simple search with a vast team of very talented programmers and engineers. Alexa intends to tap the independent innovative talents from the entire connected planet.

    Who knows what new innovations will come from the ether? Or how many info-entrepreneurs dreams suddenly have become possible for an investment of a few thousand dollars? But come they will, and with startling speed.

    We’re about to see an ecosystem of applications we can’t imagine living without, burst forth Web 2.0 style. Faster, better, and built by the collective intelligence of the connected masses.

    I wonder when we’ll see the huge databases of Web queries open up. Think of the trend engines and predictive markets that could spring forth.

  32. Of course this only include the highly visible content of the web. It’s a nice move by Amazon ahead of Google/Yahoo but I see the other players will open themselves up soon. It’s a good move forward for everyone.

  33. Hi Jeremy;

    No need to confuse my optimism with tech stocks πŸ™‚

    Ecommerce is hard and risky gamble and Innovation accelerating fastest within the digital world. Tomorrow’s disruptive changes will be faster, more powerful and with even larger scope than today’s. The Alexa move is a case in point.

    Shifts like this can destabilize the entire playing field crushing other players overnight, and are often meant to. Workers who once made an income suddenly find themselves unemployed with rapidly outdated technical skills. So, not much optimism there.

    But I do believe the net is evolving into a global computing grid and eventually the only sane way to compete will be to harness the network to do the work. I AM optimistic about that.
    So, Alexa /Amazon, win or lose, symbolizes an important shift in that direction.

  34. Scottie,
    With all the optimism around Web 2.0 and the VC heat-up and all that, it’s starting to look like we’re in for another bubble. I’ve taken to calling this Bubble 2.0. I don’t remember where I first saw that, but I doubt I made it up. πŸ™‚
    Anyway, the first net bubble is hereby revisionistically renamed Bubble 1.0.

  35. FWIW, the further discussion has raised my own optimism about this. I’ve still got some really negative impressions of Alexa that will take a while to dispel, but as Jeremy explains it I have to admit this is a good idea, and perhaps a step to a newer, more successful identity for the company and even for search.

  36. how good can it be if they won’t use their own index for the seearch at they use Google right now. the datamining aspect is pretty awesome though, that will be enormous for a lot of businesses. Though it’s going to suck for anyone that has their email address posted on the web.

  37. i am amazed at Alexa- and even dumber about these sorts of things. out of idle curiouity i want to be the first on my block … so i wanted to correspond withyou first…
    ( double positive?)

    Best regards,
    Bob VL

  38. i am amazed at Alexa- and even dumber about these sorts of things. out of idle curiouity i want to be the first on my block … so i wanted to correspond withyou first…
    ( double positive?)

    Best regards,
    Bob VL

  39. I think that people don’t give alexa enought credit.
    I have read so many things that people are giving them the thumbs down.
    I go there all the time:)

Leave a Reply

Your email address will not be published. Required fields are marked *