From Boing Boing I learn that the Internet Archive is releasing its crawler for free under a LGPL license. Why is this news? As I’ve argued in the past, it’s not cheap or easy to innovate in the search space, but the search space desperately needs innovation. If key components like crawlers can be snapped in place relatively easily, new ideas heretofore unthinkable become possible. I also like the philosophy behind the crawler, which is named Heritrix: “Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix ) is an archaic word for inheritess. Since our crawler seeks to collect the digital artifacts of our culture (my emphasis/link) for the benefit of future researchers and generations, this name seemed apt.”
Way to go, Brewster!