Getting Samuel Johnson Right

Lisa Gold (via Cory and BB) has a blog about book research. I love this post about an oft repeated Samuel Johnson quote. The next best thing to knowing something is knowing where to find it.” — Samuel Johnson I thought this famous Samuel Johnson quote would be an…

Lisa Gold (via Cory and BB) has a blog about book research. I love this post about an oft repeated Samuel Johnson quote.

The next best thing to knowing something is knowing where to find it.” — Samuel Johnson

I thought this famous Samuel Johnson quote would be an appropriate way to begin my blog. The problem is that Johnson never actually said this…..The actual Johnson quote is: “Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.” I confirmed this by searching the text of Boswell’s Life of Johnson online.

11 thoughts on “Getting Samuel Johnson Right”

  1. The purpose of computing is insight, not numbers.” — Richard Hamming (1962), Mathematician and founder of the Association of Computing Machinery (ACM)

    Unfortunately, the current state of web search is all about finding little factoids, home pages, known items. Numbers. There is very little provided by the big search engine companies by way of exploration, understanding, insight, summarization, sensemaking, et cetera. The user is left having to manually comb through hundreds of thousands of results in order to make sense of all that data, themselves.

    People don’t want to just search. People also don’t want to just find. People want to understand.

    What I think very few people get is that “finding” doesn’t just mean locating a quote, or a home page. “Finding” means gathering enough of a wide understanding of a topic, so that your information need is filled. And currently this behavior, again, is not very well supported.

  2. I am overwhelmed at the impact of what “find” has already had. In some ways it has diminished the need to be “knowledge oriented” (from a getting information point of view, NOT from an understanding point of view). Instead, increasingly people tend to become “Search Smart”.

    It would indeed be a leap from “find” to “understand”. Perhaps this is the promise of the semantic web. At present we seem to be at the mercy of statistical techniques (combined with AI). Would be interesting to see how (if at all) semantic web would take us from “find” to “understand”.

    Whether it is “find” or “understand”, it is up to human intelligence to find the “find” and understand the “understand”.

  3. The funny thing is, that John just made a post what is also duplicates the false infomation, what will be eventually found by other searchers: “The next best thing to knowing something is knowing where to find it.”
    The problem is rooted in the very nature of the Internet itself. Information is duplicated easier than ever before. We are spreading information without the time or preparation to check the reliability of our information sources.

  4. it would indeed be a leap from “find” to “understand”. Perhaps this is the promise of the semantic web. At present we seem to be at the mercy of statistical techniques (combined with AI).

    I submit that this is a false dichotomy. “Understand” and “statistical techniques” are not mutually exclusive. You can use statistical techniques and still go from “find” to “understand”.

    My point is that when you run a query, and get results 1 to 10 of 789,000 documents, you have now “found” 789,000 documents. But you really have no understanding of what those 789,000 documents contain. You have no summarization of those 789,000 documents. You have no sense of what they’re all about. You have no insight into their content. You have no way of exploring the information in those documents, other than 10 results (1 page) at a time.

    This is what information retrieval systems (“search engines”) are good at.. making sense of vast quantities of data. I would disagree with you when you say that it is up to human intelligence to find the find and understand the understand. Search engines can “read” those 789,000 documents much quicker than you or I can, and they can look for statistical patterns, regularities, and even recurring irregularities or contradictory statements! For example, when I’ve searched for information that gets written about on both liberal and conservative blogs (let’s say.. “Iraq war”), there will probably be 200,000 of those 789,000 SERP pages that say “Iraq is going well” and 200,000 of those 789,000 SERP pages that say “Iraq is going terribly”. A search engine should be able to detect that information, and “retrieve” it, presenting it to me, the user!

    Discovering large-scale patterns like that, and extracting this sort of information, is what information retrieval (“search”) is all about. And yet no major web search engine company even starts to deliver on this. Everyone punts on it, and some even waste their time developing chat applications and horoscopes (when they once promised not to).

    All they give us is 789,000 results, ten at a time. Let’s assume that it takes me 10 seconds to look at a page with 10 results. That means (assuming my maths are correct) it would take me 9 straight days, with no sleep, to get through all 789,000 results, to be able to “understand” what information was contained out there on the web in relation to the information need expressed by my query.

    We can do better.

  5. Another example: There exists research on “sentiment analysis”, discovering whether the tone of a web page is positive or negative, etc.

    When I run a search, I should be able to click a button (or enter as a command-line option) the fact that I want to see my search results presented as two lists: The “positive sentiment” list, and the “negative sentiment” list. Currently, there is no way of doing this on the search engines. It won’t work to just do two sub-searches, [Iraq war positive] and [Iraq war negative]. The “positive” and “negative” keywording in my search doesn’t actually do sentiment analysis. It just looks for occurrences of those terms, which themselves are not likely to occur. (Very few people explicitly say that what they are writing has a positive or a negative sentiment.)

    That way, instead of seeing some lame “universal” search, with all the positives and all the negatives mixed up, I can instead get a clear sense of the top-ranked positive results, and the top ranked negative results. I can also get a sense of _how many_ results are contained in the positive and negative sub-lists.

    This would be an invaluable tool for searching for things like “Iraq war”. But it would also help in product search. Or issue search, for example when I am trying to decide on some local ballot issue for the upcoming election. It would be nice to discover the best pages both for an against an issue. Automatically. Because the search engine itself has surfaced (“found”) both aspects of that information for me, and thereby helped me understand the topic space.

    10 years of Google, and it seems more time is spent on Calender apps than on useful stuff like this!

  6. When I run a search, I should be able to click a button (or enter as a command-line option) the fact that I want to see my search results presented as two lists: The “positive sentiment” list, and the “negative sentiment” list. Currently, there is no way of doing this on the search engines. It won’t work to just do two sub-searches, [Iraq war positive] and [Iraq war negative]. The “positive” and “negative” keywording in my search doesn’t actually do sentiment analysis. It just looks for occurrences of those terms, which themselves are not likely to occur. (Very few people explicitly say that what they are writing has a positive or a negative sentiment.)

  7. That way, instead of seeing some lame “universal” search, with all the positives and all the negatives mixed up, I can instead get a clear sense of the top-ranked positive results, and the top ranked negative results. I can also get a sense of _how many_ results are contained in the positive and negative sub-lists.

  8. This would be an invaluable tool for searching for things like “Iraq war”. But it would also help in product search. Or issue search, for example when I am trying to decide on some local ballot issue for the upcoming election. It would be nice to discover the best pages both for an against an issue. Automatically. Because the search engine itself has surfaced (“found”) both aspects of that information for me, and thereby helped me understand the topic space.

Leave a Reply

Your email address will not be published. Required fields are marked *