Search + State + Metadata = A Search Application
The major search engines have all built incredibly impressive and expensive infrastructures with one main goal in mind: finding the one result or small set of results that enable a user to find the specific information they are looking for amidst the vast forest of information that is the Web. Finding the specific tree in this vast forest, and finding it fast, is thus not only the benchmark by which all search engines are measured but also arguably the dominate axis of investment and innovation within the search industry today.
If you don’t believe me, just look at all the search start-ups that have launched in the last few weeks including Riya and Powerset. While these start-ups each have a different focus, their main value proposition is the same: we help you find you are looking for faster and better than the other guys.
As I outlined in my last post though, the huge investments required to even have a prayer of “building a better Google” dictates that most “tree focused” search start-ups will either: A) Be forced to concentrate on a small niche, B) Be acquired by one the one of the majors for their technology, or C) Be crushed by the major players when they add similar functionality.
Seeing The Forest for The Trees
The real green field in search today is not trying to find the tree, it is in trying to understand the forest. The forest, is the vast interconnected sea of web of sites that make up the web and the flows of information that constantly course through it. Once you can see this forest and observe how it changes over time, you can begin to derive insights and information that simply are not possible to discern with a single query.
To understand the forest you need three main things: persistent search, state, and metadata. Persistent search is simply a search query that is constantly running. State is information about the state (what time is it, what things are connected to what, etc.) of the web each time a query is run. And metadata is information that is derived from the set of search results returned for a particular query (how many results, what type of results, who authored the results, statistical analysis of elements within the results, etc.).
Marrying these three items together adds two key dimensions to search: time and context. Time is an incredibly important dimension because once you can compare things over time, you can determine change and the rate of change. For a primitive example of how to marry search and time, look no further than a little noticed (and very immature) feature on Google Finance. If you look at the news for a particular stock in Google Finance, on the top left hand side you will see a little graph of the number of articles mentioning this stock over time.
Right now this feature is very novel (and it lacks state), but if you think about a more robust version of this that attempts to relate the number of articles to other times series, such as stock prices or trading volume, it starts to get much more interesting.
In terms of context, context allows a user to easily understand how one data point fits into a broader picture. For example, at Vast.com, if you search for a specific used car, say an Audi S4, on the left hand side of the page you not only see all of the Audi S4’s for sale, but also the median mileage and median price of all those cars. This metadata (the medians ) is not found in the search results, but produced by analyzing the aggregate results.
Now imagine charting this metadata over time (via a persistent query that captures the state each time it runs) and you can easily envision a chart of the average asking price for a specific type of Audi S4 over time. (Kind of like what mpire is doing with its auction search today.) A marketing person at Audi (or its competitor) might be very interested to see how these values change over time or respond to price cuts for new cars or new model introductions. Similarly a car insurance company might be very interested in this data to assess replacement values. Kelly BlueBook might be interested in this data because it has the potential to seriously disrupt their business.
The Metadata’s the Limit
Things get even more interesting when you add in advanced text analysis techniques, such as natural language processing and entity extraction, to generate additional metadata. All this metadata can then be correlated with not only data from other searches but with traditional structured data. Want to understand if good or bad product reviews on blogs impact sales? Search away. Want to understand if a sharp spike in message board traffic is likely to impact a stock? Search away.
Of course, you won’t be able to simply type in a search to Google (or any other search engine for that matter) and find out the answers to such questions. You will likely get you answers from purpose built “search applications”. Who will build those applications? Mostly start-ups to begin with (many already are), but eventually established firms will also enter this space. Expect to see a heavy focus on financial services and marketing related applications early on, but the concept has applicability across just about everything. It will be very interesting to see just what is built and what succeeds.
Other Articles In This Blog By Topic: Blogs Collaboration Content Managment CRM Database Development Tools EAI ERP Internet Middleware Network Management Open Source Operating Systems Operations Management PLM RSS Security Software Stocks Supply Chain Venture Capital Wall Street Web Services Wireless
The thoughts and opinions on this blog are mine and mine alone and not affiliated in any way with Inductive Capital LP, San Andreas Capital LLC, or any other company I am involved with. Nothing written in this blog should be considered investment, tax, legal,financial or any other kind of advice. These writings, misinformed as they may be, are just my personal opinions.