In early 2006, I wrote a post titled "The next generation web, scaling and data mining will matter." In it, I highlighted some thoughts on the future:
I truly believe the next battleground will be based on scaling the back end and more importantly mining all of that clickstream data to offer a better service to users. Those that can do it cheaply and effectively will win. The tools are getting more sophisticated, the data sizes are growing exponentially, and companies don't want to break the bank nor wait for Godot to deliver results.
Ok, clickstream data and data mining sound kind of geeky, but with every one of our calls, clicks, and purchases being tracked and logged it is an important topic. My friend Scott Yara, co-founder and President of Greenplum (full disclosure-my fund is an investor), wrote an interesting post the other day which is more mainstream calling this ongoing revolution the "Your Data" revolution.
The point here is that data can be a wonderful thing if used the right way and if controlled by us. For example, Scott points out;
Your Data can lead you home with turn-by-turn directions on Mapquest. It can find you love by sorting through the profiles of 20 million other lonely hearts on eHarmony. It brings you up-to-the-second stock prices, sports scores, and flight delay alerts. It helps doctors fight diseases and engineers design safer cars. It gives environmentalists the power to track the movements of endangered animals and biologists the tools to map the structure of our genes.
Your Data, in short, is transforming everything.
However, with all of this data comes great responsibility and opportunity. As Scott points out:
We also need to make sure we can use all the information we're collecting. That means better schools that will turn out kids who are able to cope with the age of Your Data. And we need better, cheaper technologies to enable companies of all sizes, as well as organizations and individuals, to get all the information they want and do something useful with it.
Knowledge is power, and we know more than any previous generation could even conceive. We're moving into a world of infinite information. The challenge we face is turning all that information into insights, conclusions, and revelations — in other words, turning that knowledge into wisdom, without letting it be turned against us. We need to make sure Your Data doesn't oppress us, but serves us. And we need to do that fast, because the revolution is well underway.
From a VC and entrepreneurial perspective, what excites me is that we are just scratching the surface of what to do with all of this data and how to turn it into actionable, meaningful insight. In order to make data and insight more accessible to everyone we first need the back-end technology that makes data storage and analysis better, faster, cheaper (enter companies like Greenplum-ok, shameless plug 🙂 ). We then need great entrepreneurs to continue to build new services that help end users seamlessly and implicitly help everyone make better decisions, discover new things, and empower and motivate us to do more. In addition, we also need to consider cultural factors. For example while privacy still needs to be at the forefront of the Your Data revolution, we also need the ability and power to choose what we want to share and when with the world. Little did we know that four years ago, more people than ever would be willing to share their whereabouts through services like Loopt or Foursquare or Twitter and their every thought through Facebook or even their credit card purchasing data through new services like Blippy. It is clear that the once sacred walls between private and public information are increasingly disintegrating based on these cultural factors. While we clearly have to be careful not to extrapolate too much from early successes like Blippy and Foursquare, we also cannot underestimate the power of these cultural factors as once young start-ups like Facebook and Twitter have exploded in growth. The question is who will create the next great back-end technologies and new web services that drive a whole new conversation and new way of thinking about what we do with the data that is around everywhere.
I agree with your point that we have only scratched the surface. We are currently interested in the Semantic Web. As definitions of web and other types of content is blurred in cloud and SaaS applications that Semantic Web term can also reach back into the enterprise and documents, drawings, contracts, etc. The challenge to building enough statistical data (requires hundreds of millions of samples) is daunting and we see, like Google, that freemium services are the way to collect aggregate statistical information which can be leveraged, eventually, in AI applications which enable “smart content” which can find its own way to the user.
Exciting days!