I had some interesting meetings yesterday and as I reflected on them this morning, one common theme emerged which is that the next generation of the web will be built on data mining and extracting intelligence from the reams of data web services collect on a daily basis. This reminds me of a post I made in March of 2006 titled "The Next Generation Web – scaling and data mining will matter" where I mention:
I truly believe the next battleground will be based on scaling the back end and more importantly mining all of that clickstream data to offer a better service to users. Those that can do it cheaply and effectively will win. The tools are getting more sophisticated, the data sizes are growing exponentially, and companies don’t want to break the bank nor wait for Godot to deliver results.
My first meeting was with a well known research analyst covering Internet stocks. While we discussed the usual topics such as how the Internet was taking share from traditional advertising budgets and how the top brand advertisers have not really embraced the web yet, our most lively discussion centered around next generation advertising technology which all centered around increasingly complex forms of data analysis. To that end, I mentioned one of the fund’s portfolio companies, Peer39, which is using natural language processing and machine learning to create highly precise matching of commercial offers and user generated content. As you might guess, the secret sauce is the algorithms that the company has created.
Later in the day I had lunch with a friend who we had funded years ago. What was interesting to hear was how many of the future product lines that we discussed a few years ago were finally starting to emerge as real revenue drivers for the business today. Years ago the company’s first data center cost around $20mm and the latest one which has orders of magniture more customers cost only $3mm. Clearly, any data-driven opportunities a few years ago were cost prohibitive in the first place and too early for the customer to understand in the second place. That was the case because many businesses were just worried about not getting Amazoned and today they are all on the web thinking about how to drive better results. That is why our discussion led to a massive data warehousing project his company was working on to take all of that data across his huge customer base and to help them better monetize their sites.
What I love about these kinds of opportunities is that algorithms scale, have high gross margins, and are proprietary and defensible. The next generation web is not about what you click and see but what is happening behind the scenes every time you click on a page and move from site to site.
I couldn’t agree more; however, as a newer publisher what do you think are necessary forms of data besides “everything you can get”. How important are things aside from age, gender, and location such as income, hobbies, and interests?
I have to agree with Chris Allison, great blog by the way.
Britec – http://www.britec.org.uk