We are all enjoying the benefits that come with the commoditization of existing hardware and software infrastructure. It is true that it costs exponentially less to launch a business today versus five years ago. We are all smarter, broadband penetration is reaching critical mass, and open source and commodity hardware have become reliable alternatives to proprietary architectures and closed systems. As we all move forward with our web-based operations, it is clear that scaling the back-end infrastructure still remains a formidable challenge. There have been many an instance of popular services going down – remember Typepad, Salesforce.com, and del.icio.us as a few examples. With scaling the backend also comes a need to learn more about your users and their interactions. Data mining and analysis is becoming a big thing to not only help companies create better services but also to generate more revenue per user. In addition, for many web companies extreme data driven applications are the core of their services. Think about Zillow, Technorati, and services like Indeed which are dynamically driven services based on aggregating, crawling, and filtering millions of pieces of data. However, the fast growth of many a web-based operations combined with the need to mine the data leaves a big hole in the revolution of the cheap. Web-based operations need an open source way and cheaper option to scale their database needs, move to a data warehousing architecture without breaking the bank, and scale with user growth leveraging commodity infrastructure. Enter Greenplum (full disclosure-Greenplum is a portfolio company and I am on the board) which just released its GA product Bizgres MPP for data warehousing leveraging the best of the open source PostgreSQL database. We have been working on the code for the past 18 months, and I am quite proud of the team for having delivered the release. Greenplum is taking the best of the open source database PostgreSQL and rebuilding some of the core functions like the query optimization, execution, and interconnect. We are allowing anyone to build a shared nothing architecture ala Google to scale their backend to multiterabyte sized systems leveraging cheap hardware. It is free to run on a single machine but if you want to run a massively parallel option we charge a fee per CPU.
Dana Blankenhorn from ZDNet gets it:
This is a problem a lot of Web 2.0 start-ups like Technorati, Bloglines and Flickr are facing, and projects like Drupal will face soon. They were built with open source tools, but then find they need to "graduate" to something like a data warehouse. And there’s old Oracle, telling them there’s nothing from an open source supplier that can deliver what they need. Share with us, they say, you don’t have any choice.
Well, now there is a choice. Greenplum CTO Luke Lonergan said that O’Reilly Media, one of Greenplum’s early customers, graduated from mySQL to PostgreSQL with Greenplum and got a
100%100 times improvement in database access speed across a 500 Gigabyte database. Other Web 2.0 start-ups, and projects, can do the same thing."The price of conversion is where the pain is," said Yara, "but look at how fast some of these projects grow." While mySQL was smart in building on a lightweight Web base, more and more users and projects will find the need to graduate, and face proprietary FUD from major vendors saying they have to pay the "monopoly tax" in order to grow.
I truly believe the next battleground will be based on scaling the back end and more importantly mining all of that clickstream data to offer a better service to users. Those that can do it cheaply and effectively will win. The tools are getting more sophisticated, the data sizes are growing exponentially, and companies don’t want to break the bank nor wait for Godot to deliver results. Given these trends, I suggest downloading Greenplum’s Bizgres MPP and let me know what you think.