Dr. Dobb's Digest August 2009
From e-science visualizations to Wall Street what-ifs, you know there's a problem when the talk turns to exabytes. But that problem isn't so much about too much data as it is about making sense of the data at hand. In other words, it's a question of data management -- the architectures, policies, practices, and procedures you have in place for managing and enhancing a company's data assets.
What really stands out are the vendors that are providing tools to manage and analyze what's referred to as "big data." There are the usual suspects: Oracle, IBM, Google, Amazon.com, and FairCom. And then there are upstarts, such as Cloudera and Aster Data Systems, that are leveraging open source software such as MapReduce and Hadoop to build new businesses around big data.
Many of the technologies available to manage big data aren't new. In one form or another, column-oriented databases, data parallelism, solid-state drives, declarative programming languages, and cloud computing have been around for years. What's new is the emergence of "fringe databases," or database management systems that are appearing where you least expect sophisticated data management. For example, medical and consumer devices that once got by with flat files now require powerful database engines to manage the sheer volume of data being collected.
None of this comes without a price. What with big data on the rise, transaction throughput and concurrency requirements escalating, and data becoming more distributed, application complexity is increasing. To make it easier to manipulate data, it may have to be partitioned across multiple files or replicated and synchronized across multiple sites. And, of course, software developers are looking at complex data schema paradigms to accommodate their needs while still maintaining traditional relational access.
Hey, no one said it was going to be easy.