Classifying Today’s “Big Data Innovators”

Big-Data3Editor’s note: The piece below by   first appeared on the Hadapt blog and is republished with permission here. The framework presented provides insight into the very dynamic market around “Big Data Innovators” and should be of use for classifying many other firms in this interesting space. -bg

Recently InformationWeek published a piece, authored by Doug Henschen, that listed 13 innovative Big Data vendors. The complete list is reproduced below:

1.  MongoDB
2.  Amazon (Redshift, EMR, DynamoDB)
3.  Cloudera (CDH, Impala)
4.  Couchbase
5.  Datameer
6.  Datastax
7.  Hadapt
8.  Hortonworks
9.  Karmasphere
10.  MapR
11.  Neo Technology
12.  Platfora
13.  Splunk

These 13 vendors distribute 16 unique data management products (since both Amazon and Cloudera offer multiple distinct data management/processing systems), all of which push the boundary on Big Data management.

In this post I will attempt to subcategorize these 16 products into a competitive grouping, where products placed inside the same group can be considered replacements for each other (and hence are competitive), and each group is complementary to every other group.

Before starting this classification, I will remove three products that, while potentially being interesting from a Big Data perspective, are often used outside of what has become known as the “Big Data realm”, and therefore their primary competitors did not make it on the InformationWeek list. These three products are Splunk (which typically competes with companies focused on the security, compliance, and IT operations management verticals), Amazon Redshift (which typically completes with traditional MPP database vendors), and Neo Technology (which, although usually classified as a “NoSQL database”, its focus on graph data makes it highly unique from a technology and use case perspective relative to the other NoSQL databases on this list).

The remaining 13 products can be classified into four distinct groups:
1.  Operational data stores that allow flexible schemas
2.  Hadoop distributions
3.  Real-time Hadoop-based analytical platforms
4.  Hadoop-based BI solutions

Group 1 (operational data stores that allow flexible schemas)
This group is composed of database products that can be used to manage active data for dynamic applications with hard to define (or hard to predict) schemas. The database must be optimized for inserting, retrieving, updating, or deleting individual data items in real-time (latencies on the order of milliseconds), but should also support some sort of interface for performing analysis of the data stored within. The dynamic nature of the typical use case for databases in this group implies a NoSQL interface, and either a key-value or document-store retrieval model. From the InformationWeek list, MongoDB, DynamoDB, Couchbase, and Datastax all fit in this category. Although there are some significant technical differences between these products, they can nonetheless be roughly described as potential replacements for each other in Group 1 use cases.

Group 2 (Hadoop distributions)
The products in this group are designed for very different situations than Group 1. Hadoop is typically used for large scale data analysis and batch processing. Rather than inserting, retrieving, updating, or deleting individual data items, Hadoop is optimized for scanning through large swaths of data, processing and analyzing the data as it proceeds. Hadoop has become the poster-child for “Big Data” due to its proven massive scalability, and its ability to handle the “variety” aspect of Big Data (since Hadoop does not require data to fit neatly into rows and columns in order to be analyzed and processed). From the InformationWeek list, Cloudera, Hortonworks, MapR, and Amazon EMR all fit in this category.

Group 3 (real-time Hadoop-based analytical platforms)
Group 3 takes Hadoop to the next level, transforming it from a mere batch processing system to a full-fledged analytical platform that can answer queries in real-time. Furthermore, by adding a more robust SQL interface to Hadoop (in addition to industry-standard ODBC connectors), group 3 products help to hide the complexity of Hadoop and the need for Hadoop specialists, since traditional business intelligence and visualization tools are now able to interface directly with data stored inside Hadoop. From the InformationWeek list, Hadapt clearly fits in this category, and with certain caveats, so does Cloudera Impala (the caveats are that as of the time of writing this blog post (a) Impala is an extremely young codebase and is still only in beta (b) Impala only supports a small subset of SQL and does not support UDFs or other ways to combine structured and unstructured data in the same query, so calling it an “analytical platform” might be a bit of a stretch).

Group 4 (Hadoop-based BI solutions)
Often lumped together with group 3 products,  group 4 products are often confused as being competitive with group 3 products. However, just as business intelligence tools and analytical database solutions are highly complementary and were often packaged together in the pre-Hadoop world, the same is true in the Hadoop/Big Data world. Therefore, Datameer, Karmasphere, and Platfora, all of which function as a business intelligence layer above Hadoop, are capable of working closely with the group 3 products (with announcements along these lines already starting to begin).

In conclusion, although “Big Data” is an enormous and rapidly growing market, one single data management software product is not going to rule the market. Rather, there are four major groups of data management solutions within the Big Data space; and while there is fierce competition within each group, at the macro level these groups can not only co-exist, but are highly complementary. In the long run, it is likely that the 2-3 leaders in each group will emerge and share the Big Data pie.

Sign up for your free CTOvision Pro trial today for unique insights, exclusive content and special reporting.

CTOvision Pro Special Technology Assessments

We produce special technology reviews continuously updated for CTOvision Pro members. Categories we cover include:

  • Analytical Tools - With a special focus on technologies that can make dramatic positive improvements for enterprise analysts.
  • Big Data - We cover the technologies that help organizations deal with massive quantities of data.
  • Cloud Computing - We curate information on the technologies enabling enterprise use of the cloud.
  • Communications - Advances in communications are revolutionizing how data gets moved.
  • GreenIT - A great and virtuous reason to modernize!
  • Infrastructure  - Modernizing Infrastructure can have dramatic benefits on functionality while reducing operating costs.
  • Mobile - This revolution is empowering the workforce in ways few of us ever dreamed of.
  • Security  -  There are real needs for enhancements to security systems.
  • Visualization  - Connecting computers with humans.
  • Hot Technologies - Firms we believe warrant special attention.

 

Recent Research

Request Your Invite to the 20 May 2014 Andreessen Horowitz Fed Forum in DC

Amazon Hopeful that Fire TV will Spread

What The Enterprise IT Professional Needs To Know About Git and GitHub

3D Printing… At Home?

Tech Firms Seeking To Serve Federal Missions: Here is how to follow the money

Creating The New Cyber Warrior: Eight South Carolina Universities Compete

Mobile Gamers: Fun-Seeking but Fickle

Update from DIA CTO, CIO and Chief Engineer on ICITE and Enterprise Apps

Pew Report: Increasing Technology Use among Seniors

Finding The Elusive Data Scientist In The Federal Space

DoD Public And Private Cloud Mandates: And insights from a deployed communications professional on why it matters

Intel CEO Brian Krzanich and Cloudera CSO Mike Olson on Intel and Cloudera’s Technology Collaboration

solid
  • http://twitter.com/ctovision/status/284700404086882304/ @ctovision

    Classifying Today’s “Big Data Innovators” http://t.co/FDL5jqy1 #bigdata #cto #apachehadoop #bigdata #cloudera

  • Pingback: Classifying Today’s “Big Data Innovators”

  • http://twitter.com/netspective/status/284716506808459264/ @netspective

    Classifying Today’s “Big Data Innovators” http://t.co/hKsSvv9d

  • http://twitter.com/PagnatoKarp/status/284720840380121088/ @PagnatoKarp

    RT @bobgourley: Classifying Today’s “Big Data Innovators” http://t.co/qWvCiekn via @ctovision

  • Pingback: CTOvision Big Data Reporting for 2012: CTOs want discipline in the language of sensemaking

  • http://twitter.com/MFDVbrokers/status/284745898746068993/ @MFDVbrokers

    Classifying Today’s “Big Data Innovators” http://t.co/WlTDbr2K

  • http://twitter.com/Libel/status/284797593689661440/ @Libel

    Classifying Today’s “Big Data Innovators” http://t.co/MHDiAcX9

  • http://twitter.com/CosimoAccoto/status/285036620321800192/ @CosimoAccoto

    Classifying Today’s “Big Data Innovators” http://t.co/gC8B7JzS via @ctovision

  • http://twitter.com/KirkDBorne/status/285048159820992512/ @KirkDBorne

    RT @bobgourley: I want to try this for every #BigData firm: Classifying Today’s “Big Data Innovators” http://t.co/iy8GVvjg #bigdata #hadoop

  • http://gravatar.com/cutlass2011 cutlass2011

    Missed off MarkLogic which is minimally in group 1, if not in a bunch of other groups …

  • http://twitter.com/ricmcc/status/285070322833707008/ @ricmcc

    Classifying Today’s “Big Data Innovators” http://t.co/uUAjReN5

  • http://twitter.com/pivotic/status/286151819120492544/ @pivotic

    Classifying Today’s “Big Data Innovators”: http://t.co/6AuLDfur via @ctovision #bigdata #analytics

  • http://twitter.com/FYeomans/status/286263954202828800/ @FYeomans

    Classifying Today’s “Big Data Innovators” http://t.co/ngQ4i3QK

  • http://twitter.com/parkseungkyu/status/286284657543217152/ @parkseungkyu

    Classifying Today’s “Big Data Innovators” http://t.co/oHbJjwTw

  • http://twitter.com/hnfirehose/status/286315372049743872/ @hnfirehose

    Method for Classifying Today’s “Big Data Innovators”: http://t.co/18Ff8bZj