Hadoop Hits the Wall of Reality

    

The big news in the big data ecosystem, as of late, is the recent merger between Cloudera and Hortonworks. As expected, there are many varied opinions on what the unification means for both the merged companies, as well as the future of Hadoop.

I attended the Strata + Hadoop World Conference in New York City in the fall of 2012 when Hadoop was still the hot, new technology capturing all of the hype and excitement in the data processing world. Presenters from all of the major Hadoop vendors raved about the bright future of the technology and predicted it would become the centerpiece of data processing for large enterprises.  Since then, we have seen a steady shake out of commercial Hadoop offerings as the technology has continued to endure strong headwinds preventing widespread adoption.

So, what does the news of Cloudera and Hortonworks mean for Hadoop?  A few things:

First, the impetus for the merger is survival.  Neither company is making money currently and being able to combine their customer bases, while reducing redundant headcount, should improve their outlook.

The commercial Hadoop market will be reduced to two firms with one dominant firm (Cloudera) and one niche one (MapR) in terms of overall customer adoption.  Cloudera, Hortonworks and MapR have all been trying to position themselves as something more than Hadoop vendors, focusing on data science, the Spark framework, data integration tools and databases.  I expect this to continue.   

I believe the marriage of Cloudera and Hortonworks will be a rough one in the beginning.  The companies have long had an antagonistic relationship in the marketplace, and both firms have tried to differentiate themselves by developing competing components in the Hadoop ecosystem.  One of the biggest strengths of the Hadoop architecture is that it’s a modular system that allows you to replace almost any part of the ‘standard’ stack with alternative tools.  This has also been one of its biggest weaknesses from a commercial standpoint. 

When it comes to individual parts of the Hadoop ecosystem, these are the main conflicts I see arising as the new firm rationalizes their software stack:

  • Admin tools: Cloudera Manager vs. Apache Ambari
  • SQL Engines: Cloudera Impala vs Apache Hive LLAP
  • Security: Apache Sentry vs. Apache Ranger
  • Data Integration Tools: Cloudera has had a very close relationship with Streamsets, while Hortonworks has sold Apache NiFi as Hortonworks Data Flow

Cloudera and Hortonworks have also had different philosophies, with Hortonworks pushing a 100% open source software model, while Cloudera has selectively held some critical components back as proprietary code, namely their Cloudera Manager admin tool.  Cloudera will become the dominant player in the combined firm and it will be interesting to see which of those philosophies will win, and what the fallout could be in terms of ex-Hortonworks customers not wanting to buy into Cloudera’s hybrid proprietary/open source approach to software.

The other interesting question for data professionals is – how does this news affect other parts of the big data world?

In short, not much.  Companies that have already invested in Hadoop will continue to use it but I don’t see the market for Hadoop expanding significantly going forward.  The software, while powerful, is still not all that easy to configure or use.  Limited skill sets for Hadoop has also played a major role in curtailing expansion of the market.

Hadoop was paradigm-breaking technology when it was created in the 2000s.  The idea of clustering commodity hardware with local storage and turning it into a single, expandable namespace for data and compute resources was revolutionary at the time.  And being able to work with petabytes of data in real time was unheard of before then.

Since Hadoop was introduced, the rise of public cloud providers like Amazon Web Services and Microsoft Azure has changed the game.  Now, not only can you create storage volumes of unlimited size and build compute clusters to access that data, but you can do it quickly and cost-effectively without requiring physical servers or even a data center. 

From an innovation standpoint, the big data tools that have made a splash in the marketplace in recent years were not developed at Cloudera nor Hortonworks.   They have created or bought interesting technologies like Kudu (cloudera) or NiFi (Hortonworks), but the projects that have really caught on, like Spark, were developed elsewhere. When I work with our customers to design new data platforms, I incorporate tools like Spark and Kafka alongside cloud-native storage and compute solutions more often than I recommend HDFS or Hive. 

In summary, I believe that the merger of Cloudera and Hortonworks will reap benefits for the combined company, but it’s also a signal that the commercial Hadoop software market is not strong enough for two major players to thrive.  Hadoop will survive, but it will continue to cede both market share and technical leadership to other players in the crowded data market. 

If you’re looking to build a modern data platform, let Antuit’s team of experienced technologists help you navigate today’s complex data landscape. Our architects combine deep experience in data systems design with the latest technologies to create a future-proofed architecture that enables your organization to become data-driven. 

Disclosure:  I used to work for a competing Hadoop vendor (Pivotal).  As of this writing I hold no financial positions in either Cloudera or Hortonworks and have no intention of doing so in the future.

See how Antuit can help you build an adaptive, modern framework that unlocks the hidden value in your data.

Learn more