Global Manufacturer Averts Data Swamp With New Data Lake Architecture

Global-Manufacturer-Averts-Data-Swamp-with-New-Data-Lake-Architecture_Preview

Overview

A multi-billion dollar global manufacturer of electronic components, connectors and sensors wanted to enhance the value being derived from their extensive data. The company had launched a strategic initiative to utilize data as a strategic asset to transform the business The company had built a Hadoop data lake consisting of multiple disparate data sources of structured and unstructured data, yet they were unable to effectively leverage the data to create actionable business insights. The Hadoop implementation was also showing signs of performance issues. The organization turned to Antuit’s team of big data architects and engineers to improve the performance of their architecture and create a scalable platform for data consumption.

Designed and implemented a new data lake in Hadoop that was stable and scalable by reformatting data types, partitioning and compressing data, and using new file formats. The new data lake architecture is fast, reliable, and easily accessible to business decision makers.

Solution

Working collaboratively with the client, the Antuit team audited the existing process, and then re-engineered and implemented a robust scalable architecture. As a result of this process, Antuit uncovered a number of challenges. The client’s existing architecture could not handle the 10+ years of sales and marketing data. The existing systems did not scale, and therefore were not prepared to handle the velocity or volume of data expected in the future. Some power users within the organization were executing overwhelmingly complex queries that exceeded system limitations. Finally, the data systems themselves were housed and managed by disparate business units with minimal integration.

Mindful of the significant investment the client had made in its Hadoop architecture, Antuit recommended and then implemented a number of changes. The Antuit team helped the client restructure the data lake and created a better data process by partitioning and compressing data, using split-able file formats, and helping them to identify and use the right data types. To create a seamless experience, Antuit was able to leverage multiple test environments to validate approaches and identify ancillary technologies that, once integrated, would keep their data lake running smoothly and efficiently.

 

Results

While getting the data lake up and running was priority number one, changing internal behaviors and the manner in which queries were written was an equally important challenge. Antuit established a new set of guidelines for internal users, directing them as to how to retrieve desired data from the lake without bringing the entire system to a halt.

With the solid data architecture designed by Antuit in place, the organization now has accessible data at their fingertips. With a scalable data lake architecture and sales and marketing data models in place, the company is now able to utilized data as a strategic asset to help transform the way the business is run through advanced analytics. Antuit continues to work with the organization to build new analytical models that solve business problems extract value from their growing data sets.

Key steps to improve Hadoop performance:

  • Partitioned larger data by effective key
  • Implemented split-able file format like Sequence, Avro or RC file
  • Utilized data compress techniques like snappy, bzip2
  • Used proper data type in hive table 

Download the case study

Download now