Big data is a big deal. We are generating, capturing and managing new data from new sources with immense volume at unprecedented rates. Our virtualized environments, media streaming services, cloud-based infrastructures and distributed workforce want more from that data. Now. Historically, one of the major challenges for data scientists has been providing CPUs with data fast enough to reduce idle times and fully utilize these expensive resources. CPU idle time is not only inefficient, it's detrimental to getting real-time, actionable results. Whether your organization is using open source Hadoop®, privately held Aerospike, Cassandra, or other databases, attaining the benefits of real-time analytics requires faster storage than hard disk drives (HDDs).
The Hadoop Example
Micron’s test environment for Hadoop utilized one node running KVM to virtualize the servers running the NameNode, Secondary NameNode, Resource Manager, Zookeeper, Hive and the Ambari server. The datanodes were four servers. The network switch was a 48-port 10GbE switch running Cumulus Linux 3.4.2.
The Hadoop cluster software consisted of a HDP 3.0 Hive database on HDFS/YARN deployed on two separate four-node clusters. The two clusters differed only in that one cluster used a group of 15K SAS HDDs and the second cluster used the same HDD configuration plus a single Micron 9200MAX NVMe SSD added to each node with the YARN cache redirected to use the NVMe SSD.
To ensure true measurement of the storage I/O, the database size-to-memory ratio was targeted at about 2-to-1 (2TB of data with an aggregate cluster memory of 822GB available after operating system overhead). The results: a faster and more efficient Hadoop database.
Micron Manufacturing IT Hadoop Case Study
The great results in our testing led Micron IT to deploy Hadoop running on our own SSDs in our real-world storage cluster used for manufacturing analytics, resulting in much more efficient performance for minimal costs. Read the blog here.
- Accelerating the Apache Hadoop 3.1-based Ecosystem with Flash Storage
- Micron and Hortonworks Power Advanced Big Data Solutions
- Micron Apache Hadoop Analytics Spark Reference Architecture
- How to Give Your Big Data the Big Performance It Requires
- Micron Accelerated Solutions for Red Hat Ceph Storage
- AI Matters: Getting to the Heart of Data Intelligence with Memory and Storage