Big Data

Performance Benchmarking

Hadoop evolved as a distributed software platform for managing and transforming large quantities of data, and has grown to be one of the most popular tools to meet many of the above needs in a cost-effective manner. By abstracting away many of the high availability (HA) and distributed programming issues, Hadoop allows developers to focus on higher-level algorithms. Hadoop is designed to run on a large cluster of commodity servers and to scale to hundreds or thousands of nodes. Each disk, server, network link, and even rack within the cluster is assumed to be unreliable. This assumption allows the use of the least expensive cluster components consistent with delivering sufficient performance, including the use of unprotected local storage (JBODs). Hadoop’s design and ability to handle large amounts of data efficiently make it a natural fit as an integration, data transformation, and analytics platform.

Hadoop use cases include:

Customizing content for users: Creating a better user experience through targeted and relevant ads, personalized home pages, and good recommendations.

  • Supply chain management: Examining all available historical data enables better decisions for stocking and managing goods. Among the many sectors in this category are retail, agriculture, hospitals, and energy.
  • Fraud analysis: Analyzing transaction histories in the financial sector (for example, for credit cards and ATMs) to detect fraud. Bioinformatics: Applying genome analytics and DNA sequencing algorithms to large datasets. Beyond analytics: Transforming data from one form to another, including adding structure to unstructured data in log files before combining them with other structured data.
  • Miscellaneous uses: Aggregating large sets of images from various sources and combining them into one (for example, satellite images), moving large amounts of data from one location to another.

2015 AgileBudi | All rights reserved