Enterprise Data Warehouse Optimization with Hadoop on IBM Power Systems Servers Enterprise Data Warehouse Optimization with Hadoop on IBM Power Systems Servers

Enterprise Data Warehouse Optimization with Hadoop on IBM Power Systems Servers

Scott Vetter and Others

Publisher Description

Data warehouses were developed for many good reasons, such as providing quick query and reporting for business operations, and business performance. However, over the years, due to the explosion of applications and data volume, many existing data warehouses have become difficult to manage. Extract, Transform, and Load (ETL) processes are taking longer, missing their allocated batch windows. In addition, data types that are required for business analysis have expanded from structured data to unstructured data.

The Apache open source Hadoop platform provides a great alternative for solving these problems.

IBM® has committed to open source since the early years of open Linux. IBM and Hortonworks together are committed to Apache open source software more than any other company.

IBM Power Systems™ servers are built with open technologies and are designed for mission-critical data applications. Power Systems servers use technology from the OpenPOWER Foundation, an open technology infrastructure that uses the IBM POWER® architecture to help meet the evolving needs of big data applications. The combination of Power Systems with Hortonworks Data Platform (HDP) provides users with a highly efficient platform that provides leadership performance for big data workloads such as Hadoop and Spark.

This IBM Redpaper™ publication provides details about Enterprise Data Warehouse (EDW) optimization with Hadoop on Power Systems. Many people know Power Systems from the IBM AIX® platform, but might not be familiar with IBM PowerLinux™, so part of this paper provides a Power Systems overview. A quick introduction to Hadoop is provided for those not familiar with the topic. Details of HDP on Power Reference architecture are included that will help both software architects and infrastructure architects understand the design.

In the optimization chapter, we describe various topics: traditional EDW offload, sizing guidelines, performance tuning, IBM Elastic Storage™ Server (ESS) for data-intensive workload, IBM Big SQL as the common structured query language (SQL) engine for Hadoop platform, and tools that are available on Power Systems that are related to EDW optimization. We also dedicate some pages to the analytics components (IBM Data Science Experience (IBM DSX) and IBM Spectrum™ Conductor for Spark workload) for the Hadoop infrastructure.

GENRE
Computers & Internet
RELEASED
2018
January 31
LANGUAGE
EN
English
LENGTH
82
Pages
PUBLISHER
IBM Redbooks
SELLER
International Business Machines Corp
SIZE
1.3
MB

More Books Like This

AI and Big Data on IBM Power Systems Servers AI and Big Data on IBM Power Systems Servers
2019
SAP Business Suite on IBM X6 Systems: Reference Architecture SAP Business Suite on IBM X6 Systems: Reference Architecture
2014
IBM Technical Computing Clouds IBM Technical Computing Clouds
2013
IBM Data Engine for Hadoop and Spark IBM Data Engine for Hadoop and Spark
2016
Implementing IBM InfoSphere BigInsights on IBM System x Implementing IBM InfoSphere BigInsights on IBM System x
2013
IBM Private, Public, and Hybrid Cloud Storage Solutions IBM Private, Public, and Hybrid Cloud Storage Solutions
2016

More Books by Scott Vetter, Helen Lu & Maciej Olejniczak

IBM PowerVM Virtualization Introduction and Configuration IBM PowerVM Virtualization Introduction and Configuration
2017
IBM Power Systems S814 and S824 Technical Overview and Introduction IBM Power Systems S814 and S824 Technical Overview and Introduction
2017
Integrated Virtualization Manager for IBM Power Systems Servers Integrated Virtualization Manager for IBM Power Systems Servers
2016
AI and Big Data on IBM Power Systems Servers AI and Big Data on IBM Power Systems Servers
2019
IBM Power Systems E870 and E880 Technical Overview and Introduction IBM Power Systems E870 and E880 Technical Overview and Introduction
2017
IBM Power System S822 Technical Overview and Introduction IBM Power System S822 Technical Overview and Introduction
2020

Customers Also Bought

IBM Data Engine for Hadoop and Spark IBM Data Engine for Hadoop and Spark
2016
IBM Software Defined Infrastructure for Big Data Analytics Workloads IBM Software Defined Infrastructure for Big Data Analytics Workloads
2015
The Complete Review Of Data Warehousing and Big Data From OpenWorld 2018 The Complete Review Of Data Warehousing and Big Data From OpenWorld 2018
2018
Data Warehousing and Big Data #OOW16 Data Warehousing and Big Data #OOW16
2016
Apache Spark Implementation on IBM z/OS Apache Spark Implementation on IBM z/OS
2016
Review of Data Warehousing and Big Data At #OOW16 Review of Data Warehousing and Big Data At #OOW16
2016