Sale!

Conquering Big Data with High Performance Computing 1st editon by Ritu Arora 3319337424 9783319337425

Name: Conquering Big Data with High Performance Computing 1st editon by Ritu Arora 3319337424 9783319337425
SKU: EB_107664
Availability: InStock

Original price was: $70.00.Current price is: $35.00.

Instant download Conquering Big Data with High Performance Computing 3319337408 after payment

SKU: EB_107664 Category: Ebooks

Description

Conquering Big Data with High Performance Computing 1st editon by Ritu Arora – Ebook PDF Instant Download/Delivery: 3319337424, 9783319337425
Full dowload Conquering Big Data with High Performance Computing 1st editon after payment

Product details:

ISBN 10: 3319337424
ISBN 13: 9783319337425
Author: Ritu Arora

This book provides an overview of the resources and research projects that are bringing Big Data and High Performance Computing (HPC) on converging tracks. It demystifies Big Data and HPC for the reader by covering the primary resources, middleware, applications, and tools that enable the usage of HPC platforms for Big Data management and processing. Through interesting use-cases from traditional and non-traditional HPC domains, the book highlights the most critical challenges related to Big Data processing and management, and shows ways to mitigate them using HPC resources. Unlike most books on Big Data, it covers a variety of alternatives to Hadoop, and explains the differences between HPC platforms and Hadoop. Written by professionals and researchers in a range of departments and fields, this book is designed for anyone studying Big Data and its future directions. Those studying HPC will also find the content valuable.

Conquering Big Data with High Performance Computing 1st Table of contents:

1 An Introduction to Big Data, High Performance Computing, High-Throughput Computing, and Hadoop

1.1 Big Data

1.2 High Performance Computing (HPC)

1.2.1 HPC Platform

1.2.2 Serial and Parallel Processing on HPC Platform

1.3 High-Throughput Computing (HTC)

1.4 Hadoop

1.4.1 Hadoop-Related Technologies

1.4.2 Some Limitations of Hadoop and Hadoop-Related Technologies

1.5 Convergence of Big Data, HPC, HTC, and Hadoop

1.6 HPC and Big Data Processing in Cloud and at Open-Science Data Centers

1.7 Conclusion

References

2 Using High Performance Computing for Conquering Big Data

2.1 Introduction

2.2 The Big Data Life Cycle

2.3 Technologies and Hardware Platforms for Managing the Big Data Life Cycle

2.4 Managing Big Data Life Cycle on HPC Platforms at Open-Science Data Centers

2.4.1 TACC Resources and Usage Policies

2.4.2 End-to-End Big Data Life Cycle on TACC Resources

2.5 Use Case: Optimization of Nuclear Fusion Devices

2.5.1 Optimization

2.5.2 Computation on HPC

2.5.3 Visualization Using GPUs

2.5.4 Permanent Storage of Valuable Data

2.6 Conclusions

References

3 Data Movement in Data-Intensive High Performance Computing

3.1 Introduction

3.2 Node-Level Data Movement

3.2.1 Case Study: ADAMANT

3.2.2 Case Study: Energy Cost of Data Movement

3.3 System-Level Data Movement

3.3.1 Case Study: Graphs

3.3.2 Case Study: Map Reduce

3.4 Center-Level Data Movement

3.4.1 Case Study: Spider

3.4.2 Case Study: Gordon and Oasis

3.5 About the Authors

References

4 Using Managed High Performance Computing Systems for High-Throughput Computing

4.1 Introduction

4.2 What Are We Trying to Do?

4.2.1 Deductive Computation

4.2.2 Inductive Computation

4.2.2.1 High-Throughput Computing

4.3 Hurdles to Using HPC Systems for HTC

4.3.1 Runtime Limits

4.3.2 Jobs-in-Queue Limits

4.3.3 Dynamic Job Submission Restrictions

4.3.4 Solutions from Resource Managers and Big Data Research

4.3.5 A Better Solution for Managed HPC Systems

4.4 Launcher

4.4.1 How Launcher Works

4.4.2 Guided Example: A Simple Launcher Bundle

4.4.2.1 Step 1: Create a Job File

4.4.2.2 Step 2: Build a SLURM Batch Script

4.4.3 Using Various Scheduling Methods

4.4.3.1 Dynamic Scheduling

4.4.3.2 Static Scheduling

4.4.4 Launcher with Intel®Xeon Phi™ Coprocessors

4.4.4.1 Offload

4.4.4.2 Independent Workloads for Host and Coprocessor

4.4.4.3 Symmetric Execution on Host and Phi

4.4.5 Use Case: Molecular Docking and Virtual Screening

4.5 Conclusion

References

5 Accelerating Big Data Processing on Modern HPC Clusters

5.1 Introduction

5.2 Overview of Apache Hadoop and Spark

5.2.1 Overview of Apache Hadoop Distributed File System

5.2.2 Overview of Apache Hadoop MapReduce

5.2.3 Overview of Apache Spark

5.3 Overview of High-Performance Interconnects and Storage Architecture on Modern HPC Clusters

5.3.1 Overview of High-Performance Interconnects and Protocols

5.3.1.1 Overview of High Speed Ethernet

5.3.1.2 Overview of InfiniBand

5.3.2 Overview of High-Performance Storage

5.4 Challenges in Accelerating Big Data Processing on Modern HPC Clusters

5.5 Case Studies of Accelerating Big Data Processing on Modern HPC Clusters

5.5.1 Accelerating HDFS with RDMA

5.5.2 Accelerating HDFS with Heterogeneous Storage

5.5.3 Accelerating HDFS with Lustre Through Key-Value Store-Based Burst Buffer System

5.5.4 Accelerating Hadoop MapReduce with RDMA

5.5.5 Accelerating MapReduce with Lustre

5.5.6 Accelerating Apache Spark with RDMA

5.6 High-Performance Big Data (HiBD) Project

5.7 Conclusion

References

6 dispel4py: Agility and Scalability for Data-Intensive Methods Using HPC

6.1 Introduction

6.2 Motivation

6.2.1 Supporting Domain Specialists

6.2.2 Supporting Data Scientists

6.2.3 Supporting Data-Intensive Engineers

6.2.4 Communication Between Experts

6.3 Background and Related Work

6.4 Semantics, Examples and Tutorial

6.5 dispel4py Tools

6.5.1 Registry

6.5.2 Provenance Management

6.5.3 Diagnosis Tool

6.6 Engineering Effective Mappings

6.6.1 Apache Storm

6.6.2 MPI

6.6.3 Multiprocessing

6.6.4 Spark

6.6.5 Sequential Mode

6.7 Performance

6.7.1 Experiments

6.7.2 Experimental Results

6.7.2.1 Scalability Experiments

6.7.2.2 Performance Experiments

6.7.3 Analysis of Measurements

6.8 Summary and Future Work

References

7 Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

7.1 Introduction

7.2 Related Work

7.3 Design and Implementation

7.4 Case Study—PTF Application

7.4.1 PTF Application

7.4.2 Execution Time Analysis

7.4.3 Data Dependency Performance Analysis

7.4.3.1 Analysis of Saved Objects

7.4.3.2 Analysis of Galactic Latitude

7.5 Case Study: Job Log Analysis

7.5.1 Job Logs

7.5.2 Test Setup

7.5.3 Job Log Analysis

7.5.4 Clustering Analysis

7.6 Conclusion

References

8 Big Data Behind Big Data

8.1 Background and Goals of the Project

8.1.1 The Many Faces of Data

8.1.2 Data Variety and Location

8.1.3 The Different Consumers of the Data

8.2 What Big Data Did We Have?

8.2.1 Collected Data

8.2.2 Data-in-Flight

8.2.3 Data-at-Rest

8.2.4 Data-in-growth

8.2.5 Event Data

8.2.6 Data Types to Collect

8.3 The Old Method Prompts a New Solution

8.3.1 Environmental Data

8.3.2 Host Based Data

8.3.3 Refinement of the Goal

8.4 Out with the Old, in with the New Design

8.4.1 Elastic

8.4.2 Data Collection

8.4.2.1 Collectd

8.4.2.2 Custom Scripts

8.4.2.3 Filebeats

8.4.3 Data Transport Components

8.4.3.1 RabbitMQ®

8.4.3.2 Logstash

8.4.4 Data Storage

8.4.4.1 Elasticsearch

8.4.5 Visualization and Analysis

8.4.5.1 Kibana

8.4.6 Future Growth and Enhancements

8.5 Data Collected

8.5.1 Environmental

8.5.2 Computational

8.5.3 Event

8.6 The Analytics of It All: It Just Works!

8.7 Conclusion

References

9 Empowering R with High Performance Computing Resources for Big Data Analytics

9.1 Introduction

9.1.1 Introduction of R

9.1.2 Motivation of Empowering R with HPC

9.2 Opportunities in High Performance Computing to Empower R

9.2.1 Parallel Computation Within a Single Compute Node

9.2.2 Multi-Node Parallelism Support

9.3 Support for Parallelism in R

9.3.1 Support for Parallel Execution Within a Single Node in R

9.3.2 Support for Parallel Execution Over Multiple Nodes with MPI

9.3.3 Packages Utilizing Other Distributed Systems

9.4 Parallel Performance Comparison of Selected Packages

9.4.1 Performance of Using Intel® Xeon Phi Coprocessor

9.4.1.1 Testing Workloads

9.4.1.2 System Specification

9.4.1.3 Results and Discussion

9.4.2 Comparison of Parallel Packages in R

9.5 Use Case Examples

9.5.1 Enabling JAGS (Just Another Gibbs Sampler) on Multiple Nodes

9.5.2 Exemplar Application Using Coprocessors

9.6 Discussions and Conclusion

References

10 Big Data Techniques as a Solution to Theory Problems

10.1 Introduction

10.2 General Formulation of Big Data Solution Method

10.2.1 General Formulation of Class of Models and Solution Method

10.2.2 Computational Steps to Big Data Solution Method

10.2.3 Virtues of Equidistributed Sequences

10.3 Optimal Tax Application

10.4 Other Applications

10.5 Conclusion

References

11 High-Frequency Financial Statistics Through High-Performance Computing

11.1 Introduction

11.2 Large Portfolio Allocation for High-Frequency Financial Data

11.2.1 Background

11.2.2 Our Methods

11.3 Parallelism Considerations

11.3.1 Parallel R

11.3.2 Intel”472 MKL

11.3.3 Offloading to Phi Coprocessor

11.3.4 Our Computing Environment

11.4 Numerical Studies

11.4.1 Portfolio Optimization with High-Frequency Data

11.4.1.1 LASSO Approximation for Risk Minimization Problem

11.4.1.2 Parallelization

11.4.2 Bayesian Large-Scale Multiple Testing for Time Series Data

11.4.2.1 Hidden Markov Model and Multiple Hypothesis Testing

11.4.2.2 Parallelization

11.5 Discussion and Conclusions

References

12 Large-Scale Multi-Modal Data Exploration with Human in the Loop

12.1 Background

12.2 Details of Implementation Models

12.2.1 Developing Top-Down Knowledge Hypotheses From Visual Analysis of Multi-Modal Data Streams

12.2.1.1 Color-Based Representation of Temporal Events

12.2.1.2 Generating Logical Conjunctions

12.2.1.3 Developing Hypotheses From Visual Analysis

12.2.2 Complementing Hypotheses with Bottom-Up Quantitative Measures

12.2.2.1 Clustering in 2D Event Space

12.2.2.2 Apriori-Like Pattern Searching

12.2.2.3 Integrating Bottom-Up Machine Computation and Top-Down Domain Knowledge

12.2.3 Large-Scale Multi-Modal Data Analytics with Iterative MapReduce Tasks

12.2.3.1 Parallelization Choices

12.2.3.2 Parallel Temporal Pattern Mining Using Twister MapReduce Tasks

12.3 Preliminary Results

12.4 Conclusion and Future Work

References

13 Using High Performance Computing for Detecting Duplicate, Similar and Related Images in a Large D

13.1 Introduction

13.2 Challenges in Using Existing Solutions

13.3 New Solution for Large-Scale Image Comparison

13.3.1 Pre-processing Stage

13.3.2 Processing Stage

13.3.3 Post-processing Stage

13.4 Test Collection from the Institute of Classical Archaeology (ICA)

13.5 Testing the Solution on Stampede: Early Results and Current Limitations

13.6 Future Work

13.7 Conclusion

References

14 Big Data Processing in the eDiscovery Domain

14.1 Introduction to eDiscovery

14.2 Big Data Challenges in eDiscovery

14.3 Key Techniques Used to Process Big Data in eDiscovery

14.3.1 Culling to Reduce Dataset Size

14.3.2 Metadata Extraction

14.3.3 Dataset Partitioning and Archival

14.3.4 Sampling and Workload Profiling

14.3.5 Multi-Pass (Iterative) Processing and Interactive Analysis

14.3.6 Search and Review Methods

14.3.7 Visual Analytics

14.3.8 Software Refactoring and Parallelization

14.4 Limitations of Existing eDiscovery Solutions

14.5 Using HPC for eDiscovery

14.5.1 Data Collection and Data Ingestion

14.5.2 Pre-processing

14.5.3 Processing

14.5.4 Review and Analysis

14.5.5 Archival

14.6 Accelerating the Rate of eDiscovery Using HPC: A Case Study

14.7 Conclusions and Future Direction

References

15 Databases and High Performance Computing

15.1 Introduction

15.2 Databases on Supercomputing Resources

15.2.1 Relational Databases

15.2.2 NoSQL or Non-relational and Hadoop Databases

15.2.3 Graph Databases

15.2.4 Scientific and Specialized Databases

15.3 Installing a Database on a Supercomputing Resource

15.4 Accessing a Database on Supercomputing Resources

15.5 Optimizing Database Access on Supercomputing Resources

15.6 Examples of Applications Using Databases on Supercomputing Resources

15.7 Conclusion

References

16 Conquering Big Data Through the Usage of the Wrangler Supercomputer

16.1 Introduction

16.1.1 Wrangler System Overview

16.1.2 A New User Community for Supercomputers

16.2 First Use-Case: Evolution of Monogamy

16.3 Second Use-Case: Save Money, Save Energy with Supercomputers

16.4 Third Use-Case: Human Origins in Fossil Data

16.5 Fourth Use-Case: Dark Energy of a Million Galaxies

16.6 Conclusion

References

People also search for Conquering Big Data with High Performance Computing 1st:

high-performance big data computing pdf

a large-scale study of failures in high-performance computing systems

high performance computing book

high performance computing vs quantum computing

Conquering Big Data with High Performance Computing 1st editon by Ritu Arora 3319337424 9783319337425

Conquering Big Data with High Performance Computing 1st editon by Ritu Arora – Ebook PDF Instant Download/Delivery: 3319337424, 9783319337425Full dowload Conquering Big Data with High Performance Computing 1st editon after payment

Product details:

Conquering Big Data with High Performance Computing 1st Table of contents:

People also search for Conquering Big Data with High Performance Computing 1st:

Login

Conquering Big Data with High Performance Computing 1st editon by Ritu Arora – Ebook PDF Instant Download/Delivery: 3319337424, 9783319337425
Full dowload Conquering Big Data with High Performance Computing 1st editon after payment