Sale!

Spark Big Data Cluster Computing in Production 1st edition by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York ISBN 1119254019 9781119254010

Name: Spark Big Data Cluster Computing in Production 1st edition by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York ISBN 1119254019 9781119254010
SKU: EB_108126
Availability: InStock

Original price was: $70.00.Current price is: $35.00.

Instant download Spark Big Data Cluster Computing in Production 1119254019 after payment

SKU: EB_108126 Category: Ebooks Tags: Brennon, Cluster Computing, Ema, Ilya, Kai, Spark

Description

Spark Big Data Cluster Computing in Production 1st edition by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York – Ebook PDF Instant Download/Delivery: 1119254019, 9781119254010
Full download Spark Big Data Cluster Computing in Production 1st edition after payment

Product details:

ISBN 10: 1119254019
ISBN 13: 9781119254010
Author: Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York

Production-targeted Spark guidance with real-world use cases

Spark: Big Data Cluster Computing in Production goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big-data clustering in production. Written by an expert team well-known in the big data community, this book walks you through the challenges in moving from proof-of-concept or demo Spark applications to live Spark in production. Real use cases provide deep insight into common problems, limitations, challenges, and opportunities, while expert tips and tricks help you get the most out of Spark performance. Coverage includes Spark SQL, Tachyon, Kerberos, ML Lib, YARN, and Mesos, with clear, actionable guidance on resource scheduling, db connectors, streaming, security, and much more.

Spark has become the tool of choice for many Big Data problems, with more active contributors than any other Apache Software project. General introductory books abound, but this book is the first to provide deep insight and real-world advice on using Spark in production. Specific guidance, expert tips, and invaluable foresight make this guide an incredibly useful resource for real production settings.

Review Spark hardware requirements and estimate cluster size
Gain insight from real-world production use cases
Tighten security, schedule resources, and fine-tune performance
Overcome common problems encountered using Spark in production

Spark works with other big data tools including MapReduce and Hadoop, and uses languages you already know like Java, Scala, Python, and R. Lightning speed makes Spark too good to pass up, but understanding limitations and challenges in advance goes a long way toward easing actual production implementation. Spark: Big Data Cluster Computing in Production tells you everything you need to know, with real-world production insight and expert guidance, tips, and tricks.

Spark Big Data Cluster Computing in Production 1st Table of contents:

Introduction

Chapter 1 Finishing Your Spark Job

Installation of the Necessary Components

Native Installation Using a Spark Standalone Cluster

The History of Distributed Computing That Led to Spark

Enter the Cloud

Understanding Resource Management

Using Various Formats for Storage

Text Files

Sequence Files

Avro Files

Parquet Files

Making Sense of Monitoring and Instrumentation

Spark UI

Spark Standalone UI

Metrics REST API

Metrics System

External Monitoring Tools

Summary

Chapter 2 Cluster Management

Background

Spark Components

Driver

Workers and Executors

Configuration

Spark Standalone

Architecture

Single-Node Setup Scenario

Multi-Node Setup

YARN

Architecture

Dynamic Resource Allocation

Scenario

Mesos

Setup

Architecture

Dynamic Resource Allocation

Basic Setup Scenario

Comparison

Summary

Chapter 3 Performance Tuning

Spark Execution Model

Partitioning

Controlling Parallelism

Partitioners

Shuffling Data

Shuffling and Data Partitioning

Operators and Shuffling

Shuffling Is Not That Bad After All

Serialization

Kryo Registrators

Spark Cache

Spark SQL Cache

Memory Management

Garbage Collection

Shared Variables

Broadcast Variables

Accumulators

Data Locality

Summary

Chapter 4 Security

Architecture

Security Manager

Setup Configurations

ACL

Configuration

Job Submission

Web UI

Network Security

Encryption

Event logging

Kerberos

Apache Sentry

Summary

Chapter 5 Fault Tolerance or Job Execution

Lifecycle of a Spark Job

Spark Master

Spark Driver

Spark Worker

Job Lifecycle

Job Scheduling

Scheduling within an Application

Scheduling with External Utilities

Fault Tolerance

Internal and External Fault Tolerance

Service Level Agreements (SLAs)

Resilient Distributed Datasets (RDDs)

Batch versus Streaming

Testing Strategies

Recommended Configurations

Summary

Chapter 6 Beyond Spark

Data Warehousing

Spark SQL CLI

Thrift JDBC/ODBC Server

Hive on Spark

Machine Learning

DataFrame

MLlib and ML

Mahout on Spark

Hivemall on Spark

External Frameworks

Spark Package

XGBoost

spark-jobserver

Future Works

Integration with the Parameter Server

Deep Learning

Enterprise Usage

Collecting User Activity Log with Spark and Kafka

Real-Time Recommendation with Spark

Real-Time Categorization of Twitter Bots

Summary

Index

EULA

People also search for Spark Big Data Cluster Computing in Production 1st :

spark cluster explained

spark cluster computing with working sets

spark big data example

spark cluster capacity planning

largest spark cluster

Tags: Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York, Big Data, Cluster Computing

Spark Big Data Cluster Computing in Production 1st edition by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York ISBN 1119254019 9781119254010

Spark Big Data Cluster Computing in Production 1st edition by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York – Ebook PDF Instant Download/Delivery: 1119254019, 9781119254010Full download Spark Big Data Cluster Computing in Production 1st edition after payment

Product details:

Spark Big Data Cluster Computing in Production 1st Table of contents:

People also search for Spark Big Data Cluster Computing in Production 1st :

Login

Spark Big Data Cluster Computing in Production 1st edition by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York – Ebook PDF Instant Download/Delivery: 1119254019, 9781119254010
Full download Spark Big Data Cluster Computing in Production 1st edition after payment