Spark Big Data Cluster Computing in Production 1st edition by Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York – Ebook PDF Instant Download/Delivery: 1119254019, 9781119254010
Full download Spark Big Data Cluster Computing in Production 1st edition after payment

Product details:
ISBN 10: 1119254019
ISBN 13: 9781119254010
Author: Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York
Production-targeted Spark guidance with real-world use cases
Spark: Big Data Cluster Computing in Production goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big-data clustering in production. Written by an expert team well-known in the big data community, this book walks you through the challenges in moving from proof-of-concept or demo Spark applications to live Spark in production. Real use cases provide deep insight into common problems, limitations, challenges, and opportunities, while expert tips and tricks help you get the most out of Spark performance. Coverage includes Spark SQL, Tachyon, Kerberos, ML Lib, YARN, and Mesos, with clear, actionable guidance on resource scheduling, db connectors, streaming, security, and much more.
Spark has become the tool of choice for many Big Data problems, with more active contributors than any other Apache Software project. General introductory books abound, but this book is the first to provide deep insight and real-world advice on using Spark in production. Specific guidance, expert tips, and invaluable foresight make this guide an incredibly useful resource for real production settings.
- Review Spark hardware requirements and estimate cluster size
- Gain insight from real-world production use cases
- Tighten security, schedule resources, and fine-tune performance
- Overcome common problems encountered using Spark in production
Spark works with other big data tools including MapReduce and Hadoop, and uses languages you already know like Java, Scala, Python, and R. Lightning speed makes Spark too good to pass up, but understanding limitations and challenges in advance goes a long way toward easing actual production implementation. Spark: Big Data Cluster Computing in Production tells you everything you need to know, with real-world production insight and expert guidance, tips, and tricks.
Spark Big Data Cluster Computing in Production 1st Table of contents:
Introduction
Chapter 1 Finishing Your Spark Job
Installation of the Necessary Components
Native Installation Using a Spark Standalone Cluster
The History of Distributed Computing That Led to Spark
Enter the Cloud
Understanding Resource Management
Using Various Formats for Storage
Text Files
Sequence Files
Avro Files
Parquet Files
Making Sense of Monitoring and Instrumentation
Spark UI
Spark Standalone UI
Metrics REST API
Metrics System
External Monitoring Tools
Summary
Chapter 2 Cluster Management
Background
Spark Components
Driver
Workers and Executors
Configuration
Spark Standalone
Architecture
Single-Node Setup Scenario
Multi-Node Setup
YARN
Architecture
Dynamic Resource Allocation
Scenario
Mesos
Setup
Architecture
Dynamic Resource Allocation
Basic Setup Scenario
Comparison
Summary
Chapter 3 Performance Tuning
Spark Execution Model
Partitioning
Controlling Parallelism
Partitioners
Shuffling Data
Shuffling and Data Partitioning
Operators and Shuffling
Shuffling Is Not That Bad After All
Serialization
Kryo Registrators
Spark Cache
Spark SQL Cache
Memory Management
Garbage Collection
Shared Variables
Broadcast Variables
Accumulators
Data Locality
Summary
Chapter 4 Security
Architecture
Security Manager
Setup Configurations
ACL
Configuration
Job Submission
Web UI
Network Security
Encryption
Event logging
Kerberos
Apache Sentry
Summary
Chapter 5 Fault Tolerance or Job Execution
Lifecycle of a Spark Job
Spark Master
Spark Driver
Spark Worker
Job Lifecycle
Job Scheduling
Scheduling within an Application
Scheduling with External Utilities
Fault Tolerance
Internal and External Fault Tolerance
Service Level Agreements (SLAs)
Resilient Distributed Datasets (RDDs)
Batch versus Streaming
Testing Strategies
Recommended Configurations
Summary
Chapter 6 Beyond Spark
Data Warehousing
Spark SQL CLI
Thrift JDBC/ODBC Server
Hive on Spark
Machine Learning
DataFrame
MLlib and ML
Mahout on Spark
Hivemall on Spark
External Frameworks
Spark Package
XGBoost
spark-jobserver
Future Works
Integration with the Parameter Server
Deep Learning
Enterprise Usage
Collecting User Activity Log with Spark and Kafka
Real-Time Recommendation with Spark
Real-Time Categorization of Twitter Bots
Summary
Index
EULA
People also search for Spark Big Data Cluster Computing in Production 1st :
spark cluster explained
spark cluster computing with working sets
spark big data example
spark cluster capacity planning
largest spark cluster
Tags: Ilya Ganelin, Ema Orhian, Kai Sasaki, Brennon York, Big Data, Cluster Computing