Learning Spark 2nd Edition by Jules Damji, Brooke Wenig, Tathagata Das, Denny Lee – Ebook PDF Instant Download/Delivery: 1492050040, 978-1492050049
Full dowload Learning Spark 2nd Edition after payment
Product details:
ISBN 10: 1492050040
ISBN 13: 978-1492050049
Author: Jules Damji, Brooke Wenig, Tathagata Das, Denny Lee
Data is getting bigger, arriving faster, and coming in varied formats-and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads efficiently? Enter Apache Spark.
Updated to emphasize new features in Spark 2.4., this second edition shows data engineers and scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine-learning algorithms. Through discourse, code snippets, and notebooks, you’ll be able to:
Learn Python, SQL, Scala, or Java high-level APIs: DataFrames and Datasets Peek under the hood of the Spark SQL engine to understand Spark transformations and performance Inspect, tune, and debug your Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow Use open source Pandas framework Koalas and Spark for data transformation and feature engineering
Learning Spark 2nd Table of contents:
1. Introduction to Apache Spark: A Unified Analytics Engine
- The Genesis of Spark
- Big Data and Distributed Computing at Google
- Hadoop at Yahoo!
- Spark’s Early Years at AMPLab
- What Is Apache Spark?
- Apache Spark Components as a Unified Stack
- Apache Spark’s Distributed Execution
- The Developer’s Experience
- Who Uses Spark, and for What?
- Community Adoption and Expansion
2. Downloading Apache Spark and Getting Started
- Step 1: Downloading Apache Spark
- Spark’s Directories and Files
- Step 2: Using the Scala or PySpark Shell
- Step 3: Understanding Spark Application Concepts
3. Apache Spark’s Structured APIs
- Spark: What’s Underneath an RDD?
- Structuring Spark
- Key Merits and Benefits
- The DataFrame API
4. Spark SQL and DataFrames: Introduction to Built-in Data Sources
- Using Spark SQL in Spark Applications
- Basic Query Examples
- SQL Tables and Views
5. Spark SQL and DataFrames: Interacting with External Data Sources
- Spark SQL and Apache Hive
- User-Defined Functions
- Querying with the Spark SQL Shell, Beeline, and Tableau
- External Data Sources
- Other External Sources
- Higher-Order Functions in DataFrames and Spark SQL
- Built-in Functions for Complex Data Types
6. Spark SQL and Datasets
- Single API for Java and Scala
- Scala Case Classes and JavaBeans for Datasets
- Working with Datasets
- Memory Management for Datasets and DataFrames
- Serialization and Deserialization (SerDe)
7. Optimizing and Tuning Spark Applications
- Optimizing and Tuning Spark for Efficiency
- Viewing and Setting Apache Spark Configurations
- Scaling Spark for Large Workloads
- Caching and Persistence of Data
8. Structured Streaming
- Evolution of the Apache Spark Stream Processing Engine
- The Advent of Micro-Batch Stream Processing
- Lessons Learned from Spark Streaming (DStreams)
- The Philosophy of Structured Streaming
- The Programming Model of Structured Streaming
9. Building Reliable Data Lakes with Apache Spark
- The Importance of an Optimal Storage Solution
- Building Lakehouses with Apache Spark and Delta Lake
- Configuring Apache Spark with Delta Lake
- Loading Data into a Delta Lake Table
10. Machine Learning with MLlib
- What Is Machine Learning?
- Supervised Learning
- Unsupervised Learning
- Why Spark for Machine Learning?
- Designing Machine Learning Pipelines
11. Managing, Deploying, and Scaling Machine Learning Pipelines with Apache Spark
- Model Management
- Model Deployment Options with MLlib
- Model Export Patterns for Real-Time Inference
- Leveraging Spark for Non-MLlib Models
- Summary
12. Epilogue: Apache Spark 3.0
- Spark Core and Spark SQL
- Dynamic Partition Pruning
- Adaptive Query Execution
- Accelerator-Aware Scheduler
- Structured Streaming
People also search for Learning Spark 2nd:
learning spark pdf
learning spark 2nd edition
machine learning spark
azure machine learning spark
Reviews
There are no reviews yet.