Spark in Action

Book Description:

Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0.

about the technology

Spark is a powerful general-purpose analytics engine that can handle massive amounts of data distributed across clusters with thousands of servers. Optimized to run in memory, this impressive framework can process data up to 100x faster than most Hadoop-based systems. Spark’s support for SQL, along with its ability to rapidly run repeated queries and quickly adapt to modified queries, make it well-suited for machine learning, so important in this age of big data. Whether you’re using Java, Scala, or Python, Spark offers straightforward APIs to access its core features.

about the book

Spark in Action, Second Edition is an entirely new book that teaches you everything you need to create end-to-end analytics pipelines in Spark. Rewritten from the ground up with lots of helpful graphics, you’ll learn the roles of DAGs and data frames, the advantages of “lazy evaluation”, and ingestion from files, databases, and streams.

By working through carefully-designed Java-based examples, you’ll delve into Spark SQL, interface with Python, and cache and checkpoint your data. Along the way, you’ll learn to interact with common enterprise data technologies like HDFS and file formats like Parquet, ORC, and Avro.

You’ll also discover interesting Spark use cases, like interactive reporting, machine learning pipelines, and even monitoring players in online games. You’ll even get a quick look at machine learning techniques you can apply without a Ph.D. in mathematics! All examples are available on GitHub for you to explore and adapt as you learn. The demand for Spark-savvy developers is so steep, they’re among the highest paid in the industry today!

what’s inside

Lots of examples based in the Spark Java APIs using real-life dataset and scenarios
Examples based on Spark v2.3 Ingestion through files, databases, and streaming
Building custom ingestion process
Querying distributed datasets with Spark SQL
Deploying Spark applications
Caching and checkpointing your data
Interfacing with data scientists using Python
Applied machine learning
Spark use cases including Lumeris, CERN, and IBM

about the reader

For beginning to intermediate developers and data engineers comfortable programming in Java. No experience with functional programming, Scala, Spark, Hadoop, or big data is required.

about the author

An experienced consultant and entrepreneur passionate about all things data, Jean Georges Perrin was the first IBM Champion in France, an honor he’s now held for ten consecutive years. Jean Georges has managed many teams of software and data engineers.

Book Description:

about the technology

about the book

what’s inside

about the reader

about the author

LEAVE A REPLY Cancel reply

Latest Books

Articulate Storyline Essentials

Beginning SharePoint 2013 Development

Beginning SharePoint 2013

SharePoint 2013 WCM Advanced Cookbook

Beginning PowerShell for SharePoint 2013

Popular Categories

POPULAR POSTS

Beginning Programming with Python For Dummies, 2nd Edition [pdf]

AWS Certified SysOps Administrator Official Study Guide: Associate Exam [PDF]

Best 3 Python books For Programmers [2018]

POPULAR CATEGORY

REVIEW OVERVIEW
Spark in Action
SUMMARY	5 OVERALL SCORE

Book Description:

about the technology

about the book

what’s inside

about the reader

about the author

Share this:

LEAVE A REPLY Cancel reply

Latest Books

Popular Categories

POPULAR POSTS

POPULAR CATEGORY