Advanced Analytics with Spark: Patterns for Learning from Data at Scale
Product Description
Advanced Analytics with Spark: Patterns for Learning from Data at Scale
In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example.
You€ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques€"classification, collaborative filtering, and anomaly detection among others€"to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you€ll find these patterns useful for working on your own data applications.
Patterns include:
- Recommending music and the Audioscrobbler data set
- Predicting forest cover with decision trees
- Anomaly detection in network traffic with K-means clustering
- Understanding Wikipedia with Latent Semantic Analysis
- Analyzing co-occurrence networks with GraphX
- Geospatial and temporal data analysis on the New York City Taxi Trips data
- Estimating financial risk through Monte Carlo simulation
- Analyzing genomics data and the BDG project
- Analyzing neuroimaging data with PySpark and Thunder









