By Padma Priya Chitturi
- Use Apache Spark for information processing with those hands-on recipes
- Implement end-to-end, large-scale information research higher than ever before
- Work with strong libraries corresponding to MLLib, SciPy, NumPy, and Pandas to realize insights out of your data
Spark has emerged because the such a lot promising titanic info analytics engine for facts technological know-how pros. the genuine strength and cost of Apache Spark lies in its skill to execute info technological know-how projects with velocity and accuracy. Spark's promoting aspect is that it combines ETL, batch analytics, real-time move research, desktop studying, graph processing, and visualizations. It helps you to take on the complexities that include uncooked unstructured information units with ease.
This advisor gets you cozy and assured acting facts technological know-how initiatives with Spark. you'll find out about implementations together with dispensed deep studying, numerical computing, and scalable laptop studying. you may be proven powerful ideas to challenging recommendations in info technological know-how utilizing Spark's information technological know-how libraries similar to MLLib, Pandas, NumPy, SciPy, and extra. those uncomplicated and effective recipes will allow you to enforce algorithms and optimize your work.
What you are going to learn
- Explore the subjects of information mining, textual content mining, common Language Processing, details retrieval, and computing device learning.
- Solve real-world analytical issues of huge information sets.
- Address information technology demanding situations with analytical instruments on a dispensed approach like Spark (apt for iterative algorithms), which bargains in-memory processing and extra flexibility for information research at scale.
- Get hands-on adventure with algorithms like category, regression, and suggestion on genuine datasets utilizing Spark MLLib package.
- Learn approximately numerical and clinical computing utilizing NumPy and SciPy on Spark.
- Use Predictive version Markup Language (PMML) in Spark for statistical facts mining models.
About the Author
Padma Priya Chitturi is Analytics Lead at Fractal Analytics Pvt Ltd and has over 5 years of expertise in massive information processing. presently, she is a part of potential improvement at Fractal and answerable for resolution improvement for analytical difficulties throughout a number of company domain names at huge scale. sooner than this, she labored for an airways product on a real-time processing platform serving a million consumer requests/sec at Amadeus software program Labs. She has labored on understanding large-scale deep networks (Jeffrey dean's paintings in Google mind) for picture category at the titanic information platform Spark. She works heavily with huge facts applied sciences reminiscent of Spark, typhoon, Cassandra and Hadoop. She was once an open resource contributor to Apache Storm.
Table of Contents
- Big information Analytics with Spark
- Tricky records with Spark
- Data research with Spark
- Clustering, type, and Regression
- Working with Spark MLlib
- NLP with Spark
- Working with glowing Water - H2O
- Data Visualization with Spark
- Deep studying on Spark
- Working with SparkR
Read or Download Apache Spark for Data Science Cookbook PDF
Similar data modeling & design books
No matter if you’re development a social media website or an internal-use firm software, this hands-on consultant exhibits you the relationship among MongoDB and the enterprise difficulties it’s designed to resolve. You’ll the way to observe MongoDB layout styles to a number of hard domain names, corresponding to ecommerce, content material administration, and on-line gaming.
Transcend the fundamentals and grasp the subsequent iteration of Hadoop facts processing platformsAbout This BookLearn tips to optimize Hadoop MapReduce, Pig and HiveDive into YARN and learn the way it could possibly combine hurricane with HadoopUnderstand how Hadoop might be deployed at the cloud and achieve insights into analytics with HadoopWho This ebook Is ForDo you must increase your Hadoop ability set and take your wisdom to the subsequent point?
Comprehend the basics of laptop studying with R and construct your individual dynamic algorithms to take on complex real-world difficulties successfullyAbout This BookGet to grips with the techniques of computer studying via interesting real-world examplesVisualize and remedy advanced difficulties by utilizing power-packed R constructs and its powerful applications for laptop learningLearn to construct your personal computer studying procedure with this example-based useful guideWho This e-book Is ForIf you have an interest in mining worthy info from info utilizing cutting-edge innovations to make data-driven judgements, it is a go-to consultant for you.
Written by means of major specialists, this self-contained textual content offers systematic assurance of LDPC codes and their building ideas, unifying either algebraic- and graph-based ways right into a unmarried theoretical framework (the superposition construction). An algebraic approach for developing protograph LDPC codes is defined, and fully new codes and strategies are provided.
- Java Data Analysis
- MySQL Explained: Your Step-by-Step Guide
- Structural Information and Communication Complexity: 23rd International Colloquium, SIROCCO 2016, Helsinki, Finland, July 19-21, 2016, Revised Selected Papers (Lecture Notes in Computer Science)
- Numerical Methods for Stochastic Computations: A Spectral Method Approach
Extra info for Apache Spark for Data Science Cookbook
Apache Spark for Data Science Cookbook by Padma Priya Chitturi