Apache Spark Performance Tuning
Master Spark Performance Tuning on Databricks Cloud
This is the most comprehensive course ever created for performance tuning Apache Spark on Databricks Cloud. This course will teach you a holistic approach of performance tuning and take you deep into instrumenting, monitoring, diagnosing, pinpointing, and identifying the root cause of the performance problems. Then, it will take you to the approaches for solving, implementing, measuring, and benchmarking the performance tuning solutions.
This is a dual language course. Every scenario and solution are created in Python and Scala languages. The course applies to Spark SQL, PySpark and Scala projects.
The course will simulate and generate various performance problems using large volumes of data at the magnitude of GBs and help you gain knowledge to solve real-life problems.
Performance debugging, benchmarking, analyzing and pinpointing the root cause
Tuning data storage, read operation, partition size, small files, caching, indexing, disk partitioning and pruning
Tuning Data Spill, Data Skew, Data Shuffle, Joins, Wide transformations, UDF and Serialization problems
Fine tune AQE, Data skipping, Dynamic Pruning, Auto Optimize small file, Cleaning stale data
Cluster Selection, VM Selection, Photon Engine, Estimating CPU, Memory, Worker Nodes and Monitoring Utilization
What do you need to know before you start this course
You must already know Spark programming in Python or Scala language
You must be already familiar with Apache Spark Architecture and Internals
You also need to be familier with Databricks Cloud platform features and capabilities
Video lectures - Source code - Revision documents
Introduction to performance tuning and challenges
FREE PREVIEWInstrumentation and pinpointing
FREE PREVIEWWhat is Benchmarking?
How to Benchmark?
FREE PREVIEWOptimizing schema inference overhead
FREE PREVIEWWhat is disk caching?
Performance tuning goals - Time vs. Cost
Review Rating
Notes for reference
Source Code
Introduction to data read optimization
Optimize read using spark cache
Optimize read using column elimination
Optimize read using row elimination and predicate pushdown
Crippling of predicate pushdown
Optimize read, eliminating scan overhead and metadata problems
What is a small file problem, and how to correct it
What is haystack query and how to optimize using the Zorder
Understanding Data Frame Partition Calculation
Dataframe memory partition tuning
File open cost tuning
Notes for reference
Source Code
Review Rating
Introduction to spill, cause, and implications
Understanding Spark memory management and allocation
Detecting data spill, its severity and approaches to solve the spill
Tuning spill without code change
Small VM vs. Large VM for spill tuning
Tuning spill by reducing partition size
Data explosion and spill tuning
Other common data spill scenarios
Notes for reference
Source Code
Review Rating
Introduction to shuffle and data skew
Detecting data skew and its severity
Skew tuning challenges and approaches
Introduction to data salting
Data salting for handling skew
Skew tuning hints and its limitations
Skew tuning using AQE
Notes for reference
Source code
Data shuffle and performance bottlenecks
Broadcast Internals
Avoid shuffle using broadcast
Handling large volume broadcast and OOM
Tuning Large table to large table join
Tuning terabyte join
What, when and how to bucketing
Optimizing terabyte join with buckets
Reusing the join results
Optimizing joins with intermediate results
Notes for reference
Source code
You wont get this course anywhere in world. Its one of the rare and best course i have come across. you will rock in tuning aspects if you follow the instructor and complete this course.
You wont get this course anywhere in world. Its one of the rare and best course i have come across. you will rock in tuning aspects if you follow the instructor and complete this course.
Read LessThis is a great course. I learn something new on each section.
This is a great course. I learn something new on each section.
Read LessThe course is very helpful in deepening our understanding of the inner workings of spark and various optimization techniques.
The course is very helpful in deepening our understanding of the inner workings of spark and various optimization techniques.
Read LessThis course is awesome
This course is awesome
Read LessGood
Good
Read LessI'm learning a lot from this course. The explanations on performance tuning are exceptionally clear. If you're interested in mastering performance tuning, don't hesitate—just enroll in this course.
I'm learning a lot from this course. The explanations on performance tuning are exceptionally clear. If you're interested in mastering performance tuning, don't hesitate—just enroll in this course.
Read LessThe in depth explanation is provided with every scenario.
The in depth explanation is provided with every scenario.
Read LessExcellent course if you want to learn the nitty-gritty of Spark in detail. This course has articulated the concepts in detail and kudos to the instructor Prashant Pandey ⭐️⭐️⭐️⭐️⭐️ who has made the concepts very easy to observe. Thank you so much!!!
Excellent course if you want to learn the nitty-gritty of Spark in detail. This course has articulated the concepts in detail and kudos to the instructor Prashant Pandey ⭐️⭐️⭐️⭐️⭐️ who has made the concepts very easy to observe. Thank you so much!!!
Read LessNo. This course does not come with a refund policy. Once sold, you cannot get a refund for this self-paced course.
We provide standard 3-year access to the course material from the date of purchase. However, our promotional offers may reduce the access duration for a discounted price. Please check access validity terms and conditions for the promotional offers.
We have a Q&A forum where you can ask questions, and our team will answer your queries.
Get in touch with your course coordinator to learn more about the course, instructor-led course options, discount offers, course bundles, and additional payment methods.
Would you like to talk to your course coordinator? We are just a WhatsApp message away. Reach out for any query related to the course, payments, and current promotional offers.
Contact us for current promotional offers, course bundles, and additional payment methods such as NEFT, Net Banking, UPI, etc.
Schedule a call with course coordinator for bundles, discounts and live sessions
Master Spark programming in Python (PySpark) from beginner to advanced. Hands-on learning and Capstone project.
Master Azure Databricks Cloud platform capabilities and Lakehouse architecture. Micro-projects and Capstone project.
Curated learning path for mastering big data engineering using Spark and Azure Databricks. Hands-on and Capstone projects.