What you'll learn

This is the most comprehensive course ever created for performance tuning Apache Spark on Databricks Cloud. This course will teach you a holistic approach of performance tuning and take you deep into instrumenting, monitoring, diagnosing, pinpointing, and identifying the root cause of the performance problems. Then, it will take you to the approaches for solving, implementing, measuring, and benchmarking the performance tuning solutions.

This is a dual language course. Every scenario and solution are created in Python and Scala languages. The course applies to Spark SQL, PySpark and Scala projects.

The course will simulate and generate various performance problems using large volumes of data at the magnitude of GBs and help you gain knowledge to solve real-life problems.

  • Performance debugging, benchmarking, analyzing and pinpointing the root cause

  • Tuning data storage, read operation, partition size, small files, caching, indexing, disk partitioning and pruning

  • Tuning Data Spill, Data Skew, Data Shuffle, Joins, Wide transformations, UDF and Serialization problems

  • Fine tune AQE, Data skipping, Dynamic Pruning, Auto Optimize small file, Cleaning stale data

  • Cluster Selection, VM Selection, Photon Engine, Estimating CPU, Memory, Worker Nodes and Monitoring Utilization

Course Prerequisite

What do you need to know before you start this course

  • You must already know Spark programming in Python or Scala language

  • You must be already familiar with Apache Spark Architecture and Internals

  • You also need to be familier with Databricks Cloud platform features and capabilities

Course Content

Video lectures - Source code - Revision documents

    1. Introduction to performance tuning and challenges

      FREE PREVIEW
    2. Instrumentation and pinpointing

      FREE PREVIEW
    3. What is Benchmarking?

    4. How to Benchmark?

      FREE PREVIEW
    5. Optimizing schema inference overhead

      FREE PREVIEW
    6. What is disk caching?

    7. Performance tuning goals - Time vs. Cost

    8. Review Rating

    9. Notes for reference

    10. Source Code

    1. Introduction to data read optimization

    2. Optimize read using spark cache

    3. Optimize read using column elimination

    4. Optimize read using row elimination and predicate pushdown

    5. Crippling of predicate pushdown

    6. Optimize read, eliminating scan overhead and metadata problems

    7. What is a small file problem, and how to correct it

    8. What is haystack query and how to optimize using the Zorder

    9. Understanding Data Frame Partition Calculation

    10. Dataframe memory partition tuning

    11. File open cost tuning

    12. Notes for reference

    13. Source Code

    14. Review Rating

    1. Introduction to spill, cause, and implications

    2. Understanding Spark memory management and allocation

    3. Detecting data spill, its severity and approaches to solve the spill

    4. Tuning spill without code change

    5. Small VM vs. Large VM for spill tuning

    6. Tuning spill by reducing partition size

    7. Data explosion and spill tuning

    8. Other common data spill scenarios

    9. Notes for reference

    10. Source Code

    11. Review Rating

    1. Introduction to shuffle and data skew

    2. Detecting data skew and its severity

    3. Skew tuning challenges and approaches

    4. Introduction to data salting

    5. Data salting for handling skew

    6. Skew tuning hints and its limitations

    7. Skew tuning using AQE

    8. Notes for reference

    9. Source code

    1. Data shuffle and performance bottlenecks

    2. Broadcast Internals

    3. Avoid shuffle using broadcast

    4. Handling large volume broadcast and OOM

    5. Tuning Large table to large table join

    6. Tuning terabyte join

    7. What, when and how to bucketing

    8. Optimizing terabyte join with buckets

    9. Reusing the join results

    10. Optimizing joins with intermediate results

    11. Notes for reference

    12. Source code

Course Features

  • 84 lessons
  • 25.5 hours of video content
  • PDF & Source Code
  • Total Support

Course Reviews

5 star rating

Best course for performance tuning

Ananda Harihara Shivamurthy

You wont get this course anywhere in world. Its one of the rare and best course i have come across. you will rock in tuning aspects if you follow the instructor and complete this course.

You wont get this course anywhere in world. Its one of the rare and best course i have come across. you will rock in tuning aspects if you follow the instructor and complete this course.

Read Less
5 star rating

Great Instructor, Great Course

Greg Coopman

This is a great course. I learn something new on each section.

This is a great course. I learn something new on each section.

Read Less
5 star rating

Excellent course on spark optimization

Navkanth Jyothi

The course is very helpful in deepening our understanding of the inner workings of spark and various optimization techniques.

The course is very helpful in deepening our understanding of the inner workings of spark and various optimization techniques.

Read Less
5 star rating

Excellent course

Shrikant Pandey

This course is awesome

This course is awesome

Read Less
5 star rating

Good

naresh upputuri

Good

5 star rating

Excellent course

Vijay Kumar

I'm learning a lot from this course. The explanations on performance tuning are exceptionally clear. If you're interested in mastering performance tuning, don't hesitate—just enroll in this course.

I'm learning a lot from this course. The explanations on performance tuning are exceptionally clear. If you're interested in mastering performance tuning, don't hesitate—just enroll in this course.

Read Less
5 star rating

Explanation

Anil Sharma

The in depth explanation is provided with every scenario.

The in depth explanation is provided with every scenario.

Read Less
5 star rating

Apache Spark Performance Tuning - review

Swetha Poddutooru

Excellent course if you want to learn the nitty-gritty of Spark in detail. This course has articulated the concepts in detail and kudos to the instructor Prashant Pandey ⭐️⭐️⭐️⭐️⭐️ who has made the concepts very easy to observe. Thank you so much!!!

Excellent course if you want to learn the nitty-gritty of Spark in detail. This course has articulated the concepts in detail and kudos to the instructor Prashant Pandey ⭐️⭐️⭐️⭐️⭐️ who has made the concepts very easy to observe. Thank you so much!!!

Read Less

Features & Support

  • Total Support

    We provide support throughout your learning and answer every question. You may also avail one-to-one and online technical support calls for blocker issues.

  • Completion Certificate

    Students who complete ScholarNest Academy courses earn free, verifiable course completion certificates to share with their friends, co-workers, and potential employers.

  • Future Updates

    Any future updates, upgrades, revisions, or topics included in the same course during your course access period will be available at no additional cost.

Course FAQ

  • Do you have a refund policy?

    No. This course does not come with a refund policy. Once sold, you cannot get a refund for this self-paced course.

  • How long can I access the course material?

    We provide standard 3-year access to the course material from the date of purchase. However, our promotional offers may reduce the access duration for a discounted price. Please check access validity terms and conditions for the promotional offers.

  • How do you provide support?

    We have a Q&A forum where you can ask questions, and our team will answer your queries.

Enroll into self-paced course

Discover your potential, start today and learn at your own schedule

Schedule a free call

Get in touch with your course coordinator to learn more about the course, instructor-led course options, discount offers, course bundles, and additional payment methods.

  • WhatsApp

    WhatsApp: +91-93534 65988

    Would you like to talk to your course coordinator? We are just a WhatsApp message away. Reach out for any query related to the course, payments, and current promotional offers.

  • Email

    Email: [email protected]

    Contact us for current promotional offers, course bundles, and additional payment methods such as NEFT, Net Banking, UPI, etc.