When processing large datasets in Databricks using PySpark, performance depends heavily on how well your cluster resources are utilized — specifically, executors, cores, and partitions.
n In this blog, we’ll break down exactly how to calculate the number of parallel tasks, understand cluster behavior, and see a real-world example with performance observations.
Concept Overview
Before diving into the calculations, let’s understand the key Spark components:
| Term | Description |
|:—:|:—:|
| Executor | A JVM process launched on a worker node is responsible for running tasks. |
| Core | Each core represents a single thread of execution — 1 task runs per core. |
| Task | The smallest unit of work in Spark (usually processes one partition). |
| Parallelism | The number of tasks Spark can execute simultaneously. |
| Partition | Logical chunk of data Spark processes in parallel — one task per partition. |
In short:
More cores = more parallel tasks = faster processing (up to a point).
Example Cluster Configuration
| Parameter | Description | Value |
|:—:|:—:|:—:|
| Number of Worker Nodes | Total compute nodes (excluding the driver) | 10 |
| Executors per Node | Executors running on each node | 4 |
| CPU Cores per Executor | Number of CPU cores allocated per executor | 5 |
| Memory per Executor | Memory allocated per executor | 16 GB |
Step-by-Step Calculation
-
Total Executors
Total Executors = Nnode x Eper_node
= 10 x 4 = 40 executors
Each of the 10 nodes runs 4 executors, giving 40 total executors.
-
Total CPU Cores
Total CPU Cores = Total Executors x Cper_executor
= 40 x 5 = 200 cores
That means your cluster can process 200 tasks in parallel.
-
Number of Parallel Tasks
In Spark, each task uses one CPU core:
Parallel Tasks x Total CPU Cores = 200
So 200 partitions can be processed at the same time.
Cluster Visualization
The diagram below shows the concept of executors, cores, and tasks in a simplified Databricks cluster
Data & File Example (Real-time Scenario)
Let’s assume we are processing a Parquet file stored in ADLS with the following details:
| Parameter | Description |
|:—:|:—:|
| File Format | Parquet |
| File Size | 100 GB |
| Number of Rows | 250 Million |
| Columns | 60 |
| Cluster Type | Databricks Standard (10-node cluster) |
Partition Calculation and Parallelism
By default, Spark creates partitions automatically. However, for large datasets, it’s better to define a target partition size — typically between 128 MB and 512 MB per partition.
Let’s calculate:
Number of Partitions = 100 GB ⁄256 MB = 102400 ⁄ 256 = 400 partitions
With 200 cores, Spark will process:
200 tasks in the first wave
200 tasks in the second wave
Total = 400 tasks processed in 2 waves
Performance Observation (Approximate)
| Stage | Description | Approx Time | Remarks |
|:—:|:—:|:—:|:—:|
| Stage 1 (Read & Filter) | Reading Parquet and applying filters | ~3 mins | Data is distributed evenly across executors. |
| Stage 2 (Transformations) | Joins and aggregations | ~5 mins | CPU-heavy but parallelized well due to 200 cores. |
| Stage 3 (Write Stage) | Writing output as Delta format | ~2 mins | Write parallelism depends on output partitions. |
| Total Job Runtime | — | ~10 mins | Efficient partitioning and balanced task distribution. |
| If only 50 partitions | — | ~25 mins | Underutilization — fewer tasks → idle cores. |
| If 2000 partitions | — | ~12–13 mins | Slight slowdown due to scheduling overhead. |
Performance Insights
| Configuration Change | Performance Impact |
|:—:|:—:|
| Increase executors or cores | Improves parallelism, reduces runtime |
| Too few partitions | CPU underutilized, slower |
| Too many small partitions | Task scheduling overhead increases |
| Balanced partitions (~256 MB each) | Best performance |
| Use Delta Format | Faster reads/writes with optimized layout |
Key Takeaways
- Parallel Tasks = Total CPU Cores
- 1 Task = 1 Partition = 1 Core
- Tune partitions close to the number of available cores for best performance.
- Monitor Spark UI → Stages tab to analyze task distribution and identify bottlenecks.
- Don’t blindly increase partitions or cores — find the sweet spot for your workload.
Example Summary
| Metric | Value |
|:—:|:—:|
| Worker Nodes | 10 |
| Executors per Node | 4 |
| CPU Cores per Executor | 5 |
| Total Executors | 40 |
| Total CPU Cores / Parallel Tasks | 200 |
| Input File Size | 100 GB |
| Row Count | 250 Million |
| Partitions | 400 |
| Execution Waves | 2 |
| Approx Runtime | ~10 minutes |
Final Thoughts
This real-time breakdown shows how cluster configuration, task parallelism, and partition strategy directly impact Spark job runtime.
n Whether you’re optimizing ETL pipelines or tuning Delta writes —understanding these fundamentals can drastically improve Databricks performance and cost efficiency.