TalkDate: 02.06 / Start: 00:00 – Finish: 00:00

Fusing Efficient Parallel For Loops with a Composable Task Scheduler

Concurrency

If we want more efficient CPU utilization for big multicore systems, we need to extract parallelism on all the possible levels from all the application components, which requires nestable, composable, dynamic but still efficient multithreading systems. Task-based parallelism, like in Intel's OneTBB, solves the issue of oversubscription, enabling the first 3 desired properties, but it also has limitations, which affect efficiency compared to the classical OpenMP approach.

We faced these limitations for the first time while developing the Intel OpenCL runtime for CPU and MIC architectures and were able to close the gap with OpenMP significantly. So that even today, its performance remains unbeatable by open-source implementation (PoCL).

Now, task-based parallelism is gaining popularity even in the HPC area thanks to new OpenMP specifications and technologies like BOLT and OmpSS, while the number of processor cores keeps increasing. So, the same questions arise again, and we are ready to solve them.