OpenCL: Dynamic parallelism in OpenCL 2.0

Sunday, September 29, 2013

Dynamic parallelism in OpenCL 2.0

Provisional specifications of OpenCL 2.0 were released few months ago. One of the very interesting features is support for dynamic parallelism. In CUDA world it already exist for about a year but still only on the most expensive devices with compute capability 3.5 (Titan, GTX780; booth with chip GK110). On AMD side it a little bit different story. They didn't talk anything about dynamic parallelization but on the other side they introduced GCN 2.0 which might have support for it. In addition they introduced Mantle - a new GPU API which promises up to 9 times more draw calls than comparable API's (OpenGL, DirectX). This might smell that draw calls might be called from the GPU itself.

How will be dynamic parallelization used? Very simple. Kernels will enque kernels to a device queue:

int enqueue_kernel (
 queue_t queue,
 kernel_enqueue_flags_t flags,
 const ndrange_t ndrange,
 my_block_A);

First argument requires the queue; you can use the one from the host.

Take care as this function is asynchronous. The parent kernel will not wait for its child kernels but it will be vice versa. Second argument of enqueue_kernel will define if child kernels will start running while the parent kernel is still running (CLK_ENQUEUE_FLAGS_NO_WAIT), wait for the parent kernel to finish (CLK_ENQUEUE_FLAGS_WAIT_KERNEL), or wait only for a work-group of parent kernel to finish (CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP).

Third argument defines amount of threads to run. For example ndrange_1D(global_work_size, local_work_size) can be used.

And the last block defines the actual function to run as a kernel. It is defined by using the Block syntax:

void (^my_block_A)(void) =
 ^{ size_t id = get_global_id(0);
 b[id] += a[id];
};

Usage of enqueue_kernel seems quite easy, but there are some possible problems. How much kernels can we run in a such way? We can run out of memory. If it happens, enqueue_kernel returns CL_ENQUEUE_FAILURE (in debug mode CLK_DEVICE_QUEUE_FULL). So it seems that OpenCL code requires a lot of error handling which is not optimal for SIMD machinery.

10 comments:

SecretAardvarkMarch 26, 2014 at 2:12 AM
I suspect that this will only work on GPUs that are compatible with dynamic parallelism at the hardware level. Is this correct? If so, do you know which AMD GPUs will support this, if any?
ReplyDelete
Replies
Webgurukul IT Training InstituteJanuary 8, 2020 at 4:57 AM
Interesting share, must read complete blog.
Best PHP Course in Nagpur
ReplyDelete
Replies
AnonymousFebruary 24, 2020 at 9:43 PM
Awesome blog. I enjoyed reading your articles.

click here for more info
ReplyDelete
Replies
Karthik LoganSeptember 9, 2020 at 8:15 AM
Really great post admin thanks for sharing this.
JioTV live for PC
Vivavideo for PC Download
Cartoon HD for PC Apk
Jio Fiber Register
Snapseed for PC
Whatsapp for laptop

ReplyDelete
Replies
Manage TransfersNovember 4, 2021 at 12:28 AM
Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. Do you want to know about Manage Transfer and Health Insurance Live Transfers?
ReplyDelete
Replies
satta kingFebruary 16, 2022 at 9:39 AM
Best site for Satta king kashipurleak number & all game record charts. We provide 100% fix number direct from Satta king kashipur company which includes all famous games like, Shri Ganesh Satta King Here is an example card. satta-king.online is the no1 satta king site where you can get the fastest Shri ganesh satta king. And also another example card. sattakingy.in is the no1 satta king site where you can get the fastest Delhi bazar satta king result and Leak Delhi bazar satta

ReplyDelete
Replies
ImarticusMarch 6, 2022 at 11:13 PM
Fast-track your data analytic and machine learning course with guaranteed placement opportunities. Most extensive, industry-approved experiential learning program ideal for future Data Scientists.
ReplyDelete
Replies
jennyAugust 23, 2022 at 2:47 AM

Really informative and inoperative, Thanks for the post and effort! Please keep sharing more such blog if want
more information about topic Visit Us our Website at movies flame .
ReplyDelete
Replies
Emma ParkerAugust 23, 2022 at 4:05 AM
The greatest source for results if you're seeking for gali results is Satta King, which can be found at Satta king.
ReplyDelete
Replies
ImarticusJanuary 23, 2024 at 6:58 AM
I thoroughly enjoyed reading your article on dynamic parallelization in OpenCL 2.0. Your comprehensive explanation of the concepts and practical examples make it accessible for both beginners and experienced developers. The way you break down complex ideas into simple steps is truly commendable. It's evident that you have a deep understanding of the subject matter, and your passion for sharing knowledge shines through.

Additionally, I would like to highlight the relevance of OpenCL skills in the field of data science, which is growing rapidly. Imarticus Learning's Data Science Course could be a perfect complement to your expertise in OpenCL. Their program not only covers the essentials of data science but also provides hands-on experience with tools and technologies widely used in the industry. Integrating OpenCL proficiency with data science skills from Imarticus could open up exciting opportunities for anyone looking to excel in the dynamic landscape of parallel computing and data analytics. Great job on the article!
ReplyDelete
Replies

Add comment