This site contains OpenCL notes, tutorials, benchmarks, news.

Sunday, June 2, 2013

Blender 2.67b and OpenCL is working better

I just updated to new Blender 2.67b and found out that something in OpenCL changed to better. Last time I checked previous version of Blender there was not possible to select CPU as the compute device. Now it's possible. It's even possible to use combination of CPU and GPU. Take a look at the next picture:


I can use Intel Core i5 or/and AMD Radeon graphic card as compute device. This is nice.



You can see Intel Core i5 written twice. The reason is that I have installed two OpenCL implementations. One is from Intel and one is from AMD. Sadly I don't know which is from AMD and which is from Intel but most of users will not have that problem.

What if we try to run Cycles on OpenCL? Let's start with Intel Core i5 and theirs OpenCL implementation. In console we get next output:
Compiling OpenCL kernel ...
OpenCL kernel build output:
Compilation started
In file included from <built-in>:132:
<command line>:2:36: warning: ISO C99 requires whitespace after the macro name
Compilation done
Linking started
Linking done
Kernel <kernel_ocl_path_trace> was not vectorized
Kernel <kernel_ocl_tonemap> was successfully vectorized
Done.
Kernel compilation finished in 17.80s.  


You can see that whole Cycles code is quite massive stuff. Compilation takes 17.8s. Guys who wrote Cycles, put a lot of work into this code. Rendering time of default cube takes: 2.9s.

What about AMD's implementation of CPU backend? Console output is in this case less verbose:
Compiling OpenCL kernel ...
Kernel compilation finished in 5.13s.

Rendering time 2.2s is what is strange as AMD's implementation takes less time than Intel's implementation on Intel's CPUs! We're using AMD APP 1214.3 and Intel SDK 2013. But I'm not alone here. Phoronix found similar results: http://www.phoronix.com/scan.php?page=article&item=amd_intel_openclsdk&num=1 .

If we select not OpenCL computing but pure CPU implementation, it takes 1.8s. It seems that some work could be done to optimize whole thing. For my opinion pure CPU implementation is not needed any more. OpenCL implementation is enough. For machines which don't have OpenCL preinstalled, default OpenCL implementation could be bundled with Blender.

I noticed that Blender caches built OpenCL kernels. Good work! On the next start of Blender, first rendering is significantly faster.

What about GPU? At first we get a lot of trivial warnings which can be ignored:
Compiling OpenCL kernel ...
OpenCL kernel build output:
"/tmp/OCLawnF6S.cl", line 16307: warning: double-precision constant is
          represented as single-precision constant because double is not
          enabled
        float phi = M_2PI_F * randv;
                    ^

"/tmp/OCLawnF6S.cl", line 16323: warning: double-precision constant is
          represented as single-precision constant because double is not
          enabled
        float phi = M_2PI_F * randv;
                    ^

"/tmp/OCLawnF6S.cl", line 16337: warning: double-precision constant is
          represented as single-precision constant because double is not
          enabled
        float phi = M_2PI_F*u2;
                    ^

"/tmp/OCLawnF6S.cl", line 22875: warning: double-precision constant is
          represented as single-precision constant because double is not
          enabled
                float phi = M_2PI_F * randu;
                            ^

"/tmp/OCLawnF6S.cl", line 23165: warning: double-precision constant is
          represented as single-precision constant because double is not
          enabled
                float phiM = M_2PI_F * randv;
                             ^

"/tmp/OCLawnF6S.cl", line 23394: warning: double-precision constant is
          represented as single-precision constant because double is not
          enabled
                float phiM = M_2PI_F * randv;
                             ^                                                                                                                                                                                                    
                                                                                                                                                                                                                                  
"/tmp/OCLawnF6S.cl", line 24051: warning: double-precision constant is                                                                                                                                                            
          represented as single-precision constant because double is not                                                                                                                                                          
          enabled                                                                                                                                                                                                                 
                float phi = M_2PI_F * randu;                                                                                                                                                                                      
                            ^                                                                                                                                                                                                     
                                                                                                                                                                                                                                  
"/tmp/OCLawnF6S.cl", line 24427: warning: double-precision constant is                                                                                                                                                            
          represented as single-precision constant because double is not                                                                                                                                                          
          enabled                                                                                                                                                                                                                 
        const float tolerance = 1e-8;                                                                                                                                                                                             
                                ^                                                                                                                                                                                                 
                                                                                                                                                                                                                                  
"/tmp/OCLawnF6S.cl", line 24497: warning: double-precision constant is                                                                                                                                                            
          represented as single-precision constant because double is not                                                                                                                                                          
          enabled                                                                                                                                                                                                                 
        return ss->alpha_*(1.0f/M_4PI_F)*(Rdr + Rdv);                                                                                                                                                                             
                                ^                                                                                                                                                                                                 
                                                                                                                                                                                                                                  
"/tmp/OCLawnF6S.cl", line 26172: warning: double-precision constant is                                                                                                                                                            
          represented as single-precision constant because double is not
          enabled
                return atan2f(y, x) / M_2PI_F + 0.5f;
                                      ^

Error:E013:Insufficient Private Resources! 

OpenCL build failed: errors in console


But at then end we get:
Error:E013:Insufficient Private Resources! 

It looks like our GPU AMD Radeon 5470 is too low end. But at least it compiles all the code. It would be nice to get Cylces working on low end GPU's, but if we think further we can see that it's not worth the effort. Real Blender users will anyway use better GPU's.

The question is why it doesn't work. Is there to less local memory? Or we have to complex program? As we're talking about private resources I think that Cycles program is to complex to our GPU. It uses to much of registers or the program is to long. Maybe splitting Cycles into more smaller kernels would help. To find out the exact problem it's needed to use KernelAnalyzer from AMD APP and try to compile kernel for all GPUs.

23 comments:

  1. Hi, I downloaded Blender 2.67B, I have an Intel i7-3770 and a GPU Ati Radeon 7950. I don't have the option to switch from CPU Computing to GPU. Can I do something to render with my GPU? I know Cycles doesn't work well with OpenCL

    ReplyDelete
  2. You need to set environment variable CYCLES_OPENCL_TEST=true

    ReplyDelete
  3. Thanks for the answer! But how? Changing a file in the Cycles/kernel folder?

    ReplyDelete
  4. One way on Windows is here: http://support.microsoft.com/kb/310519 .

    On Linux you can type into the console: CYCLES_OPENCL_TEST=true blender

    ReplyDelete
  5. I'm on Windows, I did that, and the GPU Compute appeared. Although, the render on GPU doesn't work.
    OpenCL build failed: errors in console.
    Thanks for your help

    ReplyDelete
    Replies
    1. No problem.

      But it's interesting. On my side it compiled overall but code seems to complex for my GPU which is really low-end. Maybe you could update the drivers to newest one or even to beta. Which errors did you get? Would be interesting to see.

      Delete
  6. I have the last official drivers, I'm downloading the beta now. I don't know where Cycles put my errors files, I'm looking...
    In the Official Blender version I had just the option "Tahiti" under OpenCL, I tried with the last two versions which I found on Graphicall now and there is the option CPU+GPU, but it doesn't work anyway

    ReplyDelete
    Replies
    1. If you run the Blender using the command prompt I think you can see the log messages (at least on Linux):

      run cmd.exe
      cd c:/programfiles..../blender/
      blender.exe

      Delete
  7. yes, works. This is the error http://postimg.org/image/9r2p8fcw1/

    ReplyDelete
  8. Really strange. I have installed Blender 2.67.57180. 57180 is the revision number.

    If you look the preview versions at http://builder.blender.org/download/, it looks like the Windows is a little bit behind the Linux. Last revision is blender-2.67-r57165-win64.zip

    ReplyDelete
  9. I could only render with OpenCL CPU, when using the version "multiview-blender-2.67-r57177-win64.zip" found at this site: http://builder.blender.org/download/
    It is interesting to try.

    ReplyDelete
  10. Please keep exploring Cycles and OpenCL. I'm following close the development and you bring some interesting insight.
    Thanks!

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. The problem with AMD hardware is the compiller as stated by an AMD representative here:

    "I can guarantee you that active work is happening here. Fixing cycles involves work at both OpenCL compiler level and also in layers beneath it.

    The work is pretty involved and will take a fair amount of time.

    I have not got any timelines from AMD engineers. But it looks like this is going to take a while.

    Please bear with us."

    This if from http://devgurus.amd.com/message/1287979#1287979, and the date is march, 2013. AFAIK they are still working on it, so for now no news about it.

    Brecht Van Lommel has stated splitting a complex megakernel like cyles means quite a lot of work and time. So either way, people wanting to use Cycles on AMD hardware will have to wait quite some time.

    ReplyDelete
  13. Blender on OSX Lion:
    --------------------

    Compiling OpenCL kernel ...
    OpenCL error (ATI Radeon HD 6750M): OpenCL Warning : clBuildProgram failed: could not build program for 0x1021b00 (ATI Radeon HD 6750M) (err:-2)
    OpenCL error (ATI Radeon HD 6750M): [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
    :18459:10: error: initializing '__float4 *' with an expression of type '__attribute__((address_space(1))) __float4 *' changes address space of pointer
    float4 *in = (__global float4*)(buffer + index*kernel_data.film.pass_stride);
    ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    :37588:8: warning: unused variable 'ray_t'
    float ray_t = 0.0f;
    ^


    OpenCL kernel build output:
    :18459:10: error: initializing '__float4 *' with an expression of type '__attribute__((address_space(1))) __float4 *' changes address space of pointer
    float4 *in = (__global float4*)(buffer + index*kernel_data.film.pass_stride);
    ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    :37588:8: warning: unused variable 'ray_t'
    float ray_t = 0.0f;
    ^

    OpenCL build failed: errors in console

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. 2.68a under windows 8 is working, kernel compilation was ~ 138 seconds with about 7 gb used during the compilation process (used to utilize ~ 13 gb and take much longer). Running a very simple scene puts the GPU into the full cstate (1150/1550 are my settings at the moment) but appears to be no faster than the CPU implementation at the moment. GPU atomic utilization (gpu-z reported) around 64%.

      Currently running on a 7950, I will test on a barts (6870) now too. I can add in a bunch of cayman GPUs (6970s and 50s) later if you are interested.

      Progress has been made, now for optimization. It will be interesting to see if I can extract better performance from the GPU as it is now vs the CPU with different settings and scenes/materials. Currently, it's ~ the same as an 8350 under windows, so overall slower than an 8350 under Linux, most likely.

      Delete
    2. I've gotten it to run approximately twice as fast at 1080p (50 samples in this instance) than the CPU implementation.
      ~ 12-14 seconds vs. ~ 25. Most everything seems to work including hdri, minus the subsurface scattering and anisotropic filtering materials which just render black so far. I found the greatest speedup by increasing the tile size to 1024x1024, threads are disregarded by the opencl gpu render at the moment.

      Also, it appears to be 100% stable. Haven't tested barts yet.

      Delete
    3. Barts and Tahiti/barts/vishera build fails!

      Heh, looks like it's GCN only for the time being. Now that the blender devs (seems to be mostly one guy working on cycles...) have gotten over the 'opencl is hard' nonsense, shouldn't be long.

      Delete
  15. Under Linux with catalyst driver and AMD Tahiti GPU ( Radeon HD7970) it works !!! This is great news... btw. It is working with Blender 2.69 already...

    ReplyDelete
  16. I can comfirmed it to. Amd HD7950 works fine, but by using Gpu and Cpu together i get bugs in my renderings.

    ReplyDelete
  17. This comment has been removed by the author.

    ReplyDelete
  18. I'm trying OpenCL on a Yoga 2 Pro laptop which has Intel HD Graphics 4400 GPU but I'm getting a compilation error shown below. Is there any way to turn verbose logging on since there's not much information here? Is it going to be worthwhile bothering to try anyway??

    C:\Program Files\Blender Foundation\Blender>blender.exe
    Read new prefs: C:\Users\Philip\AppData\Roaming\Blender Foundation\Blender\2.73\config\userpref.blend
    found bundled python: C:\Program Files\Blender Foundation\Blender\2.73\python
    Imported multifiles
    Device init succes
    Compiling OpenCL kernel ...
    OpenCL error (Intel(R) HD Graphics 4400): Build program failure.
    OpenCL kernel build output:
    :19472:35: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :19666:14: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :19700:14: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :19717:14: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :19732:14: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :31562:15: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :31815:21: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :31891:21: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :33111:10: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :35879:25: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :36315:58: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :39463:32: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :41640:31: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :41674:35: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :41759:12: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    :41832:12: warning: double precision constant requires cl_khr_fp64, casting to single precision
    :1846:26: note: expanded from here
    fcl build 1 succeeded.
    fcl build 2 succeeded.
    Error: internal error.

    OpenCL build failed: errors in console
    Error: OpenCL build failed: errors in console

    ReplyDelete