Tag Archives: OpenGL

Why I’m betting on Vulkan and Julia

By: Simon Danisch

Re-posted from: http://randomfantasies.com/2016/02/why-im-betting-on-vulkan-and-julia/

Vulkan – The successor of OpenGL and OpenCL
Julia – A fast high-level language for Scientific Computing

There are currently different problem areas when dealing with visualizations and high performance computing, which make it hard to create a pleasant and fast workflow.

  • There is no seamless way from data to high performance visualization libraries. Even if your data processing runs on the GPU, it is common practice to serialize it to the hard drive and load it into an external visualization program, which again uploads the data to the GPU.
  • You need to know different low-level languages and API’s to get full control over the data processing/simulation and visualization pipeline -> if you’re not an expert you won’t get far
  • It is often hard to set up a speedy pipeline with different libraries that all need to work together (The worst setup took me one week before I could get started)
  • There are many problems in scientific computing which need custom solutions. If your technology stack is fragmented and uses a lot of different languages, it is really hard to create a custom solution without starting from scratch.
  • Once your research is done, it might only run on a couple of platforms.

Vulkan and Julia can form the next framework for high performance scientific computing with a new level of native support for interactive 2D/3D visualizations across different platforms.

 

Short introduction to Vulkan:
Vulkan is the newest industry standard for general computing and graphics on the GPU (released by the Khronos Group on the 02/16/16). It can be thought as the successor of Khronos’s OpenGL (Graphics) and OpenCL (GPGPU), but it is designed almost from scratch.

The design offers a new kind of flexibility and performance when programming for the GPU.

Some of the biggest changes are, that OpenGL and OpenCL kernels now share the same intermediate representation (SPIR-V) and execution model. This intermediate binary representation is very similar to that of LLVM-IR and there are bi-directional translators for SPIR-V and LLVM-IR.
SPIR-V is easy to target from different languages finally opening up the world of GPU accelerated graphics and computing to more languages.
Another improvement is the elimination of most driver overhead and better support for multithreading which makes it easier to utilize the CPU while doing work on the GPU.
And last of all, Vulkan is expected to run on different hardware setups, from multi node clusters with thousands of GPU’s down to a smartphone, all while squeezing out the maximum performance.

Supported Platforms

(where is apple?)

All this comes at the cost of a complicated runtime, where you have to manage the memory allocations and schedule programs via command buffers with a fairly complex C API.

 
Short introduction to Julia:

Julia is a young programming language promising to be the greediest of all!
It is easy to use and offers the kind of performance you would expect from C, while offering freedom and usability known from dynamic languages like Python.
The main language features are multiple dispatch and a rather functional style of programming. The two together make it a good fit for mathematical code and parallel programming. As a loosely typed language without much boilerplate, a huge standard library and a lot of scientific libraries, it is very easy to work interactively on powerful scripts. These scripts can go from data analysis to generating HTML for a website.
Still offering high speed in this situation is made possible by a clever runtime type inference and just in time compilation of the then (mostly) fully typed multimethods.
The just in time compilation is done via LLVM, which means Julia already targets LLVM-IR as an intermediate representation. Since Julia’s memory layout and binary format is similar to C’s, Julia has a mostly overhead less C-call API. Finally, Julia runs on most platforms supported by LLVM, and is designed to easily run on large clusters.

 

You might already see where this is going.

We have a language that targets number crunching and scientific computing with a huge demand for processing power, and an API that allows to squeeze out the last bit of performance from heterogeneous hardware.

We have a hard to control Vulkan C-API and a scripting language which excels at calling C code.
We have a language that compiles to LLVM-IR and converters that convert LLVM-IR to SPIR-V, allowing the language to run natively on the GPU.

Everything is there for a great support of high performance computing and visualizations from within one easy to use language.

All these goodies are very close, but there is much work to be done. We still need a wrapper for the Vulkan API and Julia’s LLVM-IR needs some tweaking to conform to the SPIR-V standards, which is a pretty involved task.

To slowly bring Julia and Vulkan together, a two stepped approach is adviced:

  1. Use Julia as a scripting language for Vulkan to have a nice interface to the memory and command buffers, while still relying on C/C++ GLSL kernels to produce SPIR-V executables.
  2. make Julia compile to SPIR-V itself, completely elimating the need for other languages.

 

In the first stage, Julia already offers great advantages.
From my experience with the julian OpenCL and OpenGL wrappers, decorating a low level API like Vulkan with a higher level interface in Julia offers a great improvement in productivity and safety while losing almost no performance.
Then in the final stage, we could even choose freely what to run on the CPU and what to run on the GPU, since the Julia code can run on both.
You could create software that works on the desktop and on tablets and mobile phones without any code duplication.

We could simulate the gravity of large galaxies on a cluster of GPU’s while immediately visualizing the result, all while staying in a nice high level programming language and having absolute control over all libraries involved.
This could be done in an interactive way, refining functions and algorithms while directly getting feedback.

We could finally build the interactive tools that stream big data in parallel, while seamlessly viewing and editing the data.
If you have a powerful GPU, you could even dive into virtual reality to get a good look at the data. If you know a bit about VR, you also know that it is very sensitive to latency, making it a must to rely on tools with the highest possible performance.

I think the motivation is clear, now we only need to implement the missing bits!

Work is done on Vulkan.jl already and I will do my best to add a Vulkan backend to GLVisualize.

GLVisualize Benchmark

By: Simon Danisch

Re-posted from: http://randomfantasies.com/2015/05/glvisualize-benchmark/

This is a benchmark comparing GLVisualize against some popular scientific visualization libraries, namely Mayavi, Vispy and Matlab.

There is also a chapter on IJulia, which is not really a plotting library, but can incorporate plots from other libraries.

The biggest problem with bench marking 3D rendering speed is, that there is no library which will allow to exactly reproduce similar conditions and measures.
Additionally, without extensive knowledge of the library, it is difficult to foresee what gets bench marked.
As an example of why it is difficult to measure the frame rate we can look at Vispy. When you enable to measure the frame rate, it will show very low frame rates, as it only creates a new frame on demand.
In contrast, GLVisualize has a fixed render loop, which renders as much frames as possible, leading to totally different amount of rendered frames per second (which is admittedly a total waste of GPU time and will change in the future).
This is why it was decided, to use the threshold at which a similar 3D scene is still conceived as enjoyable and interactive. Usually the minimal amount of frames per second for perceiving movements as smooth is roughly around 25.
So the benchmark was executed in the way, that the number regulating the complexity of the 3D scene was increased until one could not move the camera without stutters. The recorded last enjoyable threshold is than the result of the Benchmark.

vispy_mayavi_romeo

First benchmark is an animated and still 3D surface plot. The libraries offering this functionality where Vispy, Mayavi and Matlab.

Library Still Animated
Vispy  300  80
Mayavi 800 150
Matlab 800 450
GLVisualize 900 600
Speed up Vispy 9x  56x
Speed up Mayavi 1.26x 16x
Speed up Matlab 1.26x 1.7x

(Speed up of GLVisualize)
Vispy had some issues, as the camera was never really smooth for the surface example. Also the normals were missing and there was no option to colorize the surface depending on the height.
It was decided to use the threshold of going from a little stutter to unpleasant stutters, making vispy not completely fail this benchmark.
For Vispy it was found out, that the normals where calculated on the CPU resulting in a major slow down. The same can be expected for Mayavi, but Mayavi seems to be faster at calculating the normals.
There is not much information available on how Matlab renders their visualization, as it is closed source. It has to be noted, that Matlab did some additional calculations to always fit the drawing ideally into the borders.

On the other hand, Matlab is using a gouraud shading model, which needs quite a bit less processing power than phong-blinn shading, which is used by GLVisualize.


 

romeo_mayavi_particles2

The next benchmark is only between GLVisualize and Mayavi, as the other libraries did not offer a comparable solution. Matlab does not allow to use cubes as particle primitives and Vispy only had an example, where you needed to write your own shader, which can not be seen as a serious option. This is a benchmark for easy to use and high level plotting libraries. It is always possible to write an optimal version yourself in some framework, but what really interesting is, how well you can solve a problem with the tools the library has readily available.

Library  Still  Animated
Mayavi 90000 2500
GLVisualize 1000000 40000
Speed up 11x 16x

GLVisualize is an order of magnitude faster in this specific benchmark. This is most likely due to the fact that GLVisualize uses OpenGL’s native instance rendering.

On a side note, GLVisualize was the only library that allowed to use any arbitrary mesh as a particle.


 

IJulia

It was not possible to compare IJulia directly with GLVisualize, as the feature set for plotting is too different.

But there are certain factors, which indicate, that is hard to reach optimal performance with IJulia.
First of all, IJulia uses ZMQ to bridge the web interface with the Julia kernel.
ZMQ is a messaging system using different sockets for communication like inproc, IPC, TCP, TIPC and multicas.
While it is very fast at it’s task of sending messages, it can not compete with the native performance of staying inside one language.
This is not very important as long as there does not have to be much communication between Julia and the IPython kernel.

This changes drastically for animations, where big memory chunks have to be streamed to the rendering engine of the browser. It can be expected, that this will always be a weakness of IJulia.
On the other hand, GPU accelerated rendering in a web browser is also limited.
It relies on WebGL, which offers only a subset of OpenGL’s functionality. So while the execution speed of OpenGL can be expected to be similar, there are a lot of new techniques missing, which can speed up rendering.

To investigate this another benchmark has been created.
It is between GLVisualize and Compose3D, which was the only library found to be able to display 3D models created with Julia directly from the IJulia notebook.
This benchmark is not entirely fair, as Compose3D is just a rough prototype so far which wasn’t even published yet.
But there seems to be no other library with which you can easily create and display interactive 3D graphics in the IJulia or IPython notebook.
This benchmark creates a sierpinsky gasket and Compose3D displays it in the IJulia notebook while GLVisualize displays it natively in a window.

sierpinsky - Copy

Compose3D 15625
GLVisualize 1953125
Speed up 125x

Again, GLVisualize is an order of magnitude faster. This can change in the future when Compose3D matures.
But one needs to notice, that GLVisualize utilizes OpenGL’s instancing to gain this speed. Native instancing is not yet available in WebGL, which means that this optimization will not be available for the IPython notebook in the near future.

 

All in all, this looks pretty promising for GLVisualize.

It must be said though, that the numbers need to be treated with care, as it is quite hard to benchmark 3D scenes without the full control over the libraries. It might be, that I did something wrong in the setup, or that the library actually offers a lot more than GLVisualize, which in turn slows it down.

But it can definitely be said, that these are solid results for a pretty fresh prototype, competing with pretty mature libraries.

 

The Code is in my Github repository.

ModernGL vs GLEW vs PyOpenGL

By: Simon Danisch

Re-posted from: http://randomfantasies.com/2015/05/moderngl-vs-glew-vs-pyopengl/

Benchmark of ModernGL (Julia), GLEW ( C++) and PyOpenGL (Python).

glbench

Relative slowdown compared to GLEW:

glbench_table

 

Procedure:

Each function gets called 10^7 times in a tight loop. Execution time of the loop gets measured.

This got executed on windows 8.1 with an intel i5 and an intel hd 4400 video card.

Julia 0.4 has been used, the C++ version was compiled with VS13 and for python the anaconda distribution with Python 2.7 was used.

The OpenGL function loader from ModernGL has undergone some changes over the time.
Starting with a very simple solution, there have been pull requests to include better methods for the function loading.
The current approach in ModernGL master was not written by myself, but by the Github user aaalexandrov.
Before aaalexandrov’s approach, the fastest approach would have used a pretty new Julia feature, named staged functions.
It should in principle yield the best performance as it compiles a specialized version of the function when it gets called for the first time. This is perfect for OpenGL function loading, as the pointer to the function can only be queried after an OpenGL context has been created. When the staged function gets called the pointer can be queried and gets inlined into the just in time compiled function.

Staged functions only work with the newest Julia build, which is why aaalexandrov’s approach is favorable.

ModernGL seems to do pretty well compared to C++ and python does very badly, with being up to 470 times slower in the case of glClearColor.
Julia in contrast offers nearly the same speed as calling OpenGL functions from C++ as can be seen in the table.
As all the OpenGL wrappers are pretty mature by now and bind to the same C library (the video driver), this should mainly be a C function call benchmark.
Python performs badly here, but it must be noted that there are a lot of different Python distributions and some promise to have better C interoperability.
As this benchmarks goal is to show that Julia’s ccall interface is comparable to a c function call from inside C++, the python options have not been researched that thoroughly.
From this benchmark can be concluded, that Julia offers a solid basis for an OpenGL wrapper library.

The code and results can be found on github