Author Archives: Simon Danisch

GLVisualize Benchmark

By: Simon Danisch

Re-posted from: http://randomfantasies.com/2015/05/glvisualize-benchmark/

This is a benchmark comparing GLVisualize against some popular scientific visualization libraries, namely Mayavi, Vispy and Matlab.

There is also a chapter on IJulia, which is not really a plotting library, but can incorporate plots from other libraries.

The biggest problem with bench marking 3D rendering speed is, that there is no library which will allow to exactly reproduce similar conditions and measures.
Additionally, without extensive knowledge of the library, it is difficult to foresee what gets bench marked.
As an example of why it is difficult to measure the frame rate we can look at Vispy. When you enable to measure the frame rate, it will show very low frame rates, as it only creates a new frame on demand.
In contrast, GLVisualize has a fixed render loop, which renders as much frames as possible, leading to totally different amount of rendered frames per second (which is admittedly a total waste of GPU time and will change in the future).
This is why it was decided, to use the threshold at which a similar 3D scene is still conceived as enjoyable and interactive. Usually the minimal amount of frames per second for perceiving movements as smooth is roughly around 25.
So the benchmark was executed in the way, that the number regulating the complexity of the 3D scene was increased until one could not move the camera without stutters. The recorded last enjoyable threshold is than the result of the Benchmark.

vispy_mayavi_romeo

First benchmark is an animated and still 3D surface plot. The libraries offering this functionality where Vispy, Mayavi and Matlab.

Library Still Animated
Vispy  300  80
Mayavi 800 150
Matlab 800 450
GLVisualize 900 600
Speed up Vispy 9x  56x
Speed up Mayavi 1.26x 16x
Speed up Matlab 1.26x 1.7x

(Speed up of GLVisualize)
Vispy had some issues, as the camera was never really smooth for the surface example. Also the normals were missing and there was no option to colorize the surface depending on the height.
It was decided to use the threshold of going from a little stutter to unpleasant stutters, making vispy not completely fail this benchmark.
For Vispy it was found out, that the normals where calculated on the CPU resulting in a major slow down. The same can be expected for Mayavi, but Mayavi seems to be faster at calculating the normals.
There is not much information available on how Matlab renders their visualization, as it is closed source. It has to be noted, that Matlab did some additional calculations to always fit the drawing ideally into the borders.

On the other hand, Matlab is using a gouraud shading model, which needs quite a bit less processing power than phong-blinn shading, which is used by GLVisualize.


 

romeo_mayavi_particles2

The next benchmark is only between GLVisualize and Mayavi, as the other libraries did not offer a comparable solution. Matlab does not allow to use cubes as particle primitives and Vispy only had an example, where you needed to write your own shader, which can not be seen as a serious option. This is a benchmark for easy to use and high level plotting libraries. It is always possible to write an optimal version yourself in some framework, but what really interesting is, how well you can solve a problem with the tools the library has readily available.

Library  Still  Animated
Mayavi 90000 2500
GLVisualize 1000000 40000
Speed up 11x 16x

GLVisualize is an order of magnitude faster in this specific benchmark. This is most likely due to the fact that GLVisualize uses OpenGL’s native instance rendering.

On a side note, GLVisualize was the only library that allowed to use any arbitrary mesh as a particle.


 

IJulia

It was not possible to compare IJulia directly with GLVisualize, as the feature set for plotting is too different.

But there are certain factors, which indicate, that is hard to reach optimal performance with IJulia.
First of all, IJulia uses ZMQ to bridge the web interface with the Julia kernel.
ZMQ is a messaging system using different sockets for communication like inproc, IPC, TCP, TIPC and multicas.
While it is very fast at it’s task of sending messages, it can not compete with the native performance of staying inside one language.
This is not very important as long as there does not have to be much communication between Julia and the IPython kernel.

This changes drastically for animations, where big memory chunks have to be streamed to the rendering engine of the browser. It can be expected, that this will always be a weakness of IJulia.
On the other hand, GPU accelerated rendering in a web browser is also limited.
It relies on WebGL, which offers only a subset of OpenGL’s functionality. So while the execution speed of OpenGL can be expected to be similar, there are a lot of new techniques missing, which can speed up rendering.

To investigate this another benchmark has been created.
It is between GLVisualize and Compose3D, which was the only library found to be able to display 3D models created with Julia directly from the IJulia notebook.
This benchmark is not entirely fair, as Compose3D is just a rough prototype so far which wasn’t even published yet.
But there seems to be no other library with which you can easily create and display interactive 3D graphics in the IJulia or IPython notebook.
This benchmark creates a sierpinsky gasket and Compose3D displays it in the IJulia notebook while GLVisualize displays it natively in a window.

sierpinsky - Copy

Compose3D 15625
GLVisualize 1953125
Speed up 125x

Again, GLVisualize is an order of magnitude faster. This can change in the future when Compose3D matures.
But one needs to notice, that GLVisualize utilizes OpenGL’s instancing to gain this speed. Native instancing is not yet available in WebGL, which means that this optimization will not be available for the IPython notebook in the near future.

 

All in all, this looks pretty promising for GLVisualize.

It must be said though, that the numbers need to be treated with care, as it is quite hard to benchmark 3D scenes without the full control over the libraries. It might be, that I did something wrong in the setup, or that the library actually offers a lot more than GLVisualize, which in turn slows it down.

But it can definitely be said, that these are solid results for a pretty fresh prototype, competing with pretty mature libraries.

 

The Code is in my Github repository.

Speed Expectations for Julia

By: Simon Danisch

Re-posted from: http://randomfantasies.com/2015/05/speed-expectations-for-julia/

In this blog post I want to analyse the speed of Julia a little bit.
It is a very tedious task to write representative benchmarks for a programming language.
The only way out is to rely on a multitude of sources and try to find analytical arguments.
Julia’s own benchmark suite will be used in addition to two other benchmarks that I’ve found to be relevant.
In addition, the general compiler structure of Julia will be shortly analyzed to find indicators for Julia’s overall performance.
juliabench
(data from julialang)
In this first benchmark we can see that Julia stays well within the range of C Speed.
Actually, it even comes second to C-speed with no other language being that close.

Adding to this, Julia also offers a concise and high-level coding style. This unique combination of conciseness and speed is well illustrated in this graphic:

julia_codevstime

(different view of the data from julialang)

This is a very promising first look at Julia, but it should be noted, that these benchmarks are mainly written by the Julia core team.
So it is not guaranteed, that there is not an (unintentional) bias favoring Julia in these Benchmarks.

There is another benchmark comparing C++, Julia and F#, which was created by Palladium Consulting which should not have any interest in favoring one of the languages.
They compare the performance of C++, Julia and F# for an IBM/370 floating point to IEEE floating point conversion algorithm. This is part of a blog series written by Palladium Consulting.
F# comes out last with 748.275 ms, than Julia with 483.769 ms and finally C++ with 463.474 ms.
At the citation time, the Author had updated the C++ version to achieve 388.668 ms.
It does not say if the author put additional time into making the other versions faster as well, so it can not be said that the other versions could not have been made faster too.

The last Julia benchmark is more real world oriented.
It is comparing Finite Element solver, which is an often used algorithm in material research and therefore represents a relevant use case for Julia.

N Julia FEniCS(Python + C++) FreeFem++(C++)
121 0.99 0.67 0.01
2601 1.07 0.76 0.05
10201 1.37 1.00 0.23
40401 2.63 2.09 1.05
123201 6.29 5.88 4.03
251001 12.28 12.16 9.09

(taken from codeproject.)
These are remarkable results, considering that the author states it was not a big effort to achieve this. After all, the other libraries are established FEM solvers written in C++, which should not be easy to compete with.

This list could go on, but it is more constructive to find out Julia’s limits analytically.
Julia’s compilation model can be described as statically compiled at run-time. This means, as long as all types can be inferred at run-time, Julia will have in the most cases identical performance to C++. (See Julia’s performance tips, to get an idea of what needs to be done in order to achieve this)
The biggest remaining difference in this case will be the garbage collection.

Julia 0.3 has a mark and sweep garbage collector, while Julia 0.4 has an incremental garbage collector.
As seen in the benchmarks, it does not necessarily introduce big slowdowns.
But there are issues, where garbage collection introduces a significant slow down.
Analyzing this further is not in the scope of this blog post, though.
But it can be said that Julia’s garbage collector is very young and only the future will show how big the actual differences will be.

Another big difference is the difference in between different compiler technologies.
One of LLVM’s biggest alternatives is gcc, which offers similar functionality to LLVM’s C/C++ language front-end Clang.

If C++ code that is compiled with gcc is much faster than the same code compiled with LLVM-Clang, the gcc version will also be faster as a comparable Julia program.
In order to investigate the impact of this, one last benchmark series will be analyzed.
This is a summary of a series of articles posted on Phoronix, which bench marked gcc 4.92 against LLVM 3.5 and LLVM 3.5 against LLVM 3.6.

Statistic Speedup gcc 4.9 vs LLVM 3.5  Speedup LLVM 3.6 vs 3.5
mean 0.99 0.99
median 0.97 1.00
maximum 1.48 1.10
minimum 0.39 0.88

Bigger is better for LLVM.

Sources: gcc vs LLVMLLVM 3.5 vs LLVM 3.6

The results suggest, that LLVM is well in the range of gcc, even though that there can be big differences between the two.
These are promising results, especially if you consider that LLVM is much younger than gcc.
With big companies like Apple, Google, AMD, Nvidia and Microsoft being invested in LLVM, it is to be expected that LLVM will stay competitive.

So Julia should also stay competitive as it directly profits from any developments in LLVM.

This makes Julia a relatively save bet for the future!

ModernGL vs GLEW vs PyOpenGL

By: Simon Danisch

Re-posted from: http://randomfantasies.com/2015/05/moderngl-vs-glew-vs-pyopengl/

Benchmark of ModernGL (Julia), GLEW ( C++) and PyOpenGL (Python).

glbench

Relative slowdown compared to GLEW:

glbench_table

 

Procedure:

Each function gets called 10^7 times in a tight loop. Execution time of the loop gets measured.

This got executed on windows 8.1 with an intel i5 and an intel hd 4400 video card.

Julia 0.4 has been used, the C++ version was compiled with VS13 and for python the anaconda distribution with Python 2.7 was used.

The OpenGL function loader from ModernGL has undergone some changes over the time.
Starting with a very simple solution, there have been pull requests to include better methods for the function loading.
The current approach in ModernGL master was not written by myself, but by the Github user aaalexandrov.
Before aaalexandrov’s approach, the fastest approach would have used a pretty new Julia feature, named staged functions.
It should in principle yield the best performance as it compiles a specialized version of the function when it gets called for the first time. This is perfect for OpenGL function loading, as the pointer to the function can only be queried after an OpenGL context has been created. When the staged function gets called the pointer can be queried and gets inlined into the just in time compiled function.

Staged functions only work with the newest Julia build, which is why aaalexandrov’s approach is favorable.

ModernGL seems to do pretty well compared to C++ and python does very badly, with being up to 470 times slower in the case of glClearColor.
Julia in contrast offers nearly the same speed as calling OpenGL functions from C++ as can be seen in the table.
As all the OpenGL wrappers are pretty mature by now and bind to the same C library (the video driver), this should mainly be a C function call benchmark.
Python performs badly here, but it must be noted that there are a lot of different Python distributions and some promise to have better C interoperability.
As this benchmarks goal is to show that Julia’s ccall interface is comparable to a c function call from inside C++, the python options have not been researched that thoroughly.
From this benchmark can be concluded, that Julia offers a solid basis for an OpenGL wrapper library.

The code and results can be found on github