By: n JuliaLang - The Julia programming language n
Re-posted from: https://julialang.org/blog/2022/02/10years/index.html
10 years ago today, we published 'Why we Created Julia' …
By: n JuliaLang - The Julia programming language n
Re-posted from: https://julialang.org/blog/2022/02/10years/index.html
10 years ago today, we published 'Why we Created Julia' …
Re-posted from: https://blog.esciencecenter.nl/can-python-with-julia-be-faster-than-low-level-code-cd71a72fbcf4?source=rss----ab3660314556--julia
By Abel Soares Siqueira and Faruk Diblen

In our last post, we were able to improve Python code using a few lines of Julia code. We were able to achieve a very interesting result without optimizing prematurely or using low-level code. However, what if we want more? In this blog post, we will investigate that.
It is quite common that a developer prototypes with a high-level language, but when the need for speed arises, they eventually move to a low-level language. This is called the “two-language problem”, and Julia was created with the objective of solving this issue (read more on their blog post from 2012). Unfortunately, achieving the desired speedup is not always easy. It depends highly on the problem, and on how much previous work was done trying to tackle it. Today we find out how much more we can speed up our Julia code, and how much effort it took.
Remember that we have set up a GitHub repository with our whole code, and also, that we have a Docker image for reproducibility.
Our first approach to speeding things up is to simulate what C++ is doing. We believe that the C++ version is faster because it can read the data directly as the desired data type. In Julia, we had to read the data as String and then convert it to Int. We don’t know how to do that with Julia. But we know how to do that with C.
Using Julia’s built-in ccall function, we can directly call the C functions to open and close a file, namely fopen and fclose, and call fscanf to read and parse the file at the same time. Our updated Julia code which uses these C functions is below.
Let’s see if that helped increase the speed of our code. We include in our benchmark the previous strategies as well. This new strategy will be called Julia + C parsing.


Our code is much more C-like now, so understanding it requires more knowledge about how C works. However, the code is way faster than our previous implementation. For files with more than 1 million elements, the Julia + C parsing strategy has a 10.38 speedup over the Pure Python strategy, on average. This is almost double the speedup we got with Prealloc Julia, which is an amazing result. For comparison, on average, C++ has a 16.37 speedup.
Our C approach was very fast, and we would like to replicate it with pure Julia. Unfortunately, we could not find anything in Julia to perform the same type of reading as fscanf. However, after some investigation, we found an alternative.
Using the read function of Julia, we can parse the file as a stream of bytes. This way we can manually walk through the file and parse the integers. This is the code:
We denote this strategy Optimized Julia. This version of the code manually keeps track of the sequence of bytes related to integers, so it is much less readable. However, this version achieves an impressive speedup, surpassing the C++ version:


It was not easy to get to this point, and the code itself is convoluted, but we managed to achieve a large speedup in relation to Python using only Julia, another high-level language. The average speedup for files with over 1 million elements is 40.25, which is over 2 times faster than what we got with the C++ strategy. We remark again that the Pure Python and C++ strategies have not been optimized, and that readers can let us know in the comments if they found a better strategy.
So yes, we can achieve a speedup equivalent to a low-level language using Julia.
One thing to keep in mind is that to achieve high speedups, we had to put more effort into getting to that point. This effort comes in diverse ways:
Here’s our conclusion. To achieve a high speedup, we need specialized knowledge which requires a big effort. However, we can conclude as well that, if you are not familiar with either C++ or Julia, then acquiring some knowledge in Julia allows you to get a smaller improvement. That is, a small effort with Julia already gets you some speedup. You can prototype quickly in Julia and get a reasonable result and keep improving that version to get C-like speedups over time.

We hope you have enjoyed the series and that it helps you with your code in any way. Let us know what you think and what you missed. Follow us for more research software content.
Many thanks to our proofreaders and reviewers, Elena Ranguelova, Jason Maassen, Jurrian Spaaks, Patrick Bos, Rob van Nieuwpoort, and Stefan Verhoeven.
Can Python with Julia be faster than low-level code? was originally published in Netherlands eScience Center on Medium, where people are continuing the conversation by highlighting and responding to this story.
Re-posted from: https://bkamins.github.io/julialang/2022/02/11/mappure.html
The map function is borrowed from functional programming languages in Julia.
In many such languages you can safely assume that the functions you pass to
map are pure, meaning that they have no side effects and always return the
same output for the same passed arguments.
In practice, however, you sometimes pass a non-pure function to map. In this
post I present selected cases when this can lead to surprising results.
The post was written under Julia 1.7.0 and PooledArrays.jl 1.4.0.
mapAssume you have a vector of data and you want to add some noise to values stored
in it. Here is an example how you can do it using map:
julia> using Random
julia> Random.seed!(1234);
julia> x = zeros(5)
5-element Vector{Float64}:
0.0
0.0
0.0
0.0
0.0
julia> map(v -> v + randn(), x)
5-element Vector{Float64}:
-0.3597289068234817
1.0872084924285859
-0.4195896169388487
0.7189099374659392
0.4202471777937789
As expected we have added a different random number to every element of the
passed vector.
Let us now move to a case when things get problematic:
julia> using SparseArrays
julia> y = sparse(x)
5-element SparseVector{Float64, Int64} with 0 stored entries
julia> Random.seed!(1234);
julia> map(v -> v + randn(), y)
5-element SparseVector{Float64, Int64} with 5 stored entries:
[1] = -0.359729
[2] = -0.359729
[3] = -0.359729
[4] = -0.359729
[5] = -0.359729
To our surprise we have added the same noise to every element of y. The reason
is that the map function assumes for SparseVector that the function passed
to it is pure. Therefore it is called only once for 0.0 element of sparse
vector (note that technically this element is not stored in the vector).
Let us try another sparse vector:
julia> z = sparse(ones(5)) .- 1.0
5-element SparseVector{Float64, Int64} with 5 stored entries:
[1] = 0.0
[2] = 0.0
[3] = 0.0
[4] = 0.0
[5] = 0.0
julia> Random.seed!(1234);
julia> map(v -> v + randn(), z)
5-element SparseVector{Float64, Int64} with 5 stored entries:
[1] = 1.08721
[2] = -0.41959
[3] = 0.71891
[4] = 0.420247
[5] = -0.685671
What is going on? There are two things to observe:
0.0 values in the vector so for each entry in themap internally calledrandn() one more time for the default 0.0 value (which is not storedThings look complex. The short lesson is: do not use the default map with
sparse vectors when you want to use a non-pure function.
As a side note, the same problem is present with broadcasting:
julia> Random.seed!(1234);
julia> (v -> v + randn()).(y)
5-element SparseVector{Float64, Int64} with 5 stored entries:
[1] = -0.359729
[2] = -0.359729
[3] = -0.359729
[4] = -0.359729
[5] = -0.359729
map with non-pure function and sparse vectorFortunately there is a relatively simple way to resolve our problems. We need to
use the Base.@invoke macro. In this way we can force Julia to use the default
map method (not the optimized one that the SparseArrays module defines).
Here is how you can do it:
julia> Random.seed!(1234);
julia> Base.@invoke map(v -> v + randn(), y)
5-element Vector{Float64}:
-0.3597289068234817
1.0872084924285859
-0.4195896169388487
0.7189099374659392
0.4202471777937789
julia> Random.seed!(1234);
julia> Base.@invoke map(v -> v + randn(), z)
5-element Vector{Float64}:
-0.3597289068234817
1.0872084924285859
-0.4195896169388487
0.7189099374659392
0.4202471777937789
This time we get the same results as for the x vector. Note that the returned
object has Vector type (previously we were getting SparseVector).
The same problem as for SparseVector is present in PooledArrays.jl. However,
here, by default, the map function assumes that the function you pass to it
is not-pure for safety:
julia> using PooledArrays
julia> p = PooledArray(x)
5-element PooledVector{Float64, UInt32, Vector{UInt32}}:
0.0
0.0
0.0
0.0
0.0
julia> Random.seed!(1234);
julia> map(v -> v + randn(), p)
5-element PooledVector{Float64, UInt32, Vector{UInt32}}:
-0.3597289068234817
1.0872084924285859
-0.4195896169388487
0.7189099374659392
0.4202471777937789
If you are sure that you are passing a pure function to map in this case
you can pass pure=true keyword argument to speed things up:
julia> Random.seed!(1234);
julia> map(v -> v + randn(), p, pure=true)
5-element PooledVector{Float64, UInt32, Vector{UInt32}}:
-0.3597289068234817
-0.3597289068234817
-0.3597289068234817
-0.3597289068234817
-0.3597289068234817
(or to break things, as in this case, since we passed a non-pure function)
In summary let me highlight that the situation we have discussed in this post is
a hard design decision. The reason is that most of the time you indeed pass
pure functions to the map function. Therefore having methods that are
optimized for performance makes sense. However, sometimes, you want to perform
mapping using a non-pure function, in which case you can get surprising results.
Hopefully, after reading this post you now know how to handle such cases.
Happy hacking!