Author Archives: Christian Groll

Element-wise mathematical operators and iterator slides

By: Christian Groll

Re-posted from: http://grollchristian.wordpress.com/2014/08/06/iterators-and-comprehensions-slides/

I recently did engage in a quite elaborate discussion on the julia-stats mailing list about mathematical operators for DataFrames in Julia. Although I still do not agree with all of the arguments that were stated (at least not yet), I did get a very comforting feeling about the lively and engaged Julia community once again. Even one of the most active and busiest community members, John Myles White, did take the time to elaborately explain his point of view in the discussion – and this just might be the even higher good to me. Different opinions will always be part of any community. But it is the transparency of the discussions that tell you how strong a community is.

Still, however, mathematical operators are important to me, as I am quite frequently working with strictly real numeric data: no Strings, and no columns of categorical IDs. Given Julia’s expressive language, it would be quite easy to implement any desired mathematical operators for DataFrames on my own. However, I decided to follow what seems to be the consensus of the DataFrame developers, and hence refrain from any individual deviations in this direction. Alternatively, I decided to simply relate any element-wise operators of multi-column DataFrames to DataArray arithmetic, which allow most mathematical operators for individual columns. Viewed from this perspective, element-wise DataFrame operators are nothing else than operators that are successively applied to individual columns of a DataFrame, which are DataArrays.

As a consequence of this, I had to deepen my understanding of iterators, comprehensions and functions like vcat, map and reduce. For future reference, I did sum up my insights in a slide deck, which anybody who is interested could find here, or as part of my IJulia notebook collection here.

For those of you who are using the TimeData package, the current road-map regarding mathematical operators will be the following: any types that are constrained to numeric values only (including the extension to NA values) will carry on providing mathematical operators. These operators do perform some minimal checks upfront, in order to minimize risk of meaningless applications (for example, only adding up columns with equal names, equal dates,…). Furthermore, for any type that allows values other than numeric data these mathematical operators will not be defined. Hence, anybody in need of element-wise arithmetic for numeric data could easily make use of either Timematr or Timenum types (even if you do not need any time index). If you do, however, make sure to not mix up real numeric data and categorical data: applying mathematical operators or statistical functions like mean to something like customer IDs most likely will lead to meaningless results.

Filed under: Julia Tagged: iterators, map, slides

Julia syntax features

By: Christian Groll

Re-posted from: http://grollchristian.wordpress.com/2014/07/20/julia-syntax-features/

In one of my last posts I already tried to point out some advantages of Julia. Two of the main arguments are quite easily made: Julia is comparatively fast and free and open source. In addition, however, Julia also has a very powerful and expressive syntax compared to other programming languages, but this advantage is maybe less obvious to understand. Hence, I recently gave a short talk where I tried to extend a little bit on this point, while simultaneously also showing some of the convenient publishing feature of the IJulia backend. I thought I’d just share the outcome with you, just in case that anyone else could use the slides to convince some people of Julia’s powerful syntax. In addition to the slides, you can also access the presentation rendered as ijulia notebook here.

Filed under: Julia Tagged: ijulia, Julia, slides

Julia language: A letter of recommendation

By: Christian Groll

Re-posted from: http://grollchristian.wordpress.com/2014/04/25/julia-language-recommendation/

After spending quite some time using Julia (a programming language for technical computing) during the last few months, I am confident enough to provide kind of a “letter of recommendation” by now. Hence, I decided to list some of the features that make Julia appealing to me, while also interspersing some resources on Julia that I found helpful and worth sharing.

1 It is free

Julia language is develop under the MIT open source license and hence can be used free of charge. Open source is a highly desirable feature to me, especially in research, as it promotes cooperation and interchange between researchers. That being said, Julia easily stands up to any comparison with proprietary software (like MATLAB) as well, and I hope this will become clear in the following.

2 It is fast

Although I did not make any formal performance comparisons with other programming languages so far, I can at least assure you that Julia feels quite fast on a day-to-day standard usage to me, especially in comparison to R. On the homepage, however, there are some formal benchmarks listed that indicate a really good performance compared to other languages. Of course, these are just some made-up test cases. Objective comparison in real applications, however, is quite hard to achieve, since languages like R make substantial use of C code in almost any computationally intensive package under the hood. For the sake of both efficiency and reliability, however, I think that researchers should generally avoid usage of low-level software languages like C. As most researchers did never get a true and deep training in software development, such low-level languages simply are too error-prone, especially if you refrain from any thought out extensive and well-structured software testing. So, excluding factoring out code parts into C, I am quite confident that Julia truly is faster than R, and at least equally fast as Matlab. Furthermore, Julia allegedly was designed to enable things like parallel computing and big data handling from scratch.

Nothing comes without a price, however, and hence truly leveraging performance capabilities forces you to deal with data types more explicitly. This happens in Julia to a far lower degree than in C, but it still can be un-intuitive and cumbersome at some points, and it especially complicates the learning process in the beginning. But, dealing with types more explicitly also allows some additional benefits like multiple dispatch, where the behavior of a function can be defined across many combinations of argument types.

3 It is expressive

The next selling point is probably a bit underestimated in general, since it is harder to understand its true benefit than, for example, when we are talking just about speed. The point is that the syntax of Julia is very rich and expressive. Having my roots in MATLAB, I myself generally favor the syntax of Julia (which is fundamentally similar to MATLAB) over the syntax of R, as it appears to be cleaner to me (of course: this is a matter of taste!). More generally, however, the syntax of Julia is much richer, such that it allows a high level of customization. For example, Julia is able to mimic R formula syntax as it is done in the GLM package. Furthermore, you can build your own types that behave exactly the way you want. For example, one of my first projects in Julia was to create a type that is especially suited for time series data (I wrapped it up in the package TimeData). Thereby, I could specify, for example, the way that objects are displayed, that mathematical functions do not apply on the time index column but to numeric data only, and that entries can be accessed and indexed through date strings. Also, the rich syntax includes meta-programming capabilities (generating code through code), and macros, making unit testing in Julia as straightforward as it could be (yes, I really think that software testing is indispensable when software is used and disclosed in research!). In order to get a feeling of the richness of the syntax and the infinite possibilities that it provides you could exemplarily check out a use case of iterators in Julia shown in this post. Or, for an impression on compactness and intuitive appeal of the syntax, there is a blog post on Econometrics by Simulation that compares it to R.

4 It is transparent

What I really like about Julia is that it was based on open source software development practices right away, such that it is tightly integrated with version control through git and github. This way, the complete code base is easily accessible and readable, and the github platform provides the best environment for further contributions and cooperation. For example, code improvements from any third person can easily be integrated into the code base, ultimately promoting cooperation. The times where you had to provide your code extension to some package author via email are gone. Also, github allows usage of automated testing services like Travis, such that code ultimately becomes more robust with less bugs.

5 It is illustrative

Meanwhile, there already is a bunch of graphics packages out there, for both on-the-fly visualizations and publication-ready graphics, as well as for graphics that suit for html formats:

Even more, there also exists an interface to Python’s interactive graphical notebook (IJulia), which allows to combine code, formatted text, math, and multimedia in a single document.

6 It is growing

Due to its outstanding features, Julia also seems to get increasing attention worldwide (you can find some numbers and graphics on the community in this post). With JuliaStudio, there even exists an integrated development environment similar to RStudio already.

7 It is unfinished

Despite all these positive features, there also are some deficiencies that yet need to be overcome. As a matter of fact, we have not yet reached an official 1.0 release of the language. Hence, code development sometimes can be a little bit more cumbersome than necessary, for example due to the following problems:

  • sometimes Julia still crashes and needs to be re-started
  • variables can not be completely removed from workspace
  • in my opinion, MATLAB still has by far the best debugger – Julia does lag behind here
  • any changes to type definitions require re-starting of Julia

And, of course, as a comparatively new language Julia still lacks some of the extensive libraries that already have been implemented for other languages and yet need to be imported into Julia. (That being said, there already exist quite good interfaces to other languages, in order to make some of these libraries also available in Julia).

However, keep in mind that this list of deficiencies is only a description of the state of Julia at the time of writing, and I am quite confident that it will be outdated rather soon.

8 Resources

Besides the helping material provided by the official homepage, here are some links to additional resources that I found helpful:

Filed under: Julia Tagged: Julia