By: Christian Groll
I recently did engage in a quite elaborate discussion on the julia-stats mailing list about mathematical operators for
DataFrames in Julia. Although I still do not agree with all of the arguments that were stated (at least not yet), I did get a very comforting feeling about the lively and engaged Julia community once again. Even one of the most active and busiest community members, John Myles White, did take the time to elaborately explain his point of view in the discussion – and this just might be the even higher good to me. Different opinions will always be part of any community. But it is the transparency of the discussions that tell you how strong a community is.
Still, however, mathematical operators are important to me, as I am quite frequently working with strictly real numeric data: no
Strings, and no columns of categorical IDs. Given Julia’s expressive language, it would be quite easy to implement any desired mathematical operators for
DataFrames on my own. However, I decided to follow what seems to be the consensus of the
DataFrame developers, and hence refrain from any individual deviations in this direction. Alternatively, I decided to simply relate any element-wise operators of multi-column
DataArray arithmetic, which allow most mathematical operators for individual columns. Viewed from this perspective, element-wise
DataFrame operators are nothing else than operators that are successively applied to individual columns of a
DataFrame, which are
As a consequence of this, I had to deepen my understanding of iterators, comprehensions and functions like
reduce. For future reference, I did sum up my insights in a slide deck, which anybody who is interested could find here, or as part of my IJulia notebook collection here.
For those of you who are using the TimeData package, the current road-map regarding mathematical operators will be the following: any types that are constrained to numeric values only (including the extension to
NA values) will carry on providing mathematical operators. These operators do perform some minimal checks upfront, in order to minimize risk of meaningless applications (for example, only adding up columns with equal names, equal dates,…). Furthermore, for any type that allows values other than numeric data these mathematical operators will not be defined. Hence, anybody in need of element-wise arithmetic for numeric data could easily make use of either
Timenum types (even if you do not need any time index). If you do, however, make sure to not mix up real numeric data and categorical data: applying mathematical operators or statistical functions like
mean to something like customer IDs most likely will lead to meaningless results.