By: Christian Groll

Re-posted from: http://grollchristian.wordpress.com/2014/01/22/julia-inheriting-behavior/

In object oriented programming languages, classes can inherit from classes of objects on a higher level in the class hierarchy. This way, methods of the superclass will apply to the subclass as well, given that they are not explicitly re-defined for the subclass. In many regards, super- and subclasses hence behave similarly, allowing the same methods to be applied and similar access behavior. Joined together, they hence build a coherent user interface.

In Julia, such a coherent interface for multiple types requires a little bit of extra work, since Julia does not allow subtyping for composite types. Nevertheless, Julia’s flexibility generally allows composite types to be constructed such that they emulate the behavior of some already existing type. This only requires a little bit of extra coding, but can be implemented efficiently through metaprogramming.

## Table of Contents

## 1 Justification of inheritance

Such an emulation or inheritance of behavior is desired in basically two situations.

### 1.1 Implementing constrained subsets

The first situation where behavior inheritance could be useful is when interest lies only in a subset of all possible instances of an already existent type. For example, let’s assume that we want to create a new composite type `Simplex`

, which shall contain all points on the -dimensional simplex . Thereby, the simplex is defined as the set of -dimensional points with individual entries being positive and summing up to one:

Clearly, is a subset of . And, for points of we already have a quite simple and suitable type in Julia: `Array{Float64, 1}`

. However, representing the simplex in Julia, we would like to have a type that corresponds to only a subset of all instances of `Array{Float64, 1}`

. This way, we can achieve more robust coding, since users will not be able to manually break the summation constraint too easily. Also, the creation of a new composite type allows overloading function definitions. We hence could create a method `plot(x::Simplex)`

which suits the specific characteristics of the subset of . On the other hand, however, any instance of `Simplex`

still represents an element of , and hence ideally should behave as such in Julia as well. Any methods generally working on `Array{Float64, 1}`

should also be working for instances of `Simplex`

. In contrast to some object oriented languages, Julia does allow subtypes only for abstract types, and not for composite types. Hence, even though any mathematical operation on arbitrary points in also holds simultaneously for points of as well, methods of `Array{Float64, 1}`

will not automatically apply for instances of type `Simplex`

.

A naive way to fix this would be through definition of an appropriate conversion method, allowing to re-interpret instances of `Simplex`

as instances of `Array{Float64, 1}`

. This way, an already existing method of `Array{Float64, 1}`

could be generally expressed for type `Simplex`

:

function f(x::Simplex) xAsArray = convert(Array{Float64, 1}, x) return f(xAsArray) end

Even more efficiently, one could incorporate a tight relation to `Array{Float64, 1}`

into type `Simplex`

from scratch up, defining `Simplex`

such that it contains only one field, which is of type `Array{Float64, 1}`

.

type Simplex points::Array{Float64, n} end

This way, the conversion step becomes unnecessary, as methods only need to be delegated to the respective field of `Simplex`

:

f(x::Simplex) = f(x.points)

Hence, we now already have a quite good starting point for inheritance of behavior. Instances of type `Simplex`

, when plugged into the function, return the same value as if the -dimensional simplex point was simply represented as `Array{Float64, 1}`

. Nevertheless, as we will see further on, a lot of intricacies are yet to come. But, before we get there, we first take a look at a second situation where inheritance of behavior might be desired.

### 1.2 Extending existing types

In the first case we where implementing a true subset relation, with . Here, we will take a given type and try to extend it with some new attribute. For example, let’s assume that we had to deal with numeric time series data that does not have any missing values. Of course, the data could be suitably stored and processed as instance of the very general type `DataFrame`

. Time information would simply be stored in the first column, with values of the observations in the subsequent columns. However, data might be better stored as instances of a type that is more tailored to the specific characteristics of the data. For example, as the data values itself are numeric (except for the time column), they permit statistical operations like, for example, deriving mean, minimum and maximum values. Separating numeric observations from dates, code becomes more robust, as we are able to better distinguish between functions applying to the numerical part of the data and functions applying to the dates information.

For illustration, let’s look at two distinct ways of translating time series data into Julia data types. In the first case, we simply treat the data as true subset of type `DataFrame`

, although in a newly created type `TimeSeriesDf`

. This way, we still can overload functions and create methods that take into account the specific characteristics of the data. For example, a method `plot(ts::TimeSeriesDf)`

would show date labels on the x-axis. Hence, we define:

type TimeSeriesDf vals::DataFrame end

The problem is, that it becomes more tedious to separate dates and numeric data. For example, a method `mean(ts::TimeSeriesDf)`

now might be implemented as

mean(ts::TimeSeriesDf) = mean(ts.vals[:, 2:end])

In contrast, we could also define a type `TimeSeries`

that has two fields, where the first field deals with dates, and the second field contains numeric observations only.

type TimeSeries dates::Array{Date{ISOCalendar}, 1} vals::DataFrame end

The distinction between dates and numeric data hence becomes more hard-coded into the data type. Here, a mean method would require no additional indexing:

mean(ts::TimeSeries) = mean(ts.vals)

In a way, instances of type `TimeSeries`

are no true subtypes of type `DataFrame`

anymore, as they are characterized through an additional dates field. It extends a simple `DataFrame`

with an additional array of dates. Ideally, such a type should inherit behavior of both individual field types, making its application a naturally extension to already existing types. For example, adding a scalar value should take into account field `vals`

only:

+(ts::TimeSeries, x::Numeric) = +(ts.vals, x)

This way, one can achieve a very nice and intuitive way of handling time series data. For more information, simply take a look at package TimeData, where you can see the benefits of type inheritance and type extension in a more elaborate use case.

In my opinion, these are the two most common cases where one would like to have a new composite type behaving like some already existent type. In both cases, the key lies in the delegation of functions to certain fields of the composite type.

## 2 Intricacies

Let’s now take a more detailed look at the intricacies that we have to expect in the course of implementation. Thereby, we want to look at the most general case, where we want to simultaneously create multiple new types. Each type shall borrow from the same existent type, and implement multiple functions with probably multiple methods per function. To avoid theorizing only, we want to look at a concrete example, where we will create types that inherit from `DataFrame`

. The first type is called `NumDf`

, and it represents `DataFrames`

that consist of either numeric values or `NAs`

only. The second type is called `ArrayDf`

, and it even excludes `NAs`

, too. Hence, for `ArrayDf`

, the values itself could be stored as `Array{Float64, 2}`

. However, relating it to `DataFrames`

still allows to make use of column names. Furthermore, for reasons that will become clear later, we also render both types subtypes of an abstract type `AbstractDf`

. For both types, the inner constructor will first need to check whether the constraints on the data are fulfilled.

abstract AbstractDf type NumDf <: AbstractDf vals::DataFrame function NumDf(df::DataFrame) chkForNumericValues(df) return new(df) end end type ArrayDf <: AbstractDf vals::DataFrame function ArrayDf(df::DataFrame) chkForNumericValues(df) chkNoNAs(df) return new(df) end end

In the example, the newly created types consist of only one field which already is the same type as the type that should be emulated. However, this restriction is only for reasons of simplicity, and all subsequent propositions could easily be transferred to more complex structures, too. Again, if you are interested in this, simply take a look at package TimeData. It implements very similar types that also borrow functionality from `DataFrames`

, but with an additional field for dates:

type Timenum <: AbstractTimenum vals::DataFrame dates::DataArray function Timenum(vals::DataFrame, dates::DataArray) chkDates(dates) chkNumDf(vals) if(size(vals, 1) != length(dates)) if (length(dates) == 0) | (size(vals, 1) == 0) return new(DataFrame([]), DataArray([])) end error("number of dates must equal number of columns of data") end return new(vals, dates) end end

### 2.1 Type preservation

We already have seen examples of functions that are easily delegated to the respective field of the new type. For functions like `size()`

this makes perfect sense, and the method definition becomes:

size(nd::NumDf) = size(nd.vals)

However, not all the time this kind of perfect delegation is how we want methods to behave. Simply delegating all methods this way, we would end up permanently escaping our own data type. For example, simply delegating method `exp(nd::NumDf)`

would indeed evaluate the exponential function on each individual entry. However, it would return these values as an instance of type `DataFrame`

, as does method `exp(df::DataFrame)`

:

exp(nd::NumDf) = exp(nd.vals) nd = NumDf(DataFrame(rand(3, 2))); nd2 = exp(nd);

typeof(nd) typeof(nd2)

NumDf (constructor with 1 method) DataFrame (constructor with 22 methods)

Given that the returned values still meet the required constraints, we would like the method to return the resulting values as type `NumDf`

again. Otherwise there would be no persistence in our data representation, as we would end up with type `DataFrame`

sooner or later anyways.

Hence, we need to write some methods such that the original type of our data will remain unaffected. Methods get delegated to methods of the respective emulated type, but the returned value needs to be transferred back to the inheriting type. Therefore, we simply need to hand over the result to the constructor at the end:

function exp(nd::NumDf) valsDf = exp(nd.vals) return NumDf(valsDf) end

We hence need to state more precisely what “inheriting behavior” should mean for any given method: should it return exactly the same output as the emulated type, or should it wrap up the resulting values at the end, in order to return an instance of the same type as the caller type? Both alternatives require slightly different code. Also, this distinction will become relevant in the course of meta-programming for more general cases comprising multiple functions and multiple new types.

### 2.2 Multiple method signatures

The next complication arises for functions with multiple method signatures. Due to multiple dispatch, functions may well have multiple methods defined that need to be delegated. For example, you can examine the size of DataFrames with two different methods `size()`

:

size(df::AbstractDataFrame) size(df::AbstractDataFrame, i::Integer)

In principle, each of both methods could just be delegated for itself.

size(tn::NumDf) = size(tn.vals) size(tn::NumDf, i::Integer) = size(tn.vals, i)

This, of course, seems to be redundant, as it is quite cumbersome to delegate each method individually. Hence, one might be tempted to tackle both methods in one assignment, making use of variable function arguments:

size(tn::NumDf, args...) = size(tn.vals, args...)

Although this seems quite reasonable, one still needs to be very careful with this approach, since it could easily lead to method ambiguities. For example, let’s take a look at function .== for DataFrames. Amongst others, there exist the following methods:

.==(a::DataFrame,b::NAtype) .==(a::DataFrame,b::DataFrame) .==(a::DataFrame,b::Union(Number,String))

Using variable function arguments, the delegation would be implemented as:

.==(x::NumDf, args...) = .==(x.vals, args...)

This, however, will lead to the following warning:

Warning: New definition .==(NumDf,Any) at none:1 is ambiguous with: .==(Any,AbstractArray{T,N}) at bitarray.jl:1450. To fix, define .==(NumDf,AbstractArray{T,N}) before the new definition.

Hence, there are two methods that possibly could be called for `(NumDf,AbstractArray{T,N})`

, and Julia will kind of randomly pick one of them.

Although in this case both methods `.==`

would most likely lead to the same result when called with `.==(NumDf,AbstractArray{T,N})`

anyways, it is considered poor style to simply ignore these warnings.

Furthermore, the ambiguity is not a side effect of the variable function argument only, but it also appears with the following method definition:

`.==(x::NumDf, y::Any) = .==(x.vals, y)`

In my opinion, it is better to refrain from extensive method delegations with `args...`

or `Any`

whenever possible. Even if your own code worked, excessive use of such extensive delegations could still impose problems for other people that want to build on your code.

Nevertheless, this does not mean that you have to type any individual method definition by itself! For example, a really large part of methods of DataFrames comes with the exact same method signatures:

f(b::NAtype,a::DataFrame) f(a::DataFrame,b::NAtype) f(a::DataFrame,b::DataFrame) f(a::DataFrame,b::Union(String,Number)) f(b::Union(String,Number),a::DataFrame)

Using macros and metaprogramming, one can easily define all functions with equal method signatures simultaneously in one rush.

## 3 Implementation

Given these intricacies, let’s look at the actual implementation of inheritance for the most general case of multiple inheriting types, multiple functions and possibly multiple methods per function.

In a first step, all functions need to be classified with respect to two criteria:

- the method signatures of the function
- whether the function is type preserving or not

For example, for the case of type `NumDf`

, the following table classifies some exemplary functions with respect to three different method signatures.

method signatures | type preserving | non-preserving |
---|---|---|

f(nd::NumDf) | :abs, :exp, :log | :length, :isempty |

f(nd::NumDf) | :round, :floor | :size |

f(nd::NumDf, x::Int) | ||

f(b::NAtype,a::NumDf) | :+, :-, :*, :.^ | :.<, :.>, :.== |

f(a::NumDf,b::NAtype) | ||

f(a::NumDf,b::NumDf) | ||

f(a::NumDf,b::Union(String,Number)) | ||

f(b::Union(String,Number),a::NumDf) |

Once that all desired functions are classified, functions within the same combination of method signatures and preservation kind can be simultaneously defined in a loop. One simply needs to iterate over all functions within a block, and interpolate the function names into an adequate macro.

Additionally, when multiple new types need to be defined simultaneously, one could also iterate over all types. However, for non-preserving functions, methods equivalently could be expressed with reference to an abstract supertype, so that the additional loop over all new types becomes unnecessary.

Let’s just make this more clear through an example implementation for the case of method signatures given by

f(nd::NumDf) f(nd::NumDf, x::Int)

First, let’s look at the implementation for non-preserving functions. (In this case, the only function within this block is `size`

. Nevertheless, we implement it as loop over all functions, since this provides more flexibility for later extensions.)

single_or_two_args_non_preserving = [:size] for f in single_or_two_args_non_preserving eval(quote function $(f)(nd::AbstractDf) return $(f)(nd.vals) end function $(f)(nd::AbstractDf, i::Integer) return $(f)(nd.vals, i) end end) end

For the case of type preservation, one additionally needs to iterate over all new types.

single_or_two_args_type_preserving = [:round, :floor] for t in (:NumDf, :ArrayDf) for f in single_or_two_args_type_preserving eval(quote function $(f)(nd::$(t)) valuesDf = $(f)(nd.vals) return $(t)(valuesDf) end function $(f)(nd::$(t), i::Integer) valuesDf = $(f)(nd.vals, i) return $(t)(valuesDf) end end) end end

### 3.1 Special cases

Using the instructions above, you’re now able to emulate behavior of a different type in almost all cases. Still, however, there are some special cases where you will most likely need to find your own solution.

#### 3.1.1 Outer constructors

Outer constructors are different from general functions in that their name will naturally differ for each individual type. Hence, the function name (which is the name of the constructor) needs to be interpolated for each type as well.

for t in (:NumDf, :ArrayDf) eval(quote function $(t)(vals::Array{Float64, 2}) $(t)(DataFrame(vals)) end end) end

#### 3.1.2 Partially type preserving functions

In some cases, you either might want to have partially type preserving functions, or inherit from such functions. For example, take a look at `getindex`

for `DataFrames`

, which returns different types depending on the input:

using DataFrames df = DataFrame(rand(4, 3)); typeof(df[2:4, 1:2]) typeof(df[:, 1]) typeof(df[1, 1])

DataFrame (constructor with 22 methods) DataArray{Float64,1} (constructor with 1 method) Float64

If you want to build on such a function, and you do not want to simply give back the same variation of types, you most likely need to manually adapt every single method. Sadly, these functions can be quite cumbersome to implement.

## 4 Conclusions

Although Julia does not come with a straightforward way to inherit behavior out of the box, its extensive meta-programming capabilities still provide the flexibility to achieve this with a minimal amount of effort. Still, however, meta-programming might represent quite a hurdle to overcome for beginners, and without it inheritance is rather cumbersome to implement.

Filed under: Julia Tagged: inheritance, types