Author Archives: David Anthoff

Query.jl v0.9.x released

I just released Query.jl
v0.9.0. The new version adds the @take and @drop standalone query
operators and brings pretty printing to uncollected queries.

Pretty printing

In previous versions queries displayed a really awful mess of internal
data when they were displayed in the REPL. In practice one always had to
collect a query into something like a DataFrame to get a nice view
of the query result. The new version changes that and provides a nice
output for any query, even an uncollected one. Here is an example:

julia> using FileIO, Query, CSVFiles

julia> filename = "https://gist.githubusercontent.com/davidanthoff/bebfd24c1a3f32f576eb61bee77f5944/raw/dd9233ad860037a2155f3a9ca3c37eb2d5572573/testdata2.csv";

julia> load(filename) |> @map({_.Year, _.Cause_Name})
15028x2 query result
Year  Cause_Name
─────┼───────────────────────
1999  Unintentional Injuries
1999  Unintentional Injuries
1999  Unintentional Injuries
1999  Unintentional Injuries
1999  Unintentional Injuries
1999  Unintentional Injuries
1999  Unintentional Injuries
1999  Unintentional Injuries
1999  Unintentional Injuries
1999  Unintentional Injuries
... with 15018 more rows

The pretty printing should work for the values returned from any of the
query operators. The output format is heavily inspired by R’s tibbles.

I hope this will make interactive work much more pleasant because it
should be easier to build up more complicated queries step by step, while
periodically running a query to check intermediate results.

I also plan to add this to the whole tabular file IO of the iterable
tables ecosystem at a later date (e.g. CSVFiles.jl,
FeatherFiles.jl,
ExcelFiles.jl,
StatFiles.jl etc.).

The @take and @drop query commands

Those are fairly straightforward: both of these filter elements out of a
sequence. @take limits the number of elements to some upper maximum,
and @drop skips a number of elements. Here is an example of how one
can use these:

using FileIO, Query, CSVFiles

filename = "https://gist.githubusercontent.com/davidanthoff/bebfd24c1a3f32f576eb61bee77f5944/raw/dd9233ad860037a2155f3a9ca3c37eb2d5572573/testdata2.csv"

load(filename) |> 
    @filter(_.Cause_Name!="All Causes" && !isnull(_.Age_adjusted_Death_Rate)) |> 
    @groupby(_.Cause_Name) |> 
    @map({cause=_.key, death_rate=sum(_..Age_adjusted_Death_Rate)}) |>
    @orderby_descending(_.death_rate) |> 
    @drop(2) |> 
    @take(3) |>
    save("output.feather")

This example showcases a whole range of features, including the use of
the @drop and @take operations. The official documentation for
these two new operators is in the “Experimental Features” section in the
Query.jl documentation.

Any feedback on these new features (and old ones) is most welcome, and of
course any help with the overall package would also be fantastic!

This post is being discussed here.

Query.jl v0.8.x released

I just released Query.jl
v0.8.0. The new version has some breaking renames in the experimental
parts of the package (that is why they are experimental!), extends the
set of experimental standalone query commands, adds a slight twist to
the experimental anonymous function syntax and ships with a whole bunch
of package refactoring under the hood.

Renamed experimental standalone query commands

The @select standalone command was renamed to @map, and the
@where command to @filter. Those names are more in line with
julia conventions.

This change only applies to the experimental standalone versions of the
query commands that you would use with the pipe operator. Nothing has
changed about the LINQ syntax, i.e. that part will retain the SQL-like
terms from LINQ. I have no plans to change those terms going forward,
i.e. this does not indicate that there are any breaking changes planned
for the stable part of LINQ.

New experimental standalone query commands

This release adds support for the @groupjoin, @join and
@mapmany standalone query commands, i.e. you can now use those with
the pipe syntax. This
part of the documentation describes the arguments to those commands.

Support for two arguments in the experimental anonymous function syntax

Some of new standalone query commands require an anonymous function that
takes two arguments. The existing experimental shortcut syntax for
creating anonymous functions has been extended to support that scenario.
To create a two argument anonymous function, you simply have to use both
_ and __ (double underscore) in the expression that should be
turned into an anonymous function. For example {a=_, b=__} will be
translated into (i1,i2)->{a=i1, b=i2}.

Package refactoring

This release continues the breakup of the very large package
Query.jl into smaller packages
that do specific things. The current situation now is this:

  • IteratorInterfaceExtensions.jl
    defines a number of small extensions to the base julia iterator interface
    that are used by both Query.jl
    and the iterable tables universe.
  • QueryOperators.jl
    contains the definition of the query operators and the default iterator
    based backend implementation.
  • Query.jl now contains the
    syntax for the two supported front-ends: the traditional LINQ style
    syntax and the new experimental standalone commands.
  • TableTraits.jl defines
    a very minimal interface for tabular data interop.
  • TableTraitsUtils.jl
    provides some helper functions that make it easier to implement the
    interface defined in TableTraits.jl.
    Some packages use this to implement the table traits interface, but
    others don’t need it.
  • IterableTables.jl
    contains all the integrations for various packages with the iterable
    tables ecosystem that have not yet moved into those packages themselves.

This refactoring significantly simplifies the dependency situation with
these packages. The dependency graph now looks roughly like this:


                                             IterableTables
                                            /
                                           /- TableTraitUtils
                                          /
                              TableTraits         
                            /
IteratorInterfaceExtensions
                            \
                              QueryOperators
                                            \
                                             Query

Thanks

Thanks as always to all the folks that contributed to this effort with
bug reports, suggestions and PRs. This release also got some new
benchmarks that were contributed by floswald
that will hopefully trigger some performance improvements going forward.

Please do report any bugs and suggestions back! And help with this whole
effort is of course also always most welcome.

This post is being discussed here.