Category Archives: Julia

News features in DataFrames.jl 1.3: part 3

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2021/12/24/selection.html

Introduction

This post continues the presentation of new features added in DataFrames.jl 1.3.0. This time I will discuss what is new in indexing syntax.

The post was written under Julia 1.7.0 and DataFrames.jl 1.3.1.
When running the examples use the --depwarn=yes option when starting Julia.

Adding columns in views

Since DataFrames.jl 1.3 a long requested feature to allow adding columns to
views has been added. As, in general, in a view you can reorder and/or drop
columns this feature is only allowed if a view was created with : as column
selector (remember, that when using : as column selector a view will always
reflect the list of columns of its parent DataFrame). Here is an example:

julia> using DataFrames

julia> df = DataFrame(a=1:3)
3×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3

julia> dfv = @view df[[1,3], :]
2×1 SubDataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     3

julia> dfv[:, :b] = 4:5
4:5

julia> dfv
2×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64?
─────┼───────────────
   1 │     1       4
   2 │     3       5

julia> df
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64?
─────┼────────────────
   1 │     1        4
   2 │     2  missing
   3 │     3        5

Note that in column :b in df in filtered out rows missing value was
placed.

As noted creating new columns is not allowed if other column selector than :
is passed when creating a view:

julia> dfv = @view df[[1,3], 1:2]
2×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64?
─────┼───────────────
   1 │     1       4
   2 │     3       5

julia> dfv[:, :c] = 4:5
ERROR: ArgumentError: creating new columns in a SubDataFrame that subsets columns of its parent data frame is disallowed

Additionally it is allowed to replace columns in a view when ! selector is
used (here it works for any view as we are not creating new columns):

julia> dfv.a = ["111", "113"]
2-element Vector{String}:
 "111"
 "113"

julia> df
3×2 DataFrame
 Row │ a    b
     │ Any  Int64?
─────┼──────────────
   1 │ 111        4
   2 │ 2    missing
   3 │ 113        5

As you can see the values that were present in filtered-out rows are retained.
If the new values have type not allowed in the current element type of the
column an appropriate type promotion is performed. Here is another example:

julia> df = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

julia> dfv = @view df[[1,3], :]
2×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     3      6

julia> dfv.a = 11.0:12.0
11.0:1.0:12.0

julia> dfv.b = 'a':'b'
'a':1:'b'

julia> df
3×2 DataFrame
 Row │ a        b
     │ Float64  Any
─────┼──────────────
   1 │    11.0  a
   2 │     2.0  5
   3 │    12.0  b

Why does adding columns in views matter?

The huge benefit of allowing adding columns in views is as follows: we can make
all standard functions like insertols!, select!, transform! work on
SubDataFrames. This is very useful if you want to perform some operation only
under some condition. Here is a simple example:

julia> df = DataFrame(x = -1:0.2:1)
11×1 DataFrame
 Row │ x
     │ Float64
─────┼─────────
   1 │    -1.0
   2 │    -0.8
   3 │    -0.6
   4 │    -0.4
   5 │    -0.2
   6 │     0.0
   7 │     0.2
   8 │     0.4
   9 │     0.6
  10 │     0.8
  11 │     1.0

julia> transform!(subset(df, :x => ByRow(>(0)), view=true), :x => ByRow(log))
5×2 SubDataFrame
 Row │ x        x_log
     │ Float64  Float64?
─────┼────────────────────
   1 │     0.2  -1.60944
   2 │     0.4  -0.916291
   3 │     0.6  -0.510826
   4 │     0.8  -0.223144
   5 │     1.0   0.0

julia> df
11×2 DataFrame
 Row │ x        x_log
     │ Float64  Float64?
─────┼─────────────────────────
   1 │    -1.0  missing
   2 │    -0.8  missing
   3 │    -0.6  missing
   4 │    -0.4  missing
   5 │    -0.2  missing
   6 │     0.0  missing
   7 │     0.2       -1.60944
   8 │     0.4       -0.916291
   9 │     0.6       -0.510826
  10 │     0.8       -0.223144
  11 │     1.0        0.0

In DataFrames.jl you have to do this in two steps: subset to a view and then
transform!. However, I hope that DataFramesMeta.jl and
DataFrameMacros packages in the coming releases will provide a nicer
syntax for this, allowing to combine transformation and filtering in one step.

Hard deprecation period for broadcasted assignment

Since Julia 1.7 is out a long missing feature is now available.
The feature is that it is allowed to add new columns to a data frame using
the broadcasting assignment with setproperty:

julia> df = DataFrame(a=1:3)
3×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3

julia> df.b .= 1
3-element Vector{Int64}:
 1
 1
 1

julia> df
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      1
   3 │     3      1

Also views are supported the way we have described earlier:

julia> dfv = view(df, [1, 3], :)
2×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     3      1

julia> dfv.c .= 2
2-element Vector{Int64}:
 2
 2

julia> df
3×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64?
─────┼───────────────────────
   1 │     1      1        2
   2 │     2      1  missing
   3 │     3      1        2

In essence using setproperty is made almost the same as using the ! row
selector and assignment. Why do I say almost? The reason is that the only
place where it is inconsistent is broadcasted assignment to an existing column:

julia> df.a .= 10
┌ Warning: In the 1.4 release of DataFrames.jl this operation will allocate a new column instead of performing an in-place assignment. To perform an in-place assignment use `df[:, col] .= ...` instead.
│   caller = top-level scope at REPL[8]:1
└ @ Core REPL[8]:1
3-element Vector{Int64}:
 10
 10
 10

As you can see in the warning message this inconsistency (that was known and
discussed for some time already) will be fixed in DataFrames.jl 1.4.
We have been waiting with this change for several releases in order to clearly
inform users about this fix in advance.

This change is not only about consistency, but also to make sure we do not
perform accidental conversions where users most likely do not expect them:

julia> df2 = DataFrame(x = 'a':'c')
3×1 DataFrame
 Row │ x
     │ Char
─────┼──────
   1 │ a
   2 │ b
   3 │ c

julia> df2.x .= 104
┌ Warning: In the 1.4 release of DataFrames.jl this operation will allocate a new column instead of performing an in-place assignment. To perform an in-place assignment use `df[:, col] .= ...` instead.
│   caller = top-level scope at REPL[18]:1
└ @ Core REPL[18]:1
3-element Vector{Char}:
 'h': ASCII/Unicode U+0068 (category Ll: Letter, lowercase)
 'h': ASCII/Unicode U+0068 (category Ll: Letter, lowercase)
 'h': ASCII/Unicode U+0068 (category Ll: Letter, lowercase)

julia> df2
3×1 DataFrame
 Row │ x
     │ Char
─────┼──────
   1 │ h
   2 │ h
   3 │ h

In the future, when DataFrames.jl 1.4 is released, instead you will get a data
frame with column :x having element type Int and storing three 104
values.

Conclusions

The changes I have described today are not something that a new person starts to
use on the first day of working with DataFrames.jl. However, after one learns
the basics more and more advanced queries are needed in practice. Improvements
in functionality and consistency of the design of the core of indexing
mechanisms in DataFrames.jl are hopefully going to make these complex
requirements easier to meet.

The Future of Machine Learning and why it looks a lot like Julia

By: Logan Kilpatrick

Re-posted from: https://towardsdatascience.com/the-future-of-machine-learning-and-why-it-looks-a-lot-like-julia-a0e26b51f6a6?source=rss-2c8aac9051d3------2

Everything you need to know about the deep learning and machine learning ecosystem in Julia.

Advent of Code 2021 – Day 17

By: Julia on Eric Burden

Re-posted from: https://www.ericburden.work/blog/2021/12/18/advent-of-code-2021-day-17/

It’s that time of year again! Just like last year, I’ll be posting my solutions to the Advent of Code puzzles. This year, I’ll be solving the puzzles in Julia. I’ll post my solutions and code to GitHub as well. I had a blast with AoC last year, first in R, then as a set of exercises for learning Rust, so I’m really excited to see what this year’s puzzles will bring. If you haven’t given AoC a try, I encourage you to do so along with me!

Day 17 – Trick Shot

Find the problem description HERE.

The Input – Stay on Target

Today’s puzzle input is a single sentence in a text file in the format of “target area: x=.., y=-..”, where the “” values are real numbers. For this, we’ll just use a regular expression with capture groups for each number, parse those captured numbers to integers, and stuff them into a TargetRange named tuple. That last part is just to give them a nice type alias and help make the subsequent code more readable.

# Helper Functions -------------------------------------------------------------

const TargetRange = @NamedTuple begin
    x_min::Int64
    x_max::Int64
    y_min::Int64
    y_max::Int64
end

function targetrange(vals::Vector{Int64})::TargetRange
    (x_min, x_max, y_min, y_max) = vals
    return (x_min=x_min, x_max=x_max, y_min=y_min, y_max=y_max)
end


# Ingest the Data -------------------------------------------------------------

function ingest(path)
    re = r"x=(-?\d+)\.\.(-?\d+), y=(-?\d+)\.\.(-?\d+)"
    m = match(re, readchomp(path))
    return targetrange([parse(Int, i) for i in m.captures])
end

If I had a gripe about Julia, it would be that using named tuples have proven to result in more performant code than normal structs in several of these puzzles, but you can’t write constructors with the same name as the type alias. It’s a minor gripe, but since these behave so similarly to structs, it seems intuitive that we should be able to create them in the same way as structs. Like I said, minor gripe, but I’d just rather have TargetRange() instead of targetrange() above. That’s all.

Part One – Show Off

Wait, wait, wait. Hold on. Yesterday, we decoded a bunch of hex values, one bit at a time, and the elves said “HI”!? And we’re just going to move right past that? No wonder we’re in the mood to launch probes at things today! Our goal is to identify the launch velocity (in both the horizontal and vertical direction) that will land our probe in the target zone, while getting some sweet hang (float?) time in the process. This sounds like math…

# Some Useful Data Structures --------------------------------------------------

# I find it helpful to name my Types according to their use. 
const Velocity = @NamedTuple{x::Int64, y::Int64}
const Position = @NamedTuple{x::Int64, y::Int64}
const VelocityRange = @NamedTuple{x::UnitRange{Int64}, y::StepRange{Int64,Int64}}

# Helper Functions -------------------------------------------------------------

# These little functions do a lot of work. 
# - `triangle()` returns the `nth` triangle number (or Gaussian sum), the sum of
#   the sequence of numbers up to and including `n`.
# - `distanceat()` calculates the distance traveled for a given velocity of `V`
#   at time `T`. Because the puzzle description indicates that velocity degrades
#   at a constant rate over time, the general formula for distance is
#   `position at time T = (initial velocity x T) - loss in velocity`, where the
#   loss in velocity at any given time is the sum of all the prior losses.
# - `peak()` returns the maximum distance traveled for initial velocity `V₀`.
#   This corresponds to the distance when the time matches the initial velocity,
#   after which the accumulated velocity loss is larger than `V₀ × T`.
triangle(n)       = (n^2 + n) ÷ 2
distanceat(V₀, T) = (V₀ * T) - triangle(T - 1)
peak(V₀)          =  distanceat(V₀, V₀)

# Given an initial velocity and a time, calculate the position of the launched
# probe at `time`. In the x-direction, this is either the distance at `time`, or
# the peak distance if `time` is greater than the initial velocity (the probe
# will not travel backwards once it reaches peak distance).
function positionat(initial::Velocity, time::Int64)::Position
    (v_x0, v_y0) = initial
    p_yt = distanceat(v_y0, time)
    p_xt = time >= v_x0 ? peak(v_x0) : distanceat(v_x0, time)
    return (x=p_xt, y=p_yt)
end

# Identify the possible initial x and y velocities for a given `target`.
function velocityrange(target::TargetRange)::VelocityRange
    # The smallest possible x velocity that will reach the target is the
    # velocity where `triangle(v_x)` is at least equal to the minimum
    # range of x in the target. Any lower velocity that this will not reach
    # the distance of `target.x_min`. The maximum x velocity is just the
    # maximum range of x in the target, since the probe could theoretically
    # be shot straight at that point.
    v_xmin = 0
    while triangle(v_xmin) < target.x_min; v_xmin += 1; end
    v_xrange = v_xmin:target.x_max

    # The largest possible y velocity is the one that, when reaching the point
    # of y == 0, will not overshoot the target on the next step. This works out
    # to be the absolute value of `target.y_min`. This range is given backwards,
    # since it is assumed that the maximum height will be found at larger values
    # for y velocity.
    v_ymax = abs(target.y_min)
    v_yrange = v_ymax:-1:target.y_min

    return (x=v_xrange, y = v_yrange)
end

# Given the initial velocity of a probe, determine whether that probe will land
# in the target zone. 
function willhit(initial::Velocity, target::TargetRange)::Bool
    # Starting at the initial position of 0, 0 and time 0
    (x_t, y_t) = (0, 0)
    time = 0

    # Check the position of the probe at subsequent times until it reaches a
    # position either to the right or below the target area. If, during this
    # search, the probe position is found to be within the target area, return
    # true. Otherwise, return false.
    while x_t <= target.x_max && y_t >= target.y_min
        x_t >= target.x_min && y_t <= target.y_max && return true
        (x_t, y_t) = positionat(initial, time)
        time += 1
    end
    return false
end

# Solve Part One ---------------------------------------------------------------

# Starting with the range of possible velocities, check each combination of 
# x and y velocities until an x/y velocity is found that lands in the target
# area. Since we're counting down from the maximum possible y velocity, the
# first probe we find that lands in the target will reach the maximum 
# height, so just return the peak of that y velocity.
function part1(input)
    v_range = velocityrange(input)

    for (v_x, v_y) in Iterators.product(v_range...)
        initial = (x=v_x, y=v_y)
        willhit(initial, input) && return peak(v_y)
    end
    error("Could not find maximum Y for $input")
end

It was math! Well, at least some. I’m pretty sure there’s some math to just solve this part without any iteration or code. I worked on and thought about that for a bit, but I couldn’t crack it. Guess that why I write code!

Part Two – The Loneliest Probe

Oh, we only have one probe, eh? I’ve seen those trickshot videos on the internet, and I know for sure they (almost) NEVER land those on the first try, even if they did some math first. Guess we need to pick the safest shot, and the only way to do that (apparently) is to identify all the launch velocities that should land in the target area. That’s not too bad, actually.

# Solve Part Two ---------------------------------------------------------------

# This time, just search through the whole range of possible x/y combinations
# and increment the counter for each launch velocity that lands the probe in
# the target area.
function part2(input)
    v_range = velocityrange(input)
    found = 0

    for (v_x, v_y) in Iterators.product(v_range...)
        initial = (x=v_x, y=v_y)
        if willhit(initial, input)
            found += 1
        end
    end

    return found
end

No major changes here, we just loop through the entire possible range of launch vectors and increment our count for every one that should land in the target zone.

Wrap Up

I got so distracted arranging formulas today! I ended up with a quadratic formula that did not give me anything resembling the answer I was looking for. I’m not sure if I’m annoyed or appreciative of the opportunity to practice some algebra. Probably annoyed because it didn’t work. That said, I’m getting more and more comfortable in Julia, so that’s a win. The process is definitely working.

If you found a different solution (or spotted a mistake in one of mine), please drop me a line!