Category Archives: Julia

Simple is best: do you really need your code to be fast?

Re-posted from: https://bkamins.github.io/julialang/2022/09/02/speed.html

Introduction

In my post from last week I tried to convince novice Julia users
to focus on writing simple code, and only when they learn more to start
writing fully generic code.

This week I continue my simple is best miniseries of posts. This time
I want to argue that you do not need to try writing code that
is maximally fast. Usually it is enough that your code is just fast
(fortunately in Julia, if you do not do some serious mistake your code
is not likely to be slow). Instead, I recommend that you try your code
to be simple and follow common patterns. In the long run it will be
much easier to read and update such code.

For the post I use DataFrames.jl as an example. It was tested under Julia 1.8.0,
DataFrames.jl 1.3.4, ShiftedArrays.jl 1.0.0, and BenchmarkTools.jl 1.3.1.

An example task

Recently one of the users asked the following question. Given a data frame
with column col add 5 columns to it with lags of column col ranging from
1 to 5.

Here is an example how you can do this task:

julia> using DataFrames

julia> using ShiftedArrays

julia> using BenchmarkTools

julia> df = DataFrame(a=1:10)
10×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3
   4 │     4
   5 │     5
   6 │     6
   7 │     7
   8 │     8
   9 │     9
  10 │    10

julia> function add_lags(df)
           out_df = copy(df)
           for i in 1:5
               out_df[:, "a$i"] = lag(df.a, i)
           end
           return out_df
       end
add_lags (generic function with 1 method)

julia> add_lags(df)
10×6 DataFrame
 Row │ a      a1       a2       a3       a4       a5
     │ Int64  Int64?   Int64?   Int64?   Int64?   Int64?
─────┼────────────────────────────────────────────────────
   1 │     1  missing  missing  missing  missing  missing
   2 │     2        1  missing  missing  missing  missing
   3 │     3        2        1  missing  missing  missing
   4 │     4        3        2        1  missing  missing
   5 │     5        4        3        2        1  missing
   6 │     6        5        4        3        2        1
   7 │     7        6        5        4        3        2
   8 │     8        7        6        5        4        3
   9 │     9        8        7        6        5        4
  10 │    10        9        8        7        6        5

julia> @btime add_lags($df);
  2.900 μs (50 allocations: 3.26 KiB)

I have additionally added timing of this operation for a reference.
The add_lags function is meant to be an example of “fast code”
(it could still be made faster, or I could have chosen a more complex example
but I wanted to pick something that is easy to understand).

Adding lags using the high-level API function

The functions combine, select[!], transform[!], and subset[!] support
the operation specification syntax
and are called the high-level API in DataFrames.jl.

A natural code for adding lags using the transform function is:

julia> transform(df, ["a" => (x -> lag(x, i)) => "a$i" for i in 1:5])
10×6 DataFrame
 Row │ a      a1       a2       a3       a4       a5
     │ Int64  Int64?   Int64?   Int64?   Int64?   Int64?
─────┼────────────────────────────────────────────────────
   1 │     1  missing  missing  missing  missing  missing
   2 │     2        1  missing  missing  missing  missing
   3 │     3        2        1  missing  missing  missing
   4 │     4        3        2        1  missing  missing
   5 │     5        4        3        2        1  missing
   6 │     6        5        4        3        2        1
   7 │     7        6        5        4        3        2
   8 │     8        7        6        5        4        3
   9 │     9        8        7        6        5        4
  10 │    10        9        8        7        6        5

The problem with this solution that users often raise is that it is slower than
the previous one:

julia> @btime transform($df, ["a" => (x -> lag(x, i)) => "a$i" for i in 1:5]);
  61.600 μs (523 allocations: 27.46 KiB)

There are two questions that one naturally asks. The first is what is the reason
of this timing difference and the second is do we care.

So first let me explain the reason for the slowdown. The transform function is
flexible and allows for many different operations. Therefore it does,
apart from doing core transformations, a lot of extra pre and post processing
that is needed to provide this flexibility.

Now let us try to answer the question if we care. First check a bigger data frame:

julia> df2 = DataFrame(a=1:10^8);

julia> @btime add_lags($df2);
  1.354 s (62 allocations: 4.94 GiB)

julia> @btime transform($df2, ["a" => (x -> lag(x, i)) => "a$i" for i in 1:5]);
  1.318 s (539 allocations: 4.94 GiB)

The timings are roughly the same.
As you can see – the pre and post processing time in transform does not grow
with size of data. Therefore we get the following conclusions:

if you have few small data frames it does not matter what you use; things will
be fast;
if you have a large data frame it does not matter what you use; things will
have a comparable speed;
if you have a lot of small data frames – then you should be careful as
pre and post processing time of transform might end up being a significant
portion of total run time.

You might think that the code:

["a" => (x -> lag(x, i)) => "a$i" for i in 1:5]

is not so easy to read. This might be true for a newcomer, as this is something
new you need to learn. However, my experience is that after some practice with
DataFrames.jl operation specification language it becomes quite easy to write
and read. You just need to embrace the pattern:

[input column] => [transformation function] => [output column name]

and the rest is a natural consequence (one of the benefits of this pattern
is that it is clear visually as => nicely separates its parts).

What is the benefit of using the high-level API?

The benefit of the high-level API is that most likely the lagging operation we
have considered needs to be done in groups. That is, you have a grouping
variable, call it id, and you want to perform lagging per-id value
(e.g. you have different stocks and their quotations for different days
that you want to lag).

With transform this is a breeze. You just need to add groupby to your code:

julia> df3 = DataFrame(id=repeat(1:2, inner=8), a=1:16)
16×2 DataFrame
 Row │ id     a
     │ Int64  Int64
─────┼──────────────
│     1      1
│     1      2
│     1      3
│     1      4
│     1      5
│     1      6
│     1      7
│     1      8
│     2      9
│     2     10
│     2     11
│     2     12
│     2     13
│     2     14
│     2     15
│     2     16

julia> transform(groupby(df3, :id),
                 ["a" => (x -> lag(x, i)) => "a$i" for i in 1:5])
16×7 DataFrame
 Row │ id     a      a1       a2       a3       a4       a5
     │ Int64  Int64  Int64?   Int64?   Int64?   Int64?   Int64?
─────┼───────────────────────────────────────────────────────────
│     1      1  missing  missing  missing  missing  missing
│     1      2        1  missing  missing  missing  missing
│     1      3        2        1  missing  missing  missing
│     1      4        3        2        1  missing  missing
│     1      5        4        3        2        1  missing
│     1      6        5        4        3        2        1
│     1      7        6        5        4        3        2
│     1      8        7        6        5        4        3
│     2      9  missing  missing  missing  missing  missing
│     2     10        9  missing  missing  missing  missing
│     2     11       10        9  missing  missing  missing
│     2     12       11       10        9  missing  missing
│     2     13       12       11       10        9  missing
│     2     14       13       12       11       10        9
│     2     15       14       13       12       11       10
│     2     16       15       14       13       12       11

The add_lags function does not obviously allow you to do what you want.
After some thinking (and if you know DataFrames.jl well enough) you come to the
conclusion that the following will work:

julia> transform(groupby(df3, :id), add_lags)
16×7 DataFrame
 Row │ id     a      a1       a2       a3       a4       a5
     │ Int64  Int64  Int64?   Int64?   Int64?   Int64?   Int64?
─────┼───────────────────────────────────────────────────────────
│     1      1  missing  missing  missing  missing  missing
│     1      2        1  missing  missing  missing  missing
│     1      3        2        1  missing  missing  missing
│     1      4        3        2        1  missing  missing
│     1      5        4        3        2        1  missing
│     1      6        5        4        3        2        1
│     1      7        6        5        4        3        2
│     1      8        7        6        5        4        3
│     2      9  missing  missing  missing  missing  missing
│     2     10        9  missing  missing  missing  missing
│     2     11       10        9  missing  missing  missing
│     2     12       11       10        9  missing  missing
│     2     13       12       11       10        9  missing
│     2     14       13       12       11       10        9
│     2     15       14       13       12       11       10
│     2     16       15       14       13       12       11

so things are still easy to code (but need transform).
Let us compare the performance of both operations on a larger data frame:

julia> df4 = DataFrame(id=repeat(1:10, inner=10^7), a=1:10^8);

julia> @btime transform(groupby($df4, :id),
                        ["a" => (x -> lag(x, i)) => "a$i" for i in 1:5]);
  10.124 s (1731 allocations: 19.35 GiB)

julia> @btime transform(groupby($df4, :id), add_lags);
  11.747 s (1552 allocations: 24.59 GiB)

We see that using add_lags in this case is slightly slower. Of course we could
write a custom add_lags function that would work by groups, even without using
groupby but this would be not so easy (I recommend you try it if you are not
convinced).

Here we see the benefit of general design of transform: it handles grouped
data in the same way as it handles a data frame.

Conclusions

Let me summarize my thinking about code performance in Julia in general,
and in DataFrames.jl in particular:

If you have small data and small problem – it does not matter how fast your
code is as it will be fast enough anyway, so you do not have to think about
speed.
If you have large data, then make sure that the solution you use is fast in
the core of the processing (in DataFrames.jl there are things like: grouping
data, sorting, joining, reshaping); the pre and post processing that some
functions (like transform in our case) do will be negligible anyway (this
logic is something that you most likely already know as every Julia user
learns that compilation time is negligible when your code runs for several
hours); it is usually not worth to optimize every part of your code; optimize
only the parts that are expensive when you scale your computations.
For DataFrames.jl there is, in my experience, only one case when you need to
carefully think how to write your code and which functions to use. This case
is when you have a lot of (like millions) of small data frames. The reason is
that then the cost of pre and post processing in functions from the high-level
API of DataFrames.jl indeed might be an important part of the total runtime of
your code.

Setting Up Julia LSP for Neovim

By: Jacob Zelko

Re-posted from: https://jacobzelko.com/08312022162228-julia-lsp-setup/index.html

Setting Up Julia LSP for Neovim

Date: August 31 2022

Summary: An explanation of how to setup the Julia LSP for Neovim

Keywords: ##summary #neovim #julia #programming #archive

Bibliography

Not Available

1. General Guide
References
References
Discussion:

General Guide

This is from Fredrik Ekre at the Julia Discourse with some minimal changes and notes from me:

LanguageServer is somewhat slow to start so it is very useful to use a custom sysimage using PackageCompiler to reduce this time. On my machine I get the first response after 20+ seconds, but with a custom sysimage I can execute LS commands instantaneously.

Here is my setup:

Install Mason.nvim or nvim-lspconfig and install julials (it may also be called something like Julia Language Server Protocol).
Modify init.vim or init.lua to use a custom Julia executable (if it exists):

require'lspconfig'.julials.setup{
    on_new_config = function(new_config, _)
        local julia = vim.fn.expand("~/.julia/environments/nvim-lspconfig/bin/julia")
        if require'lspconfig'.util.path.is_file(julia) then
	    vim.notify("Hello!")
            new_config.cmd[1] = julia
        end
    end
}

(OPTIONAL) If you use Packer to manage your vim setup, run PackerCompile.

NOTE: If you notice, there is a small line named vim.notify("Hello!"). This is to test that julials is engaged when accessing a Julia file – you can check that it is engaged by writing :messages in vim. You should see "Hello!" appear. This line can then safely be removed.

Create the nvim-lspconfig Julia environment by running the following in your shell:

julia --project=~/.julia/environments/nvim-lspconfig -e 'using Pkg; Pkg.add("LanguageServer")'

And then navigate to the directory at ~.julia/environment/nvim-lspconfig.

Copy the following makefile (courtesy of Fredrik Ekre) the nvim-lspconfig directory with the name makefile:

# MIT License. Copyright (c) 2021 Fredrik Ekre
#
# This Makefile can be used to build a custom Julia system image for LanguageServer.jl to
# use with neovims built in LSP support. An up-to date version of this Makefile can be found
# at https://github.com/fredrikekre/.dotfiles/blob/master/.julia/environments/nvim-lspconfig/Makefile
#
# Usage instructions:
#
#   1. Update the neovim configuration to use a custom julia executable. If you use
#      nvim-lspconfig (recommended) you can modify the setup call to the following:
#
#          require("lspconfig").julials.setup({
#              on_new_config = function(new_config, _)
#                  local julia = vim.fn.expand("~/.julia/environments/nvim-lspconfig/bin/julia")
#                  if require("lspconfig").util.path.is_file(julia) then
#                      new_config.cmd[1] = julia
#                  end
#              end,
#              -- ...
#          })
#
#   2. Place this Makefile in ~/.julia/environments/nvim-lspconfig (create the directory if
#      it doesn't already exist).
#
#   3. Change directory to ~/.julia/environments/nvim-lspconfig and run `make`. This will
#      start up neovim in a custom project with a julia process that recods compiler
#      statements. Follow the instructions in the opened source file, and then exit neovim.
#
#   4. Upon exiting neovim PackageCompiler.jl will compile a custom system image which will
#      automatically be used whenever you work on Julia projects in neovim.
#
# Update instructions:
#
#  To update the system image (e.g. when upgrading Julia or upgrading LanguageServer.jl or
#  it's dependencies) run the following commands from the
#  ~/.julia/environments/nvim-lspconfig directory:
#
#      julia --project=. -e 'using Pkg; Pkg.update()'
#      makeJULIA=$(shell which julia)
JULIA_PROJECT=
SRCDIR:=$(shell dirname $(abspath $(firstword $(MAKEFILE_LIST))))
ifeq ($(shell uname -s),Linux)
	SYSIMAGE=languageserver.so
else
	SYSIMAGE=languageserver.dylib
endifdefault: $(SYSIMAGE)$(SYSIMAGE): Manifest.toml packagecompiler/Manifest.toml packagecompiler/precompile_statements.jl
	JULIA_LOAD_PATH=${PWD}:${PWD}/packagecompiler:@stdlib ${JULIA} -e 'using PackageCompiler; PackageCompiler.create_sysimage(:LanguageServer, sysimage_path="$(SYSIMAGE)", precompile_statements_file="packagecompiler/precompile_statements.jl")'Manifest.toml: Project.toml
	JULIA_LOAD_PATH=${PWD}/Project.toml:@stdlib ${JULIA} -e 'using Pkg; Pkg.instantiate()'Project.toml:
	JULIA_LOAD_PATH=${PWD}/Project.toml:@stdlib ${JULIA} -e 'using Pkg; Pkg.add("LanguageServer")'packagecompiler/Manifest.toml: packagecompiler/Project.toml
	JULIA_LOAD_PATH=${PWD}/packagecompiler/Project.toml:@stdlib ${JULIA} -e 'using Pkg; Pkg.instantiate()'packagecompiler/Project.toml:
	mkdir -p packagecompiler
	JULIA_LOAD_PATH=${PWD}/packagecompiler/Project.toml:@stdlib ${JULIA} -e 'using Pkg; Pkg.add("PackageCompiler")'packagecompiler/precompile_statements.jl: Manifest.toml bin/julia
	TMPDIR=$(shell mktemp -d) && \
	cd $${TMPDIR} && \
	JULIA_LOAD_PATH=: ${JULIA} -e 'using Pkg; Pkg.generate("Example")' 2> /dev/null && \
	cd Example && \
	JULIA_LOAD_PATH=$${PWD}:@stdlib ${JULIA} -e 'using Pkg; Pkg.add(["JSON", "fzf_jll", "Random", "Zlib_jll"])' 2> /dev/null && \
	JULIA_LOAD_PATH=$${PWD}:@stdlib ${JULIA} -e 'using Pkg; Pkg.precompile()' 2> /dev/null && \
	echo "$$PACKAGE_CONTENT" > src/Example.jl && \
	JULIA_TRACE_COMPILE=1 nvim src/Example.jl && \ # NOTE: You may need to check that neovim is correctly on your path
	rm -rf $${TMPDIR}bin/julia:
	mkdir -p bin
	echo "$$JULIA_SHIM" > $@
	chmod +x $@clean:
	rm -rf $(SYSIMAGE) packagecompiler bin.PHONY: clean defaultexport JULIA_SHIM
define JULIA_SHIM
#!/bin/bash
JULIA=${JULIA}
if [[ $${JULIA_TRACE_COMPILE} = "1" ]]; then
    exec $${JULIA} --trace-compile=${PWD}/packagecompiler/precompile_statements.jl "$$@"
elif [[ -f ${PWD}/$(SYSIMAGE) ]]; then
    exec $${JULIA} --sysimage=${PWD}/$(SYSIMAGE) "$$@"
else
    exec $${JULIA} "$$@"
fi
endefexport PACKAGE_CONTENT
define PACKAGE_CONTENT
# This file is opened in neovim with a LanguageServer.jl process that records Julia
# compilation statements for creating a custom sysimage.
#
# This file has a bunch of linter errors which will exercise the linter and record
# statements for that. When the diagnostic messages corresponding to those errors show up in
# the buffer the language server should be ready to accept other commands (note: this may
# take a while -- be patient). Here are some suggestions for various LSP functionality that
# can be exercised (your regular keybindings should work):
#
#  - :lua vim.lsp.buf.hover()
#  - :lua vim.lsp.buf.definition()
#  - :lua vim.lsp.buf.references()
#  - :lua vim.lsp.buf.rename()
#  - :lua vim.lsp.buf.formatting()
#  - :lua vim.lsp.buf.formatting_sync()
#  - :lua vim.lsp.buf.code_action()
#  - Tab completion (if you have set this up using LSP)
#  - ...
#
# When you are finished, simply exit neovim and PackageCompiler.jl will use all the recorded
# statements to create a custom sysimage. This sysimage will be used for the language server
# process in the future, and should result in almost instant response.module Exampleimport JSON
import fzf_jll
using Random
using Zlib_jllfunction hello(who, notused)
    println("hello", who)
    shuffle([1, 2, 3])
   shoffle([1, 2, 3])
    fzzf = fzf_jll.fzzf()
    fzf = fzf_jll.fzf(1)
    JSON.print(stdout, Dict("hello" => [1, 2, 3]), 2, 123)
    JSON.print(stdout, Dict("hello" => [1, 2, 3]))
    hi(who)
    return Zlib_jll.libz
endfunction world(s)
    if s == nothing
      hello(s)
  else
      hello(s)
  end
    x = [1, 2, 3]
    for i in 1:length(x)
        println(x[i])
    end
endend # module
endef

Run make. This will set up a dummy project and launch nvim with julia recording everything that is compiled. Wait until the LanguageServer responds (there are a bunch of things in this dummy project that will result in warnings) and then run some LanguageServer commands, for example ::lua vim.lsp.buf.hover() to fetch documentation).
Quit vim.
PackageCompiler will now build a custom languageserver.so sysimage.
Enjoy the Julia LSP!

References

Discussion:

How to Use Linear Models and Decision Trees in Julia

By: n Logan Kilpatrick n

Re-posted from: https://www.freecodecamp.org/news/linear-models-vs-decision-trees-in-julia/

How to Use Linear Models and Decision Trees in Julia undefined

juliabloggers.com

A Julia Language Blog Aggregator

Category Archives: Julia

Simple is best: do you really need your code to be fast?

Introduction

An example task

Adding lags using the high-level API function

What is the benefit of using the high-level API?

Conclusions

Setting Up Julia LSP for Neovim

Setting Up Julia LSP for Neovim

Bibliography

Table of Contents

General Guide

References

References

Discussion:

How to Use Linear Models and Decision Trees in Julia