Tag Archives: julialang

Updating views in DataFrames.jl

Re-posted from: https://bkamins.github.io/julialang/2021/09/17/views.html

Introduction

Today I want to preview a feature that will be introduced in 1.3 release
of DataFrames.jl. We will talk about new ways of updating the columns
of a data frame, when one is working with views. My objective is to explain
the rationale behind the new functionality and the way it works.

This post was tested under Julia 1.6.1 and DataFrames.jl checked out at
main branch on Sep 17, 2021 (SHA-1 facb6721e7450c63f2d5684b78e3c3489ed999b0)

What is a `SubDataFrame` and when it is useful?

In DataFrames.jl you can construct views of data frame object using the
view function or the @view macro exactly like you can create views of arrays
in Julia Base. Here is a simple example:

julia> using DataFrames

(@v1.6) pkg> st DataFrames
      Status `~/.julia/environments/v1.6/Project.toml`
  [a93c6f00] DataFrames v1.2.2 `https://github.com/JuliaData/DataFrames.jl.git#main`

julia> df = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

julia> dfv = @view df[2:3, :]
2×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     2      5
   2 │     3      6

Now the dfv object is a view of df data frame. It means that it references
to the same data in memory as the parent data frame df, but allows
to access only a slice of it: in our case we have picked rows 2 and 3 and
all columns.

The key features of a view are:

mutating its contents also mutates the contents of the parent data frame;
it is cheap to create as it is enough to store only the reference to the parent
data frame and which rows and columns got selected;
it is memory efficient (no copying of data happens);
using it has a small computational overhead as when we index a view we need
to perform transformation of these indices to the parent data frame indices.

Let us show the first feature as it is most important from the functionality
perspective:

julia> dfv[1, 1] = 100
100

julia> dfv
2×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │   100      5
   2 │     3      6

julia> df
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │   100      5
   3 │     3      6

julia> df[3, 1] = 200
200

julia> dfv
2×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │   100      5
   2 │   200      6

julia> df
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │   100      5
   3 │   200      6

As you can see changing dfv also changes df, and vice versa – changing df
also changes dfv (if the changed cells are selected in the view).

To understand performance consider two simple implementations of a procedure
computing 90% confidence interval of correlation between two variables using
bootstrapping:

julia> using Statistics

julia> function bootcor1(df, c1, c2, n)
           cors = Float64[]
           for _ in 1:n
               tmp = df[rand(1:nrow(df), nrow(df)), :]
               push!(cors, cor(tmp[!, c1], tmp[!, c2]))
           end
           return quantile(cors, [0.05, 0.95])
       end
bootcor1 (generic function with 1 method)

julia> function bootcor2(df, c1, c2, n)
           cors = Float64[]
           for _ in 1:n
               tmp = @view df[rand(1:nrow(df), nrow(df)), :]
               push!(cors, cor(tmp[!, c1], tmp[!, c2]))
           end
           return quantile(cors, [0.05, 0.95])
       end
bootcor2 (generic function with 1 method)

(the functions could be further optimized for performance but I did not want to
overly complicate the code)

The difference between bootcor1 and bootcor2 is that the former copies a
data frame, while the latter uses a view. Both take four parameters:

df: a data frame to analyze
c1, c2: column identifiers of columns we want to compute the correlation;
n: number of bootstrapping samples;

Now create a simple data frame and compare the performance of both functions
(I present timings after compilation):

julia> df = DataFrame(rand(10^5, 10), :auto);

julia> @time bootcor1(df, :x1, :x2, 10_000)
 47.059650 seconds (430.02 k allocations: 81.976 GiB, 1.88% gc time)
2-element Vector{Float64}:
 -0.007373812772086598
  0.0029150608879804406

julia> @time bootcor2(df, :x1, :x2, 10_000)
 11.239822 seconds (80.02 k allocations: 7.453 GiB, 0.92% gc time)
2-element Vector{Float64}:
 -0.007643923412421664
  0.002966538851599437

As you can see, because the data frame was wide (10 columns), we saved a lot of
time by avoiding copying of the data.

Of course if the data frame were narrower we would not see such a difference:

julia> df = DataFrame(rand(10^5, 2), :auto);

julia> @time bootcor1(df, :x1, :x2, 10_000)
 10.829548 seconds (190.02 k allocations: 22.363 GiB, 1.60% gc time)
2-element Vector{Float64}:
 -0.006650139955186956
  0.0038227359319118795

julia> @time bootcor2(df, :x1, :x2, 10_000)
 10.963020 seconds (80.02 k allocations: 7.453 GiB, 0.53% gc time)
2-element Vector{Float64}:
 -0.006575024146232311
  0.0038253588364537162

The reason is that now while using a view still allocates less this is offset
by the fact that working with views has some computational overhead as it was
explained above.

What is new for `SubDataFrame` in DataFrames.jl 1.3?

Let us start with our original small data frame:

julia> df = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

Assume you wanted to assign a 1.5 value in the first row of column :a.
Before the upcoming DataFrames.jl 1.3 release it is quite cumbersome. If you
try doing it you get:

julia> df[1, :a] = 1.5
ERROR: InexactError: Int64(1.5)

You need to do two steps:

promote the element type of column :a to allow Float64 values;
perform the assignment.

Here is a way to do it:

julia> df.a = Vector{Float64}(df.a)
3-element Vector{Float64}:
 1.0
 2.0
 3.0

julia> df[1, :a] = 1.5
1.5

julia> df
3×2 DataFrame
 Row │ a        b
     │ Float64  Int64
─────┼────────────────
   1 │     1.5      4
   2 │     2.0      5
   3 │     3.0      6

Here is one more, well known, example of a similar situation, that sometimes
surprises users:

julia> df[1, :b] = 'a'
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

julia> df
3×2 DataFrame
 Row │ a        b
     │ Float64  Int64
─────┼────────────────
   1 │     1.5     97
   2 │     2.0      5
   3 │     3.0      6

In this case Julia silently converted Char value 'a' to its Int
representation which is 97.

The key change in the 1.3 release of DataFrames.jl is that views will allow to
use ! as row index (currently it is disallowed). The mechanics of this
functionality is the same as when ! is used for DataFrame objects – a
column will get replaced in the data frame.

A natural question is the following with what will it get replaced? It is quite
valid as we are replacing only a portion of the column. The design decision we
took is that promote_type will be used to decide the element type of the new
column combining the element type of the already present column and element type
of the newly assigned values.

Therefore in our examples above, when using a view you get the following:

julia> df = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

julia> dfv = @view df[1:1, :]
1×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4

julia> dfv[!, :a] = [1.5]
1-element Vector{Float64}:
 1.5

julia> dfv[!, :b] .= 'a'
1-element Vector{Char}:
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

julia> df
3×2 DataFrame
 Row │ a        b
     │ Float64  Any
─────┼──────────────
   1 │     1.5  a
   2 │     2.0  5
   3 │     3.0  6

julia> dfv
1×2 SubDataFrame
 Row │ a        b
     │ Float64  Any
─────┼──────────────
   1 │     1.5  a

As you can see it works both with standard assignment as well as with
broadcasted assignment.

Admittedly you still have to make two steps in the process:

create a view;
perform an assignment to it.

This is a bit cumbersome. Fortunately we can expect that in the future
DataFramesMeta.jl will provide a convenience syntax to perform
conditional assignment using this feature, e.g. like in data.table, where you
can write something like df[x == 1, y := 2] to set column y to 2 if
column x is equal to 1.

One special case that is often required is adding columns. It is supported
with both : and ! row selectors (like for DataFrame objects). In this case
we do not have a reference column in a parent data frame, so rows that are not
included in the view are filled with missing.

Here are two examples:

julia> dfv[!, :c] = ["x"]
1-element Vector{String}:
 "x"

julia> dfv[:, :d] .= true
1-element Vector{Bool}:
 1

julia> df
3×4 DataFrame
 Row │ a        b    c        d
     │ Float64  Any  String?  Bool?
─────┼────────────────────────────────
   1 │     1.5  a    x           true
   2 │     2.0  5    missing  missing
   3 │     3.0  6    missing  missing

julia> dfv
1×4 SubDataFrame
 Row │ a        b    c        d
     │ Float64  Any  String?  Bool?
─────┼──────────────────────────────
   1 │     1.5  a    x         true

The only limitation is that in this case it is only allowed if SubDataFrame
was created with : as column selector. The reason of this limitation is that
when one uses : selector we are guaranteed that SubDataFrame has the same
columns and in the same order as its parent, so the requested operation is
guaranteed not to be problematic in interpretation (otherwise we would have to
handle e.g. the case when we want to add a column whose name is not present in
the SubDataFrame but is present in its parent which could confuse users).

Conclusions

In summary the new functionality allows to replace columns in a data frame
through its view. The two main intended use cases of this feature are:

adding new columns for which we have data only for some rows
(selected in the view); it is only allowed when SubDataFrame
was created with : as column selector;
updating data in existing columns even if the new elements cannot be converted
to the element type of existing column; in this case promote_type is used to
determine the target column element type.

Vim for Julia — Another Look

By: DSB

Re-posted from: https://medium.com/coffee-in-a-klein-bottle/vim-for-julia-another-look-1dc4265bb49b?source=rss-8bd6ec95ab58------2

Vim for Julia — Another Look

LunarVim as a Julia IDE

In the past, I wrote on how to use Vim for Julia. Recently, I’ve changed my setup and I’ve been using the new and amazing LunarVim. Here is a brief tutorial on how to setup Vim (actually Neovim) as your Julia IDE.

Introducing LunarVim

The first question to be answered is, what is LunarVim? Presto!

LunarVim is an opinionated, extensible, and fast IDE layer for Neovim >= 0.5.0. LunarVim takes advantage of the latest Neovim features such as Treesitter and Language Server Protocol support.

In simpler words, it’s a series of default configurations for Neovim that makes it even more amazing. The first requirement to use LunarVim is to install Neovim with version at least 0.5. Unfortunately, the sudo apt install neovim will not work (at the time I’m writing this), because the version installed will be lower than the required one.

An easy way to install a proper version is to add the PPA for the unstable version and install it. Here are the easy steps:

sudo add-apt-repository ppa:neovim-ppa/unstable
sudo apt update
sudo apt install neovim

To install LunarVim, just run:

bash <(curl -s https://raw.githubusercontent.com/lunarvim/lunarvim/master/utils/installer/install.sh)

By running the command lvim in your terminal, LunarVim should start! You can always add alias vim = "lvim" to your .bashrc , to run LunarVim instead of vim.

Setting up Julia

Now that you’ve got LunarVim working, let’s setup the Julia language. This is actually surprisingly simple. In your terminal, run the following:

julia --project=~/.julia/environments/nvim-lspconfig -e 'using Pkg; Pkg.add("LanguageServer")'

This installs the Language Server Protocol (LSP) for Julia, i.e. the auto-completion functionally for Julia. Now, every time you open a “.jl” file, just wait a bit and the LSP will start.

Next, let’s install the Julia-Vim package that will enable us using Unicode, thus, by writing something like \alpha and pressing tab, we’ll get the alpha Unicode, which is allowed in Julia. To do this, go to your LunarVim configurations file, which can be accessed by running lvim in the terminal and selecting the Configuration option. Another way is to open the configuration file directly, which is ~/.config/lvim/config.lua .

Inside the configuration, there is a place where you can easily install any plugin you want. Just navigate to where the “- – Additional Plugins” is. Originally, everything should be commented. Just uncomment the necessary lines, and write {"JuliaEditorSupport/julia-vim"} and save. This will prompt the installation of the plugging. Take a look at the figure below.

This is what your configuration should look like to install Julia-Vim. Note that you can add any other plugins you like.

Word of Caution!

Since Vim is inside your terminal, you need your terminal to have a font with Unicode enabled. I suggest you install JuliaMono, a beautiful font created for Julia :D. Once the font is installed, just go into your terminal configuration and change to it.

Even with Unicode font enabled, in my notebook, the Julia-Vim was still freezing after pressing Tab without any text. To solve this issue, I wrote the following two commands in my LunarVim configuration:

vim.cmd("let g:latex_to_unicode_tab = 'off'")
vim.cmd("let g:latex_to_unicode_keymap = 1")

Screenshot of my own configuration file.

Now everything should be working beautifully! Your new LunarVim IDE for Julia is ready to be used.

Fast Course on LunarVim + Julia

You can now read the documentation on LunarVim to better understand some of the default settings. But, I’ll give some tips on how LunarVim works, and how to use it with Julia to run your code. Here is a (very) fast course on some of the main commands for LunarVim:

In LunarVim, your <leader> is the space , hence, many short cuts will be composed of pressing “space” followed by something else. Here is where LunarVim comes shining. If you just press “space” and wait a bit. A menu will pop-up from below, showing possible commands.

LunarVim comes with “NerdCommenter” plugin, which allows you to navigate with a menu. Just press <space>+e .

Once you open another file, a buffer is created and shown in the top of the screen . You can click on the tab to change buffers, or you can press shift+h or shift+l to change buffers.

Press ctrl+w to see the commands related to splitting screen and moving between screens. You can press ctrl+ h,l,j,k to move between windows.

As in regular Vim, you can press / to start searching and then space+h clears the highlighted terms in the search.

When you open a Julia file, you just need to wait a bit for the LSP to start working. Once the LSP is loaded, auto-completion will be working, and many other helpful features, such as visualizing the docstring of a function. For that, press g and a window will pop-up with many shortcuts related to the LSP. For example, you can press g + p to take a peek at the docstring and g + d to actually open the function definition in another buffer. Look the figures below.

Lastly, you can press ctrl+t to open and minimize a floating terminal. Once this is done, you can run the Julia REPL and copy/paste every line of code you want to run.

There are many other helpful commands. Check-out the documentation on LunarVim or try them out by yourself to learn more. Hope this was helpful.

Check out my Github for the complete configuration file.

Vim for Julia — Another Look was originally published in Coffee in a Klein Bottle on Medium, where people are continuing the conversation by highlighting and responding to this story.

Fixed width strings in CSV.jl

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2021/09/10/stringx.html

Introduction

In my recent post I have written how using inline strings can help to in
crease join performance. Since with 0.9 release of CSV.jl these strings are
used by default to read CSV files so I want to focus solely on this
topic today.

The post was written under Julia 1.6.1, DataFrames.jl 1.2.2,
WeakRrefStrings.jl 1.3.0, BenchmarkTools.jl 1.1.4, and CSV.jl 0.9.1.

Introducing inline strings

WeakRefStrings.jl defines 8 inline string types:

julia> using WeakRefStrings

julia> st = subtypes(InlineString)
8-element Vector{Any}:
 String1
 String127
 String15
 String255
 String3
 String31
 String63
 String7

What is important is that all these types are bits types, as you can see here:

julia> using DataFrames

julia> sort(DataFrame(st=st,
                      isbits = isbitstype.(st),
                      sizeof = sizeof.(st)), :sizeof)
8×3 DataFrame
 Row │ st         isbits  sizeof
     │ Any        Bool    Int64
─────┼───────────────────────────
   1 │ String1      true       1
   2 │ String3      true       4
   3 │ String7      true       8
   4 │ String15     true      16
   5 │ String31     true      32
   6 │ String63     true      64
   7 │ String127    true     128
   8 │ String255    true     256

The suffix of the name of the specific string type indicates the maximum size of
the string it can hold. So for example String3 can hold strings that have
maximum size 3 as returned by sizeof. Here is an example:

julia> String3("123")
"123"

julia> String3("1234")
ERROR: ArgumentError: string too large (4) to convert to String3

julia> String3("∀")
"∀"

julia> String3("∀1")
ERROR: ArgumentError: string too large (4) to convert to String3

As you can see here it is important to remember that some characters (like ∀)
in UTF-8 encoding take more than one code unit.

You can use the InlineString function to create an inline string of
automatically selected minimal size:

julia> typeof(InlineString("∀"))
String3

julia> typeof(InlineString("∀1"))
String7

Finally, as we can see the maximum size of inline string is 255, so:

julia> InlineString("a"^256)
ERROR: ArgumentError: string too large (256) to convert to InlineString

In summary, as you can see, the String[N] types are similar to CHAR(N)
types that are provided by many data bases. The limitation is that N
can take only several fixed values.

Why and when to use inline strings?

There are two benefits of inline strings.

The first is that they can be faster. A special case when this is visible are
equality comparisons which are quite common in practice.

julia> using Random, BenchmarkTools

julia> Random.seed!(1234);

julia> s1 = [randstring(3) for i in 1:1000];

julia> s2 = String3.(s1);

julia> @benchmark $s1 .== permutedims($s1)
BenchmarkTools.Trial: 1064 samples with 1 evaluation.
 Range (min … max):  3.928 ms …   6.435 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.910 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.697 ms ± 404.737 μs  ┊ GC (mean ± σ):  0.06% ± 0.91%

  ▅▃                             ▆ ▃  ▃█▄▂
  ██▇▇▅▇▅▅▄▆▄▄▄▅▆▅▅▁▄▆▅▅▇▅▄▁▅█▄▄▁█▇█▇▇█████▇▅▄▅▅▄▁▄▁▄▄▅▄▁▁▅▅▄ █
  3.93 ms      Histogram: log(frequency) by time      5.49 ms <

 Memory estimate: 126.53 KiB, allocs estimate: 6.

julia> @benchmark $s2 .== permutedims($s2)
BenchmarkTools.Trial: 4879 samples with 1 evaluation.
 Range (min … max):  888.150 μs …  2.037 ms  ┊ GC (min … max): 0.00% … 47.53%
 Time  (median):       1.043 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.023 ms ± 77.610 μs  ┊ GC (mean ± σ):  0.23% ±  2.38%

  ▅     ▃ ▁        ▁    ▅▃   █▆                                ▁
  █▅▃▃█████▅▃▁█▆▅▅▁█▇▃▁▃██▅▁▄███▆▅▄▆▇▅▇▅▅▇▅▅▄▄▃▅▅▅▅▃▁▃▅▄▃▁▃▁▃▄ █
  888 μs        Histogram: log(frequency) by time      1.22 ms <

 Memory estimate: 126.53 KiB, allocs estimate: 6.

The second is that they are not heap allocated so they do not lead to
significant GC latency. This topic was presented in the post on join
performance.

So what are the potential shortcommings. There are several:

they have a limited capacity, so one might not be able to always
rely that the conversion to inline string is possible;
they are not efficient when we work with strings of highly variable length;
mixing of inline strings in collections can lead to type instabilities.

I have already discussed the first topic. Now, let us handle the second and
third one in consecutive sections.

Memory footprint of inline strings

Consider the following collection:

julia> x = [String127("a"^i) for i in 1:100]
100-element Vector{String127}:
 "a"
 "aa"
 "aaa"
 ⋮
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

Let us check how expensive it is to perform its copy:

julia> @benchmark copy($x)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  677.100 ns … 131.135 μs  ┊ GC (min … max):  0.00% … 95.08%
 Time  (median):       1.224 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.464 μs ±   4.174 μs  ┊ GC (mean ± σ):  13.84% ±  4.88%

                    ▁▅███▇▆▄▂
  ▂▁▁▁▁▁▂▂▂▂▂▂▂▂▂▃▃▆██████████▇▅▅▅▅▅▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▁▁▁▂▂▂ ▃
  677 ns           Histogram: frequency by time         2.11 μs <

 Memory estimate: 12.62 KiB, allocs estimate: 1.

Now we convert it to a standard String:

julia> y = String.(x)
100-element Vector{String}:
 "a"
 "aa"
 "aaa"
 ⋮
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

julia> @benchmark copy($y)
BenchmarkTools.Trial: 10000 samples with 973 evaluations.
 Range (min … max):  64.867 ns …   1.592 μs  ┊ GC (min … max):  0.00% … 82.37%
 Time  (median):     80.387 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   98.944 ns ± 111.194 ns  ┊ GC (mean ± σ):  13.24% ± 10.98%

  ▆█▄▂                                                         ▁
  ██████▇▆▆▆▄▅▄▃▄▃▁▃▁▁▁▁▁▁▁▃▅▆▅▄▅▇▅▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▅▅▄▅▅▅▅▇▆▆▅▅ █
  64.9 ns       Histogram: log(frequency) by time       830 ns <

 Memory estimate: 896 bytes, allocs estimate: 1.

As you can see the operation on String is much faster. The reason is that x
stores 100 entries of which each has 128 bytes, while y stores 100
pointers to strings that have only 8 bytes of size. See:

julia> sizeof(x)
12800

julia> sizeof(y)
800

So much less data movement is involved if we performed copying of pointers only.

The conclusion is that it is probably better to use String type if we expect
to work with collections of strings that have a highly variable length.

Collections of inline strings

The second potential issue is working with collections of inline strings.

When you read in string data from a file CSV.jl will automatically create
columns of appropriate widths:

julia> using CSV

julia> str = """
       w1,w2,w3,w4
       a,a,a,a
       b,bb,bb,bb
       c,cc,ccc,ccc
       d,dd,ddd,dddd
       """
"w1,w2,w3,w4\na,a,a,a\nb,bb,bb,bb\nc,cc,ccc,ccc\nd,dd,ddd,dddd\n"

julia> df = CSV.read(IOBuffer(str), DataFrame)
4×4 DataFrame
 Row │ w1       w2       w3       w4
     │ String1  String3  String3  String7
─────┼────────────────────────────────────
   1 │ a        a        a        a
   2 │ b        bb       bb       bb
   3 │ c        cc       ccc      ccc
   4 │ d        dd       ddd      dddd

However, one has to be careful when converting to inline string manually. Have a
look:

julia> strs2 = InlineString.(strs)
8-element Vector{InlineString}:
 "a"
 "aa"
 "aaa"
 "aaaa"
 "aaaaa"
 "aaaaaa"
 "aaaaaaa"
 "aaaaaaaa"

julia> typeof.(strs2)
8-element Vector{DataType}:
 String1
 String3
 String3
 String7
 String7
 String7
 String7
 String15

Sometimes this is indeed what one would want (the narrowest possible
representation at the cost of having a collection of abstract element type).
But currently if you would want to have an automatic type promotion and a
concrete element type you would have to do it manually e.g.:

julia> String15.(strs)
8-element Vector{String15}:
 "a"
 "aa"
 "aaa"
 "aaaa"
 "aaaaa"
 "aaaaaa"
 "aaaaaaa"
 "aaaaaaaa"

Operations on inline strings

Additionally one should know that most operations transforming inline strings
will currently produce a String by default, e.g.:

julia> s = String3("a")
"a"

julia> typeof(s^2)
String

julia> typeof(uppercase(s))
String

which also is best kept in mind when working with them.

Conclusions

Inline strings are an excellent addition to Julia, however, one should know the
type of task they were designed to help with.

As you could see in my examples inline strings are ideal for situations where we
have millions of relatively small and strings that have relatively homogeneous
size and are not mutated.

This use case might seem restrictive, but actually in practice it is quite
common as e.g. all sorts of customer or product identifiers have exactly this
nature.

Finally, some of the limitations I listed in this post might be lifted in the
future. If you need some functionality please do not hesitate to open an issue
on WeakRefStrings.jl GitHub repository.

juliabloggers.com

A Julia Language Blog Aggregator

Tag Archives: julialang

Updating views in DataFrames.jl

Introduction

What is a `SubDataFrame` and when it is useful?

What is new for `SubDataFrame` in DataFrames.jl 1.3?

Conclusions

Vim for Julia — Another Look

Vim for Julia — Another Look

LunarVim as a Julia IDE

Introducing LunarVim

Setting up Julia

Word of Caution!

Fast Course on LunarVim + Julia

Fixed width strings in CSV.jl

Introduction

Introducing inline strings

Why and when to use inline strings?

Memory footprint of inline strings

Collections of inline strings

Operations on inline strings

Conclusions

Introduction

What is a SubDataFrame and when it is useful?

What is new for SubDataFrame in DataFrames.jl 1.3?

Conclusions

Vim for Julia — Another Look

LunarVim as a Julia IDE

Introducing LunarVim

Setting up Julia

Word of Caution!

Fast Course on LunarVim + Julia

Introduction

Introducing inline strings

Why and when to use inline strings?

Memory footprint of inline strings

Collections of inline strings

Operations on inline strings

Conclusions

What is a `SubDataFrame` and when it is useful?

What is new for `SubDataFrame` in DataFrames.jl 1.3?