Tag Archives: julialang

Julia and Python better together

Re-posted from: https://bkamins.github.io/julialang/2023/02/17/python.html

Introduction

Many data scientists with whom I discuss tell me that they like Julia, but
there are some functionalities in Python that they like and would want to
keep using.

In this post I want to show that if you are in this situation the answer is:
it is fine – you can work with Julia and keep using Python packages you like
as a part of your Julia workflow.

The post is tested under Julia 1.8.5 and Status PyCall.jl 1.95.1,
Conda.jl 1.8.0, PyPlot.jl 2.11.0, and GLM.jl 1.8.1.

Level 1: Popular Python packages have an interface in Julia

Many (if not majority of) Python packages are written in other languages (like
C++) and Python is only a wrapper around them. Most Python users do not care
(or even think) about it – they focus on getting their projects delivered.

The same approach can be used in Julia. Specifically, Julia package can be just
a wrapper around some Python package. Examples of such popular packages are:

PyPlot.jl wrapping Matplotlib;
Pandas.jl wrapping Pandas;
ScikitLearn.jl wrapping scikit-learn;
Seaborn.jl wrapping Seaborn;
Sympy.jl wrapping SymPy.

This is an easy scenario as all you need to do is install a Julia package
and you are ready to go and use your favorite Python package.

I will give here one example from Matplotlib documentation. I want to reproduce
the Python code given here:

# Python code
fig, ax = plt.subplots(figsize=(5, 2.7))
t = np.arange(0.0, 5.0, 0.01)
s = np.cos(2 * np.pi * t)
ax.plot(t, s, lw=2)
ax.annotate('local max', xy=(2, 1), xytext=(3, 1.5),
            arrowprops=dict(facecolor='black', shrink=0.05))
ax.set_ylim(-2, 2)

Now let us do the same in Julia using PyPlot.jl:

# Julia code
using PyPlot
fig, ax = plt.subplots(figsize=(5, 2.7))
t = 0.0:0.01:4.99
s = cos.(2 * π * t)
ax.plot(t, s, lw=2)
ax.annotate("local max", xy=(2, 1), xytext=(3, 1.5),
            arrowprops=Dict("facecolor" => "black", "shrink" => 0.05))
ax.set_ylim(-2, 2)

As you can see the codes are almost identical. The only differences are related
to syntax. For example ' is replaced by " and dict by Dict.

Both codes produce the following plot:

Example plot

Level 2: You can use any Python package from Julia

Not all Python packages have Julia wrappers. Also, in some cases you might
want to work with a different version or configuration of the Python package
than provided by the wrapper. Is this a problem? No, this is not a problem at
all:

You can load and use any Python package from Julia.

There are two Julia packages that allow for this PyCall.jl and
PythonCall.jl. Both packages provide the ability to directly call and
fully interoperate with Python from the Julia language. There are some
technical differences between them which are described here so that
you can decide which one you prefer.

Below I will give you an example of using PyCall.jl.

Assume you like using statsmodels package from Python and would want to
reproduce this example from its documentation:

# Python code
import numpy as np
import statsmodels.api as sm
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)
mod = sm.OLS(spector_data.endog, spector_data.exog)
res = mod.fit()
print(res.summary())

Let me show you how to reproduce it step-by-step in Julia.

We start with loading the packages:

using PyCall
sm = pyimport("statsmodels.api")

Note that using PyCall.jl we can load any Python package using the
pyimport function. The second line might give you an error like this:

julia> sm = pyimport("statsmodels.api")
ERROR: PyError (PyImport_ImportModule
...

This is not a problem. It is just an information that the statsmodels package
is not installed under Python. In this case you can easily install it from
Julia. Just do:

using Conda
Conda.add("statsmodels")

and you are ready to go. Now sm = pyimport("statsmodels.api") will work.

We are ready to build the model in Julia using statsmodels:

# Julia code
spector_data = sm.datasets.spector.load()
spector_data["exog"] = sm.add_constant(spector_data["exog"], prepend=false)
mod = sm.OLS(spector_data["endog", spector_data["exog"])
res = mod.fit()
res.summary()

and you get the output that is the same as in statsmodels documentation:

PyObject <class 'statsmodels.iolib.summary.Summary'>
"""
                            OLS Regression Results
==============================================================================
Dep. Variable:                  GRADE   R-squared:                       0.416
Model:                            OLS   Adj. R-squared:                  0.353
Method:                 Least Squares   F-statistic:                     6.646
Date:                Fri, 17 Feb 2023   Prob (F-statistic):            0.00157
Time:                        14:41:35   Log-Likelihood:                -12.978
No. Observations:                  32   AIC:                             33.96
Df Residuals:                      28   BIC:                             39.82
Df Model:                           3
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
GPA            0.4639      0.162      2.864      0.008       0.132       0.796
TUCE           0.0105      0.019      0.539      0.594      -0.029       0.050
PSI            0.3786      0.139      2.720      0.011       0.093       0.664
const         -1.4980      0.524     -2.859      0.008      -2.571      -0.425
==============================================================================
Omnibus:                        0.176   Durbin-Watson:                   2.346
Prob(Omnibus):                  0.916   Jarque-Bera (JB):                0.167
Skew:                           0.141   Prob(JB):                        0.920
Kurtosis:                       2.786   Cond. No.                         176.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors
is correctly specified.

Note that the differences in codes are minimal. Again I needed to adjust to
Julia syntax by changing False to false and doing dictionary access using
square brackets like in spector_data["exog"]. All else is identical (and for
this reason I put a comment on top showing which language is used as it could
be easily confused).

You might, however, ask if it is possible to use data from Python in Julia (or
data from Julia in Python). Yes – this is also supported. The only thing to
remember is that automatic conversion of values between Python and Julia is done
for a predefined list of most common types (like arrays, dictionaries).

Let me give an example how this is done by estimating the same regression using
GLM.jl. What I will do is transport the data as arrays from Python to Julia
(I chose this case as it is most commonly used in my experience).

using GLM
lm(spector_data["exog"].to_numpy(), spector_data["endog"].to_numpy())

And you get the output:

LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64,
LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:

Coefficients:
───────────────────────────────────────────────────────────────────
         Coef.  Std. Error      t  Pr(>|t|)   Lower 95%   Upper 95%
───────────────────────────────────────────────────────────────────
x1   0.463852    0.161956    2.86    0.0078   0.132099    0.795604
x2   0.0104951   0.0194829   0.54    0.5944  -0.0294137   0.0504039
x3   0.378555    0.139173    2.72    0.0111   0.0934724   0.663637
x4  -1.49802     0.523889   -2.86    0.0079  -2.57115    -0.42488
───────────────────────────────────────────────────────────────────

As you can see the results are the same, except that we have lost column
names. The reason is that we have used arrays to transport data from Python to
Julia (column names also could be transported, but I did not want to
complicate the example).

Conclusions

Today the conclusion is short (but for me extremely powerful):

From Julia you have all Julia and all Python packages available to use in
your projects.

If you know some package in Python and want to keep using it in Julia it is
easy. In most cases you can just copy-paste your Python code to Julia and do
minor syntax adjustments and you are done.

I find this interoperability of Julia and Python really amazing.

What is ∈ in Julia?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/02/10/in.html

Introduction

Today I decided to discuss the in function, which is a basic topic that,
from my teaching experience, often surprises people learning Julia.
I will cover several concrete cases that are worth knowing as either you might
use them yourself or might encounter them in the code that you would be reading.

The post is tested under Julia 1.8.5.

The basic syntax of `in`

in is a function in Julia. It is used to determine whether an item is in the
given collection.

Since in is a function you can invoke it using the standard function call
syntax:

julia> in(1, [1, 2, 3])
true

However, this operation is so common that there are two other ways to perform
this operation:

julia> 1 in [1, 2, 3]
true

julia> 1 ∈ [1, 2, 3]

If you wonder how to type ∈ then you can check it in Julia’s help:

help?> ∈
"∈" can be typed by \in<tab>

Since ∈ is the same as in you can also write:

julia> ∈(1, [1, 2, 3])
true

although this is likely not the most readable way to do it.

Finally there is an accompanying ∋ syntax that has the order of arguments
reversed:

julia> ∋([1, 2, 3], 1)
true

julia> [1, 2, 3] ∋ 1
true

help?> ∋
"∋" can be typed by \ni<tab>

Negating `in`

Often you want to check if some element is not in a collection. Here are the
standard ways you can do it (you could similarly negate ∈ and ∋):

julia> !in(1, [1, 2, 3])
false

julia> !(1 in [1, 2, 3])
false

However, there are also convenience ∉ and ∌ operators:

julia> 1 ∉ [1, 2, 3]
false

julia> [1, 2, 3] ∌ 1
false

help?> ∉
"∉" can be typed by \notin<tab>

help?> ∌
"∌" can be typed by \nni<tab>

Higher-order function

In all cases of in, ∈, ∋, ∉, and ∌ you can easily create a function
taking only one argument fixing the second argument of the operation.

For example writing in([1, 2, 3]) is equivalent to creation of an anonymous
function e -> e in [1, 2, 3]. Let us show this syntax at work:

julia> in([1, 2, 3])(1)
true

julia> ∋(1)([1, 2, 3])
true

julia> ∉([1, 2, 3])(1)
false

This syntax is particularly useful when working with higher-order functions:

julia> map(in(Set([1, 2, 3])), [-1, 1, 3, 5])
4-element Vector{Bool}:
 0
 1
 1
 0

Performance

In the last example above you probably noticed that I used Set instead of a
vector for lookup. This is an important pattern:

lookup in a vector does not have any preprocessing cost, but later in
execution time is, on the average, linear with the size of the vector
(advanced tip: if vector is sorted you can use the insorted function
instead and it will be faster);
lookup in a set has the cost of creating it, but later in execution time
does not grow with the size of the collection.

In summary, if you have a large collection in which you want to perform lookup
many times then make sure to convert this collection to a set (timings are after
compilation):

julia> v = rand(1:1_000_000, 10_000);

julia> @time count(in(v), 1)
  0.000022 seconds (2 allocations: 48 bytes)
0

julia> @time count(in(Set(v)), 1)
  0.000199 seconds (10 allocations: 144.648 KiB)
0

julia> @time count(in(v), 1:1_000_000)
  6.104646 seconds (5 allocations: 112 bytes)
9941

julia> @time count(in(Set(v)), 1:1_000_000)
  0.017825 seconds (13 allocations: 144.711 KiB)
9941

Note that if we made one lookup Set creation cost was significant, but if we
made one million lookups creation of a Set was crucial to ensure good
performance of the operation.

Broadcasting

It is tempting to run the operation:

julia> map(in(Set([1, 2, 3])), [-1, 1, 3, 5])
4-element Vector{Bool}:
 0
 1
 1
 0

using broadcasting like this:

julia> in.([-1, 1, 3, 5], Set([1, 2, 3]))
ERROR: DimensionMismatch: arrays could not be broadcast to a common size; got a dimension with lengths 4 and 3

However, this fails, because broadcasting iterates both arguments of the in
function [-1, 1, 3, 5] and Set([1, 2, 3]). There are two ways how you can
fix it. The first is protecting the collection in which you want to perform
lookup using Ref:

julia> in.([-1, 1, 3, 5], Ref(Set([1, 2, 3])))
4-element BitVector:
 0
 1
 1
 0

julia> [-1, 1, 3, 5] .∈ Ref(Set([1, 2, 3]))
4-element BitVector:
 0
 1
 1
 0

The other is to use higher-order function approach:

julia> in(Set([1, 2, 3])).([-1, 1, 3, 5])
4-element BitVector:
 0
 1
 1
 0

How does `in` lookup work?

The final issue is related to the definition of in. It states that in checks
if an item is in the given collection. But what does it mean exactly?

First you need to understand how the collections are iterated. If you do not
know much about this topic you can find a description of the iteration interface
in my recent post.

A typical example that is tricky is Dict lookup. Since Dict iterates
key-value pairs the following is incorrect:

julia> 1 in Dict(1 => "a", 2 => "b")
ERROR: AbstractDict collections only contain Pairs;

Instead you most likely wanted:

julia> 1 in keys(Dict(1 => "a", 2 => "b"))
true

The second issue is how does in check for equality between an item and
elements of the collection. This issue is particularly tricky. Normally the
== function is used, but for Set and Dict the isequal function is used.

Here are some examples showing you the difference:

julia> v = [1.0, missing, -0.0]
3-element Vector{Union{Missing, Float64}}:
  1.0
   missing
 -0.0

julia> s = Set(v)
Set{Union{Missing, Float64}} with 3 elements:
  missing
  -0.0
  1.0

julia> d = Dict(v .=> 'a':'c')
Dict{Union{Missing, Float64}, Char} with 3 entries:
  missing => 'b'
  -0.0    => 'c'
  1.0     => 'a'

julia> missing in v
missing

julia> missing in s
true

julia> missing in keys(d)
true

julia> (missing => 'b') in d
true

julia> 0.0 in v
true

julia> -0.0 in s
true

julia> 0.0 in s
false

julia> 0.0 in keys(d)
false

julia> -0.0 in keys(d)
true

julia> (0.0 => 'c') in d
false

julia> (-0.0 => 'c') in d
true

The reason for these results is:

julia> missing == missing
missing

julia> isequal(missing, missing)
true

julia> 0.0 == -0.0
true

julia> isequal(0.0, -0.0)
false

Conclusions

As you can see the in function has several non-obvious behaviors in terms
of:

syntax: you can use five different operations: in, ∈, ∋, ∉, and ∌;
performance: be careful to avoid performance trap of doing many lookups in a
vector;
lookup rule: Set and Dict use isequal test, while normally == is used;
this is especially relevant in combination with performance recommendation –
you might get a different result of your operations if you switch from vector
to Set because you wanted to speed-up your computations.

All topics I discussed today are documented in the Julia Manual. However,
I hope that having them presented by example in a single place in this post is
useful for you.

Grand Julia containers test. Can you get a perfect score?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/02/03/iterable.html

Introduction

Containers are objects grouping multiple values together.
In Julia, examples of containers are vectors or dictionaries.

To ensure that it is easy to work with containers Julia introduces four
interfaces:

iteration;
indexing;
array;
broadcasting.

They are described in the Interfaces section of the Julia Manual.
However, this description is a bit technical and it mostly aimed at developers
creating new container types. Therefore, I thought to write a post
aimed to explain these interfaces by example from user’s perspective.

The material presented here is organized in two parts.

Part one is basic knowledge you should have. I tried to collect in this post
most relevant information.

Part two is organized as a quiz to check your understanding of the discussed
interfaces. There are ten questions in the quiz (plus one bonus question).
Do you know an answer to all of them?

The post is tested under Julia 1.8.5 and DataFrames.jl 1.4.4.

Container interfaces explained

Iteration

Iteration is the least demanding interface. It ensures that you can get elements
from a container sequentially.

Most of containers are iterable, including: arrays, dictionaries, sets,
I/O buffers, and strings.

If some container supports iteration then it can be used in for loops,
comprehensions, and higher other functions relying only on iteration of values
like foreach.

Here is a basic example of iterating a tuple:

julia> t = (1, 2, 3)
(1, 2, 3)

julia> for v in t
           println(v)
       end
1
2
3

julia> [v for v in t]
3-element Vector{Int64}:
 1
 2
 3

An important feature of iteration interface is that it also works for
collections that get mutated when iterating them. A basic example is reading
data from IOBuffer:

julia> buf = IOBuffer(join('a':'d', '\n'))
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=7, maxsize=Inf, ptr=1, mark=-1)

julia> foreach(println, eachline(buf))
a
b
c
d

julia> foreach(println, eachline(buf))

julia>

Note that by iterating buf line by line we were mutating it. Therefore, the
second time we iterate it we do not get any output.

When you use iteration interface it is always essential to have a mindset that
you are guaranteed to be able to read an element of a collection once.

Indexing

Indexing allows you do access elements from a container using their identifier
typically called index. This, implicitly, means that you can read the same
element multiple times without a problem (as opposed to iteration).

The key feature of this interface is that it supports a convenient x[i] syntax
for getting elements from a container. Again, let us give a simple example:

julia> d = Dict(v => '0'+v for v in s)
Dict{Int64, Char} with 4 entries:
  4 => '4'
  2 => '2'
  3 => '3'
  1 => '1'

julia> d[3]
'3': ASCII/Unicode U+0033 (category Nd: Number, decimal digit)

In practice in Julia this interface is implemented in several flavors.
The most important of them are the following.

First, indexing can support only reading or reading and writing of data
to the container. For example entries of a Vector can be mutated, but ranges
are read-only:

julia> v1 = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> v1[2] = 12
12

julia> v2 = 1:3
1:3

julia> v2[2] = 12
ERROR: CanonicalIndexError: setindex! not defined for UnitRange{Int64}

Lesson to remember: do not assume that indexable objects are always writeable.

The second flavor is that typically indexing interface assumes that it is
possible to order valid indices and define first and last index. This can be
conveniently exploited by using begin and end keywords when indexing:

julia> v1[begin]
1

julia> v1[end]
3

However, this support is not guaranteed by all objects that support indexing.
A basic example is a dictionary:

julia> d[end]
ERROR: MethodError: no method matching lastindex(::Dict{Int64, Char})

So the second thing to remember is that you cannot assume that all indexable
objects will support passing begin and end keywords when indexing.

Array

Arrays move the indexing syntax further by allowing passing multiple values in
x[i, j, ...] syntax and also introducing a notion of CartesianIndex
indexing. Typically types that support array interface opt-in to be a subtype of
AbstractArray. An important feature of arrays is that they specify their
dimensions and using the size function one can get them. Similarly using the
axes functions one can get valid indices along the dimensions.

Here is an example:

julia> mat = [1 2 3; 4 5 6]
2×3 Matrix{Int64}:
 1  2  3
 4  5  6

julia> size(mat)
(2, 3)

julia> axes(mat)
(Base.OneTo(2), Base.OneTo(3))

julia> mat[1, 2]
2

julia> mat[CartesianIndex(1, 2)]
2

An important feature that is often needed is iteration over all elements of
an array. You can get an iterator of such indices using the eachindex
function.

It is important to remember that this function only guarantees to return
a value that is a valid index to an array. The type of this index is determined
based on an assessment of indexing efficiency. Here is an example:

julia> for idx in eachindex(mat)
           print(idx, " ")
       end
1 2 3 4 5 6
julia> for idx in eachindex(@view mat[1:2, 2:3])
           print(idx, " ")
       end
CartesianIndex(1, 1) CartesianIndex(2, 1) CartesianIndex(1, 2) CartesianIndex(2, 2)

One notable feature here is that although mat is a matrix, and supports
passing indices for all dimensions when indexing, like e.g. mat[1, 2] or
mat[CartesianIndex(1, 2)] we see that eachindex(mat) returns integer indices
(because they are faster). Such indices are called linear indices and are
supported by all arrays in Julia. Therefore you can write:

julia> mat[3]
2

This is an important feature to remember so let me stress it again: any array in
Julia can be indexed using a single integer index, called linear index.

The second, somewhat surprising, feature to remember, is that you can append as
many trailing 1 when indexing any array as you like, without affecting the
result (such extra 1s are ignored):

julia> mat[2, 2]
5

julia> mat[2, 2, 1, 1, 1]
5

Here you can find more explanations about this topic.

Broadcasting

Broadcasting is a convenient way of performing operations on arrays. In
particular it allows for combining arrays of different sizes in a single
operation. The rule is the following: when you operate on two arrays that do not
have matching dimensions then expand dimensions of length 1 to the required
length by repeating the values as needed. Here is a quick example:

julia> ["a" "b"] .^ [1, 2, 3]
3×2 Matrix{String}:
 "a"    "b"
 "aa"   "bb"
 "aaa"  "bbb"

We take a matrix ["a" "b"] having one row and two columns and a vector
having one column and three rows. As you can see single row of a matrix
gets repeated three times. Similarly single column of a vector gets repeated
two times. In this way the dimensions of both objects match and we can perform
the computation. Notably, our [1, 2, 3] vector has only one dimension, so the
column dimension was added to it by broadcasting and set to 1 (as you likely
remember when we discussed array indexing you were allowed to put trailing 1s
without affecting the result).

Broadcasting in Julia is performed by using a ., you can read more about it
here. This is a very convenient syntax. For this reason people started
to use it in many contexts. Because of this popularity one issue immediately
surfaced that broadcasting is useful also in cases when they are not working
with arrays. Here is an example:

julia> ["a"] .^ [1, 2, 3]
3-element Vector{String}:
 "a"
 "aa"
 "aaa"

While this works writing ["a"] is not very convenient. Therefore, broadcasting
was extended to allow for non-arrays to take part in operations. You can thus
write:

julia> "a" .^ [1, 2, 3]
3-element Vector{String}:
 "a"
 "aa"
 "aaa"

Where "a" pretends to be an array having length 1 in all dimensions that
are defined by the other container. This pretending is implemented in
cases where it is useful to think of a given value as as an array. A special,
and most common, case is when the value pretends to be a scalar. The easiest
way to think about a scalar is that it has 0 dimensions (so effectively in all
dimensions it can be treated as having length 1 and be appropriately
expanded).

Sometimes you want a value to be treated as a scalar even if it is not a scalar.
In such case you can wrap it with Ref. Ref creates a 0-dimensional object
that behaves like a scalar. Here is an example:

julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> r = Ref(x)
Base.RefValue{Vector{Int64}}([1, 2, 3])

julia> size(r)
()

julia> r[]
3-element Vector{Int64}:
 1
 2
 3

Note that we could extract the value storing in r by indexing without passing
any index (because it is 0-dimensional). Now let us have a look when this is
useful:

julia> [1, 2, 3] .∈ [1, 3, 4]
3-element BitVector:
 1
 0
 0

julia> [1, 2, 3] .∈ Ref([1, 3, 4])
3-element BitVector:
 1
 0
 1

The first operation was not very useful. It was equivalent to writing
[1 ∈ 1, 2 ∈ 3, 3 in ∈ 4], probably not what we wanted. In the second operation
we protected the second vector with Ref so that it was used as a whole
for every element of [1, 2, 3] vector.

Using Ref is also needed when some object does not have a default behavior
that it pretends to be an array implemented. A common example are standard
dictionaries:

julia> haskey.(d, [1, 2, 5])
ERROR: ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved

julia> haskey.(Ref(d), [1, 2, 5])
3-element BitVector:
 1
 1
 0

Now you should have an understanding of iteration, indexing (including array
indexing), and broadcasting.

Quiz: problems

What is the order of iteration of arrays?
Are tuples broadcastable?
Are numbers iterable and indexable?
Are atomic numbers iterable, indexable, or broadcastable?
Is Set iterable, indexable, or broadcastable?
Is Dict iterable, indexable, or broadcastable?
Is named tuple iterable, indexable, or broadcastable?
What is special about string iteration, indexing, and broadcasting?
Is Pair iterable, indexable, or broadcastable?
Is pairs(["a", "b", "c"]) iterable, indexable, or broadcastable?
Bonus question: is DataFrame iterable, indexable, or broadcastable? (it is
the only type not from Base Julia in the list, but I could not resist the
temptation to include it)

Quiz: answers

1. What is the order of iteration of arrays?

With arrays, you need to remember that they are iterated in column major order
(just as in linear indexing we discussed):

julia> x = [1 2; 3 4]
2×2 Matrix{Int64}:
 1  2
 3  4

julia> for v in x
           print(v, " ")
       end
1 3 2 4

2. Are tuples broadcastable?

Yes, you can use tuples in broadcasting. They are handled in the same way
as vectors. The only difference is that if you only broadcast a tuple you
get a tupe as a result, while, if you mix tuples with arrays of dimension
at least 1 you get an array:

julia> 1 .+ (1, 2, 3)
(2, 3, 4)

julia> fill(1) .+ (1, 2, 3) # fill(1) produces 0-dimensional array
(2, 3, 4)

julia> [1] .+ (1, 2, 3) # here [1] is 1-dimensional array
3-element Vector{Int64}:
 2
 3
 4

3. Are numbers iterable and indexable?

Yes they are. They are treated as 0-dimensional containers.

julia> x = 1
1

julia> for v in x
           println(v)
       end
1

julia> x[]
1

julia> x[1, 1, 1]
1

julia> size(x)
()

julia> axes(x)
()

4. Are atomic numbers iterable, indexable, or broadcastable?

No. They only support indexing with no arguments passed:

julia> x = Threads.Atomic{Int}(1)
Base.Threads.Atomic{Int64}(1)

julia> for v in x
           println(v)
       end
ERROR: MethodError: no method matching iterate(::Base.Threads.Atomic{Int64})

julia> x[]
1

julia> x[1]
ERROR: MethodError: no method matching getindex(::Base.Threads.Atomic{Int64}, ::Int64)

julia> x .+ 1
ERROR: MethodError: no method matching length(::Base.Threads.Atomic{Int64})

5. Is Set iterable, indexable, or broadcastable?

It is iterable and broadcastable, but not indexable:

julia> s = Set([1, 2, 3])
Set{Int64} with 3 elements:
  2
  3
  1

julia> for v in s
           println(v)
       end
2
3
1

julia> s .+ 1
3-element Vector{Int64}:
 3
 4
 2

Note that when you iterate or broadcast Set the order of returned values
is undefined. It is a common pitfall when doing broadcasting:

julia> s
Set{Int64} with 3 elements:
  2
  3
  1

julia> [1, 2, 3] .∈ s # note the iteration order of s
3-element BitVector:
 0
 0
 0

julia> [1, 2, 3] .∈ Ref(s) # this is what you really wanted
3-element BitVector:
 1
 1
 1

6. Is Dict iterable, indexable, or broadcastable?

It is iterable, and partially indexable, but not broadcastable:

julia> d = Dict(v => '0'+v for v in s)
Dict{Int64, Char} with 4 entries:
  4 => '4'
  2 => '2'
  3 => '3'
  1 => '1'

julia> foreach(println, d)
4 => '4'
2 => '2'
3 => '3'
1 => '1'

julia> foreach(show, keys(d))
4231

julia> foreach(show, values(d))
'4''2''3''1'

For dictionaries, a basic thing to remember is that iterating them returns
key-value Pairs. If you want only keys, use keys to create an appropriate
iterator. Similarly to get values use the values function. Also the order of
iteration of a standard Dict is undefined.

julia> d[1]
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)

They are partially indexable, since they do not support firstindex and
lastindex methods.

Finally they are not broadcastable:

julia> d .+ 1
ERROR: ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved

7. Is named tuple iterable, indexable, or broadcastable?

They are iterable, indexable, but not broadcastable:

julia> nt = (a=1, b=2, c=3)
(a = 1, b = 2, c = 3)

julia> foreach(println, nt)
1
2
3

julia> foreach(println, pairs(nt))
:a => 1
:b => 2
:c => 3

julia> nt[end]
3

julia> nt .+ 1
ERROR: ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved

Note that if you want to get iterate key-value in a named tuple you need to
use the pairs function.

8. What is special about string iteration, indexing, and broadcasting?

When you iterate strings you get characters. String indexing is supported, but
does not have to be continuous. Instead they support byte-indexing, as I have
explained in detail in this post. In broadcasting they behave like a
scalar:

julia> str = "1₂3"
"1₂3"

julia> foreach(display, str)
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
'₂': Unicode U+2082 (category No: Number, other)
'3': ASCII/Unicode U+0033 (category Nd: Number, decimal digit)

julia> collect(eachindex(str))
3-element Vector{Int64}:
 1
 2
 5

julia> str[2]
'₂': Unicode U+2082 (category No: Number, other)

julia> str[5]
'3': ASCII/Unicode U+0033 (category Nd: Number, decimal digit)

julia> string.(["a", "b"], str)
2-element Vector{String}:
 "a1₂3"
 "b1₂3"

Notice, that character '₂' has index 2, but the next valid index is 5,
because '₂' occupies three bytes.

9. Is Pair iterable, indexable, or broadcastable?

It is iterable and indexable. In broadcasting it is treated as a scalar:

julia> p = :a => sin
:a => sin

julia> foreach(display, p)
:a
sin (generic function with 14 methods)

julia> p[1]
:a

julia> p[end]
sin (generic function with 14 methods)

julia> tuple.(p, (1, 2))
((:a => sin, 1), (:a => sin, 2))

10. Is pairs(["a", "b", "c"]) iterable, indexable, or broadcastable?

It is iterable and partially indexable, but not broadcastable, although it
has length, size, and axes:

julia> pv = pairs(["a", "b", "c"])
pairs(::Vector{String})(...):
  1 => "a"
  2 => "b"
  3 => "c"

julia> foreach(println, pv)
1 => "a"
2 => "b"
3 => "c"

julia> pv[1]
"a"

julia> pv[3]
"c"

julia> length(pv)
3

julia> size(pv)
(3,)

julia> axes(pv)
(Base.OneTo(3),)

julia> identity.(pv)
ERROR: ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved

11. Is DataFrame iterable, indexable, or broadcastable?

Data frame is indexable (but requires to always pass two dimensional index)
and broadcastable, but not iterable. To iterate it
you need to choose if you want to iterate rows or columns:

julia> using DataFrames

julia> df = DataFrame(a=1:3, b=11:13)
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1     11
   2 │     2     12
   3 │     3     13

julia> df[2, 1]
2

julia> df[end, end]
13

julia> df[1]
ERROR: ArgumentError: syntax df[column] is not supported use df[!, column] instead

julia> df .^ 2
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1    121
   2 │     4    144
   3 │     9    169

julia> foreach(println, df)
ERROR: AbstractDataFrame is not iterable. Use eachrow(df) to get a row iterator or eachcol(df) to get a column iterator

julia> foreach(println, eachrow(df))
DataFrameRow
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1     11
DataFrameRow
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   2 │     2     12
DataFrameRow
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   3 │     3     13

julia> foreach(println, eachcol(df))
[1, 2, 3]
[11, 12, 13]

Conclusions

I hope you found the examples I included in the post useful and that after
reading it your grasp of working with containers in Julia improved!

juliabloggers.com

A Julia Language Blog Aggregator

Tag Archives: julialang

Julia and Python better together

Introduction

Level 1: Popular Python packages have an interface in Julia

Level 2: You can use any Python package from Julia

Conclusions

What is ∈ in Julia?

Introduction

The basic syntax of `in`

Negating `in`

Higher-order function

Performance

Broadcasting

How does `in` lookup work?

Conclusions

Grand Julia containers test. Can you get a perfect score?

Introduction

Container interfaces explained

Quiz: problems

Quiz: answers

Conclusions

Introduction

Level 1: Popular Python packages have an interface in Julia

Level 2: You can use any Python package from Julia

Conclusions

Introduction

The basic syntax of in

Negating in

Higher-order function

Performance

Broadcasting

How does in lookup work?

Conclusions

Introduction

Container interfaces explained

Quiz: problems

Quiz: answers

Conclusions

The basic syntax of `in`

Negating `in`

How does `in` lookup work?