By: Carl Vogel

Re-posted from: http://slendermeans.org/julia-iterators.html

## Introduction

I want to spend some time messing about with iterators in Julia. I think they

not only provide a familiar and useful entry point into Julia’s type system and dispatch

model, they’re also interesting in their own right.1 Clever application of iterators can

help to simplify complicated loops, better express their intent, and improve

memory usage.

A word of warning about the code here. Much of the it isn’t idiomatic Julia and I wouldn’t

necessarily recommend using this style in a serious project. I also can’t speak

to its performance vis-a-vis more obvious Julian alternatives. In some cases,

the style of the code examples below may help reduce memory usage, but

performance is not my main concern. (This may be the first blogpost about Julia

unconcerned with speed). Instead, I’m just interested in different ways of

expressing iteration problems.

For anyone who’d like to play along at home, there’s an IJulia notebook of

this material on Github, which can be viewed on nbviewer here.

## The Iterator Protocol

What do I mean by iterators?2 I mean any `I`

in Julia that works on

the right hand side of the statement `for i = I ...`

. That is, anything you can for-loop

over. This includes not only data collections like Arrays, Dicts, and Sets, but

also more abstract types like Ranges, as well as what I’ll call “higher order”

iterators such as those that result from `zip`

or `enumerate`

functions.

As an equivalent definition, an iterator in Julia is any type that implements

the **iterator protocol**. The iterator protocol is comprised of three methods:

`start`

, `next`

, and `done`

. So any type in Julia for which these three methods

are defined is an iterator. It might be a dumb iterator or a broken iterator,

but it’s an iterator.

Since the `for`

statement works on iterators, and iterators are just a

collection of methods, we can define any for loop using calls to those methods.

For example, this simple for loop

```
arr = [10:-2:1]
for i = arr
println(i^2)
end
```

```
100
64
36
16
4
```

is equivalent to this

```
state = start(arr);
while !done(arr, state)
i, state = next(arr, state)
println(i^2)
end
```

```
100
64
36
16
4
```

In this example, the `start`

method provides the initial state of the iterator;

the `next`

method returns the value of the array at a given state, as well as what the

next state is. Finally, the `done`

method returns `true`

when we’ve gone past

the end of the iterator, informing the loop that it should stop.

If you know Python, the idea of the iterator protocol is probably familiar. In

Python, any object can be an iterator if it has the methods `__iter__`

and

`__next__`

. But notice the lack of side effects in the Julia implementation

—calling `start`

or `next`

on the array has no affect on the array itself. `start`

is

basically a constant, always returning the value of the initial state whenever

you pass it the same type of iterator. And `next`

doesn’t really increment

anything; it’s just a mapping from *current state → (value, next
state)*. In general, the iterator itself has no internal state being incremented or

changed as you pass through a loop.

## An iterator’s state

More concretely, what’s the *state* of an iterator? How the state is

defined, and an iterator’s sequence of states depends on the type of

the iterator itself. It’s best to look at some examples.

### Arrays.

Arrays are very intuitive iterators. They have states that are just integer

values from 1 to the length of the array. So `start`

returns 1.

```
arr = ["one", "two", "three", "four", "five", "six"]
start(arr)
```

```
1
```

The `next`

mapping is *i* → *i+1*, and the value of the iterator at any state

`i`

is just `a[i]`

.

```
for i = 1:4
println("next(arr, $i) = ", next(arr, i))
end
```

```
next(arr, 1) = ("one",2)
next(arr, 2) = ("two",3)
next(arr, 3) = ("three",4)
next(arr, 4) = ("four",5)
```

If this were a multidimensional array, say 3×2 instead of 6×1, we’d

get the same result; iteration would just proceed along the rows of the matrix.

The `done`

method returns true when the state is `i =`

. You might think it’d be

length(a) + 1`length(a)`

, but recall the for-equivalent while loop

above. Having `done`

return true at the last index of the array would prevent

the loop from executing on the last element. So in our 6-element array, `done`

is true when the state hits 7.

```
println(done(arr, 6)) # not yet
println(next(arr, 6))
```

```
false
("six",7)
```

```
done(arr, 7)
```

```
true
```

### Ranges

Ranges have states that looks similar to arrays, except they start at zero.

```
rng = 11:20 # length 10 range
start(rng) # 0
```

```
0
```

But the relationship between the current and next state is the same: *i* → *i+1*.

```
for i = [0, 1, 9, 10]
println("next(rng, $i) = ", next(rng, i))
end
```

```
next(rng, 0) = (11,1)
next(rng, 1) = (12,2)
next(rng, 9) = (20,10)
next(rng, 10) = (21,11)
```

Since we start at zero, the done state is one less than the equivalent array.

```
done(rng, 10)
```

```
true
```

### Unordered collections: Dicts, Sets, etc.

Arrays and ranges have a natural order, so the evolution of state is

straightforward. But what about collections such as dictionaries and sets that have no inherent

order? Like in many languages, such things can be iterated over, but the order

of iteration is not easily predictable.

For example, here’s a dictionary:

```
dictit = {:one => 1, :three => 3, :five => 5, :five => "five!"}
```

```
{:one=>1,:three=>3,:five=>"five!"}
```

The starting state isn’t 0 or 1, as would be natural for an ordered collection.

```
s0 = start(dictit)
```

```
3
```

And while `next`

maps state *i* to state *j*, the relationship between *i* and *j*

is not obvious. Here, while the first state is 3, the second is 11, and the rest

are similarly weird.

```
_, s1 = next(dictit, s0)
```

```
((:one,1),11)
```

```
_, s2 = next(dictit, s1)
```

```
((:three,3),13)
```

```
_, s3 = next(dictit, s2)
```

```
((:five,"five!"),17)
```

```
done(dictit, s3)
```

```
true
```

The states, you probably and correctly suspect, are tied to the internal

implementation of the dictionary, e.g. how the keys are hashed. So the state

doesn’t follow a predictable 1, 2, 3, … order, and what order of elements we

get when iterating is essentially unpredictable.

But because for loops handle the iterator’s states for us, we rarely if ever have to worry about

the representation of an iterator’s state. The for loop implicitly calls the `start`

,

`done`

, and `next`

methods, which do all this bookkeeping for us.

## Iterators and Delayed Evaluation

While many iterators are collections of data in memory, like Arrays, Dicts, or

Sets, iterators can also represent abstract collections that aren’t held in memory.

Range is a good example. When we iterate over the range `1:10`

, we get the

sequence 1, 2, 3, …, 10. But in memory, this range is comprised of only two

integers, 1 and 10. The values in between are only evaluated when we’re looping over it.

From https://github.com/JuliaLang/julia/blob/master/base/range.jl, here’s how a

Range’s iterator protocol is defined:

```
start(r::Ranges) = 0
next{T}(r::Range{T}, i) = (oftype(T, r.start + i*step(r)), i+1)
next{T}(r::Range1{T}, i) = (oftype(T, r.start + i), i+1)
done(r::Ranges, i) = (length(r) <= i)
```

Notice that the `next`

method calculates the value of the iterator in state

`i`

. This is different from an Array iterator, which just reads the element

`a[i]`

from memory.

Iterators that exploit delayed evaluation like this can have important performance

benefits. If we want to iterate over the integers 1 to 10,000, iterating over an

Array means we have to allocate about 80MB to hold it. A Range only requires

16 bytes; the same size as the range 1 to 100,000 or 1 to 100,000,000.

### Application: Iterating over Fibonacci numbers

Here’s another example of an iterator which computes values on demand, using the

`next`

method to do the work. `fibit(n)`

is an iterator over the first `n`

Fibonacci numbers. When the iterator is constructed, it doesn’t calculate all of

these numbers. Instead it waits for its `next`

method to be called, providing

the next Fibonacci number depending on the current one.

```
# Iterator produces the first n Fibonacci numbers
immutable FibIt{T<:Integer}
last2::(T, T)
n::Integer
end
fibit(n::Integer) = FibIt((0, 1), n)
# Specify types, e.g. BigInt to prevent overflow.
fibit(n::Integer, T::Type) = FibIt{T}((0, 1), n)
Base.start(fi::FibIt) = 1
function Base.next(fi::FibIt, state)
if state == 1
return (1, 2)
else
fi.last2 = fi.last2[2], sum(fi.last2)
(fi.last2[2], state + 1)
end
end
Base.done(fi::FibIt, state) = state > fi.n
```

```
for i = fibit(10)
print(i, " ")
end
```

```
1 1 2 3 5 8 13 21 34 55
```

### Tasks/Co-routines

This talk of iterators with delayed evaluation may remind Pythonistas of

generators. And Julia has a type that is basically equivalent to Python’s

generators, called Tasks. A Task is constructed by calling the `Task()`

constructor (or

`@task`

macro) on a function with a `produce`

statement, which issimilar to Python’s

`yield`

.

Instead of using the `Fibit`

type above, we could get an equivalent iterator by

defining a Task that produces sequential Fibonacci numbers.

```
function fibtask(n::Integer, T::Type)
a, b = (zero(T), one(T))
produce(1)
function _it()
for i = 1:n
produce(b)
a, b = (b, a+b)
end
end
Task(_it)
end
fibtask(n::Integer) = fibtask(n, Int)
```

Once we’ve made the task, we get iteration for free.

```
for i in fibtask(10)
print(i, " ")
end
```

```
1 1 2 3 5 8 13 21 34 55
```

Whether you create an iterator using a type with the iterator protocol, or by

constructing a Task, is up to you. There are pros and cons to each approach. By

defining your iterator as a specific type, you can dispatch lots of other

functions on it. Here, on the other hand, `fibtask`

is just a `Task`

type, so

defining methods for it means defining methods for all Tasks, which may be

undesirable or infeasible. On the other hand Tasks give you iterators with less

code. Below I’ll show an example of an iterator that’s hard to define with the

iterator protocol methods, but easy to define as a Task. And of course, Tasks

are coroutines, and can be used in those contexts.

## Realizing Iterators without loops

So far, we’ve talked about iterators in the context of for loops. We saw that

`for i = I`

was a construct for calling `I`

‘s `start`

, `done`

and `next`

methods, letting us realize and operate on the values in the iterator.

But there are functions which can take iterators as inputs and implicitly iterate over them

to some desired result. This obviates the need for explicit for loops, and can

make for cleaner more functional code. Some examples follow.

`collect`

and `reduce`

The `collect`

function takes an iterator input, realizes all its values, and

*collects* them into an array.

```
collect(fibit(10))
```

```
10-element Array{Any,1}:
1
1
2
3
5
8
13
21
34
55
```

The `reduce`

function similarly realizes the values of an iterator, but then successively

applies an operator to them to give a scalar result.

```
reduce(+, fibit(10))
```

```
143
```

That reduce operation is equivalent to the `sum`

function called with an

iterator argument.

```
sum(fibit(10))
```

```
143
```

In this next line of code, I compute the sum of the reciprocals of the first

10,000 Fibonacci numbers (which should be close to this), using `collect`

to first put them into an array.

```
sum(1 ./ collect(BigInt, fibit(10_000, BigInt)))
```

```
3.359885666243177553172011302918927179688905133731968486495553815325130318996609
e+00 with 256 bits of precision
```

### Comprehensions

The `collect`

function may remind you of an array comprehension, and it is

similar, but here we see array comprehension don’t work on our iterator:

```
[f for f = fibit(10)]
```

```
no method length(FibIt{Int64})
while loading In[26], in expression starting on line 1
in anonymous at no file
```

What’s going on is that the array comprehension wants to allocate an array,

then fill it in with the values of the iterator. Since it doesn’t know the

iterator’s length (how many values it will produce), it doesn’t know how large

an array to allocate.3 We can fix this for our Fibonacci iterator by

giving it a `length`

method.

```
Base.length(it::FibIt) = it.n
[f for f = fibit(10)]
```

```
10-element Array{Int64,1}:
1
1
2
3
5
8
13
21
34
55
```

Now we can redefine our sum-of-reciprocals using a comprehension instead of `collect`

.

```
sum([1/f for f = fibit(10_000, BigInt)])
```

```
3.359885666243177553172011302918927179688905133731968486495553815325130318996712
e+00 with 256 bits of precision
```

What if we tried this with our Fibonacci task?

```
[f for f = fibtask(10)]
```

```
no method length(Task)
while loading In[27], in expression starting on line 1
in anonymous at no file
```

We get the same issue; Tasks don’t have a length method. The advantage of using

the `FibIt`

type is that we can easily define a length method for it. We can

only give our Fibonacci task a method if we give all Tasks a length method,

which wouldn’t make sense.

## The Iterator Package

When we calculated the sum of the reciprocals of Fibonacci number above, we had

to realize the values of the Fibonacci iterator before taking the

reciprocal, and then sum a collection of all those values. Alternatively, we could have called sum on an

iterator that produced *1/x* for each Fibonacci number *x*.

One way to do this would be to create a new iterator type, called

`ReciprocalFibIt`

, and given it its own `start`

, `next`

, and `done`

methods. But that

feels a little excessive. Wouldn’t it be nicer to be able to construct that iterator from

the Fibonacci iterator we already have? Essentially saying, “hey, I want another

iterator that gives one over the values of that other iterator.”

That would be an example of what I’ll call a *higher-order iterator*, which is

an iterator constructed from one or more other iterators. `zip`

and `enumerable`

are common examples.

It turns out Julia has a neat little package of useful higher-order iterators;

called, obviously, Iterators. In the rest of (this already very long) post, I’ll

explore some of things in the package. Pythonistas will notice similarities with

the itertools module in the Standard Library.

```
using Iterators
```

## Imap

The `Imap`

iterator provides us with our wish above: a new iterator that is the

result of applying a function to the values of an existing iterator. In the case of our

reciprocal Fibonacci numbers, that function is `x -> 1/x`

.

```
recipricalfib = imap(x -> 1/x, fibit(10_000, BigInt)) # A new iterator, composed
# from a FibIt
psi = sum(recipricalfib) # No collect needed
```

```
3.359885666243177553172011302918927179688905133731968486495553815325130318996609
e+00 with 256 bits of precision
```

So `reciprocalfib`

is itself an iterator, whose values are only realized when

it’s passed to the `sum`

function. We didn’t have to allocate any arrays before

calling sum as with the `collect`

and comprehension examples above.

## An IFilter iterator

Since we have a map-like iterator, why not a filter?4 How would it work? Given an

iterator that produces values *v1*, *v2*, *v3*, …, the filter iterator would

only produce the values that met some predicate, skipping any that didn’t.

This isn’t implemented in the Iterators package (because `Base.filter`

will

already do this, see footnote 4). It’s a neat idea, but it turns

out to be tricky to define in terms of the iterator protocol. It’s easy with a

Task, though.

```
function ifilter(f::Function, itr)
function _it()
for i = itr
if f(i)
produce(i)
end
end
end
Task(_it)
end
```

### Application: A list of primes whose digits sum to a prime

Here’s an example of it in action. We’ll begin with a Range iterator from 1 to

1,000. I want to list all of numbers in that range that are (1) prime and

(2) have digits that sum to a prime.

So `ifilter`

takes the predicate test and the original iterator, then produces only

those values from the original iterator that pass the test. Turns out there are

89 such primes between 1 and 1,000.

```
function funnyprimetest(n::Integer)
sumdigits = sum([parseint(string(c)) for c in string(n)])
isprime(n) & isprime(sumdigits)
end
collect(ifilter(funnyprimetest, 1:1000))
```

```
89-element Array{Any,1}:
2
3
5
7
11
23
29
41
43
47
61
67
83
⋮
829
863
881
883
887
911
919
937
953
971
977
991
```

## Repeat and RepeatForever

Another surprisingly useful iterator is `Repeat`

, which simply produces an object

some number of times. Here the iterator is just the string “echo!” five times.

```
for i = repeated("echo!", 5)
println(i)
end
```

```
echo!
echo!
echo!
echo!
echo!
```

If we didn’t provide the second argument, the result would be an iterator that

goes on infinitely, so its for loop would never terminate. Why would you want

that? I’ll show some examples of its use below.

### Extension: Repeating impure functions

One thing about the `Repeat`

iterator though, is that the object or value it

repeats is fixed at its construction. If you pass it a called function, it will

call that function once in the constructor, and then repeatedly return the

result of that first call. For pure functions, that’s fine. The first call of

`sqrt(100)`

is the same as the second, third, or ten-thousandth call of

`sqrt(100)`

.

If the function is impure, though, we’ll get undesired results.

```
for i = repeated(rand(), 10) println(i) end
```

```
0.30142748588653046
0.30142748588653046
0.30142748588653046
0.30142748588653046
0.30142748588653046
0.30142748588653046
0.30142748588653046
0.30142748588653046
0.30142748588653046
0.30142748588653046
```

Here, the `rand`

function was called once in the constructor, and that result was repeated again

and again. I’d prefer if I could get 10 separate calls to `rand`

. Here’s one way

to get this to work.

```
Base.next(it::Iterators.Repeat{Function}, state) = it.x(), state - 1
Base.next(it::Iterators.RepeatForever{Function}, state) = it.x(), nothing
# Note the function isn't called in the constructor;
# the `next` function does this.
for i = repeated(rand, 10) println(i) end
```

```
0.6621100826024566
0.755346320113107
0.021395943367805037
0.7304018818932669
0.22941680891855865
0.966762896262876
0.13729437119070198
0.028788242666101915
0.5584434146272579
0.09166900954689794
```

What I’ve done is create new `next`

methods for the `Repeat`

and `RepeatForever`

iterators. When the object of the iterators is a function, the `next`

methods

call the function. By passing the iterator an uncalled function object, I avoid the call

in the constructor, and defer it to the `next`

method.

## Take and Drop

The `Take`

iterator only iterates over some specified first values of its input

iterator. It works well in combination with infinite iterators, like `RepeatForever`

```
randsforever = repeated(rand)
[r for r = take(randsforever, 10)]
```

```
10-element Array{Any,1}:
0.719153
0.660597
0.280763
0.54125
0.427029
0.919311
0.165029
0.796911
0.354417
0.678271
```

```
[r for r = take(randsforever, 20)]
```

```
20-element Array{Any,1}:
0.568741
0.614644
0.49445
0.0942616
0.518134
0.126585
0.961748
0.698277
0.0805089
0.32351
0.797422
0.513762
0.601515
0.616174
0.460832
0.813204
0.172391
0.444915
0.732941
0.0550762
```

The `Drop`

iterator, on the other hand, *ignores* some specified first values of its

input iterator. So, how many values should be printed in this for loop?

```
for i = drop(take(randsforever, 10_000), 9998)
println(i)
end
```

Answer: just the last two, since we take 10,000 random numbers, but drop the first 9,998.

```
0.26830900957427684
0.5141969888172926
```

### Extension: TakeWhile and TakeUntil

In some cases you may not want to take a fixed number of values from an

iterator, but instead you want to take values until some condition is met.

To accomplish this, I’ll create a `TakeWhile`

iterator, which takes values from

its input iterator so long as they pass some test.

```
immutable TakeWhile{I}
xs::I
cond::Function
end
takewhile(xs, cond) = TakeWhile(xs, cond)
Base.start(it::TakeWhile) = start(it.xs)
Base.next(it::TakeWhile, state) = next(it.xs, state)
function Base.done(it::TakeWhile, state)
i, _ = next(it, state)
!it.cond(i) || done(it.xs, state)
end
tw = takewhile(1:10, x -> x^2 < 25)
collect(tw)
```

```
4-element Array{Int64,1}:
1
2
3
4
```

Let’s also create a `TakeUntil`

iterator, which takes values until it finds one that

passes the test. So the last value produced by this iterator will pass the test

and all values before that will have failed.

```
immutable TakeUntil{I}
xs::I
cond::Function
end
takeuntil(xs, cond) = TakeUntil(xs, cond)
Base.start(it::TakeUntil) = start(it.xs), false
function Base.next(it::TakeUntil, state)
i, s = next(it.xs, state[1])
i, (s, it.cond(i))
end
function Base.done(it::TakeUntil, state)
s, iscond = state
iscond || done(it.xs, s)
end
```

```
collect(takeuntil(1:10, x -> x*x >= 25)) # x <= sqrt(25) -> 1:5
```

```
5-element Array{Any,1}:
1
2
3
4
5
```

### Application: How long does it take a Poisson process to produce a prime number?

As an application of the `TakeUntil`

iterator, an experiment. How many draws do

we have to make from a Poisson process until we draw a prime number? For this

example, I’ll use a Poisson with mean 5,000.

In the code, we make a `Repeat`

iterator that repeatedly draws from the

Poisson. We pass this into `takeuntil`

and this creates an iterator that draws

from the Poisson until we find a prime number. While this is happening, we keep track of the

number of steps we took through this iterator.

```
# Draw random integers from a distibrution d until you get a prime number.
# Return the number of draws.
function primetime(d, dparams)
randgen = () -> rand(d(dparams...))
tu = takeuntil(repeated(randgen), isprime)
time = 0
for i = tu
time += 1
end
time
end
primetime_poiss5k = () -> primetime(Poisson, 5000)
```

What’s the average wait for a prime? Repeating the experiment 10,000 times, we

find the average number of draws is between 7 and 8.

```
mean(repeated(primetime_poiss5k, 10_000))
```

```
7.6783
```

To see the distribution of waiting times, I’ll collect each repetition of the

experiment in an array that we can plot.

```
times = [t::Int for t = repeated(primetime_poiss5k, 10_000)]
```

## Partition

The `Partition`

iterator split its input iterator into pieces, producing an

iterator over iterators. For example we could use it to partition the Range

iterator `1:100`

into two iterators, `1:50`

and `51:100`

. We can also make

overlapping partitions, for example, `1:50`

, `2:51`

, `3:52`

, etc.

### Application: Moving average

One useful application of overlapping partitions is computing moving

averages. The following code imports Google’s historical stock price from Yahoo

Finance and computes its 60-day moving average.

First, we download the data, creating a 2-D array containing dates, volumes, and prices.

```
const googdata =
"http://ichart.finance.yahoo.com/table.csv?s=GOOG&d=0&e=7&f=2014&g=d&a=0&b=7&c=2013&ignore=.csv" |>
download |>
open |>
readall |>
s -> split(s, "\n") |>
a -> map(l -> split(l, ","), a) |>
a -> filter(l -> contains(l[1], "201"), a) |>
reverse # Dates start at most recent, so reverse for chron order.
```

We then create iterators over the dates and closing prices in the Array. These

iteratively extract and parse values from the relevant columns.

```
dates = imap(r -> date(r[1]), googdata)
close = imap(r -> parsefloat(r[7]), googdata)
```

Now we can make 60-day sub-period partitions and compute the average of

each. Since I’m using `imap`

nothing has been calculated yet. These are all just

iterators promising to do work when called.

```
ma60 = imap(mean, partition(close, 60, 1))
# NB: The Julian way to do this would be
# [mean(price[i-59:i]) for i = 60:length(price)]
```

With all these useful iterators defined, I can just collect them into arrays for

plotting.

```
plot(layer(x=collect(dates), y=collect(close), Geom.line),
layer(x=collect(dates)[60:end], y = collect(ma60), Geom.line),
Guide.ylabel("Price"),
Guide.title("GOOG Daily Stock Price 60-Day Moving Avg."))
```

## Groupby

While the `Partition`

iterator makes partitions of specified lengths, the

`Groupby`

iterator splits an iterator based on some condition. One caveat is

that the input iterator has to be sorted in some way on the groupby condition,

so that values with the same condition are adjacent in the iterator.

### Application: Do Labor Force figures follow Benford’s Law?

In this example, I’m going to look at Benford’s Law using the `Groupby`

iterator. Benford’s Law posits that the leading digits of numbers in many

data sources follows a regular distribution. I’ll use the `Groupby`

iterator to

group the data by first digit and check this.

The data I’ll examine is the size

of the labor force population in each U.S. county in 2012.

```
const lfdata = "http://www.bls.gov/lau/laucnty12.txt" |>
download |>
open |>
readall |>
s -> split(s, "\r\n") |>
a -> filter(x -> length(x) == 125, a) |> # Rows with data
a -> map(x -> strip(x[79:92]), a) |> # Column w/ LF data
a -> map(x -> replace(x, ",", ""), a) |> # 1,000 -> 1000
x -> x[2:end] |> # Remove header
sort
```

The analysis is simple with a `Groupby`

iterator. It splits up the data by

leading digit, and then I just calculate the frequency of each leading digit in

the data by taking the length of each leading-digit group as a share of

the total length of the data.

```
dgroups = groupby(lfdata, s -> s[1]) # Groups by first digit
# Extract the digit from the group members
digits = imap(i -> parseint(string(i[1][1])), dgroups)
# Compute the frequency
frequency = imap(x -> length(x) / length(lfdata), dgroups)
```

Benford’s Law posits that the frequency of digit *d* in data should be

log_{10}(*d+1*) – log_{10}(*d*). This function prints out a

table of the observed frequencies next to the expected frequencies per Benford’s Law.

```
benfordcheck = function(obs_freqs, digits)
pred_freqs = map(d -> log10(d+1) - log10(d), digits)
println("")
println("Digit Frequency Compared to Benford's Law")
println("=========================================")
println("")
println("Digit Observed Expected Difference");
for (d, o, e) in zip(digits, obs_freqs, pred_freqs)
@printf( "%5d %9.2f %9.2f %11.2f\n", d, 100*o, 100*e, 100*(o-e))
end
end
```

We can see the labor force data follows Benford’s Law quite closely.

```
benfordcheck(frequency, digits)
```

```
Digit Frequency Compared to Benford's Law
=========================================
Digit Observed Expected Difference
1 30.09 30.10 -0.01
2 16.46 17.61 -1.15
3 12.02 12.49 -0.48
4 9.72 9.69 0.03
5 8.29 7.92 0.37
6 6.30 6.69 -0.39
7 6.02 5.80 0.23
8 5.84 5.12 0.72
9 5.25 4.58 0.67
```

To plot the comparison, I can collect the values from our iterators into a

DataFrame.

```
benford_df = DataFrame(# Extract the digit
digits = collect(digits),
observed = collect(frequency),
expected = map(d -> log10(d+1) - log10(d), digits))
```

## Iterate

Though its name might be a little confusing, the `Iterate`

iterator is one of my

favorites. It recursively applies a function to a starting value, that is

`f(...f(f(f(x)))...)`

. I come across applications for it all over the place.

### Application: Autoregressive time series processes

One application is producing autoregressive time series processes. An AR(1)

process has the form *x _{t+1} = px_{t} + e_{t+1}*, where

*e*is some random noise. If

We define the function

*f(x) = px + e*, then

*x*as a function

_{t+2}of

*x*is

_{t}*f(f(x*. Subsequent values can be similarly

_{t}))produced by iteratively applying the function.

First the code for the AR(1) function itself, along with a helper function for

plotting a realization of the process.

```
function ar(phi::Float64, sigma::Float64)
x -> phi * x + sigma * randn()
end
plotar(arseq, title) = plot(x=1:length(arseq), y=arseq, Geom.line,
Guide.xlabel("Time"), Guide.ylabel(""),
Guide.title(title))
```

Defining a coefficient and a standard deviation for the random variable, I pass

them through a process that creates an iterator that recursively applies the

function, starting with a randomly-drawn initial value. Then I collect 250 values of

that iterator and plot them.

```
const ar1coef = 0.9
const ar1sigma = 0.15
(ar1coef, ar1sigma) |>
x -> apply(ar, x) |>
f -> iterate(f, ar1sigma*rand()) |>
i -> collect(take(i, 250)) |>
s -> plotar(s, "AR(1) Time Series")
```

This idea can be extended an AR(p) process, where the current value of *x*

depends on several past values. Whereas the coefficient was a scalar in the

AR(1) model, it’s a matrix now, but the formula is otherwise the same.

```
function ar(coeffs::AbstractVector{Float64}, sigma::Float64)
p = length(coeffs)
Phi = [coeffs', eye(p)[1:(end-1),:]]
Sigma = [sigma, zeros(p-1)]
x -> Phi * x + Sigma .* randn(p)
end
```

For an example, here’s 250-periods simulated from an AR(3) model.

```
const ar3coeffs = [.9, -.1, -.25]
const ar3sigma = 0.15
(ar3coeffs, ar3sigma) |>
x -> apply(ar, x) |>
f -> iterate(f, ar3sigma*rand(3)) |>
i -> map(first, take(i, 250)) |>
s -> plotar(s, "AR(3) Time Series")
```

## Conclusion

Most iteration you’ll see in the wild uses simple collections or ranges as the

iterator, performing extensive work inside the loop. Sometimes our problem can be

better expressed using more complicated iterators whose structure represents the

logic of our iteration. One thing to notice in all the examples was that once

the iterators were defined, there was very little to do after iterating over

them. Typically I was just collecting the iteration values into an array, or

reducing them with an operation to a scalar result. We were also able to build

the problems in such a way that calculation of values in the iterators was

delayed until absolutely necessary.

There are tradeoffs to this sort of style, and much of the stuff in this

post was more cute than practical. But it was a fun exploration of how to create types

that interact with protocols in Julia. Julia’s type system and dispatch design

are very powerful and interesting, and gives programmers a lot of flexibility in

expressing their problems.

While

we’ll see lots of examples of extending Julia’s base functions dispatched on

newly-defined types, we won’t see much*multiple dispatch*, which is an

important design feature of Julia. In fact, pretty much everything here could be

implemented in a single-dispatch OO language.

Pythonistas

may be thinking about the distinction between*iterators*and*iterables*. (See,

e.g. this Stack Overflow thread.) That distinction doesn’t really apply to

Julia. So I won’t use the term iterable here, and I’ll define an iterator in the two

ways discussed above: (1) it is valid in a`for i = I`

statement, and (2) it

implements the iterator protocol.

This limitation seems

to come from the idea that only iterators with known lengths can be counted on

to produce multidimensional arrays. This may be changed in future versions of

Julia. See, e.g. Issue #550. The`collect`

function uses the

`push!`

function to dynamically allocate the array, but`collect`

can only give

a 1-D Array output, whereas comprehensions can be multidimensional.

Actually, Julia’s`filter`

function already does this. If you pass

that function a predicate or condition function and an iterator, it produce a

`Filter`

object that you can then iterate over. This is different from

`map`

which can take an input iterator, but returns the result of

mapping the function immediately.