# Lazy Sequences in Julia

I wanted to talk about using Coroutines for lazy sequences in julia.
Because I am rewriting CorpusLoaders.jl to do so in a nondeprecated way.

This basically corresponds to C# and Python’s yield return statements.
(Many other languages also have this but I think they are the most well known).

The goal of using lazy sequences is to be able to iterate though something,
without having to load it all into memory.
Since you are only going to be processing it a single element at a time.
Potentially for some kind of moving average, or for acausal language modelling,
a single window of elements at a time.
Point is, at no point do I ever want to load all 20Gb of wikipedia into my program,
nor all 100Gb of Amazon product reviews.

And I especially do not want to load $\infty$ bytes of every prime number.

First some definitions.

#### Iterators

An iterable is in essence something can processed sequentially.
The iterator is the tool uses to do this.
Sometimes the terms are uses casually and interchangably.
They exact implementation varies from language to language a surprising amount.
Though the core idea remains the same.

In julia they are define by (perhaps the most formal of all) informal interfaces.
In short iterators in julia are defined by a start function which takes an iterable,
and gives back the initial state for an iterator upon it.
a next function that takes an iterable and state and gives back the next value, and a new state,
and a done function that takes the same and returns a boolean sayign if there iteratator is complete.

They also have Holy traits to define what eltype they return and how long they are.
Not going to go in to this here, but I will note that specifying the elype is a feature channels have that the 0.5 producer based code does not.
and that 0.6 generators don’t have either right now.

Honestly if you are not familar with iterators the rest of this probably isn’t going to make much sense.

#### Coroutines.

Coroutines are the generalisation of Functions (subroutines),
to have have multiple entry and exit points.
In julia they are called Task.
They are most notable for being used in the implementation of @async etc.
For our purposes,
they have the ability to yield up control to another coroutine;
and when they get control passed back to them, they resume where they left off.

Its worth noting that they are not threads.
They do not themselves ever cause more than 1 thing to happen at a time on a computer.
Though there are similarities in how they operate.

Coroutines can be uses to implement iterators as we will show below

## Examples

I think this is best explained by examples.
So here are 3.

Input:

## Fibonacci sequence

We will start, just to explain the concepts with the fibonacci sequence.
We are not going to touch the slow recussive definition (recall that julia does not have tail call recursion).
That is not the point we want to illustrate here.

### Fibonacci Array

To begin with an array based methods.
This of course does not actually satisfy our goal of being an infinite sequence.
You must tell it how many to generate in advance.
As we will see later it is absurdly fast compaired to any of the lazy methods.
It does however use the full amount of RAM it ever needs all the time.
Where as the lazy methods that follow only need at any point a fixed amount of RAM.
This doesn’t matter for Fibonacci, as it doesn’t use much memory.
It would however matter if you were loading hundreds of gigabytes of machine learning data.

As a side note fibonacci numbers very quickly overflow an Int64 long before it can get to the 10_000. but that is ok for out demonstration purposes

Input:

Output:

fib_array (generic function with 1 method)


Input:

Output:

Tasks in 0.5 used to have a kind of built in functionality similar to Channel/put!.
(They still kind of do at a lower level, but it is no longer exposed as an iterable)
This was depreated in 0.6 at #19841,
and is not in 0.7-dev.
Here for reference is what it would look like.

Input:

Output:

fib_task (generic function with 1 method)


It is actually easier to explain than the newer Channels way so I will explain it first.
In the main task there is (will be) an iterator.
When next is called on that iterator,
the main task calls consume, which yields control to a Task which is running the code in the do block.
When that code hits a produce it yields control back to the main iterator Task, returning the value from produce to the consume call, which results n next returning that value.
When next is called again in the main task, it will again call consume which will result in the control being yielded back to the generating task, which will continue on from where it left, immediately after the produce call.
This is how coroutines are used for generating data.
As they can pause midway through a function return a result and contine after when asked to.

I’m not displaying its timings here as it basically shoots a giant pile of deprecation warnings.
It is about 5 seconds, though some of that slowness might be attributed to the deprecation warnings slowing things down.

### Fibonacci Channel

This is what we are really talking about.
From a pure logic standpoint on can think of this as functioning just like the task.
Where put! triggers a yeild to the main iterator task, and take! (inside next) triggers a yield back down the the generating task.
What is actually happening is similar, but with a buffer involved.

The function passed to a channel runs its own Task until it tries to do a put! when the buffer is full.
When that happens it yields control back to the main Task, and “sleeps” the generating task.
Which durng iteration will take! an element off the buffer,
which will rewake the Task that wants to put things on the buffer (it @scheduals it to act).
But it will not nesciarily get it to run straight away,
the main (iterator) Task does not yield until the buffer is empty and it can’t do a take!.
(though something else, like IO might triger a yield, but not in this code).

Input:

Output:

fib_ch (generic function with 2 methods)


Input:

Output:

It took me a little time to get my head around what the buffer was (because Task producer didn’t use them).
It retrospect is is obvious.
It is a buffer.
The point of the buffer from a practical point of view is that it means that there is less task switching.
Modern CPU speeds work because the CPU is very efficient at running predictable code.
Everything stays in cache, pipelining occurs, maybe even SIMD.
So the buffer lets that be avoided, while still avoiding doing all the (potentially infinite) amount of work that would be done in the array case.

Too large and you end up calculating more outputs than you will consume,
and use a lot of memory at any point in time.

I initially made the mistake of setting the buffer size to typemax(Int64),
which resulted in the code hanging, until I did an interupt with Ctrl + C.
Which resulted it in completing normally with no errors.
(because it killed the task, midway through filling the huge buffer, well after the points I needed were enqueued).

For interest below is a plot of the timings with different buffer sizes.

Input:

Output:

10^0

10^1

10^2

10^3

10^4

5

10

15

20

25

30

Effect of varying channel buffer size for fib_ch

Buffer Size

time (ms)

y1

### Fibonacci ResumableFunction

So channels are pretty great.
Easy to write and nice.

One problem is, as you can see compared to arrays they are very slow.
Especially if you have a buffer that is too small or too large.
They also allocate (and then free) a lot of memory.

Enter ResumableFunctions.jl.

ResumableFunctions lets you keep the same “coroutines pausing after every output” logical model.
But in implementation, it is actually using macros to rewrite your code into a normal iterator,
with a state machine (in the state) tracking where it is upto.
The result of this is (normally) faster, more memory efficient code.

It uses @yield instead of put! (or produce),
and the whole function has to be wrapped in a @resumable macro, which does the rewrite.

Input:

Output:

fib_rf (generic function with 1 method)


Input:

Output:

For interest this is what the code it is generating looks like:

Input:

Output:

quote
begin
mutable struct ##835 <: ResumableFunctions.FiniteStateMachineIterator
_state::UInt8
prev::Int64
cur::Int64
function ##835(; )::##835
fsmi = new()
fsmi._state = 0x00
fsmi
end
end
end
function (_fsmi::##835)(_arg::Any=nothing; )::Any
_fsmi._state == 0x00 && $(Expr(:symbolicgoto, Symbol("#367#_STATE_0"))) _fsmi._state == 0x01 &&$(Expr(:symbolicgoto, Symbol("#366#_STATE_1")))
error("@resumable function has stopped!")
$(Expr(:symboliclabel, Symbol("#367#_STATE_0"))) _fsmi._state = 0xff _arg isa Exception && throw(_arg) _fsmi.prev = 0 _fsmi.cur = 1 while true _fsmi._state = 0x01 return _fsmi.cur$(Expr(:symboliclabel, Symbol("#366#_STATE_1")))
_fsmi._state = 0xff
_arg isa Exception && throw(_arg)
(_fsmi.prev, _fsmi.cur) = (_fsmi.cur, _fsmi.prev + _fsmi.cur)
end
end
function fib_rf(; )::##835
##835()
end
end


Its pretty neat.

### Primes Channel

Input:

Output:

primes_ch (generic function with 1 method)


Input:

Output:

Input:

Output:

### Primes Resumable Function

Input:

Output:

primes_rf (generic function with 1 method)


Input:

Output:

Interestingly, the channel version is 2x as fast as the ResumableFunctions version.
I’m not sure why that is.
Could be cache related.

## Conclusions

These coroutine based sequence generators are pretty great.
They are much more flexible than generator expressions.
They are much less annoying to write than custom iterators.
They let you do things lazily to avoid using all your RAM.

They’ll probably get faster in future version of julia.
ResumableFunctions.jl is a neat package to keep an eye on.