Author Archives: OpenSourcES

Improving on the current Santa Kaggle Challenge: MIP and swapping

This is the second post on this years Santa Kaggle Challenge.

Haven’t read the first? -> Here you go!.

I wanted to blog more about it without sharing too many details as people might get mad, understandably. Unfortunately I fell back into competing mode and not into blogging mode. Sorry about that.

Feeling especially bad for my patrons as I also didn’t continue with my constraint solver posts. The gamification of Kaggle is too strong for me… Therefore I decided to not charge my patrons this month. My Christmas gift 🙂

In this post I want to share what have changed since last time by summarizing some ideas that came up in the forums and a bit about my own approach but I’ll not share code for that. Hopefully I will find the time to share all my code in a more structured way after the challenge. It would also be nice to know if someone wants to chat after the challenge and tell me about their ideas. That would be wonderful!

If you find any of this information useful and want to read more please comment and if you’re generally interested in optimization please check out other posts I wrote:

I don’t wanna beg for your money but if you consider to buy a second yacht you might give me some of your dollars 😉
You can donate via Patreon or PayPal.

Okay let’s start

I said last time that I want to take about linear programming for a bit.

A small recap on the problem: We want to match 5,000 families to 100 days and we want to minimize two types of costs. The preference costs which depend on how happy we can make the families (before giving them money) and the accounting costs which are an indication of how smooth the change between days is considering the number of people per day.

In mathematical terms:
Preference costs:

\[
\sum_{f \in F} \sum_{d \in D} \text{prefcost}_{f,d} x_{f,d}\]

Here prefcost depends on the preferences of each family and is a matrix of size \(5,000 \times 100\) and \(x_{f,d}\) describes the binary decision variable whether family \(f\) visits the workshop of Santa on day \(d\).

The accounting costs:

\[
\sum_{d=100}^{1} \frac{\left(N_{d}-125\right)}{400} N_{d}^{\left(\frac{1}{2}+\frac{\left|N_{d}-N_{d+1}\right|}{50}\right)}\]

Here \(N_d\) is the number of people assigned to day \(d\).
The preference cost is a linear function and is relatively easy. In that formulation it need still a lot of decision variables but okay.

Before we start with the linear program we might wanna have a short look at the accounting cost. Can it get huge? Like do we need to care about it? Or when is it small? Is there a way to linearize it?

Log accounting costs

Easy to visualize with Julia:

using Plots
acc(N,N1) = ((N-125)/400)*N^(0.5+abs(N-N1)/50)
acc_mat = zeros(176,176)
for i=125:300, j=125:300
    acc_mat[i-124,j-124] =...

First approach for the Kaggle Santa 2019 challenge

Okay here it is: I’m waiting for it like almost a year…
The new Santa Kaggle Challenge.

This year is a special year for me in that challenge. Normally I try to compete with all my time I have and get the best possible score I can achieve and maybe blog about it when it is all over.

But this year I blog about it while competing. I think it is more interesting for people to read the results in between instead of just the polished end result. I know that everyone can copy it now and I’ll not win this challenge but I never did anyway 😀

Maybe I can get a discussion medal with it…

Before we start I also want to thank my patrons again because it wouldn’t be that awesome without them. Normally you get posts 2 days earlier when you support me but this is against the Kaggle rules so everyone gets it as soon as I’ve finished writing.

Let’s start:

The problem

Some families are allowed to visit the workshop of Santa. Amazing news for 5,000 families. Good for them I would say 😉 His workshop is probably quite big but to be not too crowded per day only 300 people can visit the workshop and it isn’t reasonable with less than 125. Santa decided to open the workshop for 100 days. The families are super excited to see all the stuff but some days fit better than the others so they are ranked and sent to Santa in advance.

Basically the problem is: How to match each family to a day?

Of course Santa is Santa and therefore wants to be fair so if a family doesn’t get their best choice they should have something for it but Santa can’t produce more stuff so this time it’s just cash, food and a helicopter ride. Yes you heard that right: A north pole helicopter ride. Oh man that sounds … Okay got distracted here.

Now you can see that this costs money for Santa and you know the world we live in: He wants to minimize his expenses.

Additionally there is some non-linear cost which is called accounting penalty and it’s a complicated formula but I trust the Santa’s accountants so should be correct 😉 (Now there was actually a bug on the evaluation page before).

\[
\text{accounting penalty } =\sum_{d=100}^{1} \frac{\left(N_{d}-125\right)}{400} N_{d}^{\left(\frac{1}{2}+\frac{|N_{d}-N_{d+1}|}{50}\right)}\]

Where \(d\) is the day before Christmas so we are counting towards X-mas here. \(N_d\) is the number of people attending on that day and because we are counting down \(N_{d+1}\) is number of people attending the day before.

Now let’s have a deeper look into the other costs which are described here:

If the family gets their first choice for the workshop tour Santa doesn’t have to pay anything. Otherwise we have the preference costs:

  • 2nd choice: 50$
  • 3rd choice: 50$ + 9$ per family member
  • 4th choice: 100$ + 9$ per family…

Finding the maximum cardinality matching in a bipartite graph

This is kind of part 14 of the series about: How to build a Constraint Programming solver?
But it is also interesting for you if you don’t care about constraint programming.
In this post you’ll learn about bipartite graphs and matchings.

Anyway these are the other posts so far in the ongoing series:

All of that can only happen because of the generous support of some people.
I’m very happy that I can cover my server expenses now because of my Patrons.
To keep this going please consider supporting this blog on Patreon. If you do I will create a better/faster blog experience soon and you will get posts like this earlier!

If you want to just keep reading -> Here you go 😉

Last time I realized that the library I’m using doesn’t implement the easy algorithm I need which is bipartite maximum cardinality matching instead of bipartite maximum matching. This means they are solving a harder problem which I don’t have 😀

Additionally I like coding things from scratch so using a library is not the best choice but I am of course also limited in the amount of time I have and if people built it before it is often a good practice to not reinvent the wheel. Nevertheless this is a learning project where it is often good to know how a wheel works from the ground up.

The problem

We want to know whether a setting of numbers and possibilities in an all_different constraint is satisfiable or not.

For example we have 3 cells we want to fill with the numbers 1-3 and each should be used exactly once.

In the following examples our values are always at the top and the variables at the bottom. The goal is to match our variables to the values such that the all different constraint is satisfied. In this example there is no such matching:

Bipartite maximum cardinality smaller than #values

In a more precise definition a matching is a set of edges such that two edges don’t share the same vertex: Each vertex has at most one matching edge connected to it.

We are always interested in a matching which uses the most edges (maximum cardinality matching).

This is for example a matching (blue edges):

Small matching

We need this maximum cardinality matching to check whether it is possible to fulfill the constraint and later to also remove edges which would make it impossible when used (explained in the all different section).

How can we find those matchings?

Let’s have a look at a slightly bigger example:

Bipartite graph

I should mention that these graphs are called bipartite graphs as they have two…