Tag Archives: julia,optimization,kaggle

Improving on the current Santa Kaggle Challenge: MIP and swapping

This is the second post on this years Santa Kaggle Challenge.

Haven’t read the first? -> Here you go!.

I wanted to blog more about it without sharing too many details as people might get mad, understandably. Unfortunately I fell back into competing mode and not into blogging mode. Sorry about that.

Feeling especially bad for my patrons as I also didn’t continue with my constraint solver posts. The gamification of Kaggle is too strong for me… Therefore I decided to not charge my patrons this month. My Christmas gift 🙂

In this post I want to share what have changed since last time by summarizing some ideas that came up in the forums and a bit about my own approach but I’ll not share code for that. Hopefully I will find the time to share all my code in a more structured way after the challenge. It would also be nice to know if someone wants to chat after the challenge and tell me about their ideas. That would be wonderful!

If you find any of this information useful and want to read more please comment and if you’re generally interested in optimization please check out other posts I wrote:

I don’t wanna beg for your money but if you consider to buy a second yacht you might give me some of your dollars 😉
You can donate via Patreon or PayPal.

Okay let’s start

I said last time that I want to take about linear programming for a bit.

A small recap on the problem: We want to match 5,000 families to 100 days and we want to minimize two types of costs. The preference costs which depend on how happy we can make the families (before giving them money) and the accounting costs which are an indication of how smooth the change between days is considering the number of people per day.

In mathematical terms:
Preference costs:

\[
\sum_{f \in F} \sum_{d \in D} \text{prefcost}_{f,d} x_{f,d}\]

Here prefcost depends on the preferences of each family and is a matrix of size \(5,000 \times 100\) and \(x_{f,d}\) describes the binary decision variable whether family \(f\) visits the workshop of Santa on day \(d\).

The accounting costs:

\[
\sum_{d=100}^{1} \frac{\left(N_{d}-125\right)}{400} N_{d}^{\left(\frac{1}{2}+\frac{\left|N_{d}-N_{d+1}\right|}{50}\right)}\]

Here \(N_d\) is the number of people assigned to day \(d\).
The preference cost is a linear function and is relatively easy. In that formulation it need still a lot of decision variables but okay.

Before we start with the linear program we might wanna have a short look at the accounting cost. Can it get huge? Like do we need to care about it? Or when is it small? Is there a way to linearize it?

Log accounting costs

Easy to visualize with Julia:

using Plots
acc(N,N1) = ((N-125)/400)*N^(0.5+abs(N-N1)/50)
acc_mat = zeros(176,176)
for i=125:300, j=125:300
    acc_mat[i-124,j-124] =...

First approach for the Kaggle Santa 2019 challenge

Okay here it is: I’m waiting for it like almost a year…
The new Santa Kaggle Challenge.

This year is a special year for me in that challenge. Normally I try to compete with all my time I have and get the best possible score I can achieve and maybe blog about it when it is all over.

But this year I blog about it while competing. I think it is more interesting for people to read the results in between instead of just the polished end result. I know that everyone can copy it now and I’ll not win this challenge but I never did anyway 😀

Maybe I can get a discussion medal with it…

Before we start I also want to thank my patrons again because it wouldn’t be that awesome without them. Normally you get posts 2 days earlier when you support me but this is against the Kaggle rules so everyone gets it as soon as I’ve finished writing.

Let’s start:

The problem

Some families are allowed to visit the workshop of Santa. Amazing news for 5,000 families. Good for them I would say 😉 His workshop is probably quite big but to be not too crowded per day only 300 people can visit the workshop and it isn’t reasonable with less than 125. Santa decided to open the workshop for 100 days. The families are super excited to see all the stuff but some days fit better than the others so they are ranked and sent to Santa in advance.

Basically the problem is: How to match each family to a day?

Of course Santa is Santa and therefore wants to be fair so if a family doesn’t get their best choice they should have something for it but Santa can’t produce more stuff so this time it’s just cash, food and a helicopter ride. Yes you heard that right: A north pole helicopter ride. Oh man that sounds … Okay got distracted here.

Now you can see that this costs money for Santa and you know the world we live in: He wants to minimize his expenses.

Additionally there is some non-linear cost which is called accounting penalty and it’s a complicated formula but I trust the Santa’s accountants so should be correct 😉 (Now there was actually a bug on the evaluation page before).

\[
\text{accounting penalty } =\sum_{d=100}^{1} \frac{\left(N_{d}-125\right)}{400} N_{d}^{\left(\frac{1}{2}+\frac{|N_{d}-N_{d+1}|}{50}\right)}\]

Where \(d\) is the day before Christmas so we are counting towards X-mas here. \(N_d\) is the number of people attending on that day and because we are counting down \(N_{d+1}\) is number of people attending the day before.

Now let’s have a deeper look into the other costs which are described here:

If the family gets their first choice for the workshop tour Santa doesn’t have to pay anything. Otherwise we have the preference costs:

  • 2nd choice: 50$
  • 3rd choice: 50$ + 9$ per family member
  • 4th choice: 100$ + 9$ per family…