Kaggle Santa 2019: Final wrap up

Well guys and girls I’m happy to announce that I’m probably the last person telling you something about the last Kaggle Santa challenge. The reason is I’m late like super late but hopefully some of you are still interested in my thoughts. The reason I write this blog now: I haven’t found the optimal solution during the challenge and I wanted to find it before I write this and after I found it I got sick so here we are.

If you’re interested in building a constraint solver which is my current project on this blog you might want to check out the dozen of posts about it and consider subscribing on patreon to get post 48 hours earlier (This one is a special one ;))

Special limited offer only for X days… I either just lost your attention or I have it now 😀
I decided to offer a 1:1 (binary) talk where you can ask me about my work and just chat with me if you’re willing to join the Patron 4$ club. 😉 This offer exists until the 10th of February.

My final steps during the competition

Before I go over the optimal solution which I only found because I read what other teams did after the competition ended I want to go briefly over the things I didn’t talk about in my last two posts about the challenge:

There is not much more to add to the swapping part from the last post besides some things I encountered. In the following a,b,c are family ids and A,B,C are the days they currently visit the workshop.

  • Swaps of the form a->B, b->C, c->A didn’t work as well as a->B, b->C, c->D
    • this is normal as the last c->A normally destroys it
  • How did I choose a,b,c? + d,e,… for longer swap patterns?
    • a is chosen from randomly from the 5000 families
    • b is chosen by choosing B as a day of the choices from a
    • and then b is chosen randomly
    • this continues til the end
    • I reject the current set if I chosen the same family more than once
  • For performance reasons:
    • I precompute the days and choices for each family and update the assigned day after I make a change. (Probably everyone does that)
    • Only compute the resulting change in costs
    • Easy for preference cost
    • Bit harder for preference costs
    • => In the end I had about 1 million swaps per second on my machine.
  • In general I experimented a lot with the temperature in simulated annealing and with falling back to previous solutions when stuck which were quite helpful but I explained the interesting stuff in the last post.

Let’s move from swapping to the MIP approach. Last time I mentioned a way to find the optimal arrangement if we know how many people visit the workshop per day.

Which we normally don’t know 😀 Therefore that was quite pointless.
We want…