The return of the graphs (and an interesting puzzle)

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/06/23/graphspuzzle.html

Introduction

This week I come back to graphs. The reason is that I have participated
in an inspiring Social Networks & Complex Systems Workshop.

During this workshop Paweł Prałat asked the following puzzle:

You are given eight batteries, out of which four are good and four
are depleted, but visually they do not differ.
You have a flashlight that needs two good batteries to work.
You want your flashlight to work. What is the minimum number of times
you need to put two batteries into your flashlight to be sure it works
in the worst case.

In this post I want to discuss how it can be solved.
We will start with an analytical solution and then give a computational
one. You can judge for yourself which is harder.

The post was written under Julia 1.9.1,
Combinatorics v1.0.2, DataFrames v1.5.0,
SimpleGraphs v0.8.4, and SimpleGraphAlgorithms v0.4.21.

Analytical solution

Before we begin we need to make one observation. If I say that I want to use
a given number of trials in the worst case I can provide them upfront (before
making any tests). The reason is that if I put some batteries into a
flashlight it will work (and then we are done) or not work (and we need to continue).
So the situation is simpler than in many other puzzles when you might want
adjust your strategy conditional on the answer you see to your previous queries.

What we will want to show is that we need 7 trials. Start with showing
that 7 tests is enough. To see this notice that we have 4 good batteries.
Therefore if we split our 8 batteries into three groups there will be one
group that has two batteries (by the pigeonhole principle).
What we need to do is to show that using 7
comparisons we can find a pair of good batteries within one of these
three groups.

This is the way how you can do it.
Assume we number the batteries from 1 to 8. Split them into three groups:
{1, 2, 3}, {4, 5, 6}, and {7, 8}. Now we assume that we make all
possible comparisons within groups. We can see that in the first two groups
there are three comparisons possible, and in the last group only one. Seven
in total. This finishes the proof that 7 comparisons are enough.
We can visualize this solution as follows (line represents the comparison
we make):

  1      4    7
 / \    / \   |
2---3  5---6  8

We are left with showing that 6 comparisons are not enough. To see
this note the following. In the plot above we have a graph on 8 nodes
and having 7 edges. We could claim that we have a solution because in
any four element subset of its nodes there existed at least two connected
by an edge (within one group).

So to show that 6 comparisons are not enough we must show that no matter
how we make them there will always be 4 nodes that are mutually not
connected. We will prove that by contradiction. Assume we have some assignment
of edges under which maximally three nodes are not connected. Without loss
of generality take that their numbers are 1, 2, and 3. But this means that each
of the nodes 4, 5, 6, 7, and 8 must be connected to one of the 1, 2, or 3 nodes
by at least one edge. So we must use-up 5 edges to make these connections.
We are left with one edge (recall we assume that we can use 6 edges in total).
If this one last edge is connected to node 1, 2, or 3. Then nodes 4, 5, 6, 7, and 8
are not connected directly and we just found a 5-element set that is not connected by an edge.
So assume that this edge connects one pair of nodes from the set {4, 5, 6, 7, 8}.
However, since we only have one edge we still will have 4 nodes that are not connected.
E.g. if 4 and 5 are connected then the set of nodes {4, 6, 7, 8} are not connected,
so we have a 4 element set of nodes that are not connected. A contradiction to the assumption
that there were maximally three nodes that are not connected. In conclusion – 6 comparisons
are not enough.

(If you would like to see an alternative proof that uses the probabilistic method
you can reach out to Paweł Prałat who has shown it to me.)

Computational solution

Now let us move to the brute force and computation (and in the process hopefully learn some Julia tricks).

First load the required packages and do some setup:

julia> using Combinatorics

julia> using DataFrames

julia> using SimpleGraphs

julia> using SimpleGraphAlgorithms

julia> use_Cbc()
[ Info: Solver Cbc verbose is set to false

As we saw in the analytical solution we can represent our queries as graphs on 8 nodes
and some edges.

The problem is that there are potentially many such graphs. Therefore we will want to limit
our search to the graphs that are not isomorphic. Two graphs are isomorphic if you can
get one from the other by re-labelling the nodes. Clearly such two graphs are undistinguishable
for our purposes.

What we want to show is that all graphs on 8 nodes and 6 edges contain at least 4 nodes that
are not connected by an edge. And that there exist graphs on 8 nodes and 7 edges in which
the maximum number of unconnected nodes is 3.

So how do we create a list of all non-isomorphic graphs having 6 and 7 edges respectively?

Let us start with a simpler case of graphs on 8 nodes and 3 edges and list non-isomorphic graphs:

julia> function create_graph(es)
           g = IntGraph(8)
           for e in es
               add!(g, e...)
           end
           return g
       end
create_graph (generic function with 1 method)

julia> g3 = map(create_graph, combinations([(i, j) for i in 1:7 for j in i+1:8], 3));

julia> hg3 = uhash.(g3);

julia> g3df = DataFrame(hg=hg3, g=g3);

julia> g3gdf = groupby(g3df, :hg);

julia> redirect_stdout(devnull) do
           for sdf in g3gdf
               for i in 2:nrow(sdf)
                   @assert is_iso(sdf.g[1], sdf.g[i])
               end
           end
       end

julia> noniso3 = combine(g3gdf, first).g;

julia> elist.(noniso3)
5-element Vector{Vector{Tuple{Int64, Int64}}}:
 [(1, 2), (1, 3), (1, 4)]
 [(1, 2), (1, 3), (2, 3)]
 [(1, 2), (1, 3), (2, 4)]
 [(1, 2), (1, 3), (4, 5)]
 [(1, 2), (3, 4), (5, 6)]

Let us explain what we do step by step. The g3 object contains all
graphs on three edges. Let us check how many of them we have:

julia> length(g3)
3276

There are a lot of graphs. But most of them are isomorphic. How do we prune them?
Using the uhash function we compute the hash of each graph.
uhash guarantees us that graphs having a different hash are not isomorphic.
The g3gdf is a GroupedDataFrame that keeps these graphs grouped by their
hash value. We have 5 such groups as can be checked in:

julia> length(g3gdf)
5

However, maybe this is the case that we have non-isomorphic graphs that
have the same hash value (this is unlikely but possible). We check it with
the is_iso function. If they would not be isomorphic @assert would error.
Since it does not we are good. Note that I use the redirect_stdout(devnull)
trick to avoid printing any output that is_iso produces. The reason
is that it calls CBC solver which prints to the screen its status (and since
we do over 3000 calls the screen would be flooded by not very useful output).

With the elist.(noniso3) we can see what are the edges of the five non-isomorphic
graphs that exist on 3 edges.
(And since we have only 5 graphs you can probably convince yourself using pen
and paper that we have found all possible options.)

How do we do this process for a larger number of edges.
The same approach would work, but would be much more time consuming (there are
over 1 million graphs on 7 edges). So we use the following trick, we take the
non-isomorphic graphs with 3 edges and add one edge to them. We now get graphs
on 4 edges. Some of them are isomorphic. But we already know how to prune them
to be left only with non-isomorphic ones.

The procedure that iteratively performs this task up to 7 edges is as follows:

julia> function add_possible_edges(g::T) where T
           res = T[]
           for i in 1:7, j in i+1:8
               if !has(g, i, j)
                   newg = deepcopy(g)
                   add!(newg, i, j)
                   @assert NE(newg) == NE(g) + 1
                   push!(res, newg)
               end
                   end
           return res
       end
add_possible_edges (generic function with 1 method)

julia> function grow_graphs(noniso)
           g = reduce(vcat, add_possible_edges.(noniso))
           hg = uhash.(g)
           gdf = DataFrame(; hg, g)
           ggdf = groupby(gdf, :hg)
           redirect_stdout(devnull) do
               for sdf in ggdf
                   for i in 2:nrow(sdf)
                       @assert is_iso(sdf.g[1], sdf.g[i])
                   end
               end
           end
           return combine(ggdf, first).g
       end
grow_graphs (generic function with 1 method)

julia> noniso4 = grow_graphs(noniso3)
11-element Vector{UndirectedGraph{Int64}}:
 UndirectedGraph{Int64} (n=8, m=4)
 ⋮
 UndirectedGraph{Int64} (n=8, m=4)

julia> noniso5 = grow_graphs(noniso4)
24-element Vector{UndirectedGraph{Int64}}:
 UndirectedGraph{Int64} (n=8, m=5)
 ⋮
 UndirectedGraph{Int64} (n=8, m=5)

julia> noniso6 = grow_graphs(noniso5)
56-element Vector{UndirectedGraph{Int64}}:
 UndirectedGraph{Int64} (n=8, m=6)
 ⋮
 UndirectedGraph{Int64} (n=8, m=6)

julia> noniso7 = grow_graphs(noniso6)
115-element Vector{UndirectedGraph{Int64}}:
 UndirectedGraph{Int64} (n=8, m=7)
 ⋮
 UndirectedGraph{Int64} (n=8, m=7)

In the process we learn that there are 11 non-isomorphic graphs having 4 edges,
and respectively 24 for 5 edges, 56 for 6 edges and 115 for 7 edges.

Now for each of these graphs let us find a maximum number of nodes that are
not connected. This can be done using the max_indep_set function.
Again we use the devnull trick to avoid printing of the output:

julia> mis6 = redirect_stdout(devnull) do
           return max_indep_set.(noniso6)
       end
56-element Vector{Set{Int64}}:
 Set([5, 4, 6, 7, 2, 8, 3])
 ⋮
 Set([4, 7, 8, 3])

julia> minimum(length.(mis6))
4

So we first see that for graphs on 6 edges we indeed have at least 4 nodes in the
independent set.

Let us now check the 7 node case:

julia> mis7 = redirect_stdout(devnull) do
           return max_indep_set.(noniso7)
       end
115-element Vector{Set{Int64}}:
 Set([5, 4, 6, 7, 2, 8, 3])
 ⋮
 Set([7, 2, 8, 3])

julia> minimum(length.(mis7))
3

julia> elist.(noniso7[length.(mis7) .== 3])
1-element Vector{Vector{Tuple{Int64, Int64}}}:
 [(1, 2), (1, 3), (2, 3), (4, 5), (4, 6), (5, 6), (7, 8)]

Here we see that there exists only one graph (up to isomorphism)
that has a property that at most three nodes are independent.
And looking at its edges it is the same graph that we have drawn
in our analytical solution.

Conclusions

So is the analytical or computational solution more interesting?
For me both have their value and were fun.

If you like such puzzles, and do plan ahead, please consider joining
us next year. From June 3 to 7, 2024 we are going to host
the WAW2024: 19th Workshop on Modelling and Mining Networks
at SGH Warsaw School of Economics. We invite all enthusiasts of graphs:
both theoreticians and practitioners.

Notes on Managing Open Source Organizations

By: Jacob Zelko

Re-posted from: https://jacobzelko.com/06202023182947-notes-open-orgs/index.html

Notes on Managing Open Source Organizations

Date: June 20 2023

Summary: A collected overview on various comments from Julia organization organizers on how to effectively build up an open source organization

Keywords: #github #open #source #community #organization #blog #archive #julia

Bibliography

J. Zelko, “Julia Orgs, How Do You Manage Logistics?,” Julia Programming Language, Jun. 16, 2023. https://discourse.julialang.org/t/julia-orgs-how-do-you-manage-logistics/100430 (accessed Jun. 20, 2023).

Table of Contents

    1. Motivation
    2. Structure Organization for Sustainability
    3. Organization Guidelines
    4. Managing Package Development
    5. Community Outreach and Communication
  1. How To Cite
  2. References:
  3. Discussion:

Motivation

As I have been involved with and have started various open source organizations in the past, I wanted to learn more about how other organizations within the Julia community manage this. In particular, as I am getting more serious about bolstering JuliaHealth, I want to make sure to do this as effectively as possible. These were some comments and notes from folks within the Julia Community who made suggestions on how to approach this effectively.

Structure Organization for Sustainability

  • Limit dependency on any one person – Jakob Nissen, George Datseris

  • All repositories and shared documents should have write access for multiple people, ensuring continuity even if some contributors are unavailable – Jakob Nissen, Guillaume Dalle

  • Furthermore, packages should be designed with longevity in mind, as maintenance can be a significant challenge – Jakob Nissen, Georgia Datseris

  • Build trust with frequent contributors – George Datseris

    • Could review their PRs more leniently

    • Provide detailed reviews for newcomers

Organization Guidelines

  • Think about what is the criteria (such as reviews or approvals) before action on a pull request is taken – Guillaume Dalle

  • Setting clear timeframes for feedback should be established – Guillaume Dalle

  • Allocate review time (around 70-80%) to evaluating design aspects, logical connections between input-output arguments, and documentation of features – George Datseris

  • Adapt the intensity of a PR review based on its potential impact to a package – George Datseris

    • PRs with package wide consequences should receive thorough evaluations

    • PRs that are independent of existing code could be reviewed more quickly

    • Bug fixes that affect small parts of the package could be reviewed faster

Managing Package Development

  • Use GitHub issues to outline development roadmaps for a given package – Guillaume Dalle

  • Use labels to prioritize GitHub issues as important and relevant to a given package – Guillaume Dalle

  • Prioritize discussions within GitHub issues to foster transparency and consolidate discussions in location – George Datseris

  • Maintain source code simplicity – George Datseris

Community Outreach and Communication

  • Avoid making promises about deliverables or timelines – Jakob Nissen

  • Community calls for triage discussions or long-term package orientation can help build community – Guillaume Dalle

  • Actively invite individuals to join the organization – George Datseris

    • Provide them with administrative rights to relevant GitHub teams and repositories to foster participation

How To Cite

Zelko, Jacob. Notes on Managing Open Source Organizations. https://jacobzelko.com/06202023182947-notes-open-orgs. June 20 2023.

References:

Discussion:


Summary of Julia Plotting Packages

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/summary-of-julia-plotting-packages/

This is a repost of my response on the Julia Discourse on this topic. I was asked to make a blog post so here you go!

The “Main” Plotting Packages

Here’s a quick summary of the most widely used plotting packages. I may have missed one, but I haven’t missed one that is very widely used.

  • Plots.jl is the most used. It’s probably the most documented, used in the most tutorials, and is used in many videos.
    • Pros: Its main draw is that it has a lot of plugins to other packages through its recipes system, which means that a lot of odd things like `plot(sol::ODESolution)` or showing the sparsity of a `BandedMatrix` just works. With all of these integrations, it’s normally what I would recommend first to newcomers since they will generally get the most done with the least work. It has a backend system, so you can make it generate plots via GR (the default), Plotly (i.e. make webpages), pyplot, PGFPlots (Latex output), UnicodePlots (i.e. output plots as text). This ease of use and flexibility is what its main draw is.
    • Cons: Its downside has traditionally been its startup time, though it’s nearly a second now so that’s fine. Its main downside now is mostly that it’s not as configurable as something like Makie, and it’s not as optimized if you get up to millions of points. Its flexibility means it’s not just for standard plots but also for animations, building small graphical user interfaces, and building small apps.
  • Makie is probably the second most popular. It’s natively Julia so it’s cool in that aspect, you can see code all the way down.
    • Pros: It’s very optimized for large plots, especially with GPU acceleration via the OpenGL backend (GLMakie). It has a lot of examples these days.
    • Cons: Its downside is that it’s a bit less “first user friendly”, given that its flexibility means there’s a lot more options you’re forced to specify everywhere. It has a recipe system now but it’s fairly new and not well-integrated with most of the ecosystem, so it’s not as seamless as Plots, though by 2024 I would assume that would largely be fixed. It has the longest startup time, used to be in minutes but now it’s like 5-10 seconds.
  • AlgebraOfGraphics.jl is a grammar of graphics front-end to Makie. This essentially means it has an API that looks and acts like R’s ggplot2. Thus it has largely the same pros and cons as Makie, since it’s just calling Makie under the hood, but with the additional pro of being more familiar to users coming from R or con if you haven’t worked with grammar of graphics before (or don’t like the style).
  • Gadfly is a grammar of graphics based library.
    • Pros: It’s very familiar to a ggplot2 user. Its default theme is pretty nice looking.
    • Cons: It’s a bit high on startup time, closer to Makie than Plots. Also, it’s pretty feature poor. In particular, it is missing 3D plots, animations, the ability to make interactive apps with buttons, etc. For these reasons more and more people are adopting AlgebraOfGraphics, but if you’re just doing some standard statistics it’s fine.
  • Vega and VegaLite are of the same camp as Gadfly in the focus towards “standard” statistics and data science, but using wrappers to Javascript libraries.
    • Pro: Fast startups
    • Cons Similar to Gadfly, little to no flexibility (making apps, animations, …) and integration with Julia libraries beyond Queryverse.
  • PlotlyLight is a no-frills wrapper to Plotly.
    • Pro: No startup time
    • Cons: Requires reading the Plotly docs to know how to use it and has little flexibility or integration into Julia libraries.
  • GR is a front end to a C library GR. It’s actually used as the default front-end from Plots.jl. Many more people use it from Plots.jl than directly due to the integrations and docs, but it is nice for some things on its own.
    • Pros: It’s fast, scales fairly well, has a fast startup time, has a nice GUI for investigating results, integrates well with ITerm, very flexible.
    • Cons: It’s docs are bit difficult, and it doesn’t have any integrations with Julia libraries.
  • PGFPlotsX.jl is a front-end to generate plots for Latex.
    • Pros: Fast startup, output to Latex which makes it easy to then further modify in publication documents.
    • Cons: Its interface is wonky, even if you are familiar with the pgfplots Latex package. This makes quite hard to use and teach. Very few integrations with Julia libraries (Measurements and Colors only?). Lacking flexibility in terms of animations and making apps, though it’s quite flexible in its ability to modify the plots and make weird things.
  • UnicodePlots.jl is very simple, fast startup, and plots to text. Its downside of course is that text is the only output it has.
  • Gaston.jl a front-end to gnuplot.
    • Pros: Fast startup.
    • Pretty basic, lacking flexibility and integrations with Julia packages. Requires gnuplot so limitations on where it can be installed (only supports linux?).
  • GMT.jl is “generic mapping tools”. It has some plotting tools highlighted here.
    • Pros: Has good examples in the docs. Nice extra tools for maps.
    • Cons: Missing some standard plot types like trisurf, missing integrations with other Julia packages.
  • GNUPlot.jl uses gnuplot under the hood.
    • Pros: Instant startup, has some interesting data science integrations for things like named datasets, very complete set of plots
    • Cons: Not the most complete documentation, requires gnuplot so limitations on where it can be installed (only supports linux?)

tl;dr on plotting in Julia

Plots.jl is the most used package in the Julia programming language for a reason. It’s very flexible, integrates with the most Julia packages so you’ll find it all throughout other docs, and it has many of the advantages of the other libraries through its backend system. Thus if you needed Latex output, use the pgfplots backend. If you needed a webpage, use the Plotly backend. Unicodeplots backend when you want text output. Or the GR default for the basics. With Julia v1.9 its startup time is much improved (and it’s like sub second on v1.10 beta), which was its major complaint before. If you’re going to use one plotting library and don’t care too much about every little detail, then Plots.jl is a good one to go with. It’s definitely not the best in any of the cases, animations are better in Makie, Latex is better in PGFPlotsX, etc., but it’s capable everywhere.

Makie.jl is catching up and may be the default in the near future. It scales well and its getting all of the niceties of Plots.jl. I wouldn’t learn it first if you’re new to Julia (right now, though that will likely change by 2024). But if you need animations or want to add custom buttons to a window (make a quick GUI-like thing), Makie is unmatched. If it makes its standard plotting interface a bit simpler, gets a few more integrations, and thus matches Plots.jl in simplicity, it may hit a “best of most worlds” soon.

Otherwise it’s a bit domain specific. If you were using Plots.jl and needed more flexibility for publication-quality plots, PGFPlotsX.jl can help. Or if you prefer grammar of graphics, AlgebraOfGraphics.jl is good. If you’re a stats person you may find Gadfly or VegaLite familiar, though I wouldn’t recommend them first because these don’t satisfy general user needs (try making a plot of an FEM output and see what I mean).

All of these are pretty good. You have a lot of options. In the end, pick the one that suits your needs best.

The post Summary of Julia Plotting Packages appeared first on Stochastic Lifestyle.