The DAG of Julia Packages

By: Júlio Hoffimann

Re-posted from: https://juliohm.github.io/science/DAG-of-Julia-packages/

If your package is listed below, please consider fixing it:


Instructions

This interactive visualization made with D3 shows the Directed Acyclic Graph (DAG) of all registered Julia packages up until 02-April-2017.

The size of a node represents its influence (i.e. out degree) in the DAG. The color represents the required Julia version.

Hover the mouse over the elements to get more information.

Data

The data was extracted from METADATA with the following script:

using JSON
using LightGraphs
using ProgressMeter

# find all packages in METADATA
pkgs = readdir(Pkg.dir("METADATA"))
filterfunc = p -> isdir(joinpath(Pkg.dir("METADATA"), p)) && p  [".git",".test"]
pkgs = filter(filterfunc, pkgs)

# assign each package an id
pkgdict = Dict{String,Int}()
for (i,pkg) in enumerate(pkgs)
  push!(pkgdict, pkg => i)
end

# build DAG
G = DiGraph(length(pkgs))
@showprogress 1 "Building graph..." for pkg in pkgs
  children = Pkg.dependents(pkg)
  for c in children
    add_edge!(G, pkgdict[pkg], pkgdict[c])
  end
end

# find required Julia version
juliaversions = String[]
for pkg in pkgs
  versiondir = joinpath(Pkg.dir("METADATA"), pkg, "versions")
  if isdir(versiondir)
    latestversion = readdir(versiondir)[end]
    reqfile = joinpath(versiondir, latestversion, "requires")
    juliaversion = string(get(Pkg.Reqs.parse(reqfile), "julia", "NA"))
    push!(juliaversions, juliaversion)
  else
    push!(juliaversions, "BOGUS")
  end
end

# construct JSON
nodes = [Dict("id"=>pkgs[v],
              "indegree"=>indegree(G,v),
              "outdegree"=>outdegree(G,v),
              "juliaversion"=>juliaversions[v]) for v in vertices(G)]
links = [Dict("source"=>pkgs[u], "target"=>pkgs[v]) for (u,v) in edges(G)]
data = Dict("nodes"=>nodes, "links"=>links)

# write to file
open("DAG-Julia-Pkgs.json", "w") do f
  JSON.print(f, data, 2)
end

Improvements

The DAG can be improved in many ways. Below is a list of issues that I would like to address, feel free to suggest more:

  1. The number of categories (i.e. Julia version strings) in the legend is too big. I wonder what would be a more sensible choice for coloring a version string [vmin,vmax), would it be the minimum vmin or the maximum vmax required Julia version?

  2. It would be great to present the data on a map. If you know how to get an approximate latitude/longitude for the author of a package (perhaps using GitHub API?), please leave a comment.

  3. Nodes could be collapsed by clicking with the mouse. Related visual elements would be updated accordingly.

  4. The evolution of the DAG with time should be interesting. How to extract time information from the commits in METADATA corresponding to new package tags?