Topological Search

2018, Nov 05

This week I spent some time working on a CS problem that required me to perform topological search of a graph. The problem didn’t seem so complicated at first glance:

Design a program which takes as input two parameters: an array of tasks and an array containing pairs of dependency relationships. Each dependency relationship consists of a pair of tasks wherein the first task must be completed before the second task is completed. Your program must must return an array consisting of a valid order in which each of the tasks may be completed or return an error if no valid order is possible.

Solving this problem, however, isn’t quite so straightforward as many of the string and array manipulation questions I typically get confronted with on a day to day basis. The question really requires diving into the topic of topological sorting of graphs. This question really excited me so I decided to sit down, put in the time, and work my way through a solution.

What captured my attention about this problem is that its practical application was immediately clear: this is dependency resolution. In other words, the problem solved by this exercise is the same fundamental problem that many of the tools I rely on every single day address such as Bundler to install Ruby gem dependencies, Apt to manage package dependencies on my Debian GNU/Linux laptop, and Systemd to manage process initialization on my servers.

Although the inputs in this exercise are arrays, they represent the edges of a directed graph, i.e. a graph in which nodes have a one-way or directional relationship to adjacent nodes. In this case the directionality is the `dependency -> dependant` relationship between some of the task nodes.

According to Wikipedia, topological sorting of a directed graph can be described as

[…] a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time.

Simple right? So how do we actually implement topological sorting and solve the problem at hand? Lets say that we are given a list of projects and their respective dependency relationships in the following form:

``````tasks = ["a", "b", "c", "d", "e", "f", "g"]

dependencies = [["b", "e"], ["a", "e"], ["f", "b"],
["f", "c"], ["c", "a"], ["f", "a"],
["b", "a"], ["d", "g"]]
``````

As a first step, let’s see if we can interpret this data to more naturally visualize it as a graph. Each element in the dependencies array represents a directional connection from, for example, task `b` to task `e`. While we want to capture this representation programattically in our code let’s start out by using the DOT graph description language to generate a graph diagram so we can better comprehend the data. The graph in this sample input can be described (with some tweaks for aesthetics and clarity) by the following DOT file:

``````digraph G {
rankdir=tb
d -> g;
f -> b;
f -> c;
f -> a;
b -> a;
c -> a;
a -> e;
b -> e;
{ rank=same "d" "f" }
{ rank=same "e" "g" }
{ rank=same "c" "b" }
}
``````

Once compiled, the above file produces this chart: From just a quick look at this image we can ascertain a few basic facts:

1. This graph is disconnected and therefore there is not a path between each node and all other nodes.
2. This graph contains no cycles and therefore should have at least one valid topological sorting order.
3. This graph contains nodes with multiple distinct dependencies and therefore likely has multiple valid topological sorting orders because the order of dependencies is not strictly prescribed.

Here is the first version of the code I came up with to solve this problem:

``````# Main method, takes specified input arrays
# returns array of tasks in valid order
# unless error is raised
# Converts graph data to hash representation
completion_order = []
# Loops through hash removing tasks which do not have
# dependencies and adding them to `completion_order`
while !deps.empty? do
completion_order << processing
# decrement the dependencies key for all dependents
processing.each do |t|
if !deps[t][:dependents].empty?
deps[t][:dependents].each do |d|
deps[d][:dependencies] -= 1
end
end
deps.delete(t)
end
end
# return the completion order array
return completion_order.flatten
end

# Parse graph and store in Hash datastructure
deps = Hash.new
# set the default deps hash value for each task
deps[t] = {dependencies: 0, dependents: []}
end

dependencies.each do |d|
# add each dependant to the dependency's :dependents array
deps[d][:dependents] << d
# increment the dependant task's :dependencies counter
deps[d][:dependencies] += 1
end

return deps
end

# Iterate through the `deps` hash to find tasks with
# no dependencies
processing = []
# Find all tasks which have no dependencies
deps.keys.each do |t|
if deps[t][:dependencies] == 0
processing << t
end
end

return processing if !processing.empty?
# if processing is empty than we have reached an irreconcilable cycle
raise "An error has occurred. No valid order detected."
end
``````

Here is a quick overview of how this code functions:

1. This solution starts out by parsing the input graph and transforming it to a hash where each task is a key and each value is a nested hash with a `:dependents` key containing an array of dependant tasks and a `:dependencies` key containing a counter of that task’s dependencies.
2. The solution then performs a loop through the graph hash looking for root tasks which have no dependencies.
3. The root tasks are then added to the `completion_order` array and removed from the graph hash.
4. The removed task’s dependents also have their `:dependencies` counter decremented.
5. On the next loop around any new root tasks are identified and the cycle continues until either the each task in the graph has been added to the `completion_order` array or no new root task can be found and an error is raised.

Given the sample input, the program returns:

``````=> ["d", "f", "b", "c", "g", "a", "e"]
``````

Checking our diagram this appears to be one of the many possible valid orders of task completion for the specified graph. What if we tweak the sample input just a bit by adding an element `["b", "f"]` to the `dependencies` array input? This graph now looks like this: When we run the program on this modified input we get the following output:

``````An error has occurred. No valid order detected.
``````

As you can see, this small change creates a big problem! The newly added `b -> f` dependency creates a bidirectional connection. In other words, without `f` there can be no `b` and without `b` there can be no `f`. Accordingly there is no valid order in which the tasks can be completed and the program correctly raises an error.

This code a solid solution and although I imagine there are some improvements I could make in my approach, I must say that I’m quite pleased with it. I did, however, discover something interesting: Topological sort is actually an included module in the Ruby standard library! Using the `TSort` module and a custom class, we can accomplish the same outcome with far fewer lines of code:

``````require 'tsort'

deps = create_hash(projects, dependencies)
graph = G.new(deps)
graph.tsort
end

deps = Hash.new

deps[t] = []
end

dependencies.each do |d|
deps[d] << d
end

return deps
end

class G
include TSort
def initialize(g)
@g = g
end

def tsort_each_child(n, &b)
@g[n].each(&b)
end

def tsort_each_node(&b)
@g.each_key(&b)
end
end
``````

This is one of the first graph problems I’ve had to deal with and I’m really excited to further explore how I can use graphs to solve challenging problems.