Graphs
Graph Basics
A graph is a data structure consisting of:
- A set of vertices $V$
- Also called nodes
- A set of edges $E$ between pairs of vertices
Types of Graphs
- Undirected
- In an undirected graph, edges do not have direction. For example, Facebook friends.
- Directed
- In a directed graph or digraph, each edge has a direction. For example, road maps, Twitter follows.
- Weighed
- In a weighted graph, each edge has an associated weight. For example, road maps, networks.
- With Loops
- A loop is an edge from a vertex to itself. Depending on the context, a graph may or may not be able to have loops.
- Multigraphs
- In a multigraph, multiple edges are allowed between two vertices. For example, call graphs, maps.
A simple graph is an undirected graph with no loops and no multiple edges.
For a simple graph with V vertices, the maximum possible number of edges is
$$E = V(V-1)/2$$
Graph Terminology
Two vertices $v$ and $w$ are adjacent if an edge $e := (v,w)$ connects them; we can $e$ is incident on $v$ and $w$.
The $degree$ of a vertex $v$ is the number of edges incident on $v$
The ratio $E:V$ can vary considerably.
If $E$ is closer to $V^2$, the graph is dense.
If $E$ is closer to $V$, the graph is sparse.
A path is aa sequence of vertices where each vertex has an edge to the next in the sequence
A path is simple if it has no repeating vertices
A cycle is a path where only the first and last vertices are the same
A complete graph is a graph where every vertex is connected to every other vertex via an edge.
In a complete graph, $E = \frac{1}{2}V(V-1)$
A connected graph is a graph where there is a path from every vertex to every other vertex
A tree is a connected graph with no cycles. A tree has exactly one path between each pair of vertices.
A subgraph of a graph G is a graph that contains a subset of the vertices of G and a subset of the edges between these vertices.
A connected component is a maximally connected subgraph. A connected graph has one connected component — the graph itself. A disconnected graph has two or more connected components.
A spanning tree of a graph G is a subgraph that contains all the vertices of G and is a single tree. Spanning trees only exist for connected graphs.
A spanning forest of a graph G is a subgraph that contains all the vertices of G and contains one tree for each connected component.
A clique is a complete subgraph. A clique is non-trivial if it has 3 or more vertices.
Graph Traversals
Graphs aren’t very useful if all we’re doing is storing data in them. We need to effectively search and traverse them to gather information.
There are three 3 main topics related to traversal:
- Traverse for Item Searching
- Move between nodes to look for a value
- Traverse for Path Finding
- Look for a value, but keep track on how to get there
- Traverse Fully
- Like path-finding but exploring the full graph
There are two primary methods to traverse a graph:
- Breadth-first search (BFS)
- Iterative solutions
- Depth-first search (DFS)
- Recursive & iterative solutions
- Recursive & iterative solutions
Both approaches ignore some edges by remembering previously visited vertices, usually done in a visited[] array (this also prevents cycles).
Breadth First Search
Start at a node, and expanding out equally from there. Visit all the neighbours of the current vertex, then visit all the neighbours of those neighbours, etc.
bfs(G, src):
Input: graph G, starting vertex src
create visited array, initialised to false
create predecessor array, initialised to -1
create queue Q
visited[src] = true
enqueue src into Q
while Q is not empty:
v = dequeue from Q
for each neighbour w of v in G where visited[w] = false:
visited[w] = true
predecessor[w] = v
enqueue w into Q
The previous code can be simplified:
- The predecessor array can double as a visited array.
- $predecessor[v] = -1$ means $v$ is not visited.
- The first vertex (src) can be marked as visited with $pred[src] = src$.
Analysis:
BFS is $O(V+E)$ when using adjacency list:
- Typical queue implementation has $O(1)$ enqueue and dequeue
- Each vertex is visited at most once $\implies$ $O(V)$
- For each vertex, all of its edges are considered once $\implies$ $O(E)$
Path Finding:
A BFS finds the shortest path between the starting vertex and all other vertices.
The shortest path between $src$ and $dest$ can be found by tracing backwards through the predecessor array (from $dest$ to $src$).
bfsFindPath(G, src, dest):
Input: graph G, vertices src and dest
... Execute BFS starting from src ...
if predecessor[dest] != -1:
v = dest
while v != src:
print v "<-"
v = predecessor[v]
print src
In short, we first check if $dest$ was reached or not, then backtrack from $dest$ to $src$ using the $predecessor$ array. For example, finding a path from vertex $3$ to $8$ can print out:
8 <- 2 <- 5 <- 1 <- 3
Depth First Search
Goes as far down one path as possible until it reaches a dead end, then backtracks until it finds a new path to take, then repeats.
DFS can be implemented recursively or iteratively.
dfs(G, src):
Input: graph G, starting vertex src
create visited array, initialised to false
dfsRec(G, src, visited)
dfsRec(G, v, visited):
Input: graph G, vertex v, visited array
visited[v] = true
for each neighbour w of v in G:
if visited[w] = false:
dfsRec(G, w, visited)
Analysis:
Recursive DFS is $O(V + E)$ when using adjacency list representation:
- Each vertex is visited at most once $\implies O(V)$
- Function is called on each vertex once
- For each vertex, all of its edges are considered once $\implies O(E)$
Path Checking:
Recursive DFS can be adapted to check if a path exists between two vertices.
dfsHasPath(G, src, dest):
Input: graph G, vertices src and dest
Output: true if there is a path from src to dest, false otherwise
create visited array, initialised to false
return dfsHasPathRec(G, src, dest, visited)
dfsHasPathRec(G, v, dest, visited):
Input: graph G, vertices v and dest, visited array
visited[v] = true
if (v = dest):
return true
for each neighbour w of v in G:
if visited[w] = false:
if dfsHasPathRec(G, w, dest, visited):
return true
return false
Path-Finding:
dfsFindPath(G, src, dest):
Input: graph G, vertices src and dest
create predecessor array, initialised to -1
predecessor[src] = src
if dfsFindPathRec(G, src, dest, predecessor):
v = dest
while v != src:
print v, "<-"
v = predecessor[v]
print src
Iterative Implementation:
DFS can be implemented iteratively using stack
dfs(G, src):
Input: graph G, vertex src
create visited array, initialised to false
create predecessor array, initialised to -1
create stack S
push src onto S
while S is not empty:
v = pop from S
if visited[v] = true:
continue
visited[v] = true
for each neighbour w of v in G where visited[w] = false
predecessor[w] = v
push w onto S
Spanning Trees
The edges traversed in a graph traversal form a spanning tree.
A traversal starting at vertex ‘a’ forms the following spanning trees:
Disconnected Graphs
If a graph is not connected, a graph traversal starting from a given vertex will not traverse the entire graph.
Solution:
After initial traversal is complete, perform traversal again on an unvisited vertex, repeat until all vertices are visited This produces a spanning forest
dfs(G):
Input: graph G
create predecessor array, initialised to -1
for each vertex v in G:
if predecessor[v] = -1:
dfsRec(G, v, predecessor)
...
Graph Problems
Cycle Checking
A cycle is a path of length > 2 where the start vertex = end vertex and no edge is used more than once.
This graph has three distinct cycles:
1-2-5-1, 2-5-6-2, 1-2-6-5-1
Idea to check cycles in a graph:
- Perform a DFS, starting from any vertex
- Keep track of the previous vertex during DFS (to ensure length > 2)
- During the DFS, if the current vertex has an edge to an already-visited vertex, which is not the previous vertex, then there is a cycle
- After the DFS, if any vertex has not yet been visited, perform another DFS, this time starting from that vertex
- Repeat until all vertices have been visited
hasCycle(G):
Input: graph G
Output: true if G has a cycle, false otherwise
create visited array, initialised to false
for each vertex v in G:
if visited[v] = false:
if dfsHasCycle(G, v, v, visited):
return true
dfsHasCycle(G, v, prev, visited):
visited[v] = true
for each neighbour w of v in G:
if w = prev:
continue
if visited[w] = true:
return true
else if dfsHasCycle(G, w, v, visited):
return true
return false
Analysis:
Using adjacency list representation,
- Algorithm is a slight modification of DFS
- A full DFS traversal is $O(V + E)$
- Thus, worst-case time complexity of cycle checking is $O(V + E)$
Connected Components
A connected component is a maximally connected subgraph. For example, this graph has three connected components:
Goal:
- Compute an array which indicates which connected component each vertex is in
- Let this array be called $componentOf$
- $componentOf[v]$ contains the component number of vertex $v$
Idea:
- Choose a vertex and perform a DFS starting at that vertex
- During the DFS, assign all vertices visited to component 0
- After the DFS, if any vertex has not been assigned a component, perform a DFS starting at that vertex
- During this DFS, assign all vertices visited to component 1
- Repeat until all vertices are assigned a component, increasing the component number each time
components(G):
Input: graph G
Output: componentOf array
create componentOf array, initialised to -1
compNo = 0
for each vertex v in G:
if componentOf[v] = -1:
dfsComponents(G, v, componentOf, compNo)
compNo = compNo + 1
return componentOf
dfsComponents(G, v, componentOf, compNo):
componentOf[v] = compNo
for each neighbour w of v in G:
if componentOf[w] = -1:
dfsComponents(G, w, componentOf, compNo)
Analysis using adjacency list:
- Algorithm performs a full DFS, which is $O(V + E)$
Suppose we frequently need to check:
- How many connected components there are
- Are $v$ and $w$ in the same connected component?
- Is there a path between $v$ and $w$?
Solution is to cache the components array in the graph struct
struct graph {
...
int nC; // number of connected components
int *cc; // componentOf array
};
However, this information needs to be maintained as the graph changes:
- Inserting an edge may reduce nC
- If the endpoint vertices were in different components
- Removing an edge may increase nC
- If the endpoint vertices were in the same component and there is no other path between them
Hamiltonian Path and Circuit
-
A Hamiltonian path is a path that includes each vertex exactly once
-
A Hamiltonian circuit is a cycle that includes each vertex exactly once
To check if a graph has a Hamiltonian path:
- Brute force
- Use DFS to check all possible paths
- Recursive DFS is perfect, as it naturally allows backtracking
- Keep track of the number of vertices left to visit
- Stop when this number reaches 0
hasHamiltonianPath(G):
Input: graph G
Output: true if G has a has a Hamiltonian path, false otherwise
create visited array, intialised to false
for each vertex v in G:
if dfsHamiltonianPath(G, v, visited, #vertices(G)):
return true
return false
dfsHamiltonianPath(G, v, visited, numVerticesLeft):
visited[v] = true
numVerticesLeft = numVerticesLeft - 1
if numVerticesLeft = 0:
return true
for each neighbour w of v in G:
if visited[w] = false:
if dfsHamiltonianPath(G, w, visited, numVerticesLeft):
return true
visited[v] = false
return false
How to check if a graph has a Hamiltonian circuit:
- Similar approach as Hamiltonian path
- Don’t need to try all starting vertices
- After a Hamiltonian path is found, check if the final vertex is adjacent to the starting vertex
hasHamiltonianCircuit(G):
Input: graph G
Output: true if G has a Hamiltonian circuit, false otherwise
if #vertices(G) < 3:
return false
create visited array, initialised to false
return dfsHamiltonianCircuit(G, 0, visited, #vertices(G))
dfsHamiltonianCircuit(G, v, visited, numVerticesLeft):
visited[v] = true
numVerticesLeft = numVerticesLeft - 1
if numVerticesLeft = 0 and adjacent(G, v, 0):
return true
for each neighbour w of v in G:
if visited[w] = false:
if dfsHamiltonianCircuit(G, w, visited, numVerticesLeft):
return true
visited[v] = false
return false
Analysis:
- Worst-case time complexity: $O(V!)$
- There are at most V! paths to check ($≈\sqrt{2\pi V}(V/e)^V$ by Stirling’s approximation)
- There is no known polynomial time algorithm
Eulerian Path and Circuit
- An Eulerian path is a path that visits each edge exactly once
- An Eulerian circuit is an Eulerian path that starts and ends at the same vertex
How to check if a graph has an Eulerian path or circuit:
- A graph has an Eulerian path if and only if exactly zero or two vertices have an odd degree, and all vertices with non-zero degree belong to the same connected component
- A graph has an Eulerian circuit if and only if every vertex has even degree, and all vertices with non-zero degree belong to the same connected component
hasEulerianPath(G):
Input: graph G
Output: true if G has an Eulerian path, false otherwise
numOddDegree = 0
for each vertex v in G:
if degree(G, v) is odd:
numOddDegree = numOddDegree + 1
return (numOddDegree = 0 or numOddDegree = 2) and eulerConnected(G)
hasEulerianCircuit(G):
Input: graph G
Output: true if G has an Eulerian circuit
false otherwise
for each vertex v in G:
if degree(G, v) is odd:
return false
return eulerConnected(G)
eulerConnected(G):
Input: graph G
Output: true if all vertices in G with non-zero degree
belong to the same connected component
false otherwise
create visited array, initialised to false
for each vertex v in G:
if degree(G, v) > 0:
dfsRec(G, v, visited)
break
for each vertex v in G:
if degree(G, v) > 0 and visited[v] = false
return false
return true
Analysis for adjacency list representation:
- Finding degree of every vertex is $O(V + E)$
- Checking connectivity requires a DFS which is $O(V + E)$
- Therefore, worst-case time complexity is $O(V + E)$
Other Graph Problems
Many graph problems are intractable, that is, there is no known “efficient” algorithm to solve them.
In this context, “efficient” usually means polynomial time.
A tractable problem is one that is known to have a polynomial-time solution.