Published on Sep 21, 2019
I/O-efficient graph algorithms have received considerable attention lately because massive graphs arise naturally in many applications. Recent web crawls, for example, produce graphs of on the order of 200 million nodes and 2 billion edges.
Recent work in web modeling uses depth-first search, breadth-first search, shortest paths, and connected components as primitive operations for investigating the structure of the web.
Massive graphs are also often manipulated in Geographic Information Systems (GIS), where many fundamental problems can be formulated as basic graph problems. The graphs arising in GIS applications are often planar. Yet another example of massive graphs is AT&T's 20TB phone call graph . When working with such large data sets, the transfer of data between internal and external memory, and not the internal memory computation, is often the bottleneck. Thus, I/O-efficient algorithms can lead to considerable run-time improvements.
Breadth-first search (BFS) and depth-first search (DFS) are the two most fundamental graph-searching strategies. They are extensively used in internal memory algorithms, as they are easy to perform in linear time; yet they provide valuable information about the structure of the given graph. Unfortunately, no I/O-efficient algorithms for BFS and DFS in arbitrary sparse graphs are known, while existing algorithms perform reasonably well on dense graphs. Together with recent results on single-source shortest paths (SSSP) and DFS, our algorithm leads to I/O-efficient algorithms for SSSP, BFS, and DFS on undirected embedded planar graphs.
Model of Computation
The algorithms in this paper are designed and analyzed in the Parallel Disk Model (PDM). In thismodel, D identical disks of unlimited size are attached to a machine with an internal memory capable of holding M data items. These disks constitute the external memory of the machine. Initially, all data is stored on disk. Each disk is partitioned into blocks of B data items each. An I/O-operation is the transfer of up to D blocks, at most one per disk, to or from internal memory from or to external memory.
The complexity of an algorithm in the PDM is the number of I/O-operations it performs. Sorting, permuting, and scanning an array of N consecutive data items are primitive operations often used in external memory algorithms. Their I/O-complexities are sort(N) = Q((N=DB) logM=B(N=B)),
perm(N) = Q(min(N; sort(N))), and scan(N) = O(N=DB), respectively.