miniVite: Problem size and input generation

Problem size

MiniVITE requires a graph to work on. The two main quantities that determine the problem size is the number of vertices and the number of edges in the graph. Meaningful problem sizes from the literature start from 5M vertices and go to 150M with number of edges in the range 50M to 7B, however larger problems might be still interesting as the main limitation is computational resources. The largest real-world graph that has been tested on both miniVite and Vite is uk2007 (3.3B edges), however, there are other larger graphs that could be used. The largest synthetic file generated using miniVite was about 2.14B vertices and 27.8B edges.

Input generation

There are two options for an input graph, ask miniVITE to generate a random graph or provide a graph generated by an external application.

To run on a randomly generated graph, run miniVITE with the -n option followed by the number of desired vertices. MiniVITE will both generate the graph and run the community detection algorithm in the same run. For randomly generated graphs the total number of processes must be a power of 2 and total number of vertices to be perfectly divisible by the number of processes (this constraint does not apply to real world graphs passed to miniVite).

The second option is to supply an external file as input. However, the input graph must be in a certain binary format.  The code for binary conversion (from a variety of common graph formats) is only included with the parent application Vite. Please follow instructions in Vite README for binary file conversion.

For weak scaling tests of miniVite you can use random generated graph or you can use multiple graphs of similar pattern, such as RMAT graphs used in Graph500 BFS) or the Sparse TAMU collection: https://sparse.tamu.edu/.

For strong scaling use a large file (over a billion edges) like Friendster (the raw input in its native format is listed in http://graphchallenge.mit.edu/data-sets).