MACSio: Algorithms and key kernels

MACSio’s funcationality can be extended using plugins that enable the tool to utilize a particular I/O library, such as HDF, netCDF, etc. These plugins allow finer grained control of performance and event logging, as well as providing additional capabilities for simulating more complex application models. MACSio enables accurate comparison with existing codes only when it has a plugin that utilizes the same data management and I/O middleware as the code it is being compared with. For example, without a LibMesh plugin for MACSio, it is rather difficult to compare other codes utilizing LibMesh for I/O. This is true even if LibMesh is utilizing a lower-level I/O library like Exodus or HDF5 (for which MACSio does have plugins) because of the rather large flexibility of using Exodus or HDF5 to perform I/O.

MACSio supports two parallel I/O paradigms, multiple independent file (MIF) where each process or group of processes accesses a separate file, and single shared file (SSF) where all processes access a single file. It should be noted that the current version of MACSio only supports I/O write operations. There are no options available to specify read operations. The key configuration parameters for MACSio are:

  • parallel_file_mode: this is used to choose between Multiple Input File (MIF) and Single
    Shared File (SIF) modes. This parameter takes an argument that specifies the number of files (MIF) or by grouping processes to produce the specified number of files (SIF).
  • part_type:  specifies the mesh type. Default `rectilinear’.
  • part_dim: the dimension of the mesh. Default 2, but this does not seem to affect the I/O behavior.
  • part_size: the nominal I/O request size used by each task.
  • avg_num_parts: the average number of mesh elements (parts) per task, which also
    configures the total number of elements in the mesh as avg_num_parts × tasks.
  • vars_per_part: the number of mesh variables per element, which specifies the number of
    I/O requests each task must make to complete a “dump”.
  • num_dumps: the number of “dumps” to perform. A “dump” roughly correlates to
    a checkpoint, so this can be used to specify the number of checkpoints (usually based on checkpoint frequency and the number of timesteps executed).
  • dataset_growth: allows simulation of increasing dataset size by specifying a multiplier factor.
  • meta_size: used to simulate the creation of additional “metadata” that is included in the output. This specifies the size of metadata objects that are included in the dump.
  • meta_type: specify the type of the metadata objects (either tabular or
    amorphous).
  • compute_work_intensity: Adds compute workload between “dumps”, which seems to be the main mechanism for specifying a time delay between each dump, but can also simulate some level of computation.

For MACSio to be useful as an I/O workload proxy for a range of ECP applications, it needs to be able to accurately simulate a wide variety of different workload patterns, and the selection of the MACSio input
parameters is crucial to how closely the target application I/O workload will be simulated. However, this can be a difficult task, since it requires detailed knowledge of the application data model, data distribution, and I/O behavior, so it would be advantageous if there was some means of obtaining at least some of this information in an automated manner.

Dickson, et al. discussed one approach to simplifying the determination of correct MACSio input parameters by post processing Darshan characterization logs generated from application runs. We decided to adopt this approach to ascertain if it was a feasible way of reducing the complexity of generating parameters. However, in addition to the characterization logs, we also decided to collect Darshan trace information to provide further details about the I/O behavior of the application.

Our approach to configuring MACSio and assessing its I/O behavior was as follows:

  1. Instrument the application to be proxied and collect Darshan characterization and trace logs.
  2. Post-processes these logs to extract useful information, such as the number of processes/ranks, the total number of files generated, the number of files per process, the average I/O request size, and the number of dumps to marshal.
  3. Obtain application specific information that can’t be determined from the logs, such as mesh size and dimension, mesh levels, and parallel file mode.
  4. Run a Darshan instrumented version of MACSio at the same scale and collect the logs.
  5. Compare the application and MACSio logs to determine how closely they match.

After undertaking this exercise for a sample application, we assessed the feasibility of this approach, determined the limitations that currently exist in MACSio, and proposed a path forward for using MACSio as an I/O proxy application.