Modern parallel file systems achieve high performance by distributing (“striping”) the contents of a single file across multiple physical disks to overcome single-disk I/O bandwidth limitations. The striping characteristics of a file determine how many disks it will be striped across and how large each stripe is. These characteristics can only be set at the time a file is created, and cannot be changed later. Standard open-source tools do not typically take striping into account when creating files, so files created by those tools will have their striping characteristics set to the default. The default stripe count is typically set to a small number to favor small files that are more numerous. A small default stripe count, however, penalizes large files that use the default settings, as they will be striped over fewer disks so access to these files will only achieve a fraction of the performance that is possible with a larger stripe count. A large default stripe count, however, causes small files to be striped over too many disks, which increases contention and reduces performance of the file system as a whole.
Retools is a set of modifications to the commonly used open-source utilities bzip2, gzip, rsync, and tar that automatically select the stripe size for created and/or extracted files according to the sizes of the files involved. These modifications make the tools “stripe-aware” so they can set an optimum stripe size for each file created instead of using the default striping. The compression utilities bzip2 and gzip set the striping of the compressed/decompressed file based on the size of the corresponding decompressed/compressed file, respectively. The synchronization utility rsync sets the striping of each destination file based on the size of the corresponding source file. Finally, the archival utility tar sets the striping of each archived/extracted file based on the size of the corresponding source/archived file, respectively.