In high-end computing environments, remote file transfers of very large data sets to and from computational resources are commonplace, as users typically are widely distributed across different organizations and must transfer in data to be processed, and transfer out results for further analysis. Local transfers of this same data across file systems are frequently performed by administrators to optimize resource utilization when new file systems come online or storage becomes imbalanced between existing file systems. In both cases, files must traverse many components on their journey from source to destination, where there are numerous opportunities for performance optimization as well as failure. A number of tools exist for providing reliable and/or high-performance file transfer capabilities, but most do not support local transfers, require specific security models and/or transport applications, are difficult for individual users to deploy, and/or are not fully optimized for highest performance.

Shift is a lightweight framework for high-performance local and remote file transfers that provides resiliency across a wide variety of failure scenarios. Shift provides high performance and resilience for local and remote transfers through a variety of techniques. These include end-to-end integrity via cryptographic hashes, throttling of transfers to prevent resource exhaustion, balancing transfers across resources based on load and availability, and parallelization of transfers across multiple source and destination hosts for increased redundancy and performance. In addition, Shift was specifically designed to accommodate the diverse heterogeneous environments of a widespread user base with minimal assumptions about operating environments.

Shift consists of a client and a manager component. A single transfer may consist of many different file operations such as creating directories, copying files, changing attributes, computing checksums, etc. The original client computes the operations that comprise the given transfer and initializes them on the manager. This client, together with any others spawned dynamically, then requests a set of operations from the manager, attempts those operations, and reports the results back to the manager. Clients may utilize different applications to carry out file operations depending on availability and underlying system characteristics.

Shift is unique in its ability to provide advanced reliability and automatic single- and multi-file parallelization to nearly any stock local or remote transfer application, while being easily deployed by both individual users and entire organizations. This allows organizations to, in many cases, utilize their existing transfer infrastructures while significantly increasing performance and reliability. Since Shift does not require a complex infrastructure of its own, individual users can utilize its features even if the organizations with which they are associated do not.

In general, Shift replaces traditional sequential transfers, which are highly vulnerable to failures at every point along the path between the client and remote file systems, with a highly parallel model that is resistant to failures throughout via multiple forms of redundancy and recovery. Outages to the client/remote file systems and the network interconnect between client/remote hosts, which are the remaining single points of failure, are tolerated through an intelligent retry mechanism that classifies failures by recoverability.

This work was done by Paul Kolano of Ames Research Center. NASA is seeking partners to further develop this technology through joint cooperative research and development. For more information about this technology and to explore opportunities, please contact Antoinette McCoy at This email address is being protected from spambots. You need JavaScript enabled to view it., or 650-604-4270. ARC-16940-1