A software framework on top of Hadoop Streaming enables both the processing of binary data in the cloud, and the freedom for the developer to implement his or her mapper and reducer programs in any language, rather than re-implementing existing solutions in Java, or repackaging existing binary data into a text format. Binary data is partitioned into chunks that are kept in a persistent data storage medium. A textual list of filenames for these chunks is piped into a Hadoop Streaming mapper program, which then reads the corresponding files, computes block transforms locally, and writes the results back to persistent data storage. The mapper program is stored on all compute nodes, and the filenames are distributed in parallel across the cluster, so that the workload is evenly distributed and the end-to-end block transform speedup is roughly given by the number of nodes in the cluster.

This work was done by Michael K. Cheng and William D. Wu of Caltech for NASA’s Jet Propulsion Laboratory.

This software is available for commercial licensing. Please contact Dan Broderick at This email address is being protected from spambots. You need JavaScript enabled to view it.. Refer to NPO-48908.



This Brief includes a Technical Support Package (TSP).
Document cover
Fast Block Transforms on Large Binary Datasets in the Cloud Using Hadoop Streaming

(reference NPO-48908) is currently available for download from the TSP library.

Don't have an account?



Magazine cover
NASA Tech Briefs Magazine

This article first appeared in the September, 2015 issue of NASA Tech Briefs Magazine (Vol. 39 No. 9).

Read more articles from this issue here.

Read more articles from the archives here.


Overview

The document discusses the use of cloud computing, specifically leveraging Amazon Elastic Compute Cloud (EC2) and the MapReduce framework, to develop efficient software-based telecommunications decoders. The principal investigators, William D. Wu and Michael K. Cheng, aim to enhance the development cost and design flexibility of decoders compared to traditional FPGA-powered systems.

The project focuses on applications such as Serially Concatenated Pulse Position Modulation (SCPPM) decoding, Reed-Solomon decoding, and Monte Carlo simulations for turbo codes. The document outlines a workflow for processing binary data, which involves uploading data to Amazon S3, executing mapper and reducer programs, and utilizing a cloud cluster for parallel processing. This approach allows for significant speed improvements, as demonstrated by the ability to simulate 100 million turbo codewords in just 1-2 hours using 150 eight-core machines, compared to 70 days without cloud resources.

The document emphasizes the benefits of cloud computing, including scalability, flexibility, and cost-effectiveness. It highlights that virtual machines can be rented at low hourly rates, enabling users to harness thousands of computing cores for large datasets. The implementation of Hadoop Streaming allows for minimal changes to existing software decoders, facilitating quick adaptations to new signaling specifications.

Amdahl’s Law is referenced to validate the performance of the cloud computing solutions, confirming that the speedup achieved with multiple processors aligns with theoretical expectations. The results indicate that cloud computing is a viable option for NASA and JPL, providing a scalable solution for high-rate telecom decoding without the need for custom hardware.

The document concludes by noting the inherent redundancy of cloud storage, which offers free backups and enhances data reliability. Overall, the project demonstrates that cloud computing can significantly improve the efficiency and accuracy of telecommunications decoding, making it an attractive alternative for future NASA missions and other applications in aerospace and beyond.