DRAT software overcomes limitations inherent in the Apache Release Audit Tool (RAT), and also brings code auditing and open-source license analysis into the realm of Big Data by using scalable, open-source Apache technologies. Distributed RAT (DRAT) leverages Apache Tika to automatically detect and classify files in source code repositories, and determines what in the code is a binary file, what is source code, what are notes that need skipping, etc.

DRAT also leverages Apache Solr to perform analytics on code repositories interactively (extracting metadata using Apache Tika), as well as leverages Apache OODT to run RAT on per-MIME type (e.g., C/C++, Java, Javascript, etc.) and per-configurable Kfile sized chunks in a Map Reduce workflow where each Mapper task is an instance of RAT running on a K-file sized per-MIME type chunk (split using Tika). Each mapper produces an incremental and intermediate log file while the Reducer aggregates the individual log files.

This work was done by Chris A. Mattmann, Paul M. Ramirez, Michael J. Joyce, Shakeh E. Khudikyan, Maziyar Boustani, Rishi Verma, and Lewis J. McGibbney of Caltech; and Tyler S. Palsulich for NASA’s Jet Propulsion Laboratory. For more information, contact This email address is being protected from spambots. You need JavaScript enabled to view it..

This software is available for commercial licensing. Please contact Dan Broderick at This email address is being protected from spambots. You need JavaScript enabled to view it.. Refer to NPO-49562.