STITCH is a computer program that processes raw nucleotide-sequence data to automatically remove unwanted vector information, perform reverse-complement comparison, stitch shorter sequences together to make longer ones to which the shorter ones presumably belong, and search against the user's choice of private and Internet-accessible public 16S rRNA databases. ["16S rRNA" denotes a ribosomal ribonucleic acid (rRNA) sequence that is common to all organisms.] In STITCH, a template 16S rRNA sequence is used to position forward and reverse reads. STITCH then automatically searches known 16S rRNA sequences in the user's chosen database(s) to find the sequence most similar to (the sequence that lies at the smallest edit distance from) each spliced sequence.
The result of processing by STITCH is the identification of the most similar welldescribed bacterium. Whereas previously commercially available software for analyzing genetic sequences operates on one sequence at a time, STITCH can manipulate multiple sequences simultaneously to perform the aforementioned operations. A typical analysis of several dozen sequences (length of the order of 103 base pairs) by use of STITCH is completed in a few minutes, whereas such an analysis performed by use of prior software takes hours or days.
This program was written by Shariff Osman and Kasthuri Venkateswaran of Caltech; George Fox of Dept. of Biology and Biochemistry, University of Texas, Houston; and Dianhui Zhu of Dept. of Computer Sciences, University of Texas, Houston for NASA's Jet Propulsion Laboratory.
In accordance with Public Law 96-517, the contractor has elected to retain title to this invention. Inquiries concerning rights for its commercial use should be addressed to:
Innovative Technology Assets Management
JPL
Mail Stop 202-233
4800 Oak Grove Drive
Pasadena, CA 91109-8099
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
Refer to NPO-44785, volume and number of this NASA Tech Briefs issue, and the page number.
This Brief includes a Technical Support Package (TSP).

Automated Identification of Nucleotide Sequences
(reference NPO-44785) is currently available for download from the TSP library.
Don't have an account?
Overview
The document outlines the development and capabilities of a software program designed for the automated identification of nucleotide sequences, particularly focusing on the 16S rRNA gene, which is crucial for characterizing and identifying bacteria. This software, referred to as "STITCH," was developed by a collaborative team from NASA's Jet Propulsion Laboratory (JPL) and the University of Houston.
STITCH addresses the challenges associated with traditional DNA sequencing methods, which often require extensive manual effort and time. For instance, tasks that could take approximately 40 man-hours using conventional techniques can be completed in about an hour with STITCH. The software is capable of rapidly processing several dozen sequences, each around 1,000 base pairs in length, in just a few minutes.
The program operates by utilizing a standard dataset of 56 phylogenetically diverse complete sequences to select an appropriate template for comparison. It employs algorithms to confirm the validity of read pairs by calculating edit distances between fragments and the template. The software only splices valid pairs of forward and reverse reads, ensuring that the resulting sequences are accurate and reliable. If discrepancies are found in overlapping regions of the reads, a consensus sequence is generated to maintain integrity.
Additionally, STITCH can automatically search for the closest 16S rRNA sequences using either a local database or the online NCBI database. This feature enhances the identification process, although users may encounter uninformative hits when using the online option. The local database contains 5,482 type strain sequences, allowing for more precise identification of well-named bacteria.
One of the key advantages of STITCH over existing software is its ability to simultaneously trim, stitch, and identify sequences, a feature not available in other applications. This integration streamlines the workflow for researchers and significantly reduces the time required for analysis.
In summary, the document presents STITCH as a groundbreaking tool in the field of microbial identification, leveraging advanced algorithms to improve efficiency and accuracy in processing nucleotide sequences. Its development represents a significant advancement in bioinformatics, with potential applications extending beyond traditional microbiology into various scientific and commercial fields.

