Luthorj is a computer program that creates static lexical analyzers in the Java programming language, in the same sense in which Flex and Lex create lexical analyzers in the C programming language. The majority of users of Luthorj are expected to be familiar with Lex, and Luthorj parses input files that are largely the same as Lex files. However, Luthorj is not merely a look-alike, Java version of Lex. The functionality of Luthorj is partly compatible with that of Flex, but Luthorj and Flex use different methods to provide similar functionality. The lexical analyzers created by Luthorj convert textual strings into tokens that, in turn, can be fed to parsers created by the Saxj program described in the preceding article. Luthorj converts input string specifications to lexical-analysis data structures by use of an algorithm that converts regular expressions to nondeterministic finite automata (NFA). The NFA are then mapped to deterministic finite automata (DFA). The combination of all DFA are represented as a transition table, which is stored in a file. The outputs of Luthorj are the transition table and the code to use it.

This program was written by Richard Weidner of Caltech for NASA's Jet Propulsion Laboratory. For further information, access the Technical Support Package (TSP) free on-line at www.nasatech.com/tsp  under the Software category.

This software is available for commercial licensing. Please contact Don Hart of the California Institute of Technology at (818) 393-3425. Refer to NPO-21054.



This Brief includes a Technical Support Package (TSP).
Document cover
Program Creates Java Lexical Analyzers

(reference NPO-21054) is currently available for download from the TSP library.

Don't have an account?



Magazine cover
NASA Tech Briefs Magazine

This article first appeared in the May, 2001 issue of NASA Tech Briefs Magazine (Vol. 25 No. 5).

Read more articles from the archives here.


Overview

The document discusses Luthorj, a lexical analyzer generator developed at the Jet Propulsion Laboratory (JPL) under NASA. It is designed to create static lexical analyzers in Java, similar to the functionality provided by tools like Flex and Lex in C. Luthorj aims to simplify the process of lexical analysis by converting regular expressions into efficient data structures, which can then be used to recognize patterns in input strings.

The document begins by explaining the role of a lexical analyzer, which processes input sequences to identify tokens. It highlights that while traditional lexical analyzers may not return tokens directly, they can filter input strings and maintain context for subsequent analysis. Luthorj is specifically designed to work with input specifications used by Lex, producing Java classes and offering a more robust interface compared to the global interface of traditional tools.

Regular expressions are a key feature of Luthorj, providing a shorthand method for describing string patterns. The document emphasizes the importance of keeping regular expressions simple to enhance recognition speed and reduce complexity. It provides examples of regular expressions, illustrating how they can match specific strings or patterns, such as matching variations of the word "March" or recognizing sequences of whitespace characters.

The document also discusses the Equivalence Character Sets (ECS), which allow for efficient matching of alternative characters. ECS can handle multiple alternatives with the same overhead as a single option, making them particularly useful for complex patterns. Additionally, it covers optional sets of characters and how to define patterns that match specific lengths.

Throughout the document, there is a focus on the challenges and intricacies of using regular expressions, acknowledging their potential for errors and difficulties in debugging. The author encourages users to familiarize themselves with the regular expression syntax and provides guidance on creating effective patterns.

In conclusion, the document serves as a technical overview of Luthorj, detailing its capabilities, the use of regular expressions, and best practices for lexical analysis. It is a valuable resource for developers looking to implement lexical analysis in Java, offering insights into the design and functionality of the Luthorj tool. For commercial licensing inquiries, the document directs readers to contact Don Hart at Caltech.