Program Creates Java Lexical Analyzers

NASA’s Jet Propulsion Laboratory, Pasadena, California

Luthorj is a computer program that creates static lexical analyzers in the Java programming language, in the same sense in which Flex and Lex create lexical analyzers in the C programming language. The majority of users of Luthorj are expected to be familiar with Lex, and Luthorj parses input files that are largely the same as Lex files. However, Luthorj is not merely a look-alike, Java version of Lex. The functionality of Luthorj is partly compatible with that of Flex, but Luthorj and Flex use different methods to provide similar functionality. The lexical analyzers created by Luthorj convert textual strings into tokens that, in turn, can be fed to parsers created by the Saxj program described in the preceding article. Luthorj converts input string specifications to lexical-analysis data structures by use of an algorithm that converts regular expressions to nondeterministic finite automata (NFA). The NFA are then mapped to deterministic finite automata (DFA). The combination of all DFA are represented as a transition table, which is stored in a file. The outputs of Luthorj are the transition table and the code to use it.

This program was written by Richard Weidner of Caltech for NASA's Jet Propulsion Laboratory. For further information, access the Technical Support Package (TSP) free on-line at www.nasatech.com/tsp under the Software category.

This software is available for commercial licensing. Please contact Don Hart of the California Institute of Technology at (818) 393-3425. Refer to NPO-21054.

This Brief includes a Technical Support Package (TSP).

Program Creates Java Lexical Analyzers

(reference NPO-21054) is currently available for download from the TSP library.

Don't have an account?

Overview

The document discusses Luthorj, a lexical analyzer generator developed at the Jet Propulsion Laboratory (JPL) under NASA. It is designed to create static lexical analyzers in Java, similar to the functionality provided by tools like Flex and Lex in C. Luthorj aims to simplify the process of lexical analysis by converting regular expressions into efficient data structures, which can then be used to recognize patterns in input strings.

The document begins by explaining the role of a lexical analyzer, which processes input sequences to identify tokens. It highlights that while traditional lexical analyzers may not return tokens directly, they can filter input strings and maintain context for subsequent analysis. Luthorj is specifically designed to work with input specifications used by Lex, producing Java classes and offering a more robust interface compared to the global interface of traditional tools.

Regular expressions are a key feature of Luthorj, providing a shorthand method for describing string patterns. The document emphasizes the importance of keeping regular expressions simple to enhance recognition speed and reduce complexity. It provides examples of regular expressions, illustrating how they can match specific strings or patterns, such as matching variations of the word "March" or recognizing sequences of whitespace characters.

The document also discusses the Equivalence Character Sets (ECS), which allow for efficient matching of alternative characters. ECS can handle multiple alternatives with the same overhead as a single option, making them particularly useful for complex patterns. Additionally, it covers optional sets of characters and how to define patterns that match specific lengths.

Throughout the document, there is a focus on the challenges and intricacies of using regular expressions, acknowledging their potential for errors and difficulties in debugging. The author encourages users to familiarize themselves with the regular expression syntax and provides guidance on creating effective patterns.

In conclusion, the document serves as a technical overview of Luthorj, detailing its capabilities, the use of regular expressions, and best practices for lexical analysis. It is a valuable resource for developers looking to implement lexical analysis in Java, offering insights into the design and functionality of the Luthorj tool. For commercial licensing inquiries, the document directs readers to contact Don Hart at Caltech.

May 1, 2001 | Software

Program Creates Java Lexical Analyzers

This Brief includes a Technical Support Package (TSP).

Program Creates Java Lexical Analyzers

Overview

Top Stories

Blog: Design

INSIDER: Design

Blog: Power

Blog: Energy

Quiz: Power

Blog: Robotics, Automation & Control

Webcasts

On-Demand Webinars: Electronics & Computers

Upcoming Webinars: Unmanned Systems

Upcoming Webinars: AR/AI

Upcoming Webinars: Defense

Upcoming Webinars: Software

Upcoming Webinars: Energy

May 1, 2001
| Software