NASA Engineering & Safety Center (NESC) subject matter experts analyze records in various International Space Station and shuttle databases to identify recurring anomalies. The key problems these experts face in analyzing such database records are:
- It is difficult to conduct integrated searches across the multiple databases with customized presentation of the results.
- It is difficult for clustering algorithms that automate analysis of such data to be executed for sets of data from the various databases.
The objective of this invention is to make analysis for recurring anomalies easy for end users. It allows them to select databases, search terms, and clustering algorithms, and then click a button to complete their analysis.
The system integrates XML Query-Based Search with Text Mining. The initial application of this technology is for discovering recurring anomalies associated with NASA missions. Users can select database(s) they wish to query or mine, enter their search terms, then either view their search results or click a Text Mining button to initiate analysis using multiple types of unsupervised clustering algorithms. After the analysis is complete, they receive e-mail notification with a link to the text mining results. Initial clustering uses a modified version of the von Mises Fisher algorithm. Other clustering algorithms use both content-based similarity and statistical similarity.
A user can select one or more databases to search, then optionally enter context and/or content search terms, optionally select one or more data mining algorithms, then click Search and/or Text Mining. If Search is selected, then the context+content search results are displayed in the Internet browser for the databases selected. If Text Mining is selected, then a request is submitted to a text mining server for the specified databases, context+content search terms, and clustering algorithms. Once the text mining has finished, the results are formatted in XML, uploaded to the XML database server, and an e-mail notification is sent to the user with a URL link to the text mining results. Users can also select sets of documents from the text-mining viewer and initiate a query of those documents.