As the cost of micro-disk and NAND Flash continue to drop, devices are storing more and more data. It is common now for a person's MP3 player to have more storage than their laptop. But this increase in storage capacity has not been matched with advances in the user interface. Typically, users still wrestle with a folder-based interface to find the data they want, searching by a few vendor-defined categories such as artist, album, and genre. But a new class of embedded database management systems (DBMS) is emerging to allow end users to search the way people think, rather than in this static manner. With a RAM footprint ranging from a few tens to a few hundred kilobytes, these products enable developers to deliver this sophisticated search on mobile devices. So how do they work? How do you write an embedded application to use a relational DBMS (RDBMS)? While there are a few kinds of DBMS, the relational model has triumphed over all the others, largely because it abstracts the data structures so that applications don't have to know them. A relational database management system offers a standard, high-level query language that allows access to data by content, not by pointer or location and offset.
An RDBMS stores data in tables made up of distinct rows. Each row is made up of named columns, each of a specific data type. Columns may be of variable length. Rows are identified by the primary key column, though an application can search on the contents of any column. It is common for a row in one table to hold the value of the primary key of another table to enable cross-referencing. These values are called foreign keys. For example, a row in the 'orders' table may hold the primary key of a row in the 'customers' table, enabling a quick lookup of the customer data for any given order.
Rapid access to rows in tables is by means of indexes that are defined by the developer and maintained by the RDBMS. Typically, these implement some version of B-Trees that allow a binary search of the index data to locate the desired rows.
Every RDBMS implements transactions. These are famously described as ACID, meaning atomic, consistent, isolated, and durable. The atomic guarantee is that a transaction occurs completely or not at all. Every transaction changes the database from one consistent state to another, and any change that would violate the data integrity rules is rejected. Transactions appear to occur in isolation because no application will ever see an intermediate state of a transaction. The changes made by transactions are durable, surviving power failure and restart.
An RDBMS offers fast, indexed access to data by content; controls data access so that data can be shared safely by other applications; and automatically maintains the database in a consistent state, even after unexpected shutdown. This is a lot of value from a small package, and it's easy to embed an RDBMS into a device application (see figure). The first step is to connect to the database. In this step, the RDBMS validates that the application has permission to connect, and the application can set parameters such as the amount of memory to allocate for buffers. Typically, an application will connect on startup and remain connected until it shuts down.
The next step is to allocate a statement handle. This defines a data structure that holds a SQL statement. Each handle manages exactly one SQL statement, but handles can be reused with different SQL statements once a query is completed. A program can allocate many statement handles to have them ready with predefined queries in order to increase application speed.
A SQL statement is "prepared" by copying it into the statement handle data structure, using the function supplied by the RDBMS vendor's library. For maximum flexibility, SQL statements usually are parameterized, making it easy to rerun the SQL with different parameter values. For example, an SQL query may look up an album title given an artist's name. The name can be defined as a parameter, so the same SQL statement can be re-used to look up the phone number for John Coltrane and for Moby. The SQL statement would look something like this:
SELECT album_title FROM album_ list WHERE artist_name = ?
Setting parameters is a matter of calling the appropriate API to plug the parameters into the statement.
Now, the SQL statement is ready to execute. Executing an SQL statement returns a result set made up of zero to many rows. The API library provides functions for stepping through the rows one at a time using a construct called a cursor. Each time a row is fetched into the application, the cursor automatically increments to point to the next row. An error return indicates that there are no more rows to process.
With each fetched row, column variables can be copied into program variables, or pointers can be set to point to the row variables in the database buffer. APIs are provided to perform these operations. The programmer must take care that the program variable data types are suitable to receive the column data. Usually, the library will perform the obvious conversions: integer to float, variable length character string to fixed length character string, etc. At this point, data has been returned from the database to the program, and the program is free to use it, stepping the cursor through each of the rows returned in the result set.
A program does not have to process all the returned rows; it can close the cursor at any time, abandoning any unprocessed rows. Or, it may process to the end of the result set. Once the cursor is closed, the SQL statement can be re-used with different parameters. Alternatively, the statement handle can be re-used to support an entirely different query by going back to the prepare statement step. Good practice is to disconnect from the database on application shutdown.
The great benefit of an RDBMS is that it provides this simple mechanism for searching data by content and handling the multiple matching results. Compared with setting up a program to sweep through files stored in the file system or scan the columns of a table to search for matching content, database access is much easier to program and much faster at execution time. With the RDBMS taking care of index maintenance, data integrity, data sharing, and recovery, the application programmer's job is simplified greatly. This means less code to write, fewer bugs to fix, reduced development costs, a more responsive user interface, and faster time to market. Additionally, it frees device developers from the more mundane tasks of data management and search, allowing them to focus instead on features to differentiate the device by providing innovative ways to search for local content.