Embedded Databases: Data Management for Real- Time and Embedded Systems
- Monday, 23 April 2007
The term embedded database was coined in the 1980s to mean a database management system (DBMS) that is embedded into an application, in contrast to large central databases (nowadays, usually client/server DMBSs a la Oracle). The first embedded databases had little or nothing to do with embedded systems, which were largely 8-bit, or possibly 16-bit, devices that performed a very specific function. Any data processing requirements were promoted to a higher layer in the system architecture. Embedded systems, like all other facets of computing, have matured and gained faster (32-bit) processors, memory, and more complexity. This has further confused conversations about embedded systems and embedded databases. Today, the term embedded database encompasses databases embedded into software applications, as well as the more modern client/server database design (although embedded client/several varieties are much smaller than their enterprise-level DBMS cousins such as Oracle or DB2). In fact, while embedded databases comprise a sizeable chunk of the overall database market, they show remarkable diversity in important respects such as programming interfaces, storage modes, and system architecture. This article examines some of these differences to help in choosing the right embedded database system for a given project.
SQL vs. Navigational APIs
A database system’s application programming interface (API) will enormously affect both the development process and run-time performance. The SQL interface, used universally in enterprise databases, is offered in some embedded databases. Many computer specialists already know SQL, and this familiarity may shorten the development learning curve. In addition, with its ability to express complex queries succinctly, SQL can sometimes do more in fewer lines of code, supporting a claim for greater efficiency. On the negative side, the higher abstraction level that drives the language’s popularity for corporate data management also makes SQL a “black box,” giving developers scant control over application predictability, performance, and memory consumption. In particular, a database system’s SQL optimizer — the software responsible for deciding a SQL statement’s execution plan — can affect efficiency. In general, cost-based optimizers, which evaluate many possible execution plans for a SQL statement, offer less control and are likelier to bog down. A potentially better choice for real-time systems with deterministic performance requirements is rule-based optimizers, which produce the optimal execution plan by examining a query using a limited set of pre-defined rules. Offered as an alternative to SQL in many embedded databases, navigational APIs are closely integrated with thirdgeneration programming languages like C and C++. In contrast to the high-level SQL, navigational APIs work on one record at a time, navigating through the database programmatically. Major advantages cited for navigational APIs are determinism — the way data will traverse within the system post-compilation — and performance, by virtue of bypassing the parsing, optimization, and execution of dynamic SQL. Using purely SQL statements, it’s difficult to achieve visibility at the program level within the DBMS. This increases the chance that some “tweak” to the SQL statement, or even a DBMS vendor’s change to the optimizer, will cause the optimizer to choose an inferior execution plan. In contrast, with a navigational API, the programmer, by definition, writes the execution plan and can see how to compensate for a change in execution (by adding an index, for example).
Disk-Based vs. In-Memory
Most database systems — enterprise as well as embedded — are on-disk databases, meaning they are designed to cache frequently requested data in memory for faster access, but to write database updates, insertions, and deletes through the cache to be stored to disk. A newer approach is the in-memory database system (IMDS), which eliminates disk access and stores data in main memory, flushing to disk only when commanded to do so by the application. The advantage of on-disk databases is persistence: storing data on disk provides greater permanence. An in-memory data store can disappear if its hardware or software environment is disrupted, such as by unplugging the power cord. In-memory databases can compensate for this risk with features such as non-volatile RAM (NVRAM), high-availability schemes that keep redundant copies of the database with automatic failover, and transaction logging, in which database changes are captured and written to a transaction log file to enable recovery from system failure. In-memory databases’ speed advantage makes them increasingly popular for real-time applications and embedded systems. IMDSs greatly enhance performance by eliminating mechanical disk I/O, multiple data copies, and certain logical processes (especially those related to caching) that are rendered unnecessary by IMDSs’ all-in-memory design. Another key IMDS advantage is footprint. The absence of caching functions and other unnecessary logic means that code size, CPU, and storage demands are low. This can be critical when the software is to be embedded in a cell phone, MP3 player, or other consumer electronics device, in which system resources are strictly limited because of cost.
“Truly Embedded” vs. Client/ Server
Some embedded databases are based on the client/server design used in most enterprise databases, in which client applications send requests to a database server. Even when residing on the same computer, these client and server processes are separated by an interprocess communication layer. But a subset of embedded databases — which might be called “more embedded” or “truly embedded” — consists not of client and server modules, but of code libraries (functions in the Java, C, and C++ languages) that are actually embedded in, and compiled with, an embedded software application. McObject's DBMS products follow this approach. This architecture is considered simpler, which means it has a smaller code size and is less prone to defects, because less complexity means less risk of things going wrong. Removing inter-process communication eliminates a performance barrier, and data storage and retrieval is further accelerated by eliminating server tasks such as the need to manage sessions and connections and to allocate and de-allocate resources on behalf of clients. Advantages of client/server include the ability to right-size a network by installing the server software on a powerful computer using thin clients. In addition, client/server architecture lends itself naturally to supporting larger numbers of users. A server typically will start more threads as the work load increases, so if it’s running on a multi- CPU box under symmetric multi-processing (SMP), it can scale. In contrast, a database library offers no independent processing and relies on its host application to start additional threads.
Embedded database systems’ ability to run with little or no end-user administration is a major selling point, especially to users familiar with the armies of database administrators (DBAs) and other specialists needed to manage enterprise databases Embedded database systems’ lack of bells and whistles compared to their enterprise-level counterparts, eliminates many of the management headaches and introduces robustness into embedded applications. Simplicity is also why in-process databases are inherently more “zero-administration” than enterpriselevel or embedded client/server DBMS. The latter requires logging into a server, which implies there is an administrator managing the concept of users and their passwords and privileges. Similarly, ultrasimple in-memory database system architecture requires even less administration because there is no file/table space, in the form of disk storage and other network devices, to create and manage. An in-memory database just needs memory. Many embedded databases address administration with programming interfaces so that administrative needs can be met within application code. This administrative API includes functions for handling backup, managing cache size, and setting the number of threads. This approach to minimizing administration might be called encapsulated administration or automated administration.” Regardless of the flavor, the embedded database market is growing, both in the business/consumer software and in the embedded systems spaces. Some of the hottest new areas for embedded databases are consumer electronics, network application acceleration devices, and automotive technology. Robust innovation in these sectors presents new needs for embedded data management, and is sure to sustain embedded database systems’ growth in the coming decade.