Google

Challenges in Embedded Database System Administration

Margo Seltzer, Harvard University

Michael Olson, Sleepycat Software, Inc.

{margo,mao}@sleepycat.com

Database configuration and maintenance have historically been complex tasks, often requiring expert knowledge of database design and application behavior. In an embedded environment, it is not feasible to require such expertise and ongoing database maintenance. This paper discusses the database administration challenges posed by embedded systems and describes how the Berkeley DB architecture addresses these challenges.

1. Introduction

Embedded systems provide a combination of opportunities and challenges in application and system configuration and management. As an embedded system is most often dedicated to a single application or small set of tasks, the operating conditions of the system are typically better understood than those of general purpose computing environments. Similarly, as embedded systems are dedicated to a small set of tasks, one would expect that the software to manage them should be small and simple. On the other hand, once an embedded system is deployed, it must continue to function without interruption and without administrator intervention.

Database administration consists of two components, initial configuration and ongoing maintenance. Initial configuration consists of database design, manifestation, and tuning. The instantiation of the design includes decomposing the design into tables, relations, or objects and designating proper indices and their implementations (e.g., Btrees, hash tables, etc.). Tuning a design requires selecting a location for the log and data files, selecting appropriate database page sizes, specifying the size of in-memory caches, and specifying the limits of multi-threading and concurrency. As embedded systems define a specific environment and set of tasks, requiring expertise during the initial system configuration process is acceptable, and we focus our efforts on the ongoing maintenance of the system. In this way, our emphasis differs from other projects such as Microsoft's AutoAdmin project [3], and the "no-knobs" administration that is identified as an area of important future research by the Asilomar authors[1].

In this paper, we focus on what the authors of the Asilomar report call "gizmo" databases [1], databases that reside in devices such as smart cards, toasters, or telephones. The key characteristics of such databases are that their functionality is completely transparent to users, no one ever performs explicit database operations or database maintenance, the database may crash at any time and must recover instantly, the device may undergo a hard reset at any time, requiring that the database return to its initial state, and the semantic integrity of the database must be maintained at all times. In Section 2, we provide more detail on the sorts of tasks typically performed by database administrators (DBAs) that must be automated in an embedded system.

The rest of this paper is structured as follows. In Section 2, we outline the requirements for embedded database support. In Section 3, we discuss how Berkeley DB is conducive to the hands-off management required in embedded systems. In Section 4, we discuss novel features that enhance Berkeley DB's suitability for the embedded applications. In Section 5, we discuss issues of footprint size. In Section 6 we discuss related work, and we conclude in Section 7.

2. Embedded Database Requirements

Historically, much of the commercial database industry has been driven by the requirements of high performance online transaction processing (OLTP), complex query processing, and the industry standard benchmarks that have emerged (e.g., TPC-C [9], TPC-D [10]) to allow for system comparisons. As embedded systems typically perform fairly simple queries, such metrics are not nearly as relevant for embedded database systems as are ease of maintenance, robustness, and small footprint. Of these three requirements, robustness and ease of maintenance are the key issues. Users must trust the data stored in their devices and must not need to manually perform anything resembling system administration in order to get their unit to work properly. Fortunately, ease of use and robustness are important side effects of simplicity and good design. These, in turn, lead to a small size, providing the third requirement of an embedded system.

2.1 The User Perspective

In the embedded database arena, it is the ongoing maintenance tasks that must be automated, not necessarily the initial system configuration. There are five tasks that are traditionally performed by DBAs, but must be performed automatically in embedded database systems. These tasks are log archival and reclamation, backup, data compaction/reorganization, automatic and rapid recovery, and reinitialization from scratch.

Log archival and backup are tightly coupled. Database backups are part of any large database installation, and log archival is analogous to incremental backup. It is not clear what the implications of backup and archival are in an embedded system. Consumers do not back up their VCRs or refrigerators, yet they do (or should) back up their personal computers or personal digital assistants. For the remainder of this paper, we assume that backups, in some form, are required for gizmo databases (imagine having to reprogram, manually, the television viewing access pattern learned by some set-top television systems today). Furthermore, we require that those backups are nearly instantaneous or completely transparent, as users should not be aware that their gizmos are being backed up and should not have to explicitly initiate such backups.

Data compaction or reorganization has traditionally required periodic dumping and restoration of database tables and the recreation of indices. In an embedded system, such reorganization must happen automatically.

Recovery issues are similar in embedded and traditional environments with a few exceptions. While a few seconds or even a minute recovery is acceptable for a large server installation, no one is willing to wait for their telephone or television to reboot. As with archival, recovery must be nearly instantaneous in an embedded product. Secondly, it is often the case that a system will be completely reinitialized, rather than simply rebooted. In this case, the embedded database must be restored to its initial state, freeing all its resources. This is not typically a requirement of large server systems.

2.2 The Developer Perspective

In addition to the maintenance-free operation required of the embedded systems, there are a number of requirements that fall out of the constrained resources typically found in the "gizmos" using gizmo databases. These requirements are: small footprint, short code-path, programmatic interface for tight application coupling and to avoid the overhead (in both time and size) of interfaces such as SQL and ODBC, application configurability and flexibility, support for complete memory-resident operation (e.g., these systems must run on gizmos without file systems), and support for multi-threading.

A small footprint and short code-path are self-explanatory, however what is not as obvious is that the programmatic interface requirement is the logical result of them. Traditional interfaces such as ODBC and SQL add significant size overhead and frequently add multiple context/thread switches per operation, not to mention several IPC calls. An embedded product is less likely to require the complex query processing that SQL enables. Instead, in the embedded space, the ability for an application to configure the database for the specific tasks in question is more important than a general query interface.

As some systems do not provide storage other than RAM and ROM, it is essential that an embedded database work seemlessly in memory-only environments. Similarly, many of today's embedded operating systems provide a single address space architecture, so a simple, multi-threaded capability is essential for application requiring any concurrency.

In general, embedded applications run on gizmos whose native operating system support varies tremendously. For example, the embedded OS may or may not support user-level processing or multi-threading. Even if it does, a particular embedded application may or may not need it. Not all applications need more than one thread of control. An embedded database must provide mechanisms to developers without deciding policy. For example, the threading model in an application is a matter of policy, and depends not on the database software, but on the hardware, operating system, and the application's feature set. Therefore, the data manager must provide for the use of multi-threading, but not require it.

3. Berkeley DB: A Database for Embedded Systems

Berkeley DB is the result of implementing database functionality using the UNIX tool-based philosophy. The current Berkeley DB package, as distributed by Sleepycat Software, is a descendant of the hash and btree access methods distributed with 4.4BSD and its descendents. The original package (referred to as DB-1.85), while intended as a public domain replacement for dbm and its followers (e.g., ndbm, gdbm, etc), rapidly became widely used as an efficient, easy-to-use data store. It was incorporated into a number of Open Source packages including Perl, Sendmail, Kerberos, and the GNU C-library.

Versions 2.X and higher are distributed by Sleepycat Software and add functionality for concurrency, logging, transactions, and recovery. Each piece of additional functionality is implemented as an independent module, which means that the subsystems can be used outside the context of Berkeley DB. For example, the locking subsystem can easily be used to implement locking for a non-DB application and the shared memory buffer pool can be used for any application caching data in main memory. This subsystem design allows a designer to pick and choose the functionality necessary for the application, minimizing memory footprint and maximizing performance. This addresses the small footprint and short code-path criteria mentioned in the previous section.

As Berkeley DB grew out of a replacement for dbm, its primary implementation language has always been C and its interface has been programmatic. The C interface is the native interface, unlike many database systems where the programmatic API is simply a layer on top of an already-costly query interface (e.g. embedded SQL). Berkeley DB's heritage is also apparent in its data model; it has none. The database stores unstructured key/data pairs, specified as variable length byte strings. This leaves schema design and representation issues the responsibility of the application, which is ideal for an embedded environment. Applications retain full control over specification of their data types, representation, index values, and index relationships. In other words, Berkeley DB provides a robust, high-performance, keyed storage system, not a particular database management system. We have designed for simplicity and performance, trading off complex, general purpose support that is better encapsulated in applications.

Another element of Berkeley DB's programmatic interface is its customizability; applications can specify Btree comparison and prefix compression functions, hash functions, error routines, and recovery models. This means that embedded applications can tailor the underlying database to best suit their data demands. Similarly, the utilities traditionally bundled with a database manager (e.g., recovery, dump/restore, archive) are implemented as tiny wrapper programs around library routines. This means that it is not necessary to run separate applications for the utilities. Instead, independent threads can act as utility daemons, or regular query threads can perform utility functions. Many of the current products built on Berkeley DB are bundled as a single large server with independent threads that perform functions such as checkpoint, deadlock detection, and performance monitoring.

As mentioned earlier, living in an embedded environment requires flexible management of storage. Berkeley DB does not require any preallocation of disk space for log or data files. While many commercial database systems take complete control of a raw device, Berkeley DB uses a normal file system, and can therefore, safely and easily share a data space with other programs. All databases and log files are native files of the host environment, so whatever utilities are provided by the environment can be used to manage database files as well.

Berkeley DB provides three different memory models for its management of shared information. Applications can use the IEEE Std 1003.1b-1993 (POSIX) mmap interface to share data, they can use system shared memory, as frequently provided by the shmget family of interfaces, or they can use per-process heap memory (e.g., malloc). Applications that require no permanent storage and do not provide shared memory facilities can still use Berkeley DB by requesting strictly private memory and specifying that all databases be memory-resident. This provides pure-memory operation.

Lastly, Berkeley DB is designed for rapid startup -- recovery can happen automatically as part of system initialization. This means that Berkeley DB works correctly in environments where gizmos are suddenly shut down and restarted.

4. Extensions for Embedded Environments

While the Berkeley DB library has been designed for use in embedded systems, all the features described above are useful in more conventional systems as well. In this section, we discuss a number of features and "automatic knobs" that are specifically geared toward the more constrained environments found in gizmo databases.

4.1 Automatic compression

Following the programmatic interface design philosophy, we support application-specific (or default) compression routines. These can be geared toward the particular data types present in the application's dataset, thus providing better compression than a general purpose routine. Note that the application could instead specify an encryption function and create encrypted databases instead of compressed ones. Alternately, the application might specify a function that performs both compression and encryption.

As applications are also permitted to specify comparison and hash functions, the application can chose to organize its data based either on uncompressed and clear-text data or compressed and encrypted data. If the application indicates that data should be compared in its processed form (i.e., compressed and encrypted), then the compression and encryption are performed on individual data items and the in-memory representation retains these characteristics. However, if the application indicates that data should be compared in its original form, then entire pages are transformed upon being read into or written out of the main memory buffer cache. These two alternatives provide the flexibility to trade space and security for performance.

4.2 In-memory logging & transactions

One of the four key properties of transaction systems is durability. This means that transaction systems are designed for permanent storage (most commonly disk). However, as mentioned above, embedded systems do not necessarily contain any such storage. Nevertheless, transactions can be useful in this environment to preserve the semantic integrity of the underlying storage. Berkeley DB optionally provides logging functionality and transaction support regardless of whether the database and logs are on disk or in memory.

4.3 Remote Logs

While we do not expect users to backup their television sets and toasters, it is conceivable that a set-top box provided by a cable carrier should, in fact, be backed up by that cable carrier. The ability to store logs remotely can provide "information appliance" functionality, and can also be used in conjunction with local logs to enhance reliability. Furthermore, remote logs provide for catastrophic recovery, e.g., loss of the gizmo, destruction of the gizmo, etc.

4.4 Application References to Database Buffers

Typically, when data is returned to the user, it must be copied from the data manager's buffer cache (or data page) into the application's memory. However, in an embedded environment, the robustness of the total software package is of paramount importance, not the isolation between the application and the data manager. As a result, it is possible for the data manager to avoid copies by giving applications direct references to data items in a shared memory cache. This is a significant performance optimization that can be allowed when the application and data manager are tightly integrated.

4.5 Recoverable database creation/deletion

In a conventional database management system, the creation of database tables (relations) and indices are heavyweight operations that are not recoverable. This is not acceptable in a complex embedded environment where instantaneous recovery and robust operation in the face of all types of database operations is essential. While Berkeley DB files can be removed using normal file system utilities, we provide transaction protected utilities that allow us to recover both database creation and deletion.

4.6 Adaptive concurrency control

The Berkeley DB package uses page-level locking by default. This trades off fine grain concurrency control for simplicity during recovery. (Finer grain concurrency control can be obtained by reducing the page size in the database.) However, when multiple threads/processes perform page-locking in the presence of writing operations, there is the potential for deadlock. As some environments do not need or desire the overhead of logging and transactions, it is important to provide the ability for concurrent access without the potential for deadlock.

Berkeley DB provides an option to perform coarser grain, deadlock-free locking. Rather than locking on pages, locking is performed at the interface to the database. Multiple readers or a single writer are allowed to be active in the database at any instant in time, with conflicting requests queued automatically. The presence of cursors, through which applications can both read and write data, complicates this design. If a cursor is currently being used for reading, but will later be used to write, the system will be deadlock prone if no special precautions are taken. To handle this situation, we require that, when a cursor is created, the application specify any future intention to write. If there is an intention to write, the cursor is granted an intention-to-write lock which does not conflict with readers, but does conflict with other intention-to-write locks and write locks. The end result is that the application is limited to a single potentially writing cursor accessing the database at any point in time.

Under periods of low contention (but potentially high throughput), the normal page-level locking provides the best overall throughput. However, as contention rises, so does the potential for deadlock. As some cross-over point, switching to the less concurrent, but deadlock-free locking protocol will result in higher throughput as operations must never be retried. Given the operating conditions of an embedded database manager, it is useful to make this change automatically as the system itself detects high contention.

4.7 Adaptive synchronization

In addition to the logical locks that protect the integrity of the database pages, Berkeley DB must synchronize access to shared memory data structures, such as the lock table, in-memory buffer pool, and in-memory log buffer. Each independent module uses a single mutex to protect its shared data structures, under the assumption that operations that require the mutex are very short and the potential for conflict is low. Unfortunately, in highly concurrent environments with multiple processors present, this assumption is not always true. When this assumption becomes invalid (that is, we observe significant contention for the subsystem mutexes), we can switch over to a finer-grained concurrency model for the mutexes. Once again, there is a performance trade-off. Fine-grain mutexes impose a penalty of approximately 25% (due to the increased number of mutexes required for each operation), but allow for higher throughput. Using fine-grain mutexes under low contention would cause a decrease in performance, so it is important to monitor the system carefully, so that the change can be executed only when it will increase system throughput without jeopardizing latency.

5. Footprint of an Embedded System

While traditional systems compete on price-performance, the embedded players will compete on price, features, and footprint. The earlier sections have focused on features; in this section we focus on footprint.

Oracle reports that Oracle Lite 3.0 requires 350 KB to 750 KB of memory and approximately 2.5 MB of hard disk space [7]. This includes drivers for interfaces such as ODBC and JDBC. In contrast, Berkeley DB ranges in size from 75 KB to under 200 KB, foregoing heavyweight interfaces such as ODBC and JDBC and providing a variety of deployed sizes that can be used depending on application needs. At the low end, applications requiring a simple single-user access method can choose from either extended linear hashing, B+ trees, or record-number based retrieval and pay only the 75 KB space requirement. Applications requiring all three access methods will observe the 110 KB footprint. At the high end, a fully recoverable, high-performance system occupies less than a quarter megabyte of memory. This is a system you can easily incorporate in your toaster oven. Table 1 shows the per-module break down of the entire Berkeley DB library. Note that this does not include memory used to cache database pages.
Object sizes in bytes
SubsystemTextDataBss
Btree-specific routines2881200
Recno-specific routines721100
Hash-specific routines2374200
Memory Pool1453500
Access method common code2325200
OS compatibility library4980520
Support utilities616500
All modules for Btree access method only77744520
All modules for Recno access method only84955520
All modules for Hash access method only72674520
All Access Methods108697520

Locking1253300
Recovery2694884
Logging3736700
Full Package185545604

6. Related Work

Every three to five years, leading researchers in the database community convene to identify future directions in database research. They produce a report of this meeting, named for the year and location of the meeting. The most recent of these reports, the 1998 Asilomar report, identifies the embedded database market as one of the high growth areas in database research [1]. Not surprisingly, market analysts identify the embedded database market as a high-growth area in the commercial sector as well [5].

The Asilomar report identifies a new class of database applications, which they term "gizmo" databases, small databases embedded in tiny mobile appliances, e.g., smart-cards, telephones, personal digital assistants. Such databases must be self-managing, secure and reliable. Thus, the idea is that gizmo databases require plug and play data management with no database administrator (DBA), no human settable parameters, and the ability to adapt to changing conditions. More specifically, the Asilomar authors claim that the goal is self-tuning, including defining the physical DB design, the logical DB design, and automatic reports and utilities [1] To date, few researchers have accepted this challenge, and there is a dearth of research literature on the subject.

Our approach to embedded database administration is fundamentally different than that described by the Asilomar authors. We adopt their terminology, but view the challenge in supporting gizmo databases to be that of self-sustenance after initial deployment. Therefore, we find it, not only acceptable, but desirable to assume that application developers control initial database design and configuration. To the best of our knowledge, none of the published work in this area addresses this approach.

As the research community has not provided guidance in this arena, most work in embedded database administration has fallen to the commercial vendors. These vendors fall into two camps, companies selling databases specifically designed for embedding or programmatic access and the major database vendors (e.g., Oracle, Informix, Sybase).

The embedded vendors all acknowledge the need for automatic administration, but fail to identify precisely how their products actually accomplish this. A notable exception is Interbase whose white paper comparison with Sybase and Microsoft's SQL servers explicitly address features of maintenance ease. Interbase claims that as they use no log files, there is no need for log reclamation, checkpoint tuning, or other tasks associated with log management. However, Interbase uses Transaction Information Pages, and it is unclear how these are reused or reclaimed [6]. Additionally, with a log-free system, they must use a FORCE policy (write all pages to disk at commit), as defined by Haerder and Reuter [4]. This has serious performance consequences for disk-based systems. The approach described in this paper does use logs and therefore requires log reclamation, but provides hooks so the application may reclaim logs safely and programmatically. While Berkeley DB does require checkpoints, the goal of tuning the checkpoint interval is to bound recovery time. Since the checkpoint interval in Berkeley DB can be expressed by the amount of log data written, it requires no tuning. The application designer sets a target recovery time, and selects the amount of log data that can be read in that interval and specifies the checkpoint interval appropriately. Even as load changes, the time to recover does not.

The backup approaches taken by Interbase and Berkeley DB are similar in that they both allow online backup, but rather different in their affect on transactions running during backup. As Interbase performs backups as transactions [6], concurrent queries can suffer potentially long delays. Berkeley DB uses native operating system system utilities and recovery for backups, so there is no interference with concurrent activity, other than potential contention on disk arms.

There are a number of database vendors selling in the embedded market (e.g., Raima, Centura, Pervasive, Faircom), but none highlight the special requirements of embedded database applications. On the other end of the spectrum, the major vendors, Oracle, Sybase, Microsoft, are all becoming convinced of the importance of the embedded market. As mentioned earlier, Oracle has announced its Oracle Lite server for embedded use. Sybase has announced its UltraLite platform for "application-optimized, high-performance, SQL database engine for professional application developers building solutions for mobile and embedded platforms." [8]. We believe that SQL is incompatible with the gizmo database environment or truly embedded systems for which Berkeley DB is most suitable. Microsoft research is taking a different approach, developing technology to assist in automating initial database design and index specification [2][3]. As mentioned earlier, we believe that such configuration is, not only acceptable in the embedded market, but desirable so that applications can tune their database management for the target environment.

7. Conclusions

The coming wave of embedded systems poses a new set of challenges for data management. The traditional server-based, big footprint systems designed for high performance on big iron are not the right approach in this environment. Instead, application developers need small, fast, versatile systems that can be tailored to a specific environment. In this paper, we have identified several of the key issues in providing these systems and shown how Berkeley DB provides many of the characteristics necessary for such applications.

8. References

[1] Bernstein, P., Brodie, M., Ceri, S., DeWitt, D., Franklin, M., Garcia-Molina, H., Gray, J., Held, J., Hellerstein, J., Jagadish, H., Lesk, M., Maier, D., Naughton, J., Pirahesh, H., Stonebraker, M., Ullman, J., "The Asilomar Report on Database Research," SIGMOD Record 27(4): 74-80, 1998.

[2] Chaudhuri, S., Narasayya, V., "AutoAdmin 'What-If' Index Analysis Utility," Proceedings of the ACM SIGMOD Conference, Seattle, 1998.

[3] Chaudhuri, S., Narasayya, V., "An Efficient, Cost-Driver Index Selection Tool for Microsoft SQL Server," Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997.

[4] Haerder, T., Reuter, A., "Principles of Transaction-Oriented Database Recovery," Computing Surveys 15,4 (1983), 237-318.

[5] Hostetler, M., "Cover Is Off A New Type of Database," Embedded DB News, http://www.theadvisors.com/embeddeddbnews.htm, 5/6/98.

[6] Interbase, "A Comparison of Borland InterBase 4.0 Sybase SQL Server and Microsoft SQL Server," http://web.interbase.com/products/doc_info_f.html.

[7] Oracle, "Oracle Delivers New Server, Application Suite to Power the Web for Mission-Critical Business," http://www.oracle.com.sg/partners/news/newserver.htm, May 1998.

[8] Sybase, Sybase UltraLite, http://www.sybase.com/products/ultralite/beta.

[9] Transaction Processing Council, "TPC-C Benchmark Specification, Version 3.4," San Jose, CA, August 1998.

[10] Transaction Processing Council, "TPC-D Benchmark Specification, Version 2.1," San Jose, CA, April 1999.