Fundamental Specifications of Kyoto Cabinet Version 1

Copyright (C) 2009-2010 Mikio Hirabayashi
Last Update: Thu, 04 Mar 2010 17:27:26 +0900

Table of Contents

  1. Introduction
  2. Installation
  3. Tutorial
  4. Command Line Utilities
  5. License

Introduction

Kyoto Cabinet is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. Each key must be unique within a database. There is neither concept of data tables nor data types. Records are organized in hash table or B+ tree.

The following access methods are provided to the database: storing a record with a key and a value, deleting a record by a key, retrieving a record by a key. Moreover, traversal access to every key are provided. These access methods are similar to ones of DBM (or its followers: NDBM and GDBM) library defined in the UNIX standard. Kyoto Cabinet is an alternative for DBM because of its higher performance.

Each operation of hash database has the time complexity of O(1). So, in theory, the performance is consant regardless of the scale of the database. In practice, the performance is determined by the speed of the main memory or the storage device. If the size of the database is less than the capacity of the main memory, the performance will seem on-memory speed which is fastar than std::map of STL. Of cource, the database size can be greater than the capacity of the main memory and the upper limit is 8 exabytes. Even in that case, each operation needs only one or two seeking of the storage device.

Each operation of B+ tree has the time complexity of O(log N). So, in theory, the performance is logarithmic about the scale of the database. Although the performance of random access of B+ tree is slower than that of hash database, B+ tree supports sequential access in order of the keys, which realizes forward matching search for strings and range search for integers. The performance of sequential access is much faster than that of random access.

As the API is based on object-oriented design, hash database and B+ tree database have same methods which inherited from the upper abstract class. Prototype database by containers of STL and cache database with LRU deletion algorithm are also provided under the same base class. All databases have practical utility methods related to transaction and cursor. Programs for command line interface are also included in the package.

Kyoto Cabinet is written in the C++ language, and provided as API of C++. Kyoto Cabinet is available on platforms which have API conforming to C++03 with the TR1 library extensions. Kyoto Cabinet is a free software licensed under the GNU General Public License.


Installation

This section describes how to install Kyoto Cabinet with the source package. As for a binary package, see its installation manual.

Preparation

Kyoto Cabinet is available on UNIX-like systems. At least, the following environments are supported.

gcc 4.1 or later and make are required to install Kyoto Cabinet with the source package. They are installed by default on Linux, FreeBSD and so on.

As Kyoto Cabinet depends on the following libraries, install them beforehand.

Installation

When an archive file of Kyoto Cabinet is extracted, change the current working directory to the generated directory and perform installation.

Run the configuration script.

./configure

Build programs.

make

Perform self-diagnostic test.

make check

Install programs. This operation must be carried out by the root user.

make install

Result

When a series of work finishes, the following files will be installed.

/usr/local/include/kccommon.h
/usr/local/include/kcutil.h
/usr/local/include/kcdb.h
/usr/local/include/kcthread.h
/usr/local/include/kcfile.h
/usr/local/include/kccompress.h
/usr/local/include/kccompare.h
/usr/local/include/kcmap.h
/usr/local/include/kcprotodb.h
/usr/local/include/kccachedb.h
/usr/local/include/kchashdb.h
/usr/local/include/kctreedb.h
/usr/local/include/kcpolydb.h
/usr/local/lib/libkyotocabinet.a
/usr/local/lib/libkyotocabinet.so.x.y.z
/usr/local/lib/libkyotocabinet.so.x
/usr/local/lib/libkyotocabinet.so
/usr/local/lib/pkgconfig/kyotocabinet.pc
/usr/local/bin/kcutiltest
/usr/local/bin/kcutilcodec
/usr/local/bin/kcprototest
/usr/local/bin/kccachetest
/usr/local/bin/kchashtest
/usr/local/bin/kchashmgr
/usr/local/bin/kctreetest
/usr/local/bin/kctreemgr
/usr/local/bin/kcpolytest
/usr/local/bin/kcpolymgr
/usr/local/share/kyotocabinet/...
/usr/local/man/man1/...

Options of Configure

The following options can be specified with `./configure'.

`--prefix' and other options are also available as with usual UNIX software packages. If you want to install Kyoto Cabinet under `/usr' not `/usr/local', specify `--prefix=/usr'. As well, the library search path does not include `/usr/local/lib', it is necessary to set the environment variable `LD_LIBRARY_PATH' to include `/usr/local/lib' before running applications of Kyoto Cabinet.

How to Use the Library

Kyoto Cabinet provides API of the C++ language and it is available by programs conforming to the C++03 standard. As the header files of Kyoto Cabinet are provided as `kcutil.h', `kchashdb.h', and so on, applications should include one or more of them accordingly to use the API. As the library is provided as `libkyotocabinet.a' and `libkyotocabinet.so' and they depends `libz.so', `libstdc++.so', `librt.so', `libpthread.so', `libm.so', and `libc.so', linker options corresponding to them are required by the build command. The typical build command is the following.

g++ -I/usr/local/include example.cc -o example \
  -L/usr/local/lib -lkyotocabinet -lz -lstdc++ -lrt -lpthread -lm -lc

Tutorial

This section describes how to use Kyoto Cabinet with the command line utilities and some sample application programs.

Command Line Utility for the File Hash Database

To begin with, let's build a file hash database with the command line utility `kchashmgr'. The database stores a list of staffs of a company. Each record is composed of the key and the value. The key is the staff ID and the value is person's name.

The database must be created just one time before any database operation. Let's create a database file "casket.kch" with the default configuration.

$ kchashmgr create staffs.kch

Register some staffs into the database.

$ kchashmgr set staffs.kch 1001 "George Washington"
$ kchashmgr set staffs.kch 1002 "John Adams"
$ kchashmgr set staffs.kch 1003 "Thomas Jefferson"
$ kchashmgr set staffs.kch 1004 "James Madison"

Check the current contents.

$ kchashmgr list -pv staffs.kch
1001    George Washington
1002    John Adams
1003    Thomas Jefferson
1004    James Madison

To retrieve the value of a record, search the database with the key.

$ kchashmgr get staffs.kch 1003
Thomas Jefferson

Of cource, you can remove a record with the key.

$ kchashmgr remove staffs.kch 1003

That's all for the fundamental operations. The DBM family have been improving performance thanks to discarding the functionality.

Sample Application of the File Hash Database

Next, let's write a sample application program handling a file hash database. See the following source code.

#include <kchashdb.h>

using namespace std;
using namespace kyotocabinet;

int main(int argc, char** argv) {

  // create the database object
  HashDB db;

  // open the database
  if (!db.open("casket.kch", HashDB::OWRITER | HashDB::OCREATE)) {
    cout << "open error: " << db.error().string() << endl;
  }

  // store records
  if (!db.set("foo", "hop") ||
     !db.set("bar", "step") ||
     !db.set("baz", "jump")) {
    cout << "set error: " << db.error().string() << endl;
  }

  // retrieve records
  string* value = db.get("foo");
  if (value) {
    cout << *value << endl;
    delete value;
  } else {
    cout << "get error: " << db.error().string() << endl;
  }

  // traverse records
  class Traverser : public DB::Visitor {
    const char* visit_full(const char* kbuf, size_t ksiz,
                           const char* vbuf, size_t vsiz, size_t *sp) {
      cout << string(kbuf, ksiz) << ":" << string(vbuf, vsiz) << endl;
      return NOP;
    }
  } traverser;
  db.iterate(&traverser, false);

  // close the database
  if (!db.close()) {
    cout << "close error: " << db.error().string() << endl;
  }

  return 0;
}

Save the above code as a file "example.cc". Then, perform the following command line. The command `kcutilcodec conf' prints the building configuration.

$ g++ `kcutilcodec conf -i` -o example example.cc `kcutilcodec conf -l`

Execute the application program built by the above.

$ ./example
hop
foo:hop
bar:step
baz:jump

The API of the file hash database is defined in the header `kchash.h'. So, include the header near the front of a source file. All symbols of Kyoto Cabinet are packaged in the name space `kyotocabinet'. So, import the name space is useful.

#include <kchashdb.h>
using namespace kyotocabinet;

The class `HashDB' contains all functionality of the file hash database and each instance expresses a file hash database file.

HashDB db;

Each database file must be opened by the `open' method before any database operation. The flag `HashDB::OWRITER' means the process will update the database. The flag `HashDB::OCREAT' means a database file will be created if it does not exist yet.

db.open("casket.kch", HashDB::OWRITER | HashDB::OCREATE);

Every opened database must be closed by the `close' method when it is no longer in use. Closing the database is very important to avoid data corruption and memory leak.

db.close();

To store a record, use the `set' method with the key and the value.

db.put("foo", "hop");

To retrieve the value of a record, use the `get' method with the key. The return value is NULL if no record corresponds to the key. On success, the return value is the pointer to a dynamic object and it should be deleted explicitly after the use.

string* value = db.get("foo", "hop");
if (value) {
  cout << *value << endl;
  delete value;
}

Except for `set' and `get', there are other methods; `add', `append', `remove'. Each method has two versions; for `std::string' parameters and for `char*' and `size_t' parameters.

Traversing records is a bit complicated task. It needs an object for call back operations. The class must be implement the interface `DB::Visitor'. The method `visit_full' is called for each record. The parameters specify the pointer to key buffer, the length of the key buffer, the pointer to the value buffer, the length of the value buffer, and the pointer to the variable to notice the length of the return value. The return value is `NOP' for no update, `REMOVE' to remove the record, or otherwise the pointer to the buffer contains arbitrary byte pattern to update the value. The `iterate' method begins the iteration. The second parameter is true for updating operation or false for non-updating operation.

class Traverser : public DB::Visitor {
  const char* visit_full(const char* kbuf, size_t ksiz,
                         const char* vbuf, size_t vsiz, size_t *sp) {
    cout << string(kbuf, ksiz) << ":" << string(vbuf, vsiz) << endl;
    return NOP;
  }
} traverser;
db.iterate(&traverser, false);

Though the `accept' method did not appear in the above sample, it is very useful. It calls an arbitrary function with the same signature of the above iterator for a specified record. As it is, almost all built-in operations as `set' and `get' are implemented with the `accept' method. The following code is to store a record.

class Setter : public DB::Visitor {
public:
  Setter(const char* vbuf, size_t vsiz) : vbuf_(vbuf), vsiz_(vsiz) {}
private:
  const char* visit_full(const char* kbuf, size_t ksiz,
                         const char* vbuf, size_t vsiz, size_t* sp) {
    *sp = vsiz_;
    return vbuf_;
  }
  const char* visit_empty(const char* kbuf, size_t ksiz, size_t* sp) {
    *sp = vsiz_;
    return vbuf_;
  }
  const char* vbuf_;
  size_t vsiz_;
};
Setter setter("foo", 3);
accept(kbuf, ksiz, &setter, true);

Not only internal iterator by the `iterate' method but also external iterator by the `Cursor' class is provided. While the internal iterator scans the whole database atomically, the external iterator visits each record by each method call. Cursor also has the `accept' method and you can perform arbitrary operation for each record.

DB::Cursor* cur = db.cursor();
cur->jump();
string* key;
while ((key = cur->get_key()) != NULL) {
  cout << *key << endl;
  delete key;
  cur->step();
}
delete cur;

You can use the following classes in the same way.

class file description
ProtoHashDB kcprotodb.h
prototype hash database
on-memory database implemented with std::unorderd_map of STL
ProtoTreeDB kcprotodb.h
prototype tree database
on-memory database implemented with std::map of STL
CacheDB kccachedb.h
cache database
on-memory database featuring LRU deletion
HashDB kchashdb.h
file hash database
file database of hash table; typical DBM
TreeDB kctreedb.h
file tree database
file database of B+ tree; DBM with order
PolyDB kcpolydb.h
polymorphic database
dynamic binding of the above databases

Each database has different features in persistence, algorithm, time complexity, record sequence, and lock model for concurrency.

ProtoHashDB ProtoTreeDB CacheDB HashDB TreeDB
persistence volatile volatile volatile persistent persistent
algorithm hash table red black tree hash table hash table B+ tree
complexity O(1) O(log N) O(1) O(1) O(log N)
sequence undefined lexical order undefined undefined custom order
lock unit whole (rwlock) whole (rwlock) record (mutex) record (rwlock) page (rwlock)

Please see the the API documents for detail. Writing your own sample application is the best way to learn this library.


Command Line Utilities

This section describes how to use command line utilities. They are useful to manage database contents and to test the library and its applications.

kcutiltest

The command `kcutiltest' is a utility for facility test and performance test of the utility functions. This command is used in the following format. `rnum' specifies the number of iterations. `path' specifies the path of a file.

kcutiltest mutex [-th num] [-iv num] rnum
Performs test of lock primitives.
kcutiltest file [-th num] [-rnd] [-msiz num] path rnum
Performs test of the filesystem abstraction.
kcutiltest map [-rnd] rnum
Performs test of data mapping structures.
kcutiltest misc rnum
Performs test of miscellaneous mechanisms.

Options feature the following.

This command returns 0 on success, another on failure.

kcutilcodec

The command `kcutilcodec' is a tool to use encoding and decoding features, or to show the configuration. This command is used in the following format.

kcutilcodec conf [-v|-i|-l|-p]
Shows the configuration of Kyoto Cabinet.

Options feature the following.

This command returns 0 on success, another on failure.

kcprototest

The command `kcprototest' is a utility for facility test and performance test of the prototype database. This command is used in the following format. `rnum' specifies the number of iterations.

kcprototest order [-tree] [-th num] [-rnd] [-etc] [-tran] rnum
Performs in-order tests.
kcprototest queue [-tree] [-th num] [-it num] [-rnd] rnum
Performs queuing operations.
kcprototest wicked [-tree] [-th num] [-it num] rnum
Performs mixed operations selected at random.
kcprototest tran [-tree] [-th num] [-it num] rnum
Performs test of transaction.

Options feature the following.

This command returns 0 on success, another on failure.

kccachetest

The command `kccachetest' is a utility for facility test and performance test of the cache database. This command is used in the following format. `rnum' specifies the number of iterations.

kccachetest order [-th num] [-rnd] [-etc] [-tran] [-bnum num] [-capcnt num] [-capsiz num] rnum
Performs in-order tests.
kccachetest queue [-th num] [-it num] [-rnd] [-bnum num] [-capcnt num] [-capsiz num] rnum
Performs queuing operations.
kccachetest wicked [-th num] [-it num] [-bnum num] [-capcnt num] [-capsiz num] rnum
Performs mixed operations selected at random.
kccachetest tran [-th num] [-it num] [-bnum num] [-capcnt num] [-capsiz num] rnum
Performs test of transaction.

Options feature the following.

This command returns 0 on success, another on failure.

kchashtest

The command `kchashtest' is a utility for facility test and performance test of the file hash database. This command is used in the following format. `path' specifies the path of a database file. `rnum' specifies the number of iterations.

kchashtest order [-th num] [-rnd] [-set|-get|-getw|-rem|-etc] [-tran] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-msiz num] [-dfunit num] [-erv] path rnum
Performs in-order tests.
kchashtest queue [-th num] [-it num] [-rnd] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-msiz num] [-dfunit num] [-erv] path rnum
Performs queuing operations.
kchashtest wicked [-th num] [-it num] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-msiz num] [-dfunit num] [-erv] path rnum
Performs mixed operations selected at random.
kchashtest tran [-th num] [-it num] [-hard] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-msiz num] [-dfunit num] [-erv] path rnum
Performs test of transaction.

Options feature the following.

This command returns 0 on success, another on failure.

kchashmgr

The command `kchashmgr' is a utility for test and debugging of the file hash database and its applications. `path' specifies the path of a database file. `key' specifies the key of a record. `value' specifies the value of a record. `file' specifies the input file.

kchashmgr create [-otr] [-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] path
Creates a database file.
kchashmgr inform [-onl|-otl|-onr] [-st] path
Prints status information.
kchashmgr set [-onl|-otl|-onr] [-add|-app|-inci|-incd] [-sx] path key value
Stores a record.
kchashmgr remove [-onl|-otl|-onr] [-sx] path key
Removes a record.
kchashmgr get [-onl|-otl|-onr] [-sx] [-px] [-pz] path key
Prints the value of a record.
kchashmgr list [-onl|-otl|-onr] [-max num] [-sx] [-pv] [-px] path [key]
Prints keys of all records, separated by line feeds.
kchashmgr import [-onl|-otl|-onr] [-sx] path [file]
Imports records from a TSV file.
kchashmgr defrag [-onl|-otl|-onr] path
Performs defragmentation.
kchashmgr check [-onl|-otl|-onr] path
Checks consistency.

Options feature the following.

This command returns 0 on success, another on failure.

kctreetest

The command `kctreetest' is a utility for facility test and performance test of the file tree database. This command is used in the following format. `path' specifies the path of a database file. `rnum' specifies the number of iterations.

kctreetest order [-th num] [-rnd] [-set|-get|-getw|-rem|-etc] [-tran] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-psiz num] [-msiz num] [-dfunit num] [-pccap num] [-rcd] [-erv] path rnum
Performs in-order tests.
kctreetest queue [-th num] [-it num] [-rnd] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-psiz num] [-msiz num] [-dfunit num] [-pccap num] [-rcd] [-erv] path rnum
Performs queuing operations.
kctreetest wicked [-th num] [-it num] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-psiz num] [-msiz num] [-dfunit num] [-pccap num] [-rcd] [-erv] path rnum
Performs mixed operations selected at random.
kctreetest tran [-th num] [-it num] [-hard] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-psiz num] [-msiz num] [-dfunit num] [-pccap num] [-rcd] [-erv] path rnum
Performs test of transaction.

Options feature the following.

This command returns 0 on success, another on failure.

kctreemgr

The command `kctreemgr' is a utility for test and debugging of the file tree database and its applications. `path' specifies the path of a database file. `key' specifies the key of a record. `value' specifies the value of a record. `file' specifies the input file.

kctreemgr create [-otr] [-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-psiz num] [-rcd] path
Creates a database file.
kctreemgr inform [-onl|-otl|-onr] [-st] path
Prints status information.
kctreemgr set [-onl|-otl|-onr] [-add|-app|-inci|-incd] [-sx] path key value
Stores a record.
kctreemgr remove [-onl|-otl|-onr] [-sx] path key
Removes a record.
kctreemgr get [-onl|-otl|-onr] [-sx] [-px] [-pz] path key
Prints the value of a record.
kctreemgr list [-onl|-otl|-onr] [-max num] [-sx] [-pv] [-px] path [key]
Prints keys of all records, separated by line feeds.
kctreemgr import [-onl|-otl|-onr] [-sx] path [file]
Imports records from a TSV file.
kctreemgr defrag [-onl|-otl|-onr] path
Performs defragmentation.
kctreemgr check [-onl|-otl|-onr] path
Checks consistency.

Options feature the following.

This command returns 0 on success, another on failure.

kcpolytest

The command `kcpolytest' is a utility for facility test and performance test of the polymorphic database. This command is used in the following format. `path' specifies the path of a database file. `rnum' specifies the number of iterations.

kcpolytest order [-th num] [-rnd] [-set|-get|-getw|-rem|-etc] [-tran] [-oat|-onl|-onl|-otl|-onr] path rnum
Performs in-order tests.
kcpolytest queue [-th num] [-it num] [-rnd] [-oat|-onl|-onl|-otl|-onr] path rnum
Performs queuing operations.
kcpolytest wicked [-th num] [-it num] [-oat|-onl|-onl|-otl|-onr] path rnum
Performs mixed operations selected at random.
kcpolytest tran [-th num] [-it num] [-hard] [-oat|-onl|-onl|-otl|-onr] path rnum
Performs test of transaction.

Options feature the following.

This command returns 0 on success, another on failure.

kcpolymgr

The command `kcpolymgr' is a utility for test and debugging of the file hash database and its applications. `path' specifies the path of a database file. `key' specifies the key of a record. `value' specifies the value of a record. `file' specifies the input file.

kcpolymgr create [-otr] [-onl|-otl|-onr] path
Creates a database file.
kcpolymgr inform [-onl|-otl|-onr] [-st] path
Prints status information.
kcpolymgr set [-onl|-otl|-onr] [-add|-app|-inci|-incd] [-sx] path key value
Stores a record.
kcpolymgr remove [-onl|-otl|-onr] [-sx] path key
Removes a record.
kcpolymgr get [-onl|-otl|-onr] [-sx] [-px] [-pz] path key
Prints the value of a record.
kcpolymgr list [-onl|-otl|-onr] [-max num] [-sx] [-pv] [-px] path [key]
Prints keys of all records, separated by line feeds.
kcpolymgr import [-onl|-otl|-onr] [-sx] path [file]
Imports records from a TSV file.
kcpolymgr defrag [-onl|-otl|-onr] path
Performs defragmentation.
kcpolymgr check [-onl|-otl|-onr] path
Checks consistency.

Options feature the following.


License

Kyoto Cabinet is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

Kyoto Cabinet is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see `http://www.gnu.org/licenses/'.

Kyoto Cabinet was written by Mikio Hirabayashi. You can contact the author by e-mail to `hirarin@gmail.com'.