Kyoto Cabinet is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. Each key must be unique within a database. There is neither concept of data tables nor data types. Records are organized in hash table or B+ tree.
The following access methods are provided to the database: storing a record with a key and a value, deleting a record by a key, retrieving a record by a key. Moreover, traversal access to every key are provided. These access methods are similar to ones of DBM (or its followers: NDBM and GDBM) library defined in the UNIX standard. Kyoto Cabinet is an alternative for DBM because of its higher performance.
Each operation of hash database has the time complexity of O(1). So, in theory, the performance is consant regardless of the scale of the database. In practice, the performance is determined by the speed of the main memory or the storage device. If the size of the database is less than the capacity of the main memory, the performance will seem on-memory speed which is fastar than std::map of STL. Of cource, the database size can be greater than the capacity of the main memory and the upper limit is 8 exabytes. Even in that case, each operation needs only one or two seeking of the storage device.
Each operation of B+ tree has the time complexity of O(log N). So, in theory, the performance is logarithmic about the scale of the database. Although the performance of random access of B+ tree is slower than that of hash database, B+ tree supports sequential access in order of the keys, which realizes forward matching search for strings and range search for integers. The performance of sequential access is much faster than that of random access.
As the API is based on object-oriented design, hash database and B+ tree database have same methods which inherited from the upper abstract class. Prototype database by containers of STL and cache database with LRU deletion algorithm are also provided under the same base class. All databases have practical utility methods related to transaction and cursor. Programs for command line interface are also included in the package.
Kyoto Cabinet is written in the C++ language, and provided as API of C++. Kyoto Cabinet is available on platforms which have API conforming to C++03 with the TR1 library extensions. Kyoto Cabinet is a free software licensed under the GNU General Public License.
This section describes how to install Kyoto Cabinet with the source package. As for a binary package, see its installation manual.
Kyoto Cabinet is available on UNIX-like systems. At least, the following environments are supported.
gcc 4.1 or later and make are required to install Kyoto Cabinet with the source package. They are installed by default on Linux, FreeBSD and so on.
As Kyoto Cabinet depends on the following libraries, install them beforehand.
When an archive file of Kyoto Cabinet is extracted, change the current working directory to the generated directory and perform installation.
Run the configuration script.
./configure
Build programs.
make
Perform self-diagnostic test.
make check
Install programs. This operation must be carried out by the root user.
make install
When a series of work finishes, the following files will be installed.
/usr/local/include/kccommon.h /usr/local/include/kcutil.h /usr/local/include/kcdb.h /usr/local/include/kcthread.h /usr/local/include/kcfile.h /usr/local/include/kccompress.h /usr/local/include/kccompare.h /usr/local/include/kcmap.h /usr/local/include/kcprotodb.h /usr/local/include/kccachedb.h /usr/local/include/kchashdb.h /usr/local/include/kctreedb.h /usr/local/include/kcpolydb.h /usr/local/lib/libkyotocabinet.a /usr/local/lib/libkyotocabinet.so.x.y.z /usr/local/lib/libkyotocabinet.so.x /usr/local/lib/libkyotocabinet.so /usr/local/lib/pkgconfig/kyotocabinet.pc /usr/local/bin/kcutiltest /usr/local/bin/kcutilcodec /usr/local/bin/kcprototest /usr/local/bin/kccachetest /usr/local/bin/kchashtest /usr/local/bin/kchashmgr /usr/local/bin/kctreetest /usr/local/bin/kctreemgr /usr/local/bin/kcpolytest /usr/local/bin/kcpolymgr /usr/local/share/kyotocabinet/... /usr/local/man/man1/...
The following options can be specified with `./configure'.
`--prefix' and other options are also available as with usual UNIX software packages. If you want to install Kyoto Cabinet under `/usr' not `/usr/local', specify `--prefix=/usr'. As well, the library search path does not include `/usr/local/lib', it is necessary to set the environment variable `LD_LIBRARY_PATH' to include `/usr/local/lib' before running applications of Kyoto Cabinet.
Kyoto Cabinet provides API of the C++ language and it is available by programs conforming to the C++03 standard. As the header files of Kyoto Cabinet are provided as `kcutil.h', `kchashdb.h', and so on, applications should include one or more of them accordingly to use the API. As the library is provided as `libkyotocabinet.a' and `libkyotocabinet.so' and they depends `libz.so', `libstdc++.so', `librt.so', `libpthread.so', `libm.so', and `libc.so', linker options corresponding to them are required by the build command. The typical build command is the following.
g++ -I/usr/local/include example.cc -o example \ -L/usr/local/lib -lkyotocabinet -lz -lstdc++ -lrt -lpthread -lm -lc
This section describes how to use Kyoto Cabinet with the command line utilities and some sample application programs.
To begin with, let's build a file hash database with the command line utility `kchashmgr'. The database stores a list of staffs of a company. Each record is composed of the key and the value. The key is the staff ID and the value is person's name.
The database must be created just one time before any database operation. Let's create a database file "casket.kch" with the default configuration.
$ kchashmgr create staffs.kch
Register some staffs into the database.
$ kchashmgr set staffs.kch 1001 "George Washington" $ kchashmgr set staffs.kch 1002 "John Adams" $ kchashmgr set staffs.kch 1003 "Thomas Jefferson" $ kchashmgr set staffs.kch 1004 "James Madison"
Check the current contents.
$ kchashmgr list -pv staffs.kch 1001 George Washington 1002 John Adams 1003 Thomas Jefferson 1004 James Madison
To retrieve the value of a record, search the database with the key.
$ kchashmgr get staffs.kch 1003 Thomas Jefferson
Of cource, you can remove a record with the key.
$ kchashmgr remove staffs.kch 1003
That's all for the fundamental operations. The DBM family have been improving performance thanks to discarding the functionality.
Next, let's write a sample application program handling a file hash database. See the following source code.
#include <kchashdb.h>
using namespace std;
using namespace kyotocabinet;
int main(int argc, char** argv) {
// create the database object
HashDB db;
// open the database
if (!db.open("casket.kch", HashDB::OWRITER | HashDB::OCREATE)) {
cout << "open error: " << db.error().string() << endl;
}
// store records
if (!db.set("foo", "hop") ||
!db.set("bar", "step") ||
!db.set("baz", "jump")) {
cout << "set error: " << db.error().string() << endl;
}
// retrieve records
string* value = db.get("foo");
if (value) {
cout << *value << endl;
delete value;
} else {
cout << "get error: " << db.error().string() << endl;
}
// traverse records
class Traverser : public DB::Visitor {
const char* visit_full(const char* kbuf, size_t ksiz,
const char* vbuf, size_t vsiz, size_t *sp) {
cout << string(kbuf, ksiz) << ":" << string(vbuf, vsiz) << endl;
return NOP;
}
} traverser;
db.iterate(&traverser, false);
// close the database
if (!db.close()) {
cout << "close error: " << db.error().string() << endl;
}
return 0;
}
Save the above code as a file "example.cc". Then, perform the following command line. The command `kcutilcodec conf' prints the building configuration.
$ g++ `kcutilcodec conf -i` -o example example.cc `kcutilcodec conf -l`
Execute the application program built by the above.
$ ./example hop foo:hop bar:step baz:jump
The API of the file hash database is defined in the header `kchash.h'. So, include the header near the front of a source file. All symbols of Kyoto Cabinet are packaged in the name space `kyotocabinet'. So, import the name space is useful.
#include <kchashdb.h> using namespace kyotocabinet;
The class `HashDB' contains all functionality of the file hash database and each instance expresses a file hash database file.
HashDB db;
Each database file must be opened by the `open' method before any database operation. The flag `HashDB::OWRITER' means the process will update the database. The flag `HashDB::OCREAT' means a database file will be created if it does not exist yet.
db.open("casket.kch", HashDB::OWRITER | HashDB::OCREATE);
Every opened database must be closed by the `close' method when it is no longer in use. Closing the database is very important to avoid data corruption and memory leak.
db.close();
To store a record, use the `set' method with the key and the value.
db.put("foo", "hop");
To retrieve the value of a record, use the `get' method with the key. The return value is NULL if no record corresponds to the key. On success, the return value is the pointer to a dynamic object and it should be deleted explicitly after the use.
string* value = db.get("foo", "hop");
if (value) {
cout << *value << endl;
delete value;
}
Except for `set' and `get', there are other methods; `add', `append', `remove'. Each method has two versions; for `std::string' parameters and for `char*' and `size_t' parameters.
Traversing records is a bit complicated task. It needs an object for call back operations. The class must be implement the interface `DB::Visitor'. The method `visit_full' is called for each record. The parameters specify the pointer to key buffer, the length of the key buffer, the pointer to the value buffer, the length of the value buffer, and the pointer to the variable to notice the length of the return value. The return value is `NOP' for no update, `REMOVE' to remove the record, or otherwise the pointer to the buffer contains arbitrary byte pattern to update the value. The `iterate' method begins the iteration. The second parameter is true for updating operation or false for non-updating operation.
class Traverser : public DB::Visitor {
const char* visit_full(const char* kbuf, size_t ksiz,
const char* vbuf, size_t vsiz, size_t *sp) {
cout << string(kbuf, ksiz) << ":" << string(vbuf, vsiz) << endl;
return NOP;
}
} traverser;
db.iterate(&traverser, false);
Though the `accept' method did not appear in the above sample, it is very useful. It calls an arbitrary function with the same signature of the above iterator for a specified record. As it is, almost all built-in operations as `set' and `get' are implemented with the `accept' method. The following code is to store a record.
class Setter : public DB::Visitor {
public:
Setter(const char* vbuf, size_t vsiz) : vbuf_(vbuf), vsiz_(vsiz) {}
private:
const char* visit_full(const char* kbuf, size_t ksiz,
const char* vbuf, size_t vsiz, size_t* sp) {
*sp = vsiz_;
return vbuf_;
}
const char* visit_empty(const char* kbuf, size_t ksiz, size_t* sp) {
*sp = vsiz_;
return vbuf_;
}
const char* vbuf_;
size_t vsiz_;
};
Setter setter("foo", 3);
accept(kbuf, ksiz, &setter, true);
Not only internal iterator by the `iterate' method but also external iterator by the `Cursor' class is provided. While the internal iterator scans the whole database atomically, the external iterator visits each record by each method call. Cursor also has the `accept' method and you can perform arbitrary operation for each record.
DB::Cursor* cur = db.cursor();
cur->jump();
string* key;
while ((key = cur->get_key()) != NULL) {
cout << *key << endl;
delete key;
cur->step();
}
delete cur;
You can use the following classes in the same way.
| class | file | description |
|---|---|---|
ProtoHashDB |
kcprotodb.h |
prototype hash database
on-memory database implemented with std::unorderd_map of STL
|
ProtoTreeDB |
kcprotodb.h |
prototype tree database
on-memory database implemented with std::map of STL
|
CacheDB |
kccachedb.h |
cache database
on-memory database featuring LRU deletion
|
HashDB |
kchashdb.h |
file hash database
file database of hash table; typical DBM
|
TreeDB |
kctreedb.h |
file tree database
file database of B+ tree; DBM with order
|
PolyDB |
kcpolydb.h |
polymorphic database
dynamic binding of the above databases
|
Each database has different features in persistence, algorithm, time complexity, record sequence, and lock model for concurrency.
ProtoHashDB |
ProtoTreeDB |
CacheDB |
HashDB |
TreeDB |
|
| persistence | volatile | volatile | volatile | persistent | persistent |
| algorithm | hash table | red black tree | hash table | hash table | B+ tree |
| complexity | O(1) | O(log N) | O(1) | O(1) | O(log N) |
| sequence | undefined | lexical order | undefined | undefined | custom order |
| lock unit | whole (rwlock) | whole (rwlock) | record (mutex) | record (rwlock) | page (rwlock) |
Please see the the API documents for detail. Writing your own sample application is the best way to learn this library.
This section describes how to use command line utilities. They are useful to manage database contents and to test the library and its applications.
The command `kcutiltest' is a utility for facility test and performance test of the utility functions. This command is used in the following format. `rnum' specifies the number of iterations. `path' specifies the path of a file.
kcutiltest mutex [-th num] [-iv num] rnumkcutiltest file [-th num] [-rnd] [-msiz num] path rnumkcutiltest map [-rnd] rnumkcutiltest misc rnumOptions feature the following.
This command returns 0 on success, another on failure.
The command `kcutilcodec' is a tool to use encoding and decoding features, or to show the configuration. This command is used in the following format.
kcutilcodec conf [-v|-i|-l|-p]Options feature the following.
This command returns 0 on success, another on failure.
The command `kcprototest' is a utility for facility test and performance test of the prototype database. This command is used in the following format. `rnum' specifies the number of iterations.
kcprototest order [-tree] [-th num] [-rnd] [-etc] [-tran] rnumkcprototest queue [-tree] [-th num] [-it num] [-rnd] rnumkcprototest wicked [-tree] [-th num] [-it num] rnumkcprototest tran [-tree] [-th num] [-it num] rnumOptions feature the following.
This command returns 0 on success, another on failure.
The command `kccachetest' is a utility for facility test and performance test of the cache database. This command is used in the following format. `rnum' specifies the number of iterations.
kccachetest order [-th num] [-rnd] [-etc] [-tran] [-bnum num] [-capcnt num] [-capsiz num] rnumkccachetest queue [-th num] [-it num] [-rnd] [-bnum num] [-capcnt num] [-capsiz num] rnumkccachetest wicked [-th num] [-it num] [-bnum num] [-capcnt num] [-capsiz num] rnumkccachetest tran [-th num] [-it num] [-bnum num] [-capcnt num] [-capsiz num] rnumOptions feature the following.
This command returns 0 on success, another on failure.
The command `kchashtest' is a utility for facility test and performance test of the file hash database. This command is used in the following format. `path' specifies the path of a database file. `rnum' specifies the number of iterations.
kchashtest order [-th num] [-rnd] [-set|-get|-getw|-rem|-etc] [-tran] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-msiz num] [-dfunit num] [-erv] path rnumkchashtest queue [-th num] [-it num] [-rnd] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-msiz num] [-dfunit num] [-erv] path rnumkchashtest wicked [-th num] [-it num] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-msiz num] [-dfunit num] [-erv] path rnumkchashtest tran [-th num] [-it num] [-hard] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-msiz num] [-dfunit num] [-erv] path rnumOptions feature the following.
This command returns 0 on success, another on failure.
The command `kchashmgr' is a utility for test and debugging of the file hash database and its applications. `path' specifies the path of a database file. `key' specifies the key of a record. `value' specifies the value of a record. `file' specifies the input file.
kchashmgr create [-otr] [-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] pathkchashmgr inform [-onl|-otl|-onr] [-st] pathkchashmgr set [-onl|-otl|-onr] [-add|-app|-inci|-incd] [-sx] path key valuekchashmgr remove [-onl|-otl|-onr] [-sx] path keykchashmgr get [-onl|-otl|-onr] [-sx] [-px] [-pz] path keykchashmgr list [-onl|-otl|-onr] [-max num] [-sx] [-pv] [-px] path [key]kchashmgr import [-onl|-otl|-onr] [-sx] path [file]kchashmgr defrag [-onl|-otl|-onr] pathkchashmgr check [-onl|-otl|-onr] pathOptions feature the following.
This command returns 0 on success, another on failure.
The command `kctreetest' is a utility for facility test and performance test of the file tree database. This command is used in the following format. `path' specifies the path of a database file. `rnum' specifies the number of iterations.
kctreetest order [-th num] [-rnd] [-set|-get|-getw|-rem|-etc] [-tran] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-psiz num] [-msiz num] [-dfunit num] [-pccap num] [-rcd] [-erv] path rnumkctreetest queue [-th num] [-it num] [-rnd] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-psiz num] [-msiz num] [-dfunit num] [-pccap num] [-rcd] [-erv] path rnumkctreetest wicked [-th num] [-it num] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-psiz num] [-msiz num] [-dfunit num] [-pccap num] [-rcd] [-erv] path rnumkctreetest tran [-th num] [-it num] [-hard] [-oat|-onl|-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-psiz num] [-msiz num] [-dfunit num] [-pccap num] [-rcd] [-erv] path rnumOptions feature the following.
This command returns 0 on success, another on failure.
The command `kctreemgr' is a utility for test and debugging of the file tree database and its applications. `path' specifies the path of a database file. `key' specifies the key of a record. `value' specifies the value of a record. `file' specifies the input file.
kctreemgr create [-otr] [-onl|-otl|-onr] [-apow num] [-fpow num] [-ts] [-tl] [-tc] [-bnum num] [-psiz num] [-rcd] pathkctreemgr inform [-onl|-otl|-onr] [-st] pathkctreemgr set [-onl|-otl|-onr] [-add|-app|-inci|-incd] [-sx] path key valuekctreemgr remove [-onl|-otl|-onr] [-sx] path keykctreemgr get [-onl|-otl|-onr] [-sx] [-px] [-pz] path keykctreemgr list [-onl|-otl|-onr] [-max num] [-sx] [-pv] [-px] path [key]kctreemgr import [-onl|-otl|-onr] [-sx] path [file]kctreemgr defrag [-onl|-otl|-onr] pathkctreemgr check [-onl|-otl|-onr] pathOptions feature the following.
This command returns 0 on success, another on failure.
The command `kcpolytest' is a utility for facility test and performance test of the polymorphic database. This command is used in the following format. `path' specifies the path of a database file. `rnum' specifies the number of iterations.
kcpolytest order [-th num] [-rnd] [-set|-get|-getw|-rem|-etc] [-tran] [-oat|-onl|-onl|-otl|-onr] path rnumkcpolytest queue [-th num] [-it num] [-rnd] [-oat|-onl|-onl|-otl|-onr] path rnumkcpolytest wicked [-th num] [-it num] [-oat|-onl|-onl|-otl|-onr] path rnumkcpolytest tran [-th num] [-it num] [-hard] [-oat|-onl|-onl|-otl|-onr] path rnumOptions feature the following.
This command returns 0 on success, another on failure.
The command `kcpolymgr' is a utility for test and debugging of the file hash database and its applications. `path' specifies the path of a database file. `key' specifies the key of a record. `value' specifies the value of a record. `file' specifies the input file.
kcpolymgr create [-otr] [-onl|-otl|-onr] pathkcpolymgr inform [-onl|-otl|-onr] [-st] pathkcpolymgr set [-onl|-otl|-onr] [-add|-app|-inci|-incd] [-sx] path key valuekcpolymgr remove [-onl|-otl|-onr] [-sx] path keykcpolymgr get [-onl|-otl|-onr] [-sx] [-px] [-pz] path keykcpolymgr list [-onl|-otl|-onr] [-max num] [-sx] [-pv] [-px] path [key]kcpolymgr import [-onl|-otl|-onr] [-sx] path [file]kcpolymgr defrag [-onl|-otl|-onr] pathkcpolymgr check [-onl|-otl|-onr] pathOptions feature the following.
Kyoto Cabinet is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
Kyoto Cabinet is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see `http://www.gnu.org/licenses/'.
Kyoto Cabinet was written by Mikio Hirabayashi. You can contact the author by e-mail to `hirarin@gmail.com'.