Tuesday, December 27, 2011

HBASE INDEXING LIBRARY : Copied From http://www.lilyproject.org/

WHAT?

This library aids in building and querying indexes on top of HBase, in Google App Engine datastore-style. This basically means querying by range-scanning on specifically constructed index tables.

The goal of this library is to hide the details of playing with HBase's byte[] row keys to construct indexes. Actually pushing data towards the index is not handled by this package.

This library is complementary to the tableindexed contrib module of HBase, which does not do much regarding row key construction, though at this time it was not specifically designed to work together.

Some background on the indexing approach in this package can be found in this blog post.

FEATURES

define indexes consisting of one or more fields
fields can be high-level datatypes like strings, integers, floats, decimals, date(time)s
strings can be indexed as plain UTF8, ASCII-folded (= accents removed), or by collator key
the entries in the index can be in ascending or descending order
perform equals and range queries on all or a subset of the fields in the index (fields should be used left-to-right, only for the last used field a range can be used)
results from multiple indexes can be streamingly merge-joined
unrestricted scalability due to the automatic sharding (and replication) offered by HBase

NEWS

April 2011

The indexmeta table, which contained the index definition for each index, has been dropped. The index definition is now simply stored as a custom attribute in the HBase table descriptor.

This has several advantages:

the indexmeta table and the actual index tables cannot get out of sync
the indexmeta table was trivially small and hardly queried, which doesn't help even load spreading on small HBase clusters.
the index definition (json) is now visible in the HBase web ui or shell.
the indexmeta table does not need to be created (one step less in setting things up).

Migration is easy (just take the definition out of the indexmeta table and put it in a LINK_INDEX attribute on the corresponding index table), and a tool is available to do this. From the Lily source tree, after building, do:

cd global/hbaseindex
mvn exec:java -Dexec.mainClass=org.lilyproject.hbaseindex.IndexMetaTableMigrationTool -Dzookeeper=localhost -Dtable=indexmeta

And change localhost to the name of the your ZooKeeper server (or comma-separated list of servers, but for this short-running tool this should not matter much). Note that this tool will disable/enable each of the index tables. The indexmeta table is not automatically removed, you can do that yourself after verifying everything is fine.

As part of this change, the API for creating indexes is slightly changed, and the constructor which took the name of the indexmeta table was dropped. Creating indexes is not done through IndexManager.getIndex(IndexDefinition).

These changes were introduced in r4773.

January 2011

An important performance issue was fixed: now that we started making some real use of this library, we noticed a problem in the setup of the HBase scanners, causing the scan to always run until the end of the table.
Support for HBase multi-put: instead of adding index entries one by one, multiple ones can now be added in one call. These are added using one HBase put operation, which can be much faster than doing individual put operations.
A simple performance testing tool was added, see description further on in this document.

DOWNLOAD

July 22, 2010: Subversion access

The latest HBase indexing library is now available from Lily's source tree. The downloads will no longer be maintained.

The project can be found in the global/hbasindex subdirectory of the Lily source tree, its dependencies can be found in the file global/hbaseindex/pom.xml.

To get & build the code, use:

svn checkout http://dev.outerthought.org/svn_public/outerthought_lilyproject/trunk/ lily-trunk
mvn -Pfast install

Old downloads

The download includes source code, binary builds and javadocs, all under the Apache License.

April 6, 2010 snapshot

Download (application/x-gzip, 1.2 MB, info)

Indexes created with the previous snapshot are not compatible with those created by this release.

New in this snapshot:

support for variable-length strings. You do not need to set the length for string fields anymore, and there is no string-field-padding done anymore.
allow to store arbitrary key-value pairs with an index entry, which are stored as columns within the index table and can be retrieved upon index querying.
the build uses the Maven-based HBase.

Februari 22, 2010 snapshot

Download (application/x-gzip, 1.9 MB, info)

(initial release)

USAGE

Be sure to read the javadocs for the org.lilycms.hbaseindex package.

If you are using Maven, you can simply take a dependency on lily-hbaseindex to get all the dependencies. Otherwise, to see the dependencies, perform the following command in the Lily source tree:

cd global/hbaseindex
mvn dependency:tree

Most dependencies will be shown below org.lilyproject:lily-testfw, all these are not needed to use hbaseindex.

The below is a complete sample application showing how to create an index and query it.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.lilyproject.hbaseindex.*;

public class Test {
    public static void main(String[] args) throws Exception {
        Configuration config = HBaseConfiguration.create();
        config.set("hbase.zookeeper.quorum", "localhost");

        //
        // Create the IndexManager. This is a lightweight object, so it doesn't have to
        // be managed as a singleton.
        //
        IndexManager indexManager = new IndexManager(config);

        //
        // Create/get an index. If the index would already exist, this will check the definition
        // matches the one provided here. To use an already existing index, you can also get
        // it by name.
        // The name of the created HBase table is the same as the name of the index.
        //
        IndexDefinition indexDef = new IndexDefinition("index1");
        StringIndexFieldDefinition stringField = indexDef.addStringField("stringfield");
        stringField.setCaseSensitive(false);
        Index index = indexManager.getIndex(indexDef);

        //
        // Add entries to the index
        //
        String[] values = {"bird", "brown", "bee", "ape", "dog", "cat"};

        for (String value : values) {
            IndexEntry entry = new IndexEntry();
            entry.addField("stringfield", value);
            entry.setIdentifier(Bytes.toBytes("id-" + value));
            index.addEntry(entry);
        }

        //
        // Query the index
        //
        Query query = new Query();
        query.setRangeCondition("stringfield", "b", "b");
        QueryResult result = index.performQuery(query);

        System.out.println("The identifiers of the matching index entries are:");
        byte[] identifier;
        while ((identifier = result.next()) != null) {
            System.out.println(Bytes.toString(identifier));
        }        
    }
}

PERFORMANCE TEST

A simple performance testing tool of the hbaseindex library can be found in the Lily source tree in the directory global/hbaseindex-perftest.

The test tool runs in two phases:

first it builds up an index
then it performs various operations on the index: querying and updating

At the time of this writing, the index that is created is called 'perftest1' and contains two fields: a word (selected from a dictionary of about 90000 words) and a random generated number. The test performs just two queries: an exact match on the word, and a range search on a prefix of the word. It reads at most 100 results from each query. [these test should become broader, but you can always adjust the code to match your case]

Here are the steps to use the tool:

If not done already, check out and compile Lily:

svn checkout http://dev.outerthought.org/svn_public/outerthought_lilyproject/trunk/ lily-trunk
mvn -Pfast install

You need to have some HBase installation available. If you do not have any and you are a bit lazy, you can run a test instance like this:

cd global/test-fw
./target/launch-hadoop

and wait a few moments until it shows "Minicluster is up".

Then to use the test tool itself:

cd global/hbaseindex-perftest
./target/hbaseindex-perftest -h

This will show the various options available.

For example, I did a run using:

./target/hbaseindex-perftest --initial-entries 1000000 \
                             --initial-entries-batch 10000 \
                             --loops 100000 \
                             --hbase-metrics \
                             --workers 2 \
                             --zookeeper localhost

The option "--hbase-metrics" will read out the blockCacheHitRatio from your HBase regionserver(s). This requires that you have enabled their JMX ports (without password) on the default port (10102). This can be done in the hbase-env.sh on each of your HBase servers. (with the above mentioned launch-hadoop, it is automatically enabled).

The option 'workers' specifies how many threads to use in the test tool to perform operations concurrently.

The 'loops' option specifies how many iterations to perform in the second phase of the test tool, i.e. of the query/update scenario.

The results of the test are printed to the file HbaseIndexPerfTest-metrics:

tail -f HbaseIndexPerfTest-metrics

Each 30 seconds, a summary of the metrics of the last 30 seconds, as well as the total metrics up to then, will be outputted to this file.

+-----------------------------------------------------------------------------------------------------------------------+
| Interval started at: 2011-01-06T09:25:09.606+01:00 (duration: 30s).
| Measurements started at: 2011-01-06T09:25:09.606+01:00 (duration: 00:00:30)
| HBase cluster status: avg load: 4.00, dead servers: 0, live servers: 1, regions: 4
+------------------------------------+----------+----------+----------+----------+----------+-------------+-------------+
| Name                               | Op count | Average  | Median   | Minimum  | Maximum  | Alltime ops | Alltime avg |
+------------------------------------+----------+----------+----------+----------+----------+-------------+-------------+
|I:Index insert in batch of 1        |       893|      1.50|      0.82|      0.50|    338.23|          893|         1.50|
|I:Index insert in batch of 10000    |   1000000|      0.03|      0.02|      0.01|      0.25|      1000000|         0.03|
|I:Index insert in batch of 5        |      4465|      0.30|      0.18|      0.11|     68.88|         4465|         0.30|
|Single field query # of results     |       894|     11.06|     11.00|      2.00|     21.00|          894|        11.06|
|Q:Single field query duration       |       894|     10.00|      1.65|      0.83|    425.27|          894|        10.00|
|Str rng query # of results          |       893|     95.89|    100.00|      7.00|    100.00|          893|        95.89|
|Q:Str rng query duration            |       893|      9.26|      3.11|      0.86|    403.11|          893|         9.26|
|blockCacheHitRatio                  |         1|     78.00|     78.00|     78.00|     78.00|            1|        78.00|
+------------------------------------+----------+----------+----------+----------+----------+-------------+-------------+
| I ops/sec: 33175.75 interval (29426.43 real), Q ops/sec: 58.97 interval (103.84 real)
+-----------------------------------------------------------------------------------------------------------------------+

The "ops/sec" displayed at the bottom counts together the number of operations labelled with "I:" or "Q:" and divides them by the duration of the interval (30 seconds). Since this first interval did mostly the initial construction of the index, the queries/sec is low. (the difference between the interval/real ops/sec is explained in the metrics file itself).

A bit later when the block cache becomes filled:

+-----------------------------------------------------------------------------------------------------------------------+
| Interval started at: 2011-01-06T09:26:40.498+01:00 (duration: 30s).
| Measurements started at: 2011-01-06T09:25:09.606+01:00 (duration: 00:02:00)
| HBase cluster status: avg load: 5.00, dead servers: 0, live servers: 1, regions: 5
+------------------------------------+----------+----------+----------+----------+----------+-------------+-------------+
| Name                               | Op count | Average  | Median   | Minimum  | Maximum  | Alltime ops | Alltime avg |
+------------------------------------+----------+----------+----------+----------+----------+-------------+-------------+
|I:Index insert in batch of 1        |     14712|      0.51|      0.40|      0.29|    184.57|        36186|         0.58|
|I:Index insert in batch of 10000    |         0|      0.00|      0.00|      0.00|      0.00|      1000000|         0.03|
|I:Index insert in batch of 5        |     73565|      0.16|      0.14|      0.07|     37.22|       180930|         0.16|
|Single field query # of results     |     14713|     13.06|     13.00|      3.00|     30.00|        36186|        12.31|
|Q:Single field query duration       |     14713|      1.04|      1.02|      0.38|     50.56|        36186|         1.65|
|Str rng query # of results          |     14713|     95.97|    100.00|      5.00|    100.00|        36186|        95.73|
|Q:Str rng query duration            |     14713|      1.49|      1.37|      0.55|     68.47|        36186|         2.21|
|blockCacheHitRatio                  |         1|     97.00|     97.00|     97.00|     97.00|            4|        91.75|
+------------------------------------+----------+----------+----------+----------+----------+-------------+-------------+
| I ops/sec: 2941.98 interval (4581.17 real), Q ops/sec: 980.67 interval (790.48 real)
+-----------------------------------------------------------------------------------------------------------------------+

Which learns us that in this case, a query takes about 1-1.5 ms.

FEEDBACK

We are interested in hearing what you think of this library: feedback is welcome on the Lily mailing list.

Infinite Programming Tips