About Michael Hunger

Posts by Michael Hunger:

 
0

On Creating a MapDB Schema Index Provider for Neo4j 2.0

on May 11, 2013 in Uncategorized, code, java, neo4j

Writing an Neo4j 2.0 Schema Index Provider for MapDB

Neo4j 2.0 introduced the concept of real automatic indexes with a new underlying indexing subsystem SPI. So I thought it would be really helpful to try it out and provide a faster indexing implementation than the default lucene one. I chose MapDB for it and the results are here on github.

Using the index is quite easy from Cypher and the other APIs:

// Cypher
CREATE INDEX ON :Label(property)
e.g.
CREATE INDEX ON :Person(name)

// the index is used automatically, but can be enforced with
MATCH (n:Person)
USING INDEX :Person(name)
where n.name = "Andres"
// Java
Label LABEL = DynamicLabel.label("foo");
String PROPERTY = "bar";

// Creation
Transaction tx = db.beginTx();
IndexCreator indexCreator = db.schema().indexCreator(LABEL).on(PROPERTY);
IndexDefinition indexDefinition = indexCreator.create();
tx.success(); tx.finish();
db.schema().awaitIndexOnline(indexDefinition, 5, TimeUnit.SECONDS);

// Usage, get Index Information
IndexDefinition index = IteratorUtil.single(db.schema().getIndexes(LABEL));
assertEquals(LABEL.name(), index.getLabel().name());

// Create matching Node
Transaction tx = db.beginTx();
Node node = db.createNode(LABEL);
node.setProperty(PROPERTY, 42);
tx.success(); tx.finish();

// Find nodes
ResourceIterable<Node> nodes = db.findNodesByLabelAndProperty(LABEL, PROPERTY, 42);

MapDB is a very potent implementation of an effective in-memory and persistent map structure, either as b-tree or hashmap. It supports optimized serialization of arbitrary Java objects including collections, compresses data on the fly even with id-compression and much more. A very important feature for adding MapDB as an index provider is the support for snapshots.

Support for transaction like semantics allow for batch-updates which is also really cool for the index provider which uses batch-updates too.

A code-example from the MapDB website:

    import org.mapdb.*;

    //Configure and open database using builder pattern.
    DB db = DBMaker.newFileDB(new File("testdb")).closeOnJvmShutdown().make();

    //create new collection (or open existing)
    ConcurrentNavigableMap map = db.getTreeMap("collectionName");
    map.put(1,"one");
    map.put(2,"two");

    //persist changes into disk, there is also rollback() method
    db.commit();

    db.close();

So choosing MapDB as an index provider was really straightforwad. Now the small task is only to implement the SPI.

The requirements for implementing the SPI are quite simple. We have to tie into Neo4j’s lifecycle management with an IndexProviderFactory to
register the index provider and which implements the SchemaIndexProvider which supplies an IndexPopulator and IndexAccessor that handle index updates and an IndexReader that
has to provide a repeatable read snapshot view of the data in the index. Actually I just copied the code from the org.neo4j.kernel.impl.api.index.InMemoryIndexProvider and adapted it for MapDb.

The MapDbIndexProviderFactory is tiny, it just returns the single instance of MapDbSchemaIndexProvider in the newKernelExtension lifecycle method.

@Service.Implementation(KernelExtensionFactory.class)
public class MapDbIndexProviderFactory extends KernelExtensionFactory&lt;MapDbIndexProviderFactory.Dependencies> {
    public interface Dependencies {}

    private final MapDbSchemaIndexProvider singleProvider;

    public MapDbIndexProviderFactory() {
        // name and version
        super(new SchemaIndexProvider.Descriptor("mapdb-index", "1.0"));
        this.singleProvider = new MapDbSchemaIndexProvider();
    }

    @Override
    public Lifecycle newKernelExtension(Dependencies dependencies) throws Throwable {
        return singleProvider;
    }
}

To register the MapDbIndexProviderFactory we have to provide a file named org.neo4j.kernel.extension.KernelExtensionFactory in META-INF/services that contains the fully qualified name of our Factory, in its role as KernelExtensionFactory, which is: org.neo4j.index.mapdb.MapDbIndexProviderFactory.

The MapDbSchemaIndexProvider extends SchemaIndexProvider, it is also an instance of Lifecycle, so it implements init(), start(), stop(), shutdown(). In the constructor it registers itself with a descriptor and priority (2 is higher than the default 1 for lucene) and creates a MapDB Database instance which is used later on.

public MapDbSchemaIndexProvider() {
    super(new SchemaIndexProvider.Descriptor("mapdb-index", "1.0"), 2);
    db = DBMaker.newFileDB(new File("mapdb-index"))
      .compressionEnable().closeOnJvmShutdown().make();
}

It keeps an internal CopyOnWriteHashMap for a list of index-instances by name which are representated by the appropriate MapDB tree-map. The 3 methods from SchemaIndexProvider provide access to each concrete index (indexId is unique per declared :Label(property) combination).

@Override
public MapDbIndex getOnlineAccessor(long indexId) {
    MapDbIndex index = indexes.get(indexId);
    if (index == null || index.state != InternalIndexState.ONLINE)
        throw new IllegalStateException("Index " + indexId + " not online yet");
    return index;
}

@Override
public InternalIndexState getInitialState(long indexId) {
    MapDbIndex index = indexes.get(indexId);
    return index != null ? index.state : InternalIndexState.POPULATING;
}

@Override
public MapDbIndex getPopulator(long indexId) {
    BTreeMap&lt;Object,Set<Long>> map = db.getTreeMap(String.valueOf(indexId));
    MapDbIndex index = new MapDbIndex(map,db);
    indexes.put(indexId, index);
    return index;
}

The getPopulator returns the IndexPopulator which is repsonibile for updating the index. That happens within a separate class called MapDbIndex which handles addition of removal of batches of value->nodeId pairs to the MapdDB tree-map instance, all of this happens in the implementation of updateAndCommit and recover which both call update(Iterable updates). That method then decides on the Mode of NodePropertyUpdate to either add,remove or update data. In this demo I base the implementation on storing Sets of Long values for the node-id’s. The real implementation is a bit more evolved to save space and skip (un-)boxing.

private void add(Object value, Long id) {
    Set&lt;Long> ids=indexData.get(value);
    if (ids==null) ids = new HashSet&lt;Long>();
    ids.add(id);
    indexData.put(value,ids);
}
private void remove(Object value, Long id) {
    Set&lt;Long> ids=indexData.get(value);
    if (ids==null) return;
    ids.remove(id);
    indexData.put(value,ids);
}

public void update(Iterable&lt;NodePropertyUpdate> updates) {
    for (NodePropertyUpdate update : updates) {
        switch (update.getUpdateMode()) {
            case ADDED:
                add(update.getValueAfter(),update.getNodeId());
                break;
            case CHANGED:
                remove(update.getValueBefore(), update.getNodeId());
                add(update.getValueAfter(),update.getNodeId());
                break;
            case REMOVED:
                remove(update.getValueBefore(), update.getNodeId());
                break;
            default:
                throw new UnsupportedOperationException();
        }
    }
    db.commit();
}

The IndexReader must make sure to supply a reapeatable read view of the data, using MapDB’s treeMap.snapshot() facility. So implementing the MapDbIndexReader is not complicated.

@Override
public IndexReader MapDbSchemaIndexProvider.newReader() {
    return new MapDbMemoryReader((BTreeMap&lt;Object, Set<Long>>) indexData.snapshot());
}

private static class MapDbIndexReader implements IndexReader {
    private BTreeMap&lt;Object, Set<Long>> snapshot;

    @Override
    public Iterator&lt;Long> lookup(Object value) {
        final Set&lt;Long> result = snapshot.get(value);
        return result == null ? IteratorUtil.&lt;Long>emptyIterator() : result.iterator();
    }

}

That’s about it.

You just have to clone the repository, build it with mvn package and put the jar file target/mapdb-index-1.0.jar as well as org.mapdb:mapdb:jar:0.9.1 in your classpath or server/plugins directory to use the index.

. For both you can also just use the contents of the generated target/target/mapdb-index-1.0-provider.zip.

So far in tests it was twice as fast as Lucene but there is certainly optimization potential.

In general it was really simple to implement the index provider, so I suggest you go ahead and try it for other NOSQL stores. Would really love to see some other implementations out there.

 
4

Cool first Neo4j 2.0 milestone – Now with Labels and “real” Indexes

on Apr 10, 2013 in neo4j

With the addition of node labels, the property graph model that is the foundation of Neo4j was changed for the first. It has been already thirteen years, since the founders (Emil, Johan and Peter) sketched the original property graph model over some beers.
With the new node-label feature you can assign any number of types from [...]

 
2

Parallel Batch Inserter with Neo4j imported 20 billion relationships on EC2

on Oct 27, 2012 in code, java, neo4j

As massive data insertion performance has bothered me for a while, I made it the subject of my last lab days (20% time) at Neo4j. The results of my work are available on GitHub and I explain the approach below.
Data Insertion issues

When getting started with a new database like the graph database Neo4j it is [...]

 
8

On Streaming Cypher

on Apr 13, 2012 in development, java, neo4j

After being annoyed for a long time about the Neo4j REST protocol performance I decided to have a look at streaming JSON last night. It seemed simple enough.
Today Peter pushed me to continue through and use the Lab day for finishing the lab-project.
So I started to create a server-extension project that does 2 things differently. [...]

 
0

MovieHackDay Berlin Recap

on Jun 6, 2011 in fun, neo4j

It was really a great event. Perfectly organized by MoviePilot.de (@Jannis) where Pere works as well.

You can find all information about the event on their site moviehackday.com and of course @moviehackday and #moviehackday (wiki)
There was free pizza, drinks (also beer), food coupons and cake from Pere, ping pong, lots of space, great conversations.
Some people have [...]

 
8

Industrial Grade Barcode Scanning w/ an Apple iPad USB barcode scanner and camera connection kit

on Oct 26, 2010 in fun, iOS

I was pondering the means of how to scan barcodes with an iPad for a while.
Today I found a really cheap solution. Just connect a stock USB keyboard barcode scanner via the camera connection kit USB connector to the iPad, and you’re done.
Now you can scan your book EANs for librarything or other things [...]

 
3

Keynote at 4developers: The Game Of Life – Java‘s Siblings and Heirs are populating the Ecosystem

on Mar 29, 2010 in code, development, java, programming languages


I was invited to give a keynote talk at the 4developers conference in Poznan, Poland.

I’d liked to talk about the Java.next programming languages on the JVM and polyglot programming. When pondering how to address this issue, two things came into my mind.

 
0

Switching to Wordpress

on Mar 6, 2010 in blogging

After running this blog for some years on Serendipity, I finally switched to wordpress. All my other blogs are running on wordpress, so this was the only black sheep.
I use the Aspire theme for many of my other blogs but for this one a work/desk like theme seemed more appropriate. So I took a quick [...]

 
1

97TESPK: Scoping Methods

on Mar 5, 2010 in 97TESPK, code, patterns, writing

Now that 97 things every programmer should know lies on my compass table, I’ll post my contributions here that didn’t make it into the book.
The first is “scoping methods” which I thought about while reading Uncle Bob Martin’s Clean Code. He discussed scoping variables but only about putting methods near to each other. Obviously there [...]

 
0

12 patterns of development

on Oct 5, 2009 in patterns

Martin Fowlers Pattern talk at #jaoo made me make this 12 patterns of development song.
Have fun
Michael
On the twelfth day of Development,
my true dev sent to me
Twelve bridges bridging,
Eleven factories making,
Ten observers observing,
Nine builders building,
Eight visitors a-visiting,
Seven composites composing,
Six iterarators iterating,
Five golden states,
Four calling proxies,
Three nice adaptors,
Two commands commanding,
And a singleton in a pair tree!

Copyright © 2007-2013 Better Software Development All rights reserved.
Multi v1.4.5 a child of the Desk Mess Mirrored v1.4.6 theme from BuyNowShop.com.