The Story of GraphGen

Posted by Michael Hunger on Nov 1, 2014 in community, development, neo4j

This is the story behind the really useful and ingenious Neo4j example graph data generator developed by Christophe Willemsen.

I don’t just want to show you the tool but also tell the story how it came to be.

First of all: The Neo4j Community is awesome.
There are so many enthusiastic and creative people, that it is often humbling for me to be part of it.

So October 1st, Christophe tweeted out a short screencast he recorded, about a new tool (NeoGen) he was developing which converted a YAML domain specification into Cypher statements to populate a Neo4j database.

Read more…



Posted by Michael Hunger on Oct 18, 2014 in cypher, import, neo4j

I have to admit that using our LOAD CSV facility is trickier than you and I would expect.
Several people ran into issues that they could not solve on their own.

My first blog post on LOAD CSV is still valid in it own right, and contains important aspects that I won’t repeat here.
Both in terms of data quality checking (broken CSV files, misspelt header names or incorrect data types) as well as the concern of transaction size, where PERIODIC COMMIT comes to the rescue.

To address the most frequent issues and questions, I decided to write this follow up post.

In general you might have better experience using Neo4j-Enterprise as it contains some components which are more memory efficient.

If you want to import much more than 10-15 million lines of data, you might consider using our non-transactional batch-insertion facilities:

Read more…


Flexible Neo4j Batch Import with Groovy

Posted by Michael Hunger on Oct 9, 2014 in import, neo4j

You might have data as CSV files to create nodes and relationships from in your Neo4j Graph Database.
It might be a lot of data, like many tens of million lines.
Too much for LOAD CSV to handle transactionally.

Usually you can just fire up my batch-importer and prepare node and relationship files that adhere to its input format requirements.

Your Requirements

There are some things you probably want to do differently than the batch-importer does by default:

  • not create legacy indexes

  • not index properties at all that you just need for connecting data

  • create schema indexes

  • skip certain columns

  • rename properties from the column names

  • create your own labels based on the data in the row

  • convert column values into Neo4j types (e.g. split strings or parse JSON)

Read more…


LOAD CSV into Neo4j quickly and successfully

Posted by Michael Hunger on Jun 25, 2014 in cypher, import

Since version 2.1 Neo4j provides out-of-the box support for CSV ingestion. The LOAD CSV command that was added to the Cypher Query language is a versatile and powerful ETL tool.
It allows you to ingest CSV data from any URL into a friendly parameter stream for your simple or complex graph update operation, that … conversion.

But hear my words of advice before you jump directly into using it. There are some tweaks and configuration aspects that you should know to be successful on the first run.

Data volume: LOAD CSV was built to support around 1M rows per import, it still works with 10M rows but you have to wait a bit, at 100M it’ll try your patience.
Except for tiny datasets never run it without the saveguard of periodic commits, which prevent you from large transactions overflowing your available database memory (JVM Heap).

The CSV used in this example is pretty basic, but enough to show some issues and make a point, it’s people and companies they work(ed) for.

PersonName,"Company Name",year
"Kenny Bastani","Neo Technology",2013
"Michael Hunger","Neo Technology",2010
"James Ward","Heroku",2011

Read more…


Rendering a Neo4j Database in UbiGraph

Posted by Michael Hunger on Jun 23, 2014 in cypher, server

I never heard of UbiGraph before, but this tweet by @a61dr41n made me curious.

So I checked it out. UbiGraph is a graph rendering server that is controlled remotely and also interactively with a XML-RPC API (which is a weird choice).
It comes with example clients in Java, Python, Ruby and C.
You can download it from here. After unzipping the file and starting bin/ubigraph_server &, you should see a black window rendering the void, waiting for your commands.

Read more…


Presentation: “Using AsciiArt to Analyse your SourceCode with Neo4j and OSS Tools” at GeekOut.ee 2014

Posted by Michael Hunger on Jun 15, 2014 in conference, neo4j, programming languages

During the awesome GeekOut conference organized by my friends at ZeroTurnaround I was asked to stand in for Tim Fox who couldn’t come.

So instead of using a existing presentation I decided to finally write one up over night that covers one aspect of graph databases that is close to my heart:

Software Analytics with Graphs

When I first learned about Neo4j in 2008, my first project was pulling in Java class-file information into Neo4j, to find interesting tidbits about the JDK. Fast forward 4 years.

Other things kept me busy until I 2012, when I was speaking at a InnoQ tech-day and thought this would be a good topic to talk about.

I was so amazed by projects that others did in this area and published a blog post on “Graph Databases and Software Metrics to show what I’ve found. These were:

  • Raoul-Gabriel Urma: Expressive and Scalable Source Code Queries with Graph Databases (Paper)
  • Rickard Öberg: NeoMVN is tracing maven dependencies (GitHub)
  • Pavlo Baron: Graphlr, a ANTLR storage in Neo4j (GitHub)

When having a train-ride with my friend Dirk from Buschmais for two hours, he got a full load of my excitement about this topic, and he saw a real good practical use for his daily work with large software projects. Having your projects structure in a graph allows you to:

  1. Query the graph structures for insights on the code level (e.g. code-smells)
  2. Enrich the graph structure with higher level, technical, architectural and business concepts
  3. Define rules and metrics based on those higher level concepts.
  4. Run the parsing, enrichment, metrics computation and rule checking as part of your build process, generating reports and failing it in case of violation of those rules

All those ideas resulted in an impressive open-source project called jQAssistant which does all of the above (and much more).

So back to my GeekOut presentation. I sat down late night until 3am and wrote it up in AsciiDoc (+ deck.js) so you can fork it from the github repo, download the PDF or view the HTML-Slides online.

The session has been recorded, I’ll embed the video as soon as it is online. Then you can even listen to my hoarse voice.


Styling Neo4j Server Visualisation

Posted by Michael Hunger on Jun 3, 2014 in neo4j, server

Styling Neo4j Server Visualisation

To give you a head start when using Neo4j-Browser I wanted to share these quick tips for styling and querying.


Read more…


Using LOAD CSV to import Git History into Neo4j

Posted by Michael Hunger on Jun 1, 2014 in cypher, neo4j

In this blog post, I want to show the power of LOAD CSV, which is much more than just a simple data ingestion clause for Neo4j’s Cypher.
I want to demonstrate how easy it is to use by importing a project’s git commit history into Neo4j. For demonstration purposes, I use Neo4j’s repository on GitHub, which contains
about 27000 commits.

It all started with this tweet by Paul Horn, a developer from Avantgarde Labs in my lovely Dresden.

I really liked the idea and wanted to take a look. His python script takes the following approach:

Read more…


Importing Forests into Neo4j

Posted by Michael Hunger on Apr 10, 2014 in cypher, neo4j

Sometimes you don’t see the forest for the trees. But if you do, you probably use a graph database.

Giant Tree

Trees are one of the simple graph datastructures, directed acyclic graphs (DAGs).

For our example we use a time-tree that we want to import into the database.

Data Volume

A quick soulver script (thanks Mark) later, we know how many nodes and rels (nodes-1), we will have to
import to represent a full year down to the second level.

1 year = 12 months = 365 days = 8.760 hours = 525.600 minutes = 31.536.000 seconds

So we have to import about 32M nodes and 32M relationships. Sounds good enough.

Read more…


Sampling A Neo4j Database

Posted by Michael Hunger on Mar 25, 2014 in cypher, neo4j

After reading the interesting blog post of my colleague Rik van Bruggen on “Media, Politics and Graphs” I thought it would be really cool to render it as a GrapGist. Especially, as he already shared all the queries as a GitHub Gist.


Unfortunately the dataset was a bit large for a sensible GraphGist representation, so I thought about means of extracting a smaller sample of his raw data that he made available (see his blog post for the link).

Read more…

Copyright © 2007-2014 Better Software Development All rights reserved.
Multi v1.4.5 a child of the Desk Mess Mirrored v1.4.6 theme from BuyNowShop.com.