Sometimes you don’t see the forest for the trees. But if you do, you probably use a graph database.
Trees are one of the simple graph datastructures, directed acyclic graphs (DAGs).
For our example we use a time-tree that we want to import into the database.
1 year = 12 months = 365 days = 8.760 hours = 525.600 minutes = 31.536.000 seconds
So we have to import about 32M nodes and 32M relationships. Sounds good enough.
After reading the interesting blog post of my colleague Rik van Bruggen on “Media, Politics and Graphs” I thought it would be really cool to render it as a GrapGist. Especially, as he already shared all the queries as a GitHub Gist.
Unfortunately the dataset was a bit large for a sensible GraphGist representation, so I thought about means of extracting a smaller sample of his raw data that he made available (see his blog post for the link).
We want to run some test queries on an existing graph model but have no sample data at hand and also no input files (CSV,GraphML) that would provide it.
Why not create quickly it on our own just using cypher. First I thought about using Cypher to generate CSV files and loading them back, but it is much easier.
The domain is simple
(:User)-[:OWN]→(:Product) but good enough for collaborative filtering or demographic analysis.
With Neo4j 2.0 we got automatic schema indexes based on labels and properties for exact lookups of nodes on property values.
Fulltext and other indexes (spatial, range) are on the roadmap but not addressed yet.
For fulltext indexes you still have to use legacy indexes.
As you probably don’t want to add nodes to an index manually, the existing “auto-index” mechanism should be a good fit.
If you want to delete lots of data from a Neo4j database with Cypher
Just stop the server and delete the directory and start again
Fastest way with no leftovers, just delete
db/data/graph.db and you’re done.
Cypher Statement before 2.1
“Unknown Error” or
OutOfMemoryException is a symptom that your transaction size gets too big and consumes too much memory.
That is unrelated to your config, you just have to keep it in check.
If you want to delete elements in a batched way use something like this:
OPTIONAL MATCH (a)-[r]-()
Run until the result stays 0. This query will find at most 10000 nodes then find all their rels and then delete both. But how would you do it in Neo4j 2.1 ?
State of the Cypher UNION
Neo4j 2.0 introduced the
UNION (ALL) clause which can join the results of 2 or more complete statements into a single result. Each of the statements is fully formed, it can contain result projection, pagination and
You need to have the same amount and names of columns to be joined in an UNION. UNION by default returns the distinct set of results.
UNION ALL will return the full results (and will be faster and less memory intensive).
The transactional http endpoint that was added to Neo4j 2.0 is really easy to use.
You can stream batches of cypher statements with their parameters to the server and receive the answers in a streaming fashion too.
Accessing your beloved Neo4j-Shell via RMI works ok on localhost or in your intranet. But over the internet you don’t really want to expose RMI ports.
So most installations of Neo4j, e.g. on EC2 use basic auth as simplest security measure.
Also the Neo4j hosting providers like GrapheneDB.com or GraphHost.com offer basic auth by default to “secure” access your Neo4j instance, hopefully also SSL soon.
How do you access the Neo4j Shell on the server from your usual terminal command line ?
Sometimes when you work with the Neo4j community you get a database handed that you don’t know anything about.
Then it is handy to get an idea what’s in there. Which kinds of node-labels are used, what relationship-types connect these nodes and which properties are floating around.
Usually all those answers are just a Cypher Statement away:
Labels and their occurrence:
MATCH (n) RETURN labels(n),count(*);
What is connected and how (stolen from Neo4j-Browser):
MATCH (a)-[r]->(b) RETURN labels(a) AS This, type(r) as To, labels(b) AS That, count(*) AS Count