Importing Mapping Metaphor into Neo4j

Posted by Michael Hunger on Sep 30, 2017 in Uncategorized

I came across this tweet, which sounded really interesting.

@MappingMetaphor: Metaphor Map now complete! Remaining data now up, and showing nearly 12,000 metaphorical connections: http://mappingmetaphor.arts.gla.ac.uk/

Mapping Metaphor

The Metaphor Map of English shows the metaphorical links which have been identified between different areas of meaning. These links can be from the Anglo-Saxon period right up to the present day so the map covers 1300 years of the English language. This allows us the opportunity to track metaphorical ways of thinking and expressing ourselves over more than a millennium; see the Metaphor in English section for more information.

The Metaphor Map was built as part of the Mapping Metaphor with the Historical Thesaurus project. This was completed by a team in English Language at the University of Glasgow and funded by the Arts and Humanities Research Council from 2012 to early 2015. The Metaphor Map is based on the Historical Thesaurus of English, which was published in 2009 by Oxford University Press as the Historical Thesaurus of the Oxford English Dictionary.

The site is really nice and fun to explore, with an interesting data visualization of the metaphoric connections between areas of language and thought:


When most people think of metaphor, they cast their minds back to school and remember examples from poetry and drama, such as Shakespeare’s “Juliet is the sun”. This is unsurprising; metaphor is usually described as a literary phenomenon used to create arresting images in the mind of the reader. However, linguists would argue that metaphor is far more pervasive within our language and indeed within thought itself.

Useful natural language correlation network are always fun to work with, so let’s have a look at it in a graph database.

Install Neo4j & APOC

  1. Download and install Neo4j-Desktop from http://neo4j.com/download/other-releases

  2. Create a database and add the APOC procedure library.

  3. I also installed Neo4j Graph Algorithms to use later.

  4. Start the database.

Download Data

All the data is available from:
Mapping Metaphor with the Historical Thesaurus. 2015. Metaphor Map of English Glasgow: University of Glasgow. http://mappingmetaphor.arts.gla.ac.uk.
  1. select “Advanced Search”,

  2. select all categories (you’re interested in)

  3. select “Connections between selected sections and all other sections”

  4. Metaphor Strength: “Both”

  5. Click “Search”

  6. Select “View results as a table”

  7. Click the “Download” icon in the left box

The downloaded file “metaphor.csv” should contain almost 12k lines of metaphors:

Copy metaphor.csv into the import folder of your database (“Open Folder”) or in an http-accessible location to load via an http-url.

Run Import

Our data model is really simple, we have

  1. :Category nodes with id and name.

  2. :Strong or :Weak relationships between them with the start property for the start era and examples for the example words.

A more elaborate model could model the Metaphor as node, with the Words too and Era too and connect them.
I was just not sure, what to name the metaphor, that information was missing in the data.
But for this demonstration the simpler model is good enough.

For good measure.

create constraint on (c:Category) assert c.id is unique;

Run this Cypher statement to import in a few seconds

// load csv as individual lines keyed with header names
LOAD CSV WITH HEADERS FROM "file:///metaphor.csv" AS line

// get-or-create first category (note typo in name header)
merge (c1:Category {id:line.`Category 1 ID`}) ON CREATE SET c1.name=line.`Categroy 1 Name`
// get-or-create second category
merge (c2:Category {id:line.`Category 2 ID`}) ON CREATE SET c2.name=line.`Category 2 Name`

// depending on direction flip order of c1,c2
with line, case line.Direction when '>' then [c1,c2] else [c2,c1] end as cat,

// split words on ';' and remove last empty entry
     apoc.coll.toSet(split(line.`Examples of metaphor`,';'))[0..-1] as words

// create relatiosnship with dynamic type, set era & words as relatiosnship properties
call apoc.create.relationship(cat[0],line.Strength,{start:line.`Start Era`, examples:words},cat[1]) yield rel

// return rows processed
return count(*)

I rendered the category nodes pretty large so that you can read the names, and have the “Strong” links display their “words” instead.


For finding categories quickly

create index on :Category(name);

Run graph algorithms.

Degree distribution

│"type"  │"direction"│"total"│"p50"│"p75"│"p90"│"p95"│"p99"│"p999"│"max"│"min"│"mean"           │
│"Weak"  │"OUTGOING" │7908   │11   │31   │48   │61   │84   │100   │100  │0    │19.10144927536232│
│"Strong"│"OUTGOING" │3974   │3    │12   │28   │37   │86   │107   │107  │0    │9.599033816425122│

Top 10 Categories by in-degree:

MATCH (c:Category)
WITH c,size( (c)-->()) as out,size( (c)<--()) as in
RETURN c.id, c.name,in, out

│"c.id"│"c.name"                 │"in"│"out"│
│"2D06"│"Emotional suffering"    │119 │7    │
│"2C02"│"Bad"                    │119 │7    │
│"3M06"│"Literature"             │116 │29   │
│"1O22"│"Behaviour and conduct"  │109 │10   │
│"3L02"│"Money"                  │106 │44   │
│"2C01"│"Good"                   │105 │2    │
│"1P28"│"Greatness and intensity"│104 │2    │
│"2A22"│"Truth and falsity"      │104 │5    │
│"2D08"│"Love and friendship"    │100 │17   │
│"2A18"│"Intelligibility"        │99  │5    │

Outgoing Page-Rank of Categories

call algo.pageRank.stream(null,null) yield node, score
with node, toInt(score*10) as score order by score desc limit 10
return node.name, score/10.0 as score;

│"node.name"                           │"score"│
│"Greatness and intensity"             │5.6    │
│"Colour "                             │3.5    │
│"Unimportance"                        │3.5    │
│"Importance"                          │3.4    │
│"Hatred and hostility"                │3.4    │
│"Plants"                              │2.9    │
│"Good"                                │2.9    │
│"Age"                                 │2.8    │
│"Love and friendship"                 │2.7    │
│"Memory, commemoration and revocation"│2.6    │

Funny that both importance and unimportance have such a high rank.

call algo.pageRank.stream(null,null,{direction:'INCOMNG'}) yield node, score
with node, toInt(score*10) as score order by score desc limit 10
return node.name, score/10.0 as score;

Betweeness Centrality

Which categories connect others:

call algo.betweenness.stream('Category','Strong') yield nodeId, centrality as score
match (node) where id(node) = nodeId
with node, toInt(score) as score order by score desc limit 10
return node.id, node.name, score;

│"node.id"│"node.name"                                │"score"│
│"2C01"   │"Good"                                     │165912 │
│"1E02"   │"Animal categories, habitats and behaviour"│131109 │
│"3D05"   │"Authority, rebellion and freedom"         │108292 │
│"2D06"   │"Emotional suffering"                      │87551  │
│"1J34"   │"Colour "                                  │83595  │
│"1E05"   │"Insects and other invertebrates"          │77171  │
│"3D01"   │"Command and control"                      │71873  │
│"1O20"   │"Vigorous action and degrees of violence"  │65028  │
│"1C03"   │"Mental health"                            │64567  │
│"1F01"   │"Plants"                                   │59444  │

There are many other explorative queries and insights we can draw from this.

Let me know in the comments what you’d be interested in.


Fullstack JavaScript – Neo4j Script Procedures

Posted by Michael Hunger on Apr 1, 2017 in cypher, neo4j

Imagine, being a fullstack JavaScript developer and not just using the language in the frontend, middleware or backend but also to create your user-defined procedures and functions in the database.

Several other databases support a similar approach for views and user defined extensions, and now you can do it with Neo4j too.

Already early last year, Neo4j’s user defined procedures were still in their infancy.
I had just written an article about the Javas JavaScript engine “Nashorn”.

So naturally I experimented with using procedures to dynamically create and run JavaScript functions.

The function mapping is stored in Neo4j’s graph properties.

You could create JavaScript functions with a name and body and then later call them by name and passing parameters along.

CALL scripts.function('users', '
function users(name) {
  return collection(db.findNodes(label("User"),'lastname',name));

CALL scripts.run('users','Anderson') YIELD value as user;

// or call as function, returns a list
RETURN scripts.run('users','Smith');

That worked all quite well, but I didn’t find the time to turn that into a proper project.

Later in the year I got some feature and pull requests on the APOC procedure library to add such functions.

As there are some concerns esp. from corporate users about scripting support, I pulled my work into a separate project: Neo4j Script Procedures

So, when I came across this tweet, it reminded me of wanting to update the project.

I thought it was a good opportunity to upgrade and release the project.

So, now you can try to run JavaScript functions from Neo4j’s Cypher by grabbing the jar-file from the latest release.

Just put it into $NEO4J_HOME/plugins and restart your server.

Note: In Neo4j Community Desktop, there is a directory chooser on the “Options” for the plugins directory)

The of neo4j-script-procedures release does not support Neo4j 3.1.2 as there are some incompatibilities with procedures creating new property-names.
It should work with 3.1.0, 3.1.1 or 3.1.3 though.

Let me know what you think and how we can improve this little useful library, please raise issues on the repository for feedback and problems.

Tags: , ,


Creating a Neo4j Example Graph with the Arrows Tool

Posted by Michael Hunger on Mar 21, 2017 in cypher, import

Some years ago my colleague Alistair Jones created a neat little tool in JavaScript to edit and render example graphs in a consistent way.

It is aptly named Arrows and you can find it here: http://www.apcjones.com/arrows

We mostly use it for presentations, but also to show data models for Neo4j GraphGists and Neo4j Browser Guides.
Because it also stores the positions of nodes, it’s always true to the same layout and doesn’t wiggle around.


Read more…


Academy Awards (Oscars) from Kaggle to Neo4j

Posted by Michael Hunger on Mar 9, 2017 in cypher, import, neo4j
This is part 1, in the next part, we’ll look at using import.io to scrape IMDB and the Academy Awards Database.
You can query the imported data in this instance (user/pass:oscars) of the brand new Neo4j Sandbox.

I came across the tweet from @LynnLangit about first step with mxnet, which I really liked.

lynn langit mxnet.jpg

So I wanted to do the same for Neo4j and was looking for a good dataset.

Then I realized that the 89th Academy Awards (Oscars) ceremony was the next day.
I was really looking forward to it, hoping it would come with some strong statements towards the current administration.
And then him rage tweeting about it on Monday morning.

But instead we got a fun Jimmy Kimmel performance and the well know Moonlight and La-La-Land mess-up by the (ex)-PWC people.

So I found the data and imported it and had this post ready to go.

But then got distracted trying to scrape IMDB with import.io and missed the date.

But as it is a nice dataset interestingly not as widely available as you’d think, I feel it’s still worth publishing.

So enjoy my struggles with data (quality).

Read more…


5 Tips & Tricks for Fast Batched Updates of Graph Structures with Neo4j and Cypher

Posted by Michael Hunger on Mar 2, 2017 in Uncategorized, cypher, import, neo4j

Michael Hunger, @mesirii

When you’re writing a lot of data to the graph from your application or library, you want to be efficent.

Inefficient Solutions

These approaches are not very efficient:

  • hard coding values instead of using parameters

  • sending a single query / tx per individual update

  • sending many single queries within a single tx with individual updates

  • generating large, complex statements (hundreds of lines) and sending one of them per tx and update

  • sending in HUGE (millions) of updates in a single tx, will cause out-of-memory issues

Read more…


The Reddit Meme Graph with Neo4j

Posted by Michael Hunger on Feb 25, 2017 in cypher, import

Saturday night after not enough drinks, I came across these tweets by @LeFloatingGhost.

memegraph tweet.jpg

This definitely looks like a meme graph.

We can do that too

memegraph meme.jpg

Read more…



User Defined Functions in Neo4j 3.1.0-M10

Posted by Michael Hunger on Oct 6, 2016 in apoc, cypher

Neo4j 3.1 brings some really neat improvements in Cypher alongside other cool features

I already demonstrated the – GraphQL inspired – map projections and pattern comprehensions in my last blog post.

User Defined Procedures

In the 3.0 release my personal favorite was user defined procedures which can be implemented using Neo4j’s Java API and called directly from Cypher.
You can tell, because I wrote about half of the 270 procedures for the APOC procedure collection “, with the remainder provided by other contributors.

Remember the syntax: …​ CALL namespace.procedure(arg1, arg2) YIELD col1, col2 AS alias …​

MATCH (from:Place {coords:{from}}), (to:Place {coords:{to}})

CALL apoc.algo.dijkstra(from, to, "ROAD", "cost") YIELD path, weight

RETURN nodes(path)
ORDER BY weight LIMIT 10;

Read more…


Neo4j 3.0 Stored Procedures

Posted by Michael Hunger on Feb 29, 2016 in cypher, java

One of the many exciting features of Neo4j 3.0 are “Stored Procedures” that, unlike the existing Neo4j-Server extensions are directly callable from Cypher.

At the time of this writing it is only possible to call them in a stand-alone statement with CALL package.procedure(params)
but the plan is to make them a fully integrated part of Cypher statements.
Either by making CALL a clause or by turning procedures into function-expressions (which would be my personal favorite).

Currently procedures can only be written in Java (or other JVM languages).
You might say, “WTF …​ Java”, but it is less tedious than it sounds.

First of all, the effort of setting up a procedure project, writing and building it is minimal.

To get up and running you first need a recent copy of Neo4j 3.0,
either the 3.0.0-M04 milestone or the latest build from the Alpha Site.

To get you started you also need a JDK and a build tool like Gradle or Maven.

You can effectively copy the procedure template example that Jake Hansson provided in neo4j-examples as a starting point.

But let me quickly walk you through an even simpler example (GitHub Repository).

You need to declare the org.neo4j:neo4j:3.0.0[-M04] dependency in the provided scope, to get the necessary annotations and the Neo4j API to talk to the database.

project.ext {
    neo4j_version = ""
dependencies {
	compile group: "org.neo4j", name:"neo4j", version:project.neo4j_version
	testCompile group: "org.neo4j", name:"neo4j-kernel", version:project.neo4j_version, classifier:"tests"
	testCompile group: "org.neo4j", name:"neo4j-io", version:project.neo4j_version, classifier:"tests"
	testCompile group: "junit", name:"junit", version:4.12

If you have a great idea on what kind of procedure you want to write, just open a file with a new class.

Please note that the only package and method names become the procedure name (but not the class name).

In our example we will create a very simple procedure that just computes the minimum and maximum degrees of a certain label.

The reference to Neo4j’s GraphDatabaseService instance is injected into your class into the field annotated with @Context.
As procedures are meant to be stateless, declaring non-injected non-static fields is not allowed.

In our case the procedure will be named stats.degree and called like CALL stats.degree('User').

package stats;

public class GraphStatistics {

    @Context private GraphDatabaseService db;

    // Result class
    public static class Degree {
        public String label;
        // note, that "int" values are not supported
        public long count, max, min = Long.MAX_VALUE;

        // method to consume a degree and compute min, max, count
        private void add(long degree) {
          if (degree < min) min = degree;
          if (degree > max) max = degree;
          count ++;

    public Stream<Degree> degree(String label) {
        // create holder class for results
        Degree degree = new Degree(label);
        // iterate over all nodes with label
        try (ResourceIterator it = db.findNodes(Label.label(label))) {
            while (it.hasNext()) {
               // submit degree to holder for consumption (i.e. max, min, count)
        // we only return a "Stream" of a single element in this case.
        return Stream.of(degree);

If you want to test the procedures quickly without spinning up an in-process server and connecting to it remotely (e.g. via the new binary bolt protocol as shown in the procedure-template), then you can use the test-facilities of Neo4j’s Java API.

Now we can test our new and shiny procedure by writing a small unit-test.

package stats;

class GraphStatisticsTest {
    @Test public void testDegree() {
        // given Alice knowing Bob and Charlie and Dan knowing no-one
        db.execute("CREATE (alice:User)-[:KNOWS]->(bob:User),(alice)-[:KNOWS]->(charlie:User),(dan:User)").close();

        // when retrieving the degree of the User label
        Result res = db.execute("CALL stats.degree('User')");

        // then we expect one result-row with min-degree 0 and max-degree 2
        Map<String,Object> row = res.next();
        assertEquals("User", row.get("label"));
        // Dan has no friends
        assertEquals(0, row.get("min"));
        // Alice knows 2 people
        assertEquals(2, row.get("max"));
        // We have 4 nodes in our graph
        assertEquals(4, row.get("count"));
        // only one result record was produced

Of course you can use procedures to create procedures, e.g. in other languages that are supported natively on the JVM like JavaScript via Nashorn, or Clojure, Groovy, Scala, Frege (Haskell), (J)Ruby or (J/P)ython.
I wrote one for creating and running procedures implemented in JavaScript.

There are many other cool things that you can do with procedures, see the resources below.

If you have ideas for procedures or wrote some of your own, please let us know.

Join our public Slack channel and visit #neo4j-procedures.



Using XRebel 2 with Neo4j

Posted by Michael Hunger on May 5, 2015 in neo4j

At Spring.IO in Barcelona I met my pal Oleg from ZeroTurnaround and we looked at how the new XRebel 2
integrates with Neo4j, especially with the remote access using the transactional Cypher http-endpoint.

As you probably know, Neo4j currently offers a remoting API based on HTTP requests (a new binary protocol is in development).

Our JDBC driver utilizes that http-based protocol to connect to the database and execute parameterized statements while adhering to the JDBC APIs.

XRebel is a lightweight Java Application Profiler which is loaded as java-agent and instruments your application.
It traces runtime for web requests and records your backend-application CPU usage, database- (JDBC) and http-requests to other services.
For web-applications it integrates automatically with the http-processing and injects profiling information into the response.

Movies Webapp

For this quick demo, we use the example Movies application which is available for many programming languages from our developer resources.
The application is just a plain Java webapp that serves three JSON endpoints to a simple Javascript frontend page.
The backend connects to Neo4j via JDBC to retrieve the requested information via our Cypher query language.

To prepare for running our app, just download, unzip and start Neo4j, open it on http://localhost:7474/ and run the :play movies statement in the Neo4j browser.
Then we can get and build the application and run it.
To test that it works, open the app in your browser at http://localhost:8080

git clone http://github.com/neo4j-contrib/developer-resources
cd developer-resources/language-guides/java/jdbc

mvn compile exec:java -DmainClass="org.neo4j.example.movies.Movies"

Setup with XRebel

To use XRebel we just download it, get an eval license and attach the jar as a java-agent to our application.

MAVEN_OPTS="-javaagent:$HOME/Downloads/xrebel/xrebel.jar" mvn compile exec:java -DmainClass="org.neo4j.example.movies.Movies"

If we check our example application page again, we see a small green XRebel icon in the left corner.
It provides access to the XRebel UI which has tabs for application performance, database queries, exceptions and more.

For our initial query for the “Matrix” movie, it shows both the request time for the web-application, as well as the database calls to Neo4j.
Interestingly both the JDBC level as well as the underlying http calls to Neo4j are displayed.

If we uglify our app, that our queries are executed incorrectly, simulating a n+1 select, then that shows clearly up in XRebel as massive database interaction.

Runtime Exceptions due to a programming error are also made immediately accessible from the XRebel UI.

For non-visual REST-services you can access the same profiling information via a special endpoint that is added to your application, in our case: http://localhost:8080/xrebel

As you can see, XRebel can give you quick insights in the performance profile of your Neo4j backed application and highlights which queries / pages / secondary requests
need further optimization.

Ping Oleg or me if you have more questions.

If you’re in London this week and want to have a relaxing election day,
make sure to grab a seat for GraphConnect on May 7, the Neo4j conference.
Ping me via email (michael at neo4j.org) for a steep discount a an avid reader of this blog.


How To: Neo4j Data Import – Minimal Example

Posted by Michael Hunger on Apr 18, 2015 in import, neo4j

We want to import data into Neo4j, there are too many resources with a lot of information which makes it confusing.
Here is the minimal thing you need to know.

Imagine the data coming from the export of a relational or legacy system, just plain CSV files without headers (this time).


Graph Model

Our graph Model would be very simple:

import data model.jpg
(p1:Person {userId:10, name:"Anne"})-[:KNOWS]->(p2:Person {userId:123,name:"John"})

Import with Neo4j Server & Cypher

  1. Download, install and start Neo4j Server.

  2. Open http://localhost:7474

  3. Run the following statements one by one:

I used http-urls here to run this as an interactive, live Graph Gist.

LOAD CSV FROM "https://gist.githubusercontent.com/jexp/d8f251a948f5df83473a/raw/people.csv" AS row
CREATE (:Person {userId: toInt(row[0]), name:row[1]});
LOAD CSV FROM "https://gist.githubusercontent.com/jexp/d8f251a948f5df83473a/raw/friendships.csv" AS row
MATCH (p1:Person {userId: toInt(row[0])}), (p2:Person {userId: toInt(row[1])})
CREATE (p1)-[:KNOWS]->(p2);
You can also use file-urls.
Best with absolute paths like file:/path/to/data.csv, on Windows use: file:c:/path/to/data.csv

If you want to find your people not only by id but also by name quickly, also run:

CREATE INDEX ON :Person(name);

For instance all second degree friends of “Anne” and on how many ways they can be reached.

MATCH (:Person {name:"Anne"})-[:KNOWS*2..2]-(p2)
RETURN p2.name, count(*) as freq

Bulk Data Import

For tens of millions up to billions of rows.

Shutdown the server first!!

Create two additional header files:


Execute from the terminal:

path/to/neo/bin/neo4j-import --into path/to/neo/data/graph.db  \
--nodes:Person people_header.csv,people.csv --relationships:KNOWS friendships_header.csv,friendships.csv

After starting your database again, run:


Copyright © 2007-2018 Better Software Development All rights reserved.
Multi v1.4.5 a child of the Desk Mess Mirrored v1.4.6 theme from BuyNowShop.com.