About Michael Hunger

Posts by Michael Hunger:

 
0

The Reddit Meme Graph with Neo4j

on Feb 25, 2017 in cypher, import

Saturday night after not enough drinks, I came across these tweets by @LeFloatingGhost.

memegraph tweet.jpg

This definitely looks like a meme graph.

We can do that too

memegraph meme.jpg

Recorded Session

If you want to see me struggle get this going live, watch my session here

memegraph gif preview.jpg

If you want to see an interactive version of this post, check it out at the Graph Gist Collection.

memegraph graphgist.jpg

Find us some memes

sticker b222a421fb6cf257985abfab188be7d6746866850efe2a800a3e57052e1a2411.png

There is this really nice CSV from Reddit of the top memes around:

And grab an empty Neo4j Sandbox from http://neo4jsandbox.com.

What’s the data like?

Check CSV

WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url
LOAD CSV WITH HEADERS FROM url AS row
RETURN count(*);
╒══════════╕
│"count(*)"│
╞══════════╡
│"1000"    │
└──────────┘
WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url
LOAD CSV WITH HEADERS FROM url AS row
RETURN row limit 3;
╒════════════════════════════════════════════════════════════════════════════════════════════════════╕
│"row"                                                                                               │
╞════════════════════════════════════════════════════════════════════════════════════════════════════╡
│{"over_18":"False","name":"t3_1edsw9","permalink":"http://www.reddit.com/r/memes/comments/1edsw9/can│
│_we_please_start_a_crazy_amy_meme_for_amy_of/","url":"http://www.quickmeme.com/meme/3uer85/","domain│
│":"quickmeme.com","distinguished":null,"score":"1831","downs":"1010","link_flair_css_class":null,"su│
│breddit_id":"t5_2qjpg","thumbnail":"http://b.thumbs.redditmedia.com/qpz4enS1CCFIs8Ys.jpg","id":"1eds│
│w9","author_flair_css_class":null,"link_flair_text":null,"selftext":null,"ups":"2841","num_comments"│
│:"120","edited":"False","title":"Can We Please Start a Crazy Amy Meme For Amy of Amy's Baking Compan│
│y?","created_utc":"1368627364.0","is_self":"False"}                                                 │
├────────────────────────────────────────────────────────────────────────────────────────────────────┤
...

Load them memes

WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url
LOAD CSV WITH HEADERS FROM url AS row
WITH row LIMIT 10000
CREATE (m:Meme) SET m=row // we take it all into Meme nodes

Added 100 labels, created 100 nodes, set 1700 properties, statement completed in 120 ms.

Get some memes

MATCH (m:Meme) return m limit 25;
memegraph memes.jpg
MATCH (m:Meme) return m.id, m.title limit 5;
╒════════╤════════════════════════════════════════════════════════════════════════════════╕
│"m.id"  │"m.title"                                                                       │
╞════════╪════════════════════════════════════════════════════════════════════════════════╡
│"1edsw9"│"Can We Please Start a Crazy Amy Meme For Amy of Amy's Baking Company?"         │
├────────┼────────────────────────────────────────────────────────────────────────────────┤
│"1ihc34"│"Given the competitive nature of redditors, I assume you all feel the same way."│
├────────┼────────────────────────────────────────────────────────────────────────────────┤
│"1gmt99"│"This man left this woman..."                                                   │
├────────┼────────────────────────────────────────────────────────────────────────────────┤
│"1ds9y4"│"How to cure bad breath..."                                                     │
├────────┼────────────────────────────────────────────────────────────────────────────────┤

But we want the words !

Let’s grab the first meme and get going.

Split the text into words.

MATCH (m:Meme) WITH m limit 1
RETURN split(m.title, " ") as words;
["Can","We","Please","Start","a","Crazy","Amy","Meme","For","Amy","of","Amy's","Baking","Company?"]

CAN YOU HEAR ME?

MATCH (m:Meme) WITH m limit 1
RETURN split(toUpper(m.title), " ") as words;
["CAN","WE","PLEASE","START","A","CRAZY","AMY","MEME","FOR","AMY","OF","AMY'S","BAKING","COMPANY?"]

Remove Punctuation

Create an array of punctuation with split on empty string.

return split(",!?'.","") as chars;
[",","!","?","'","."]

And replace each of the characters with nothing ”

with "a?b.c,d" as word
return word,
       reduce(s=word, c IN split(",!?'.","") | replace(s,c,'')) as no_chars;
╒═════════╤══════════╕
│"word"   │"no_chars"│
╞═════════╪══════════╡
│"a?b.c,d"│"abcd"    │
└─────────┴──────────┘

We got us some nice words

MATCH (m:Meme)  WITH m limit 1
// lets split the text into words
RETURN split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words;
╒═════════════════════════════════════════════════════════════════════════════════════════════════╕
│"words"                                                                                          │
╞═════════════════════════════════════════════════════════════════════════════════════════════════╡
│["CAN","WE","PLEASE","START","A","CRAZY","AMY","MEME","FOR","AMY","OF","AMYS","BAKING","COMPANY"]│
└─────────────────────────────────────────────────────────────────────────────────────────────────┘

Enough words, where are the nodes?

Let’s create some word nodes

(merge does get-or-create)

MATCH (m:Meme)  WITH m limit 1
WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m
MERGE (a:Word {text:words[0]})
MERGE (b:Word {text:words[1]});

Our first two words

MATCH (n:Word) RETURN n;
memegraph two words.jpg

Unwind the ra(n)ge

But we want all in the array, so let’s unwind a range.

MATCH (m:Meme)  WITH m limit 1
WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m

UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx

MERGE (a:Word {text:words[idx]})
MERGE (b:Word {text:words[idx+1]});
MATCH (n:Word) RETURN n;

No Limits

MATCH (m:Meme) WITH m // no limits
WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m

UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx

MERGE (a:Word {text:words[idx]})
MERGE (b:Word {text:words[idx+1]});
memegraph all words.jpg
MATCH (n:Word) RETURN count(*);

Chain up the memes

Connect the words via :NEXT and store the meme-ids on each rel in an ids property

And for the first word (idx = 0) let’s also connect the Meme node to the first Word

MATCH (m:Meme) WITH m
WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m
UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx
MERGE (a:Word {text:words[idx]})
MERGE (b:Word {text:words[idx+1]})

// Connect the words via :NEXT and store the meme-ids on each rel in an `ids` property
MERGE (a)-[rel:NEXT]->(b) SET rel.ids = coalesce(rel.ids,[]) + [m.id]

// to later recreate the meme along the next chain
// connect the first word to the meme itself
WITH * WHERE idx = 0
MERGE (m)-[:FIRST]->(a);

Set 546 properties, created 614 relationships, statement completed in 65 ms.

Yay done!

MATCH (m:Meme)-[:FIRST]->(w:Word)-[:NEXT]->(w2:Word)
RETURN * LIMIT 33;
memegraph example.jpg

Which words appear most often

MATCH (w:Word)
WHERE length(w.text) > 4
RETURN w.text, size( (w)--() ) as relCount
ORDER BY relCount DESC LIMIT 10;
╒══════════════════╤══════════╕
│"w"               │"relCount"│
╞══════════════════╪══════════╡
│{"text":"AFTER"}  │"56"      │
├──────────────────┼──────────┤
│{"text":"REDDIT"} │"34"      │
├──────────────────┼──────────┤
│{"text":"ABOUT"}  │"33"      │
├──────────────────┼──────────┤
│{"text":"TODAY"}  │"33"      │
├──────────────────┼──────────┤
│{"text":"SCUMBAG"}│"32"      │
├──────────────────┼──────────┤
│{"text":"EVERY"}  │"31"      │
├──────────────────┼──────────┤
│{"text":"FIRST"}  │"30"      │
├──────────────────┼──────────┤
│{"text":"ALWAYS"} │"28"      │
├──────────────────┼──────────┤
│{"text":"FRIEND"} │"27"      │
├──────────────────┼──────────┤
│{"text":"THOUGHT"}│"24"      │
└──────────────────┴──────────┘

Now let’s find our memes again

// first meme
MATCH (m:Meme) WITH m limit 1
// from the :FIRST :Word follow the :NEXT chain
MATCH path = (m)-[:FIRST]->(w)-[rels:NEXT*..15]->() // let's follow the chain of words starting
// from the meme, where all relationships contain the meme-id
WHERE ALL(r in rels WHERE m.id IN r.ids)
RETURN *;
memegraph.jpg

Show meme by id

We can also get meme from the CSV list,
e.g. id ‘1kc9p2′ – ‘As stupid as memes are they can actually make valid points’

MATCH (m:Meme) WHERE m.id = '1kc9p2'

MATCH path = (m)-[:FIRST]->(w)-[rels:NEXT*..15]->()
WHERE ALL(r in rels WHERE m.id IN r.ids)

RETURN *;
memegraph 2.jpg

Done. Enjoy !

PS: If you want to connect your own stuff, grab a Neo4j Sandbox or use Neo4j on your machine.
If you have questions, ask me, Michael, on Twitter or on Slack

Tags:

 
0

User Defined Functions in Neo4j 3.1.0-M10

on Oct 6, 2016 in apoc, cypher

Neo4j 3.1 brings some really neat improvements in Cypher alongside other cool features

I already demonstrated the – GraphQL inspired – map projections and pattern comprehensions in my last blog post.

User Defined Procedures

In the 3.0 release my personal favorite was user defined procedures which can be implemented using Neo4j’s Java API and called directly from Cypher.
You [...]

 
0

Neo4j 3.0 Stored Procedures

on Feb 29, 2016 in cypher, java

One of the many exciting features of Neo4j 3.0 are “Stored Procedures” that, unlike the existing Neo4j-Server extensions are directly callable from Cypher.

At the time of this writing it is only possible to call them in a stand-alone statement with CALL package.procedure(params)
but the plan is to make them a fully integrated part of Cypher statements.
Either [...]

 
0

Using XRebel 2 with Neo4j

on May 5, 2015 in neo4j

At Spring.IO in Barcelona I met my pal Oleg from ZeroTurnaround and we looked at how the new XRebel 2
integrates with Neo4j, especially with the remote access using the transactional Cypher http-endpoint.

As you probably know, Neo4j currently offers a remoting API based on HTTP requests (a new binary protocol is in development).

Our JDBC driver utilizes [...]

 
1

Neo4j Server Extension for Single Page Experiments

on Apr 24, 2015 in neo4j, server

Sometimes you have a nice dataset in Neo4j and you’d want to provide a self-contained way of quickly exposing it to the outside world without a multi-tier setup.

So for experiments and proofs of concepts it would be helpful to be able to extend Neo4j Browser to accomodate new types of frames and commands.
Unfortunately we’re not [...]

 
6

How To: Neo4j Data Import – Minimal Example

on Apr 18, 2015 in import, neo4j

We want to import data into Neo4j, there are too many resources with a lot of information which makes it confusing.
Here is the minimal thing you need to know.

Imagine the data coming from the export of a relational or legacy system, just plain CSV files without headers (this time).

people.csv

1,”John”
10,”Jane”
234,”Fred”
4893,”Mark”
234943,”Anne”

friendships.csv

1,234
10,4893
234,1
4893,234943
234943,234
234943,1

Graph Model

Our graph Model would be very [...]

 
1

On Neo4j Indexes, Match & Merge

on Apr 11, 2015 in cypher, neo4j

We at Neo4j do our fair share to cause confusion of our users. I’m talking about indexes my friends.
My trusted colleagues Nigel Small – Index Confusion and Stefan Armbruster – Indexing an Overview already did a great job explaining the indexing situation in Neo4j,
I want to add a few more aspects here.

Since the release of [...]

 
6

Natural Language Analytics made simple and visual with Neo4j

on Jan 8, 2015 in cypher, fun

I was really impressed by this blog post on Summarizing Opinions with a Graph from Max and always waited for Part 2 to show up :)

The blog post explains an really interesting approach by Kavita Ganesan which uses a graph representation of sentences of review content to extract the most significant statements about a product.

Each [...]

 
2

Spring Data Neo4j 3.3.0 – Improving Remoting Performance

on Dec 9, 2014 in neo4j, spring-data-neo4j

With the first milestone of the Spring Data “Fowler” release train, Spring Data Neo4j 3.3.0.M1 was released. Besides a lot of smaller fixes, it contains one big improvement. I finally found some time to work on the remoting performance of the library, i.e. when used in conjunction with Neo4j Server. This blog post explains the [...]

 
0

The Story of GraphGen

on Nov 1, 2014 in community, development, neo4j

This is the story behind the really useful and ingenious Neo4j example graph data generator developed by Christophe Willemsen.

I don’t just want to show you the tool but also tell the story how it came to be.

First of all: The Neo4j Community is awesome.
There are so many enthusiastic and creative people, that it is often [...]

Copyright © 2007-2017 Better Software Development All rights reserved.
Multi v1.4.5 a child of the Desk Mess Mirrored v1.4.6 theme from BuyNowShop.com.