The Reddit Meme Graph with Neo4j

Posted by Michael Hunger on Feb 25, 2017 in cypher, import |

Saturday night after not enough drinks, I came across these tweets by @LeFloatingGhost.

memegraph tweet.jpg

This definitely looks like a meme graph.

We can do that too

memegraph meme.jpg

Recorded Session

If you want to see me struggle get this going live, watch my session here

memegraph gif preview.jpg

If you want to see an interactive version of this post, check it out at the Graph Gist Collection.

memegraph graphgist.jpg

Find us some memes

sticker b222a421fb6cf257985abfab188be7d6746866850efe2a800a3e57052e1a2411.png

There is this really nice CSV from Reddit of the top memes around:

And grab an empty Neo4j Sandbox from http://neo4jsandbox.com.

What’s the data like?

Check CSV

WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url
LOAD CSV WITH HEADERS FROM url AS row
RETURN count(*);
╒══════════╕
│"count(*)"│
╞══════════╡
│"1000"    │
└──────────┘
WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url
LOAD CSV WITH HEADERS FROM url AS row
RETURN row limit 3;
╒════════════════════════════════════════════════════════════════════════════════════════════════════╕
│"row"                                                                                               │
╞════════════════════════════════════════════════════════════════════════════════════════════════════╡
│{"over_18":"False","name":"t3_1edsw9","permalink":"http://www.reddit.com/r/memes/comments/1edsw9/can│
│_we_please_start_a_crazy_amy_meme_for_amy_of/","url":"http://www.quickmeme.com/meme/3uer85/","domain│
│":"quickmeme.com","distinguished":null,"score":"1831","downs":"1010","link_flair_css_class":null,"su│
│breddit_id":"t5_2qjpg","thumbnail":"http://b.thumbs.redditmedia.com/qpz4enS1CCFIs8Ys.jpg","id":"1eds│
│w9","author_flair_css_class":null,"link_flair_text":null,"selftext":null,"ups":"2841","num_comments"│
│:"120","edited":"False","title":"Can We Please Start a Crazy Amy Meme For Amy of Amy's Baking Compan│
│y?","created_utc":"1368627364.0","is_self":"False"}                                                 │
├────────────────────────────────────────────────────────────────────────────────────────────────────┤
...

Load them memes

WITH 'https://raw.githubusercontent.com/umbrae/reddit-top-2.5-million/master/data/memes.csv' as url
LOAD CSV WITH HEADERS FROM url AS row
WITH row LIMIT 10000
CREATE (m:Meme) SET m=row // we take it all into Meme nodes

Added 100 labels, created 100 nodes, set 1700 properties, statement completed in 120 ms.

Get some memes

MATCH (m:Meme) return m limit 25;
memegraph memes.jpg
MATCH (m:Meme) return m.id, m.title limit 5;
╒════════╤════════════════════════════════════════════════════════════════════════════════╕
│"m.id"  │"m.title"                                                                       │
╞════════╪════════════════════════════════════════════════════════════════════════════════╡
│"1edsw9"│"Can We Please Start a Crazy Amy Meme For Amy of Amy's Baking Company?"         │
├────────┼────────────────────────────────────────────────────────────────────────────────┤
│"1ihc34"│"Given the competitive nature of redditors, I assume you all feel the same way."│
├────────┼────────────────────────────────────────────────────────────────────────────────┤
│"1gmt99"│"This man left this woman..."                                                   │
├────────┼────────────────────────────────────────────────────────────────────────────────┤
│"1ds9y4"│"How to cure bad breath..."                                                     │
├────────┼────────────────────────────────────────────────────────────────────────────────┤

But we want the words !

Let’s grab the first meme and get going.

Split the text into words.

MATCH (m:Meme) WITH m limit 1
RETURN split(m.title, " ") as words;
["Can","We","Please","Start","a","Crazy","Amy","Meme","For","Amy","of","Amy's","Baking","Company?"]

CAN YOU HEAR ME?

MATCH (m:Meme) WITH m limit 1
RETURN split(toUpper(m.title), " ") as words;
["CAN","WE","PLEASE","START","A","CRAZY","AMY","MEME","FOR","AMY","OF","AMY'S","BAKING","COMPANY?"]

Remove Punctuation

Create an array of punctuation with split on empty string.

return split(",!?'.","") as chars;
[",","!","?","'","."]

And replace each of the characters with nothing ”

with "a?b.c,d" as word
return word,
       reduce(s=word, c IN split(",!?'.","") | replace(s,c,'')) as no_chars;
╒═════════╤══════════╕
│"word"   │"no_chars"│
╞═════════╪══════════╡
│"a?b.c,d"│"abcd"    │
└─────────┴──────────┘

We got us some nice words

MATCH (m:Meme)  WITH m limit 1
// lets split the text into words
RETURN split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words;
╒═════════════════════════════════════════════════════════════════════════════════════════════════╕
│"words"                                                                                          │
╞═════════════════════════════════════════════════════════════════════════════════════════════════╡
│["CAN","WE","PLEASE","START","A","CRAZY","AMY","MEME","FOR","AMY","OF","AMYS","BAKING","COMPANY"]│
└─────────────────────────────────────────────────────────────────────────────────────────────────┘

Enough words, where are the nodes?

Let’s create some word nodes

(merge does get-or-create)

MATCH (m:Meme)  WITH m limit 1
WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m
MERGE (a:Word {text:words[0]})
MERGE (b:Word {text:words[1]});

Our first two words

MATCH (n:Word) RETURN n;
memegraph two words.jpg

Unwind the ra(n)ge

But we want all in the array, so let’s unwind a range.

MATCH (m:Meme)  WITH m limit 1
WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m

UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx

MERGE (a:Word {text:words[idx]})
MERGE (b:Word {text:words[idx+1]});
MATCH (n:Word) RETURN n;

No Limits

MATCH (m:Meme) WITH m // no limits
WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m

UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx

MERGE (a:Word {text:words[idx]})
MERGE (b:Word {text:words[idx+1]});
memegraph all words.jpg
MATCH (n:Word) RETURN count(*);

Chain up the memes

Connect the words via :NEXT and store the meme-ids on each rel in an ids property

And for the first word (idx = 0) let’s also connect the Meme node to the first Word

MATCH (m:Meme) WITH m
WITH split(reduce(s=toUpper(m.title), c IN split(",!?'.","") | replace(s,c,'')), " ") as words, m
UNWIND range(0,size(words)-2) as idx // turn the range into rows of idx
MERGE (a:Word {text:words[idx]})
MERGE (b:Word {text:words[idx+1]})

// Connect the words via :NEXT and store the meme-ids on each rel in an `ids` property
MERGE (a)-[rel:NEXT]->(b) SET rel.ids = coalesce(rel.ids,[]) + [m.id]

// to later recreate the meme along the next chain
// connect the first word to the meme itself
WITH * WHERE idx = 0
MERGE (m)-[:FIRST]->(a);

Set 546 properties, created 614 relationships, statement completed in 65 ms.

Yay done!

MATCH (m:Meme)-[:FIRST]->(w:Word)-[:NEXT]->(w2:Word)
RETURN * LIMIT 33;
memegraph example.jpg

Which words appear most often

MATCH (w:Word)
WHERE length(w.text) > 4
RETURN w.text, size( (w)--() ) as relCount
ORDER BY relCount DESC LIMIT 10;
╒══════════════════╤══════════╕
│"w"               │"relCount"│
╞══════════════════╪══════════╡
│{"text":"AFTER"}  │"56"      │
├──────────────────┼──────────┤
│{"text":"REDDIT"} │"34"      │
├──────────────────┼──────────┤
│{"text":"ABOUT"}  │"33"      │
├──────────────────┼──────────┤
│{"text":"TODAY"}  │"33"      │
├──────────────────┼──────────┤
│{"text":"SCUMBAG"}│"32"      │
├──────────────────┼──────────┤
│{"text":"EVERY"}  │"31"      │
├──────────────────┼──────────┤
│{"text":"FIRST"}  │"30"      │
├──────────────────┼──────────┤
│{"text":"ALWAYS"} │"28"      │
├──────────────────┼──────────┤
│{"text":"FRIEND"} │"27"      │
├──────────────────┼──────────┤
│{"text":"THOUGHT"}│"24"      │
└──────────────────┴──────────┘

Now let’s find our memes again

// first meme
MATCH (m:Meme) WITH m limit 1
// from the :FIRST :Word follow the :NEXT chain
MATCH path = (m)-[:FIRST]->(w)-[rels:NEXT*..15]->() // let's follow the chain of words starting
// from the meme, where all relationships contain the meme-id
WHERE ALL(r in rels WHERE m.id IN r.ids)
RETURN *;
memegraph.jpg

Show meme by id

We can also get meme from the CSV list,
e.g. id ‘1kc9p2′ – ‘As stupid as memes are they can actually make valid points’

MATCH (m:Meme) WHERE m.id = '1kc9p2'

MATCH path = (m)-[:FIRST]->(w)-[rels:NEXT*..15]->()
WHERE ALL(r in rels WHERE m.id IN r.ids)

RETURN *;
memegraph 2.jpg

Done. Enjoy !

PS: If you want to connect your own stuff, grab a Neo4j Sandbox or use Neo4j on your machine.
If you have questions, ask me, Michael, on Twitter or on Slack

Share and Enjoy:
  • Print
  • Digg
  • del.icio.us
  • Facebook
  • LinkedIn
  • Netvibes
  • PDF
  • Ping.fm

Tags:

Leave a Reply

XHTML: You can use these tags:' <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Copyright © 2007-2017 Better Software Development All rights reserved.
Multi v1.4.5 a child of the Desk Mess Mirrored v1.4.6 theme from BuyNowShop.com.