Full-Text-Indexing (FTS) in Neo4j 2.0

Posted by Michael Hunger on Mar 17, 2014 in neo4j |

With Neo4j 2.0 we got automatic schema indexes based on labels and properties for exact lookups of nodes on property values.

Fulltext and other indexes (spatial, range) are on the roadmap but not addressed yet.

For fulltext indexes you still have to use legacy indexes.

As you probably don’t want to add nodes to an index manually, the existing “auto-index” mechanism should be a good fit.

To use that automatic index you have to configure the auto-index upfront to be a fulltext index and then secondly enable it in your settings.

Setup Node Auto-Index as Fulltext-Index

To configure the auto-index as fulltext index for your Neo4j Server use:

POST http://localhost:7474/db/data/index/node/
Accept: application/json; charset=UTF-8
Content-Type: application/json
{
  "name" : "node_auto_index",
  "config" : {
    "type" : "fulltext",
    "provider" : "lucene"
  }
}

You should get a response like this:

201: Created
Content-Type: application/json; charset=UTF-8
Location: http://localhost:7474/db/data/index/node/node_auto_index/
{
  "template" : "http://localhost:7474/db/data/index/node/node_auto_index/{key}/{value}",
  "type" : "fulltext",
  "provider" : "lucene"
}

Enable Node Auto-Index for certain properties

Configure and enable the auto-index in your conf/neo4j.properties. You have to enable the auto-index and also list the properties to be indexed upfront, before you insert any data.

node_auto_indexing=true
node_keys_indexable=title,description

If you configure it after the fact you have to re-set the properties with a cypher statement like this:

MATCH (n)
WHERE has(n.title)
SET n.title=n.title

If you already have many nodes in your database you have to batch it manually to cater for the transaction size limits, like this (increase SKIP by 50000 from 0 to until the query returns zero):

MATCH (n)
WHERE has(n.title)
SKIP 150000 LIMIT 50000
SET n.title=n.title
RETURN COUNT(*)

Using the Fulltext Auto-Index

You can use the fulltext auto-index by using a START-clause in Cypher, you can pass in any kind of lucene query syntax there.

START movie=node:node_auto_index("title:matr*")
MATCH (movie:Movie)<-[r:RATED]-(user)
WHERE r.rating > 4
RETURN movie, count(*) AS number, avg(r.rating) AS ratings
ORDER BY ratings desc, number desc

Java-API

You can also set it up programmatically in the Java API like this:

And pass your configuration to your EmbeddedGraphDatabase

GraphDatatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder(DB_PATH)
   .setConfig("node_auto_indexing","true").setConfig("node_keys_indexable","title,description")
   .newGraphDatabase();

db.index().forNodes( “node_auto_index”,
MapUtil.stringMap( IndexManager.PROVIDER, “lucene”, “type”, “fulltext” ) );

And then use it like this:

IndexHits<Node> nodes = db.index().forNodes( "node_auto_index").query("title:matr*");
for (Node n : nodes) {
   // do something
}
// remember to close indexhits if you don't exhaust it
nodes.close();

Custom Configuration

You can also configure additional specifics for the fulltext index, like a custom
analyzer class, just pass it to the config.

{
  "name" : "node_auto_index",
  "config" : {
    "type" : "fulltext",
    "provider" : "lucene"
    "to_lower_case" : true,
    "analyzer" : "com.example.indexing.MyAnalyzer"
  }
}
Share and Enjoy:
  • Print
  • Digg
  • del.icio.us
  • Facebook
  • LinkedIn
  • Netvibes
  • PDF
  • Ping.fm

20 Comments

  • [...] Full text Indexing in neo4j [...]

  • Dipesh Mitthalal says:

    This helped me.thanks for the effort

  • mike says:

    Thanks for this very helpful post!

    I’m finding a problem with the node_auto_index full text search – it doesn’t seem to lower case the search index by default and I can’t seem to switch it on.

    My config is spring-data-neo4j 3.0.2.RELEASE which pulls in the 2.0.1 neo4j kernel.

    I’m starting the database programmatically and have tried various things from

    node_keys_indexable.case_insensitive=name,description
    in conf/neo4j.properties and
    .setConfig(“to_lower_case”, “true”)
    neither change the behaviour (setConfig method is deprecated?) – I’m feeding neo4j.properties in via..
    .loadPropertiesFromFile(pathToProperties)

    Querying through cypher or the spring data repository gives same result i.e.

    neo4j-sh (?)$ start n=node:node_auto_index(“name:Neut*”) return n;

    returns 1 row

    neo4j-sh (?)$ start n=node:node_auto_index(“name:neut*”) return n;

    returns 0 rows (the actual value is Neutron).

    Any help greatly appreciated as this has stopped my neo4j project in its tracks.

    thanks, mike.

  • You have to configure the FTS upfront, but if you use SDN it does it for you, just use

    @Indexed(type=FULLTEXT, indexName="search") String text;

    then your “search” index is a fulltext index.

    The node_auto_index is not involved in SDN’s indexing, you can make it work but there is no need to. You would have to configure it before you use the db for the first time, e.g. via the Neo4j-Shell.

  • mike says:

    Thanks Michael – your suggestion works a treat (with a rebuild of the index).

    Just to mention the gotcha that caught me out – the step above to rebuild the index
    MATCH (n) WHERE has(n.title) SET n.title=n.title

    I assumed this would also work for the SDN index and when it didn’t I went back to the auto_index method. Obviously I was confused here so just to mention that you have to do the equivalent from within the SDN code – i.e. fetch and save the nodes using repository methods.

  • Flavio says:

    Hi Michael,

    I was wondering if you have done experiments with elasticsearch (or lucene) in combination with Neo4j via a plugin like elasticsearch-river-neo4j or a custom setup.

    Don’t you believe the ES-Neo4j combo could be a much more feature-rich solution than the native Neo4j full text search? I mean, in terms of faceting, stats, algorithms, discrimination.
    Your thoughts will be appreciated. Thanks!

  • Hi Flavio,

    I personally haven’t (not much ES experience) but I think it could be a power combo where you can use ES to find the right starting points using facets etc. and then use a Neo4j graph traversal to get the insights you need for business decisions or recommendations. Integration would probably be best in a driver or unmanaged extension (Java extension of the Neo4j server). It will definitely be more powerful than the built-in FTS which is not yet supported as automatic schema index, but will probably be pretty basic. Perhaps we can ping @knutwalker about looking into such a server extension.

  • Fabio says:

    Hi Michael,

    Im trying to use “analyzer” : “org.apache.lucene.analysis.standard.StandardAnalyzer”.
    I moves the jar “lucene-analyzers-3.6.2.jar” to lib folder .

    But I catching the exception bellow when I create the index:

    14:28:12.248 [qtp1873051589-38] WARN o.e.jetty.servlet.ServletHandler – /db/data/index/node/
    java.lang.RuntimeException: java.lang.InstantiationException: org.apache.lucene.analysis.standard.StandardAnalyzer
    at org.neo4j.index.impl.lucene.IndexType.getByClassName(IndexType.java:272) ~[neo4j-lucene-index-2.1.2.jar:2.1.2]

    You known whats this exception can be?

    Thanks !

  • Flavio, best to ask this question on Stackoverflow.

  • Krishna says:

    I have case where an user is connected to multiple different patterns. I need to perform search
    in all patterns in single query. Is this possible?

  • Yes, sure just add the different patterns that connect the user to the query and add filters/restrictions to those patterns.

  • Corey Subnet says:

    Hi, I just wanted to point out that there is an error underneath the “Enable Node Auto-Index for certain properties” heading. The auto index options are not found in the conf/neo4j-server.properties file, they are actually found, commented out, in conf/neo4j.properties. On a mac, this directory can be found by navigating to: /usr/local/Cellar/neo4j/2.0.2/libexec/conf Be aware that this directory is hidden by default.

  • You’re right, sorry I fix that.

  • Krishna Shetty says:

    Thanks for this blog.

    I am facing an issue.
    Search does not return any result when string being searched has a space.

    That is my search:
    start u=node:node_auto_index(“name:krishna*”) match (u)<-[:FRIEND]-(k{id:123}) return u.name,u.id
    This works fine.

    But same query does not work when I have to search for 'krishna she'. That is
    start u=node:node_auto_index("name:krishna she*") match (u)<-[:MEMBER]-(MemberGroup)
    does not return any results.

    I have tried with "name:krishna?she*" and "name:krishna*she*" , but no luck.

    Any hints to fix this?

  • That’s a bit tricky but a general problem with lucene, it splits your text into words/terms when indexing.

    You can try to quote the string but that will only work for exact matches. ‘name:”foo bar”‘
    Otherwise you’d have to use AND

    ‘name:foo AND name:she*’

  • Daniel Krizian says:

    Hi Michael, when following your example, and running

    START movie=node:node_auto_index(“title:matr*”)
    MATCH (movie:Movie) 4
    RETURN movie, count(*) AS number, avg(r.rating) AS ratings
    ORDER BY ratings desc, number desc

    I get Neo.ClientError.Statement.InvalidSyntax error:

    Cannot add labels or properties on a node which is already bound (line 2, column 7)
    “MATCH (movie:Movie)<-[r:RATED]-(user)"
    ^

    This is on Neo4j 2.1.1 Community Edition

  • Unfortunately there were some syntax/sematics changes in 2.1.x I think those are reverted in 2.1.3 but otherwise just change the match to a where clause

    START movie=node:node_auto_index(“title:matr*”)
    WHERE (movie:Movie)
    ....
    
    
    
    		
  • Mukesh says:

    Is it better to use a Patricia (Trie) data structure separately to store the nodes that need to be ‘wildcard searched’ ?

    How does the Neo4j match compare with storing the data in a Trie data structure in terms of performance, memory, etc?

    I am thinking of adding the nodes in the Neo4j as we do normally, but also create a Trie tree to store the node against fields that need to be searched. I am hoping that this Trie will be faster for retrieving matches, while Neo4j will be used to get relationships.

    Thoughts?

  • I replied to your question on the neo4j google group.

  • Mukesh says:

    There is a typo in GraphDatatabaseService (extra ‘ta’)

    1. GraphDatatabaseService db = new GraphDatabaseFactory().newEmbeddedGraphDatabaseBuilder(DB_PATH)

    2. There is no method: newEmbeddedGraphDatabaseBuilder(DB_PATH). The method is )newEmbeddedDatabaseBuilder(DB_PATH)

    3. What is the type of the variable db in the first line of the code. I am guessing its the GraphDatabaseService, but its not very clear, since you are suggesting first the config to be set, and then you are creating the db

Leave a Reply

XHTML: You can use these tags:' <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Copyright © 2007-2014 Better Software Development All rights reserved.
Multi v1.4.5 a child of the Desk Mess Mirrored v1.4.6 theme from BuyNowShop.com.