Initialize a Neo4j docker container using cypher scripts

Neo4j is a graph native database which many organizations are currently adopting as it can provide capabilities like missing data imputation, knowledge graph implementation, advanced AIML capabilities etc.

With docker this graph database can be containerized and managed on a container management system, but there are many things to consider before you actually use a docker version. One of those things that I’ll highlight here is → How to initialize the Neo4j docker container so that once database is up, you have pre-installed schema or indexes

Let’s first start a simple neo4j docker container:

docker run \
--name testneo4j \
-p7474:7474 -p7687:7687 \
--rm \
--env NEO4J_AUTH=neo4j/test \
neo4j:latest

You’ll see below log getting generated:

Changed password for user 'neo4j'.
Directories in use:
home: /var/lib/neo4j
config: /var/lib/neo4j/conf
logs: /logs
plugins: /var/lib/neo4j/plugins
import: /var/lib/neo4j/import
data: /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
run: /var/lib/neo4j/run
Starting Neo4j.
2021-05-28 17:22:08.024+0000 INFO Starting...
2021-05-28 17:22:10.812+0000 INFO ======== Neo4j 4.2.6 ========
2021-05-28 17:22:13.258+0000 INFO Initializing system graph model for component 'security-users' with version -1 and status UNINITIALIZED
2021-05-28 17:22:13.270+0000 INFO Setting up initial user from `auth.ini` file: neo4j
2021-05-28 17:22:13.272+0000 INFO Creating new user 'neo4j' (passwordChangeRequired=false, suspended=false)
2021-05-28 17:22:13.287+0000 INFO Setting version for 'security-users' to 2
2021-05-28 17:22:13.296+0000 INFO After initialization of system graph model component 'security-users' have version 2 and status CURRENT
2021-05-28 17:22:13.306+0000 INFO Performing postInitialization step for component 'security-users' with version 2 and status CURRENT
2021-05-28 17:22:13.674+0000 INFO Bolt enabled on 0.0.0.0:7687.
2021-05-28 17:22:15.237+0000 INFO Remote interface available at http://localhost:7474/
2021-05-28 17:22:15.238+0000 INFO Started.

Let’s try to login to neo4j (using neo4j/test as provided in auth) and check for nodes and indexes:

paras_bansal@cloudshell:~$ docker exec -it neo4j cypher-shell
username: neo4j
password: ****
Connected to Neo4j using Bolt protocol version 4.2 at neo4j://localhost:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j@neo4j> show indexes;
+---------------------------------------------------------------------------------------------------------------------+
| id | name | state | populationPercent | uniqueness | type | entityType | labelsOrTypes | properties | indexProvider |
+---------------------------------------------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------------------------------------------+
0 rows available after 821 ms, consumed after another 2 ms
neo4j@neo4j> show constraints;
+---------------------------------------------------------------------------+
| id | name | type | entityType | labelsOrTypes | properties | ownedIndexId |
+---------------------------------------------------------------------------+
+---------------------------------------------------------------------------+
0 rows available after 33 ms, consumed after another 1 ms
neo4j@neo4j> match(n) return n;
+---+
| n |
+---+
+---+
0 rows available after 103 ms, consumed after another 1 ms

Now, let’s try to initialize Neo4j by creating indexes and schema labels using cypher query language. One recommended way is to use “apoc” plugin, so we’ll be using that only. The way to install “apoc” plugin at the start is by using a environment variable:

--env NEO4JLABS_PLUGINS='["apoc"]'

Let’s first write indexes.cypher

~/neo4j$ cat indexes.cypher
//indexes
CREATE CONSTRAINT idx1 IF NOT EXISTS ON (p:Person) ASSERT p.name IS UNIQUE;
CREATE CONSTRAINT idx2 IF NOT EXISTS ON (a:Address) ASSERT a.id IS UNIQUE;

Let’s write schema.cypher:

~/neo4j$ cat schema.cypher
CREATE (p:Person{name: "Paras Bansal"});
CREATE (a:Address{id:1, firstline: "AAAA", zip: "ABCDEF", country: "Canada"});

Let’s write apoc.conf:

~/neo4j$ cat apoc.conf
apoc.import.file.use_neo4j_config=true
apoc.import.file.enabled=true
apoc.initializer.neo4j.1=CALL apoc.cypher.runSchemaFile("file:////var/lib/neo4j/db_init/indexes.cypher")
apoc.initializer.neo4j.2=CALL apoc.cypher.runFile("file:////var/lib/neo4j/db_init/schema.cypher");

apoc.conf → provides configuration for “apoc” plugin. This file needs to sit beside neo4j.conf under the conf directory

runSchemaFile → this method is used only for running schema statements like creating indexes, constraints etc. separated by ;

runFile → is used to execute all cypher statements separated by ;

The setting here is explained as:

apoc.initializer.<database_name>.<identifier> = <some cypher string>

Where identified here is 1,2,3 … can be used to execute any number of cypher statements including “apoc” procedure calls.

Let’s write the final Dockerfile to copy all these files to image and start the image:

~/neo4j$ cat Dockerfile
FROM neo4j:latest
COPY indexes.cypher /var/lib/neo4j/db_init/
COPY schema.cypher /var/lib/neo4j/db_init/
COPY apoc.conf /var/lib/neo4j/conf/

Let’s build the docker file and tag it as neo4j:custom

~/neo4j$ docker build . -t neo4j:custom
Sending build context to Docker daemon 5.12kB
Step 1/4 : FROM neo4j:latest
---> 02a452151a82
Step 2/4 : COPY indexes.cypher /var/lib/neo4j/db_init/
---> 0fbeaf9ecb30
Step 3/4 : COPY schema.cypher /var/lib/neo4j/db_init/
---> 9c68a0374220
Step 4/4 : COPY apoc.conf /var/lib/neo4j/conf/
---> bc29855c9751
Successfully built bc29855c9751
Successfully tagged neo4j:custom

Let’s start the container again. We’ll pass two additional environment variables this time:

dbms.directories.import="/"
NEO4JLABS_PLUGINS='["apoc"]'

By default, Neo4j can access files only inside import directory. To change that, we can use this setting. The above value will ensure that Neo4j can access all the files starting from base directory.

The other one will ensure that “apoc” plugin is installed during the start of the container as I wrote at the start.

Let’s start the container now:

~/neo4j$ docker run \
> --name neo4j \
> -p7474:7474 -p7687:7687 \
> --rm \
> --env NEO4J_AUTH=neo4j/test \
> --env NEO4J_dbms_directories_import="/" \
> --env NEO4JLABS_PLUGINS='["apoc"]' \
> neo4j:custom
Changed password for user 'neo4j'.
Fetching versions.json for Plugin 'apoc' from https://neo4j-contrib.github.io/neo4j-apoc-procedures/versions.json
Installing Plugin 'apoc' from https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.2.0.4/apoc-4.2.0.4-all.jar to /var/lib/neo4j/plugins/apoc.jar
Applying default values for plugin apoc to neo4j.conf
Directories in use:
home: /var/lib/neo4j
config: /var/lib/neo4j/conf
logs: /logs
plugins: /var/lib/neo4j/plugins
import: /
data: /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
run: /var/lib/neo4j/run
Starting Neo4j.
2021-05-28 18:05:19.108+0000 INFO Starting...
2021-05-28 18:05:22.106+0000 INFO ======== Neo4j 4.2.6 ========
2021-05-28 18:05:34.685+0000 INFO Initializing system graph model for component 'security-users' with version -1 and status UNINITIALIZED
2021-05-28 18:05:34.719+0000 INFO Setting up initial user from `auth.ini` file: neo4j
2021-05-28 18:05:34.721+0000 INFO Creating new user 'neo4j' (passwordChangeRequired=false, suspended=false)
2021-05-28 18:05:34.741+0000 INFO Setting version for 'security-users' to 2
2021-05-28 18:05:34.750+0000 INFO After initialization of system graph model component 'security-users' have version 2 and status CURRENT
2021-05-28 18:05:34.763+0000 INFO Performing postInitialization step for component 'security-users' with version 2 and status CURRENT
2021-05-28 18:05:36.861+0000 INFO Called db.clearQueryCaches(): Query caches successfully cleared of 1 queries.
2021-05-28 18:05:36.987+0000 INFO Bolt enabled on 0.0.0.0:7687.
2021-05-28 18:05:39.823+0000 INFO [neo4j/d421ed40] successfully initialized: CALL apoc.cypher.runSchemaFile("file:////var/lib/neo4j/db_init/indexes.cypher")
2021-05-28 18:05:39.928+0000 INFO Remote interface available at http://localhost:7474/
2021-05-28 18:05:39.932+0000 INFO Started.
2021-05-28 18:05:40.417+0000 INFO [neo4j/d421ed40] successfully initialized: CALL apoc.cypher.runFile("file:////var/lib/neo4j/db_init/schema.cypher");

Both the files are successfully initialized, let’s check by logging in:

~$ docker exec -it neo4j cypher-shell
username: neo4j
password: ****
Connected to Neo4j using Bolt protocol version 4.2 at neo4j://localhost:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j@neo4j> show indexes;
+----------------------------------------------------------------------------------------------------------------------------------+
| id | name | state | populationPercent | uniqueness | type | entityType | labelsOrTypes | properties | indexProvider |
+----------------------------------------------------------------------------------------------------------------------------------+
| 1 | "idx1" | "ONLINE" | 100.0 | "UNIQUE" | "BTREE" | "NODE" | ["Person"] | ["name"] | "native-btree-1.0" |
| 3 | "idx2" | "ONLINE" | 100.0 | "UNIQUE" | "BTREE" | "NODE" | ["Address"] | ["id"] | "native-btree-1.0" |
+----------------------------------------------------------------------------------------------------------------------------------+
2 rows available after 67 ms, consumed after another 17 ms
neo4j@neo4j> show constraints;
+-------------------------------------------------------------------------------------+
| id | name | type | entityType | labelsOrTypes | properties | ownedIndexId |
+-------------------------------------------------------------------------------------+
| 2 | "idx1" | "UNIQUENESS" | "NODE" | ["Person"] | ["name"] | 1 |
| 4 | "idx2" | "UNIQUENESS" | "NODE" | ["Address"] | ["id"] | 3 |
+-------------------------------------------------------------------------------------+
2 rows available after 23 ms, consumed after another 1 ms
neo4j@neo4j> match(n) return n;
+-------------------------------------------------------------------------+
| n |
+-------------------------------------------------------------------------+
| (:Person {name: "Paras Bansal"}) |
| (:Address {zip: "ABCDEF", country: "Canada", id: 1, firstline: "AAAA"}) |
+-------------------------------------------------------------------------+

There you go! Goal achieved.

Signing off here, hopefully this helps. Please post your comments and I’ll try to further improve it.

Solutions Architect, Cloud Enthusiast