Neo4j data processing with springboot

Paras Bansal
5 min readJun 27, 2021

--

Neo4j is enterprise-strength graph native database which stores the data in forms of nodes and relationships. It also provides features such as clustering, ACID transactions, scalability, security etc.

Cloud-native is getting popular now-a-days as many customers want their business to save on IT costs and one easy way is to use containerized apps which can scale easily and provides data harnessing power using modern orchestration platforms such as Kubernetes.

In this article we’ll discuss how I processed data into Neo4j using a REST API and the same app can be dockerized into a container and deployed to any orchestration platform.

Before we go there, I want to tell the ways data can be ingested into Neo4j and why I chose my method. There are two ways data can be ingested into Neo4j in spring:

  1. Create a Object Oriented Model (like we always prefer) — Documented here: https://neo4j.com/developer/spring-data-neo4j/
  2. Map the data to cypher queries (like SQLs which mostly people won’t prefer) — Documented here: https://neo4j.com/developer/java-driver-spring-boot-starter/

I chose method #2 because of following reasons:

  1. There is currently a underlying problem in the Neo4j driver where if you load an entity it opens a new transaction with a new session context, finds the relationships and cache them in that context. The transaction will close because the operation is done.

So the subsequent save/update does not find an opened transaction and will then as a consequence open a new transaction. It won’t see the old relationship and will alway see the entity in the current state and may end up deleting the old relationship.

2. Another reason is that behind the scene Neo4j driver executes a lot of cypher queries for a small operation like getting data, caching it etc. which slows down the operation. For a very small operation like creating 1 node in a empty db, I ended up spending 1–2 seconds, which to me is not good at all.

This all is evident from the debug logs when you enable them, you’ll notice that lots of cyphers are being executed by Neo4j driver.

The solution to that is using @Transactional feature of springboot and try executing all of the operations in one transaction, but neither I was successful nor I like the driver slowness, which was definitely outperformed by simple cypher query execution.

Just to prove, my 1st transaction resulted this:

Order node with id — 7 was created perfectly and when the same code was executed again for another order:

I lost the customer relationship on the previous order and the new order was created fine.

Okay, coming back now, how I processed a create order. My goal is to achieve this hierarchy:

My REST API request looks like:

{
"customerId": "TEST-1",
"name": "TestCustomer",
"orderId": "1stOrder",
"orderDate": "25-Jun-2021",
"items": [
{
"item": "Basket",
"unit_price": "10",
"quantity": 2
},
{
"item": "Flower",
"unit_price": "20",
"quantity": 5
}
]
}

Data model:

public class OrderCartInput {
private String customerId;
private String name;
private String orderId;
private String orderDate;
private List<OrderItem> items;
}
//-----------public class OrderItem {
private String item;
private String unit_price;
private String quantity;
}

The cypher to process the order looks like:

MERGE (c:Customer{name: "${name}", customerId: "${customerId}"})MERGE (o:Order{orderId: "${orderId}", orderDate: "${orderDate}"})<-[:PLACED]-(c)MERGE (oitem:OrderItem{item: "${item}", unit_price: "${unit_price}", quantity: "${quantity}"})<-[:HAS_ITEMS]-(o)

The service class is as:

import java.util.Map;import org.neo4j.driver.Driver;
import org.neo4j.driver.Session;
import org.neo4j.driver.Transaction;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import com.company.vo.CypherQueries;
import com.company.vo.OrderCartInput;
import com.company.vo.OrderItem;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.extern.slf4j.Slf4j;@Service
@Slf4j
public class OrderProcessingService {
private final Driver driver;public OrderProcessingService(Driver driver) {
this.driver = driver;
}
@Autowired
CypherQueries cypherQueries;
public String processOrder(OrderCartInput req) throws Exception {
log.info("req --> " + req);
String createOrder = mapStringToMapping(cypherQueries.getCypher_queries().get("createOrder"), req);for (OrderItem item : req.getItems()) {
String createOrder1 = mapStringToMapping(createOrder, item);
try (Session session = driver.session(); Transaction tx = session.beginTransaction()) {
tx.run(createOrder1);
tx.commit();
} catch (Exception e) {
log.error("Exception while processing data to Neo4j... ", e);
}
}return "Customer Order Processed Successfully";}private String mapStringToMapping(String query, Object mapping) {
ObjectMapper oMapper = new ObjectMapper();
Map<String, Object> query_params = oMapper.convertValue(mapping, Map.class);
for (Map.Entry<String, Object> entry : query_params.entrySet()) {
if (query.contains("${" + entry.getKey() + "}"))
query = query.replace("${" + entry.getKey() + "}", (String) entry.getValue());
}
log.info("Executing cypher query --> " + query);return query;
}
}

As shown above the Neo4j driver and session APIs are used to execute the cyphers.

More queries can be written and data can be mapped to POJOs easily for e.g. (not there in the code)

public List<Customer> runQuery(String query) {
Result result = driver.session().run(query);
return result.list().stream()
.map(temp -> {
Customer newCustomer = new Customer();
newCustomer.setCustomer_name(customer);
return newCustomer;
}).collect(Collectors.toList());
}

Similarly, if you have bunch of cyphers creating/updating data, you can do transaction management using below code (no there in the repo):

public void runInTransaction(List<String> queries) {
try (Session session = driver.session();
Transaction tx = session.beginTransaction()) {

queries.forEach(tx::run);

tx.commit();
} catch (Exception e) {
log.error("Exception while processing data to Neo4j... ", e);
}
}

I was able to get responses in less than 100ms. So if proper indexing strategy is adopted, this should easily work for a large dataset as well.

This is it for now, please do leave your comments & suggestions on how to improve this.

--

--