Hands on Neo4j

Hands on Neo4j

This is the script of a hands on session on MongoDB. The example refers to a typical retail example [ER diagram] with orders [CSV], products [CSV] and users [CSV].

getting a Neo4j running on your machine using docker

docker pull neo4j
docker run -p 7474:7474 -p 7687:7687 -d neo4j

then open http://localhost:7474 and enter the password (neo4j)

let’s see an example where neo4j rocks

Relational data describing a taxonomi

The categories

 

categrories
ID Name
0 top
1 sub1
2 sub2
11 sub1.1
12 sub1.2
121 sub1.2.1
the relationships among the categories
relationships
src dest
0 1
0 2
1 11
1 12
12 121

let’s now create this taxonomy in neo4j

syntax
  • nodes CREATE (<variabile>:<tipo> {<field>: <valore>}) and
  • edges CREATE (<variabile>)-[:<proriterĂ >]->(<variabile>)
example
CREATE (top:Category {name: "top"})
CREATE (sub1:Category {name: "sub1"})
CREATE (sub2:Category {name: "sub2"})
CREATE (sub11:Category {name: "sub1.1"})
CREATE (sub12:Category {name: "sub1.2"})
CREATE (sub121:Category {name: "sub1.2.1"})
CREATE (top)-[:narrower]->(sub1),
(top)-[:narrower]->(sub2),
(sub1)-[:narrower]->(sub11),
(sub1)-[:narrower]->(sub12),
(sub12)-[:narrower]->(sub121)

let’s query it in cypher

Syntax
MATCH (<variabile>?:<tipo>? {<field>: <valore>}*)
-[<variabile>?:<tipo>? {<field>: <valore>}*]->?
(<variabile>?:<tipo>? {<field>: <valore>}*)?
WHERE <condition>?
RETURN <variabile>+

examples

walking one edge

All the nodes that can be reached from a node named top following one edge of type narrower

SQL
SELECT *
FROM relationships AS R JOIN categories AS C ON R.src = C.ID
WHERE C.name = 'top'
cypher
MATCH p=(a)-[:narrower]->() WHERE a.name="top" RETURN p

NOTE: not such a big difference …

walking down the entire taxonomi

all the nodes that can be reached from a node named top following any number ofedges of type narrower

SQL
SELECT R.dest, R1.dest, R2.dest
FROM relationships AS R JOIN categories AS C ON R.src = C.ID JOIN
     relationships AS R1 ON R1.src = R.dest JOIN
     relationships AS R2 ON R2.src = R1.dest JOIN
     relationships AS R3 ON R3.src = R2.dest 
WHERE C.name = 'top'

NOTE: what if the taxonomy was deeper? What if you don’t know how deep the taxonomy is?

cypher
MATCH p=(a)-[:narrower*]->() WHERE a.name="top" RETURN p

NOTE: Wow! This is easy!!! Thank you cypher!

controlling the number of walks

all the nodes that can be reached from a node named top following a minimum of 3 and a maximum of 3 edges of type narrower

MATCH p=(a)-[:narrower*2..3]->() WHERE a.name="top" RETURN p

let’s add some relational data

syntax

importing users

LOAD CSV FROM
'http://emanueledellavalle.org/data/neo4j/retail/users.csv' AS line
CREATE (:User {uid: toInteger(line[0]), name: line[1]})

importing products

LOAD CSV FROM
'http://emanueledellavalle.org/data/neo4j/retail/products.csv' AS line
CREATE (:Product {uid: line[0], brand: line[2], name: line[1]})

importing and merging orders

LOAD CSV FROM
'http://emanueledellavalle.org/data/neo4j/retail/orders.csv' AS line
MATCH (user:User {uid: toInteger(line[1])})
MATCH (product:Product {uid: line[2]})
MERGE (order:Order {id:  toInteger(line[0])})
ON CREATE SET order.date = line[4]
MERGE (order)-[:contains {qty: toInteger(3)}]->(product)
MERGE (order)-[:orderedBy]->(user)

NOTE: the MERGE clause ensures that a pattern exists in the graph. Either the pattern already exists, or it needs to be created.

let’s add a category to a product and see neo4j rocking again!

MATCH (category:Category {name: "sub1.2.1"})
MATCH (product:Product {uid: "a"})
MERGE (product)-[:topic]->(category)
MATCH (order)-[:contains]->(product),
      (product)-[:topic]->(category),
      (supercategory)-[:narrower*]->(category)
WHERE supercategory.name="sub1" 
RETURN order, product, category

NOTE: WOW! This is SQL is, in general, impossible to write!

links

  • https://neo4j.com/docs/cypher-refcard/current/

Hands on MongoDB

Hands on MongoDB

This is the script of a hands on session on MongoDB. The example refers to a typical retail example with products [CSV], users [CSV], and orders [CSV].

getting a mongoDB running on your machine using docker

docker pull mongo
docker run --name some-mongo -d mongo
docker run -it --link some-mongo:mongo --rm mongo sh -c 'exec mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test"'

the basics

adding two users to obtain their ObjectId

db.user.insertMany([
    { "name" : "Alice"  },
    { "name" : "Bob"    }
])

looking up one of the user via ObjectId

db.user.find( {"_id": ObjectId("5bf6ce4dcdb0763e7b6b9138")} )

Note: the ObjectId on your machine will be different

adding products

db.products.insertMany([
    { name : "red apple", brand : "XYZ"},
    { name : "blue berry", brand : "XYZ"},
    { name : "cake", brand : "Acme"}
])

adding orders linked to users and products

Note 1: ObjectIds on your machine will be different Note 2: qty is on purpose written either as an int or as string. You’ll see why below in the part related to finding documents

db.orders.insert({
  "utente" : {
    id: ObjectId("5bf6ce4dcdb0763e7b6b9138"),
    name: "Alice"
  },
  products : [{ 
      id: ObjectId("5bf6cfffcdb0763e7b6b913a"),
      name : "red apple",
      qty : 2
    },{
      id: ObjectId("5bf6cfffcdb0763e7b6b913b"),
      name : "blue berry",
      qty : 3
    }],
  date : 1536656557
})
db.orders.insert({
  "utente" : {
    id: ObjectId("5bf6ce4dcdb0763e7b6b9139"),
    name: "Bob"
  },
  products : [{ 
      id: ObjectId("5bf6cfffcdb0763e7b6b913a"),
      name : "red apple",
      qty : 4
    },{
      id: ObjectId("5bf6cfffcdb0763e7b6b913c"),
      name : "cake",
      qty : 2
    }],
  date : 1536656558
})
db.orders.insert({
  "utente" : {
    id: ObjectId("5bf6ce4dcdb0763e7b6b9139"),
    name: "Bob"
  },
  products : [{ 
      id: ObjectId("5bf6cfffcdb0763e7b6b913a"),
      name : "red apple",
      qty : "2"
    },{
      id: ObjectId("5bf6cfffcdb0763e7b6b913b"),
      name : "blue berry",
      qty : "2"
    }],
  date : "1536656560"
})
db.orders.insert({
  "utente" : {
    id: ObjectId("5bf6ce4dcdb0763e7b6b9138"),
    name: "Alice"
  },
  products : [{ 
      id: ObjectId("5bf6cfffcdb0763e7b6b913b"),
      name : "blue berry",
      qty : "2",
      comment : "please, the freshest ones!"
    },{
      id: ObjectId("5bf6cfffcdb0763e7b6b913c"),
      name : "cake",
      qty : 2
    }],
  date : 1536656559
})

finding orders

get all orders

SQL: select * from orders

MongoDB: db.orders.find({})

one simple conditions on date

SQL: select * from orders where date = 1536656559

R: which(order$date==1536656559)

MongoDB: db.order.find({ date : 1536656559 })

another “simple” condition on the name of the buyer

SQL:

select * 
from orders join users on orders.user = users.id 
where users.name="Alice"

R: which((left_join(orders,users,by=c("user"="id")))$name=="Alice")

MongoDB:

db.orders.find({
    "utente.name": "Alice"
})

NOTE: MongoDB rocks! just some dot notation … NOTE: however there’s a bad side: all joins that are not paths in the document must be handled writing some code (see forEach)

a simple range query

SQL: select * from orders where date >1536656558

R: which(order$date>A1536656559)

MongoDB:

db.orders.find({
  date : {$gt : 1536656559}
})

some simple SQL queries becomes hard (because of arrays)

SQL:

select * 
from orders  
where qty = 2

R: which(order$qty=2)

MongoDB:

db.orders.find({
  products :  { $elemMatch: {qty: 2} }
})

NOTE: not dot notation here :-/ (see also elemMatch)

multiple conjunctive conditions (a.k.a. ANDs)

SQL:

select * 
from orders join users on orders.user = users.id 
where users.name="Alice" and oders.qty > 1

R: ...

MongoDB:

db.orders.find({ 
  "utente.name": "Alice",
  products :  { $elemMatch: {qty: {$gt: 1} } }
})

Note: it does not match the qty expressed as strings

multiple disjunctive conditions (a.k.a. ORs)

SQL:

select * 
from orders join users on orders.user = users.id 
where users.name="Alice" or oders.qty > 1

R: ...

MongoDB:

db.orders.find({ 
   $or : [
    {"utente.name": "Alice"},
    {products :  { $elemMatch: {qty: {$gt: 1} } }}
  ]
})

projections (a.k.a. SELECTs)

QL:

select date 
from orders join users on orders.user = users.id 
where users.name="Alice" 

R: (which((left_join(orders,users,by=c("user"="id")))$name=="Alice"))$date

MongoDB:

db.orders.find(
    {"utente.name": "Alice"},
    {date: 1, "_id":0, "products.qty": 1}
)

Aggregates: counting the orders for qty > 2 grouping by user

QL:

select user, count(*) 
from orders join users on orders.user = users.id 
where orders.qty > 2
group by users.name

R: ...

MongoDB:

db.orders.aggregate( [ 
    {$match: {
        products : { $elemMatch: {qty: {$gt: 2} } }
    }}, 
    {$group: { _id: "$utente.name", count: {$sum: 1} }} 
])

updates

Assigning all Bobs orders to Carl (simplified ignoring ObjectId)

db.orders.updateMany(
    { "utente.name": "Bob" },
    {
        $set: { "utente.name": "Carl" },
        $currentDate: { lastModified: true }
    }
)

indexes (advance topic)

introduction

Note: the explain() function returns the execution plan of a query

notice the difference between the following two simple queries:

db.orders.find( { "_id" : ObjectId("5bfd06bbf645af7d9cce94b6")} ).explain()

MongoDB uses a FETCH IDHACK to look up the object in the hash table it uses internally. This is an operation MongoDB can perform in constant time independently from the size of the collection.

db.orders.find( { "utente.name": "Carl" } ).explain()

MongoDB uses a COLSAN, i.e. it scans all the element in the collection to find those that match. This is an operation MongoDB cannot perform in constant time: the larger is the collection to scan, the longer it takes.

Adding a index

Let’s add an index on the name of the user

db.orders.createIndex( { "utente.name": 1 } )

let’s explain the previous query again

db.orders.find( { "utente.name": "Carl" } ).explain()

Now MongoDB uses an IXSCAN, i.e., it uses a binary tree to index documents with the same value. This is an operation MongoDB cannot perform in constant time, but when the collection grows the time to find documents does not grow linearly in the size of the collection. Instead, it grows sub-linearly in the number of distinct values present in the field.

For more information read Query Optimization > Explain Results

Remove index

db.orders.dropIndex( { "utente.name": 1 } )

indexing and horizontal scalability

Indexes are in memory and do not work well in presence of shards. They slow down query execution. If you want to read more https://stackoverflow.com/questions/9084569/mongodb-index-in-memory-with-sharding

links to learn more