Hands on MongoDB
This is the script of a hands on session on MongoDB. The example refers to a typical retail example with products [CSV], users [CSV], and orders [CSV].
getting a mongoDB running on your machine using docker
docker pull mongo
docker run --name some-mongo -d mongo
docker run -it --link some-mongo:mongo --rm mongo sh -c 'exec mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test"'
the basics
adding two users to obtain their ObjectId
db.user.insertMany([
{ "name" : "Alice" },
{ "name" : "Bob" }
])
looking up one of the user via ObjectId
db.user.find( {"_id": ObjectId("5bf6ce4dcdb0763e7b6b9138")} )
Note: the ObjectId on your machine will be different
adding products
db.products.insertMany([
{ name : "red apple", brand : "XYZ"},
{ name : "blue berry", brand : "XYZ"},
{ name : "cake", brand : "Acme"}
])
adding orders linked to users and products
Note 1: ObjectIds on your machine will be different Note 2: qty is on purpose written either as an int or as string. You’ll see why below in the part related to finding documents
db.orders.insert({
"utente" : {
id: ObjectId("5bf6ce4dcdb0763e7b6b9138"),
name: "Alice"
},
products : [{
id: ObjectId("5bf6cfffcdb0763e7b6b913a"),
name : "red apple",
qty : 2
},{
id: ObjectId("5bf6cfffcdb0763e7b6b913b"),
name : "blue berry",
qty : 3
}],
date : 1536656557
})
db.orders.insert({
"utente" : {
id: ObjectId("5bf6ce4dcdb0763e7b6b9139"),
name: "Bob"
},
products : [{
id: ObjectId("5bf6cfffcdb0763e7b6b913a"),
name : "red apple",
qty : 4
},{
id: ObjectId("5bf6cfffcdb0763e7b6b913c"),
name : "cake",
qty : 2
}],
date : 1536656558
})
db.orders.insert({
"utente" : {
id: ObjectId("5bf6ce4dcdb0763e7b6b9139"),
name: "Bob"
},
products : [{
id: ObjectId("5bf6cfffcdb0763e7b6b913a"),
name : "red apple",
qty : "2"
},{
id: ObjectId("5bf6cfffcdb0763e7b6b913b"),
name : "blue berry",
qty : "2"
}],
date : "1536656560"
})
db.orders.insert({
"utente" : {
id: ObjectId("5bf6ce4dcdb0763e7b6b9138"),
name: "Alice"
},
products : [{
id: ObjectId("5bf6cfffcdb0763e7b6b913b"),
name : "blue berry",
qty : "2",
comment : "please, the freshest ones!"
},{
id: ObjectId("5bf6cfffcdb0763e7b6b913c"),
name : "cake",
qty : 2
}],
date : 1536656559
})
finding orders
get all orders
SQL: select * from orders
MongoDB: db.orders.find({})
one simple conditions on date
SQL: select * from orders where date = 1536656559
R: which(order$date==1536656559)
MongoDB: db.order.find({ date : 1536656559 })
another “simple” condition on the name of the buyer
SQL:
select *
from orders join users on orders.user = users.id
where users.name="Alice"
R: which((left_join(orders,users,by=c("user"="id")))$name=="Alice")
MongoDB:
db.orders.find({
"utente.name": "Alice"
})
NOTE: MongoDB rocks! just some dot notation … NOTE: however there’s a bad side: all joins that are not paths in the document must be handled writing some code (see forEach)
a simple range query
SQL: select * from orders where date >1536656558
R: which(order$date>A1536656559)
MongoDB:
db.orders.find({
date : {$gt : 1536656559}
})
some simple SQL queries becomes hard (because of arrays)
SQL:
select *
from orders
where qty = 2
R: which(order$qty=2)
MongoDB:
db.orders.find({
products : { $elemMatch: {qty: 2} }
})
NOTE: not dot notation here :-/ (see also elemMatch)
multiple conjunctive conditions (a.k.a. ANDs)
SQL:
select *
from orders join users on orders.user = users.id
where users.name="Alice" and oders.qty > 1
R: ...
MongoDB:
db.orders.find({
"utente.name": "Alice",
products : { $elemMatch: {qty: {$gt: 1} } }
})
Note: it does not match the qty expressed as strings
multiple disjunctive conditions (a.k.a. ORs)
SQL:
select *
from orders join users on orders.user = users.id
where users.name="Alice" or oders.qty > 1
R: ...
MongoDB:
db.orders.find({
$or : [
{"utente.name": "Alice"},
{products : { $elemMatch: {qty: {$gt: 1} } }}
]
})
projections (a.k.a. SELECTs)
QL:
select date
from orders join users on orders.user = users.id
where users.name="Alice"
R: (which((left_join(orders,users,by=c("user"="id")))$name=="Alice"))$date
MongoDB:
db.orders.find(
{"utente.name": "Alice"},
{date: 1, "_id":0, "products.qty": 1}
)
Aggregates: counting the orders for qty > 2 grouping by user
QL:
select user, count(*)
from orders join users on orders.user = users.id
where orders.qty > 2
group by users.name
R: ...
MongoDB:
db.orders.aggregate( [
{$match: {
products : { $elemMatch: {qty: {$gt: 2} } }
}},
{$group: { _id: "$utente.name", count: {$sum: 1} }}
])
updates
Assigning all Bobs orders to Carl (simplified ignoring ObjectId)
db.orders.updateMany(
{ "utente.name": "Bob" },
{
$set: { "utente.name": "Carl" },
$currentDate: { lastModified: true }
}
)
indexes (advance topic)
introduction
Note: the explain() function returns the execution plan of a query
notice the difference between the following two simple queries:
db.orders.find( { "_id" : ObjectId("5bfd06bbf645af7d9cce94b6")} ).explain()
MongoDB uses a FETCH IDHACK to look up the object in the hash table it uses internally. This is an operation MongoDB can perform in constant time independently from the size of the collection.
db.orders.find( { "utente.name": "Carl" } ).explain()
MongoDB uses a COLSAN, i.e. it scans all the element in the collection to find those that match. This is an operation MongoDB cannot perform in constant time: the larger is the collection to scan, the longer it takes.
Adding a index
Let’s add an index on the name of the user
db.orders.createIndex( { "utente.name": 1 } )
let’s explain the previous query again
db.orders.find( { "utente.name": "Carl" } ).explain()
Now MongoDB uses an IXSCAN, i.e., it uses a binary tree to index documents with the same value. This is an operation MongoDB cannot perform in constant time, but when the collection grows the time to find documents does not grow linearly in the size of the collection. Instead, it grows sub-linearly in the number of distinct values present in the field.
For more information read Query Optimization > Explain Results
Remove index
db.orders.dropIndex( { "utente.name": 1 } )
indexing and horizontal scalability
Indexes are in memory and do not work well in presence of shards. They slow down query execution. If you want to read more https://stackoverflow.com/questions/9084569/mongodb-index-in-memory-with-sharding