Hands on MongoDB

Hands on MongoDB

This is the script of a hands on session on MongoDB. The example refers to a typical retail example with products [CSV], users [CSV], and orders [CSV].

getting a mongoDB running on your machine using docker

docker pull mongo
docker run --name some-mongo -d mongo
docker run -it --link some-mongo:mongo --rm mongo sh -c 'exec mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test"'

the basics

adding two users to obtain their ObjectId

db.user.insertMany([
    { "name" : "Alice"  },
    { "name" : "Bob"    }
])

looking up one of the user via ObjectId

db.user.find( {"_id": ObjectId("5bf6ce4dcdb0763e7b6b9138")} )

Note: the ObjectId on your machine will be different

adding products

db.products.insertMany([
    { name : "red apple", brand : "XYZ"},
    { name : "blue berry", brand : "XYZ"},
    { name : "cake", brand : "Acme"}
])

adding orders linked to users and products

Note 1: ObjectIds on your machine will be different Note 2: qty is on purpose written either as an int or as string. You’ll see why below in the part related to finding documents

db.orders.insert({
  "utente" : {
    id: ObjectId("5bf6ce4dcdb0763e7b6b9138"),
    name: "Alice"
  },
  products : [{ 
      id: ObjectId("5bf6cfffcdb0763e7b6b913a"),
      name : "red apple",
      qty : 2
    },{
      id: ObjectId("5bf6cfffcdb0763e7b6b913b"),
      name : "blue berry",
      qty : 3
    }],
  date : 1536656557
})
db.orders.insert({
  "utente" : {
    id: ObjectId("5bf6ce4dcdb0763e7b6b9139"),
    name: "Bob"
  },
  products : [{ 
      id: ObjectId("5bf6cfffcdb0763e7b6b913a"),
      name : "red apple",
      qty : 4
    },{
      id: ObjectId("5bf6cfffcdb0763e7b6b913c"),
      name : "cake",
      qty : 2
    }],
  date : 1536656558
})
db.orders.insert({
  "utente" : {
    id: ObjectId("5bf6ce4dcdb0763e7b6b9139"),
    name: "Bob"
  },
  products : [{ 
      id: ObjectId("5bf6cfffcdb0763e7b6b913a"),
      name : "red apple",
      qty : "2"
    },{
      id: ObjectId("5bf6cfffcdb0763e7b6b913b"),
      name : "blue berry",
      qty : "2"
    }],
  date : "1536656560"
})
db.orders.insert({
  "utente" : {
    id: ObjectId("5bf6ce4dcdb0763e7b6b9138"),
    name: "Alice"
  },
  products : [{ 
      id: ObjectId("5bf6cfffcdb0763e7b6b913b"),
      name : "blue berry",
      qty : "2",
      comment : "please, the freshest ones!"
    },{
      id: ObjectId("5bf6cfffcdb0763e7b6b913c"),
      name : "cake",
      qty : 2
    }],
  date : 1536656559
})

finding orders

get all orders

SQL: select * from orders

MongoDB: db.orders.find({})

one simple conditions on date

SQL: select * from orders where date = 1536656559

R: which(order$date==1536656559)

MongoDB: db.order.find({ date : 1536656559 })

another “simple” condition on the name of the buyer

SQL:

select * 
from orders join users on orders.user = users.id 
where users.name="Alice"

R: which((left_join(orders,users,by=c("user"="id")))$name=="Alice")

MongoDB:

db.orders.find({
    "utente.name": "Alice"
})

NOTE: MongoDB rocks! just some dot notation … NOTE: however there’s a bad side: all joins that are not paths in the document must be handled writing some code (see forEach)

a simple range query

SQL: select * from orders where date >1536656558

R: which(order$date>A1536656559)

MongoDB:

db.orders.find({
  date : {$gt : 1536656559}
})

some simple SQL queries becomes hard (because of arrays)

SQL:

select * 
from orders  
where qty = 2

R: which(order$qty=2)

MongoDB:

db.orders.find({
  products :  { $elemMatch: {qty: 2} }
})

NOTE: not dot notation here :-/ (see also elemMatch)

multiple conjunctive conditions (a.k.a. ANDs)

SQL:

select * 
from orders join users on orders.user = users.id 
where users.name="Alice" and oders.qty > 1

R: ...

MongoDB:

db.orders.find({ 
  "utente.name": "Alice",
  products :  { $elemMatch: {qty: {$gt: 1} } }
})

Note: it does not match the qty expressed as strings

multiple disjunctive conditions (a.k.a. ORs)

SQL:

select * 
from orders join users on orders.user = users.id 
where users.name="Alice" or oders.qty > 1

R: ...

MongoDB:

db.orders.find({ 
   $or : [
    {"utente.name": "Alice"},
    {products :  { $elemMatch: {qty: {$gt: 1} } }}
  ]
})

projections (a.k.a. SELECTs)

QL:

select date 
from orders join users on orders.user = users.id 
where users.name="Alice" 

R: (which((left_join(orders,users,by=c("user"="id")))$name=="Alice"))$date

MongoDB:

db.orders.find(
    {"utente.name": "Alice"},
    {date: 1, "_id":0, "products.qty": 1}
)

Aggregates: counting the orders for qty > 2 grouping by user

QL:

select user, count(*) 
from orders join users on orders.user = users.id 
where orders.qty > 2
group by users.name

R: ...

MongoDB:

db.orders.aggregate( [ 
    {$match: {
        products : { $elemMatch: {qty: {$gt: 2} } }
    }}, 
    {$group: { _id: "$utente.name", count: {$sum: 1} }} 
])

updates

Assigning all Bobs orders to Carl (simplified ignoring ObjectId)

db.orders.updateMany(
    { "utente.name": "Bob" },
    {
        $set: { "utente.name": "Carl" },
        $currentDate: { lastModified: true }
    }
)

indexes (advance topic)

introduction

Note: the explain() function returns the execution plan of a query

notice the difference between the following two simple queries:

db.orders.find( { "_id" : ObjectId("5bfd06bbf645af7d9cce94b6")} ).explain()

MongoDB uses a FETCH IDHACK to look up the object in the hash table it uses internally. This is an operation MongoDB can perform in constant time independently from the size of the collection.

db.orders.find( { "utente.name": "Carl" } ).explain()

MongoDB uses a COLSAN, i.e. it scans all the element in the collection to find those that match. This is an operation MongoDB cannot perform in constant time: the larger is the collection to scan, the longer it takes.

Adding a index

Let’s add an index on the name of the user

db.orders.createIndex( { "utente.name": 1 } )

let’s explain the previous query again

db.orders.find( { "utente.name": "Carl" } ).explain()

Now MongoDB uses an IXSCAN, i.e., it uses a binary tree to index documents with the same value. This is an operation MongoDB cannot perform in constant time, but when the collection grows the time to find documents does not grow linearly in the size of the collection. Instead, it grows sub-linearly in the number of distinct values present in the field.

For more information read Query Optimization > Explain Results

Remove index

db.orders.dropIndex( { "utente.name": 1 } )

indexing and horizontal scalability

Indexes are in memory and do not work well in presence of shards. They slow down query execution. If you want to read more https://stackoverflow.com/questions/9084569/mongodb-index-in-memory-with-sharding

links to learn more

Leave a Reply

Your email address will not be published.