A Multidisciplinary Perspective On Big Data 2015

This is the official web page of the 2015 edition of the PhD course of Politecnico di Milano on “A Multidisciplinary Perspective On Big Data“. If you are a student of the 2017 edition, this is the official page.

Lecturers

Responsible : Emanuele Della Valle

Lectures: Danilo Ardagna, Cinzia Cappiello, Paolo Ciuccarelli, Paolo Cremonesi, Emanuele Della Valle, Elisabetta Di Nitto, and Letizia Tanca

Invited Speakers: Fabrizio Antonelli (Telecom Italia), Stefano Ceri, Piero Fraternali, Marco Monti (IBM), and Marco Tagliasacchi

Mission and Goals

The term Big Data refers to a growing torrent of information that, if successfully analyzed, can unleash new business opportunities and revenues. This course aims at introducing Big Data analytics methods and includes practical sessions on PoliMI’s Big Data computational infrastructure.

Classes

1) Introduction to big data – Emanuele Della Valle (2 hours) [pdf]
  • Why now?
  • What is Big Data? volume, velocity, variety, veracity, …
  • Paradigm shifts enabled
  • Tools and Market Landscape
2) Infrastructures for big data problems to master the volume dimension – Paolo Cremonesi & Danilo Ardagna (4 hours)
3) Integrating and analyzing massive and heterogeneous (relational, graph, semi-structured) data collections and streams – Elisabetta Di Nitto, Letizia Tanca, and Emanuele Della Valle (4 hours)
  • Overview of NoSQL databases and Big data and NoSQL [pdf]
  • mastering the variety dimension [pdf]
  • mastering the velocity dimension [pdf]
4) mastering the veracity dimension – Cinzia Cappiello (3 hours) [pdf]
  • data quality
  • definition and dimensions
  • techniques for data quality assessment and improvement
  • uncertainty and data quality problems in big data”
5) Making sense of Big Data – Paolo Ciuccarelli, Letizia Tanca, and Paolo Cremonesi (6 hours)
  • Big Data Visualisation [pdf]
  • Big Data Exploration, summarisation and context-aware reduction [pdf]
  • Recommender Systems [pdf]
6) Open Workshop on Big Data (4 hours) (storify)
  • genomics applications – Stefano Ceri [pdf]
  • cognitive computing – Marco Monti (IBM)
  • multimedia analytics – Marco Tagliasacchi [pdf]
  • Telecom Italia Big Data Challenge – Fabrizio Antonelli (Telecom Italia) [pdf]
  • round table
7) Putting it all together on Telecom Italia Big Data Challenge dataset – Students (4 hours)
  • Assignments to students
  • Students’ reports

Calendar

Please find hereafter the current version of the calendar. Missing times will appear shortly. The open workshop (part 6) will most likely take place the last week of January. The students presentations (part 7) will take place in the last week of February.

Day Time Aula Lecturer Part Title
1.12.2014 10:00-12:00 PT1 Emanuele Della Valle 1 Introduction to big data [pdf] course introduction [pdf]
5.12.2014 9:30-12:30 Sala Conferenze Paolo Cremonesi 2 Infrastructures for big data problems to master the volume dimension [pdf1,pdf2]
5.12.2014 14:30-16:30 Sala Conferenze Danilo Ardagna 2 Infrastructures for big data problems to master the volume dimension [pdf]
13.1.2015 9:00-12:00 Sala Conferenze Elisabetta Di Nitto 3 Overview of NoSQL databases and Big data [pdf]
13.1.2015 13:30-14:30 Sala Conferenze Filippo Mariani 2 BigSheet and BigSQL tools [pdf]
14.1.2015 14:30-15:30 Sala Conferenze Letizia Tanca 3 Mastering the variety dimension [pdf]
14.1.2015 15:45-16:45 Sala Conferenze Emanuele Della Valle 3 Mastering the velocity dimension [pdf]
16.1.2015 9:30-12:30 Sala Seminari Cinzia Cappiello 4 Mastering the veracity dimension [pdf]
20.1.2015 10:00-11:00 Sala Conferenze Paolo Ciuccarelli 5 Big Data Visualisation [pdf]
20.1.2015 11:00-12:00

14:00-16:00

Sala Conferenze Letizia Tanca 5 Big Data Exploration, summarisation and context-aware reduction [pdf]
20.1.2015 16:00-17:00 Sala Conferenze Paolo Cremonesi 5 Recommender Systems [pdf]
27.1.2015 14:30-17:30 Sala Conferenze Emanuele Della Valle 6 Open workshop on Application of Big Data (flyer [pdf] )

27.2.2015 9:30-13:30 Sala Seminari Emanuele Della Valle 4 Students’d reports

Teaching material

The course material consists in slides prepared by the lecturers, links to on-line tutorials, and the open source release of the dataset of Telecom Italia Big Data Challenge. Students will gain enough background on the topics to be able to use the infrastructure made available by IBM as well as the poliCloud one donated by Yahoo!.

Giving the exam

The exam consist in:

  • formulating a Big Data problem: is it about volume? velocity? variety? veracity? a mix of them? Why?
  • solving such a problem with Big Data technologies illustrated during the course: which tools? Which methods? Why?
  • preparing a presentation where you discuss:
    • who you are (name, surname, skills, time spent)
    • the problem
    • the (partial) solution
    • what you learnt, i.e., what did go as expected? what did not? why? (right/wrong skills/tools, lack of time, etc.)
  • giving your presentation
  • discuss the presentations of others

Students are expected to cluster in groups of maximum 4 people, but one can work on his/her own if he/she prefers. Please find at the following link a form to fill up:

https://docs.google.com/forms/d/18vizeCbtt98017IXrlA7aQWTeSFdBxQR5ONnUGvLQyY/viewform