Streaming Data Analytics 2021-22

Objectives

The course provides the foundational concepts, methods, languages, and systems for ingesting, processing, and analyzing data that flows to enable real-time decisions. The course aims to the tame velocity dimensions of Big Data without forgetting the volume and variety dimensions.

Topics covered and tentative scheduling

14/09/2021 – Administrative items

Foundations of streaming algorithms

  • 16/09/2021 – From the foundations of streaming algorithms to real-world languages and systems

Streaming Data Engineering

From Data Streams Management Systems (DSMS) and Time-series Data Base (TSDB) to Complex Event Recognition and Processing (CER/P)

  • 21/09/2021 DSMS & CER/P Theory – Vertically Scalable solutions illustrated via EPL and Esper
  • 23/09/2021 DSMS & CER/P Practice – Fire Alarm case study in EPL
  • 28/09/2021 DSMS & CER/P Theory – the every clause and the guard patterns in EPL
  • 30/09/2021 DSMS & CER/P Practice – Robotic Arm case study in EPL
  • 05/10/2021 DSMS Thoery – Horizontally Scalable solutions illustrated via Spark
  • 19/10/2021 DSMS Theory – Spark Structured Streaming
  • 21/10/2021 DSMS Practice – Spark Structured Streaming – Fire Alarm
  • 26/10/2021 DSMS Thoery – Horizontally Scalable solutions illustrated via Kafka
  • 28/10/2021 DSMS Practice – Spark Structured Streaming – Robotic Arm
  • 02/11/2021 DSMS Theory – ksqlDB: cutting all from the same molt
  • 04/11/2021 DSMS Practice – ksqlDB – Fire Alarm
  • 11/11/2021 DSMS Practice – ksqlDB – Materialized Views, Streaming ETLs and Robotic Arm
  • 16/11/2021 TSDB Thoery – Time-series databases illustrated via influxDB and its Flux language
  • 18/11/2021 TSDB Practice – Flux – City Water Mng. Demo, Fire Alarm and Streaming Anomaly Detection

Streaming Data Science

From foundations of Time-series Analytics (TSA) and anomaly detection, to Streaming Machine Learning (SML) using Flux and River

  • 23/11/2021 SML Theory/Practice – Foundations: learning one sample at a time, prequential evaluation, and concept drift
  • 25/11/2021 SML Theory/Practice – Concept drift (cont.), methods for streaming classification illustrated via River
  • 30/11/2021 SML Theory/Practice – streaming ensemble methods illustrated via River, Challenge and Q/A time
  • 02/12/2021 TSA Thoery/Practice – Decomposing and detrending time-series without seasonality
  • 09/12/2021 TSA Thoery/Practice – stationary time-series, decomposition in presence of seasonality and forecasting (part 1 – baseline methods)
  • 14/12/2021 TSA Thoery/Practice -Time-series forecasting (part 2 – exponential smooting & SARIMA)
  • 16/12/2021 TSA Practice -Time-series analysis using SARIMA + Exam Preview
  • 23/12/2021 Stream Reasoning: Artificial Intelligence + Stream Processing (BONUS LECTURE)

NOTE: for the material and the recordings refer to the Webeep page of the course

Prerequisites

Students are expected to know the basics about: database management, SQL and Machine Learning.

For a refresh of SQL, I recommend https://www.w3schools.com/sql/. It is simple and comprehensive.

For a gentle introduction to Machine Learning, I recommend watching the following two videos by Luis Serrano:

I also recommend you to read/enjoy the visual introduction to Decision Trees by R2D3.

Expected learning outcomes

Knowledge and understandingStudents will learn how to identify problems that can be addressed with big data techniques tailored for velocity and apply the stream data analysis technologies for solving real-world problems
Applying knowledge and understandingGiven specific project cases, students will be able to define and implement a streaming data analysis solution for the problem, and apply it on real data streams from social media and IoT sensors
Making judgementsGiven specific project cases, students will be able to learn how to decide which streaming data analysis solution to apply and how to evaluate this decision
CommunicationStudents will learn to write a report on a project describing and motivating the decisions taken and the results obtained, and oresent their work in front of their colleagues and teachers
Lifelong learning skillsStudents will learn how to develop a realistic streaming data analysis project in all its phases

Evaluation

The exam consist of a theoretical part (written exam) and an optional practical part (project work with oral presentation)
The written exam is composed of a mix of theoretical questions regarding any course subjects and exercises regarding the technical content and how to apply it in practice. Students can get up to 30L in the written test.

The optional practical project requires to use of one or more of the technologies presented in the lectures. It consists in solving a realistic streaming data analysis problem based on real or realistic datasets publicly available or provided by the teachers. Only students, who will get at least 26/30 in the written exam, can opt for it.

The final grade is computed as follows: written text result + practical project result. E.g., written text 26 + practical project 5 = 30L

Bibliography