Skip to content
From Big Data to AI via Data Science [slides]
 Datadriven Decision Making for Datadriven Organizations
 What are Big Data, Data Science and AI?
 How do Big Data, Data Science and AI support Datadriven Decision Making?
 Who is using Big Data, Data Science and AI?
Key concepts

 Big Data
 Vertical vs. horizontal scalability [slides][drawing]
 Scaling storage horizontally with KV pairs [slides][drawings]
 Scaling processing horizontally with MapReduce [slides][drawing]
 Achieving usability implementing high level interfaces on horizontally scalable solutions for: data access (e.g., Spark SQL), machine learning (e.g., SparkML), graph processing (e.g., Spark GraphX) and stream processing (e.g., Spark Streaming)
 Solving problems the AI way [drawings]
 programming vs. reasoning
 Gradient Descent and error minimization
 Understanding data visualization
 the “grammar of graphics”: data and aesthetic mappings, geometric objects, scales, facet specification, statistical transformations, and
 exploring the “diamonds” dataset in Tableau
The main Machine Learning algorithms


 Venn Digram of ML algorithms [drawing]
 Predicting house prices using Linear Regression and Gradient Descent [drawings]
 Detecting spam emails using Naive Bayes Algorithm [drawings]
 Recommending Apps based on Decision Trees [drawings][animation]
 Finding the best location for a shop based on Kmeans clustering or Hierarchical Clustering [drawings]
 Deciding to accept students at a university based on Logistic Regression and Gradient Descent with Logloss function [drawings]
 When a line is not enough … or the kernel trick of Support Vector Machines [drawings]
Introduction to Deep Learning


 Handson the Linear Perceptron and the linear classification problems it can solve
 NonLinearity + Perceptron = Universal Approximation
 Handson Deep Learning and the nonlinear classification problems it can solve
Project work part 1


 grouping and idea generation via thinking hats
A case of Web Analytics


 Scraping Amazon’s product reviews
 Training a sentiment analysis model of Amazon’s product reviews using text analytics, and logistic regression
 Testing the sentiment analysis model
 Classify microposts about Amazon’s product as positive or negative using the sentiment analysis trained model
 Discussion on measuring classification errors
 MATERIAL: [drawing][nootbook for Databricks]
A case of Predictive Analytics


 The bike sharing dataset
 Building three predictive modeling using Decision Trees, Random Forests and Gradient Boosted Trees
 The role of Kfold cross validation and hyperparameter exploration to compare the results
 Discussion on bias, variance, error and model complexity
 Discussion on compatibility between Big Data, Kfold cross validation and Random Forests
Project work part 2


 identifying solutions based on ML and quick feedbacks
Convolutional Neural Networks (CNN)


 The problem of converting an image of a handwritten character into its correct classification (i.e., which character is it?)
 The MNIST Digits Dataset
 Understanding Convolutional Neural Networks
 Demo: how Keras + Tensorflow can classify correctly the MNIST Digits Dataset
 Demo: how Keras + Tensorflow can understand images using Inception V3 model
Project work part 3


 identifying solutions based on Deep Learning and quick feedbacks
Two personal concrete experiences


 Predicting the aluminium price using Deep Learning (RNN)
 Listening to the pulse of our cities
Project work part 4

 Choosing between alternatives