Skip to content
From Big Data to AI via Data Science [slides]
- Data-driven Decision Making for Data-driven Organizations
- What are Big Data, Data Science and AI?
- How do Big Data, Data Science and AI support Data-driven Decision Making?
- Who is using Big Data, Data Science and AI?
Key concepts
-
- Big Data
- Vertical vs. horizontal scalability [slides][drawing]
- Scaling storage horizontally with K-V pairs [slides][drawings]
- Scaling processing horizontally with MapReduce [slides][drawing]
- Achieving usability implementing high level interfaces on horizontally scalable solutions for: data access (e.g., Spark SQL), machine learning (e.g., SparkML), graph processing (e.g., Spark GraphX) and stream processing (e.g., Spark Streaming)
- Solving problems the AI way [drawings]
- programming vs. reasoning
- Gradient Descent and error minimization
- Understanding data visualization
The main Machine Learning algorithms
-
-
- Venn Digram of ML algorithms [drawing]
- Predicting house prices using Linear Regression and Gradient Descent [drawings]
- Detecting spam emails using Naive Bayes Algorithm [drawings]
- Recommending Apps based on Decision Trees [drawings][animation]
- Finding the best location for a shop based on K-means clustering or Hierarchical Clustering [drawings]
- Deciding to accept students at a university based on Logistic Regression and Gradient Descent with Log-loss function [drawings]
- When a line is not enough … or the kernel trick of Support Vector Machines [drawings]
Introduction to Deep Learning
-
-
- Hands-on the Linear Perceptron and the linear classification problems it can solve
- Non-Linearity + Perceptron = Universal Approximation
- Hands-on Deep Learning and the non-linear classification problems it can solve
Project work part 1
-
-
- grouping and idea generation via thinking hats
A case of Web Analytics
-
-
- Scraping Amazon’s product reviews
- Training a sentiment analysis model of Amazon’s product reviews using text analytics, and logistic regression
- Testing the sentiment analysis model
- Classify microposts about Amazon’s product as positive or negative using the sentiment analysis trained model
- Discussion on measuring classification errors
- MATERIAL: [drawing][nootbook for Databricks]
A case of Predictive Analytics
-
-
- The bike sharing dataset
- Building three predictive modeling using Decision Trees, Random Forests and Gradient Boosted Trees
- The role of K-fold cross validation and hyper-parameter exploration to compare the results
- Discussion on bias, variance, error and model complexity
- Discussion on compatibility between Big Data, K-fold cross validation and Random Forests
Project work part 2
-
-
- identifying solutions based on ML and quick feedbacks
Convolutional Neural Networks (CNN)
-
-
- The problem of converting an image of a handwritten character into its correct classification (i.e., which character is it?)
- The MNIST Digits Dataset
- Understanding Convolutional Neural Networks
- Demo: how Keras + Tensorflow can classify correctly the MNIST Digits Dataset
- Demo: how Keras + Tensorflow can understand images using Inception V3 model
Project work part 3
-
-
- identifying solutions based on Deep Learning and quick feedbacks
Two personal concrete experiences
-
-
- Predicting the aluminium price using Deep Learning (RNN)
- Listening to the pulse of our cities
Project work part 4
-
- Choosing between alternatives