Artificial Intelligence and Big Data [Q3, 2018]

From Big Data to AI via Data Science [slides]

  • Data-driven Decision Making for Data-driven Organizations
  • What are Big Data, Data Science and AI?
  • How do Big Data, Data Science and AI support Data-driven Decision Making?
  • Who is using Big Data, Data Science and AI?

Key concepts

    • Big Data
      • Vertical vs. horizontal scalability [slides][drawing]
      • Scaling storage horizontally with K-V pairs [slides][drawings]
      • Scaling processing horizontally with MapReduce [slides][drawing]
      • Achieving usability implementing high level interfaces on horizontally scalable solutions for: data access (e.g., Spark SQL), machine learning (e.g., SparkML), graph processing (e.g., Spark GraphX) and stream processing (e.g., Spark Streaming)
    • Solving problems the AI way [drawings]
      • programming vs. reasoning
      • Gradient Descent and error minimization
    • Understanding data visualization
      • the “grammar of graphics”: data and aesthetic mappings, geometric objects, scales, facet specification, statistical transformations, and
        the coordinate system. [article]
      • exploring the “diamonds” dataset in Tableau

The main Machine Learning algorithms

      • Venn Digram of ML algorithms [drawing]
      • Predicting house prices using Linear Regression and Gradient Descent [drawings]
      • Detecting spam emails using Naive Bayes Algorithm [drawings]
      • Recommending Apps based on Decision Trees [drawings][animation]
      • Finding the best location for a shop based on K-means clustering or Hierarchical Clustering [drawings]
      • Deciding to accept students at a university based on Logistic Regression and Gradient Descent with Log-loss function [drawings]
      • When a line is not enough … or the kernel trick of Support Vector Machines [drawings]

Introduction to Deep Learning

      • Hands-on the Linear Perceptron and the linear classification problems it can solve
      • Non-Linearity + Perceptron = Universal Approximation
      • Hands-on Deep Learning and the non-linear classification problems it can solve

Project work part 1

      • grouping and idea generation via thinking hats

A case of Web Analytics

      • Scraping Amazon’s product reviews
      • Training a sentiment analysis model of Amazon’s product reviews using text analytics, and logistic regression
      • Testing the sentiment analysis model
      • Classify microposts about Amazon’s product as positive or negative using the sentiment analysis trained model
      • Discussion on measuring classification errors
      • MATERIAL: [drawing][nootbook for Databricks]

A case of Predictive Analytics

      • The bike sharing dataset
      • Building three predictive modeling using Decision Trees, Random Forests and Gradient Boosted Trees
      • The role of K-fold cross validation and hyper-parameter exploration to compare the results
      • Discussion on bias, variance, error and model complexity
      • Discussion on compatibility between Big Data, K-fold cross validation and Random Forests

Project work part 2

      • identifying solutions based on ML and quick feedbacks

Convolutional Neural Networks (CNN)

      • The problem of converting an image of a handwritten character into its correct classification (i.e., which character is it?)
      • The MNIST Digits Dataset
      • Understanding Convolutional Neural Networks
      • Demo: how Keras + Tensorflow can classify correctly the┬áMNIST Digits Dataset
      • Demo: how Keras + Tensorflow can understand images using Inception V3 model

Project work part 3

      • identifying solutions based on Deep Learning and quick feedbacks

Two personal concrete experiences

      • Predicting the aluminium price using Deep Learning (RNN)
      • Listening to the pulse of our cities

Project work part 4

    • Choosing between alternatives