Machine Learning and Social Science

Xavier Vergés
@XavierVerges IBM

Agenda

  1. How do <insert noun> learn?
  2. AI, ML, NLP, DL…​ Hints to navigate the alphabet soup
  3. Five main routes to Machine Learning
  4. Data is the new…​
  5. Machine Learning and Social Science: Two Cultures
  6. Activity: read and tell

Objectives

I want the audience to

  • Become familiar with some Machine Learning concepts
  • Understand some of the risks of using data
  • Become familiar with the differences of the main world views of Machine Learning and Econometrics
  • Consider adding a new feedback loop to their life

In Praise of Feedback Loops

feedback loop
Outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a loop. The system can then be said to feed back into itself.

Excited about a thermostat? Seriously?

feedback learn
No feedback, no learning.
No feedback, no science.

Evolution:
learning for genes

Culture:
learning for societies

Culture:
learning for societies
at a faster pace

Experience:
learning for individuals

Information technology:
a new way to learn

Information technology:
a new way to learn
at a faster pace

Think, write,
send to typist,
wait for typist,
wait for computer time slot…​

a b testing

... vs learn in a few hours
if your users like your new idea

Ah! So Machine Learning is
just a new fancy name for
good old computing?

Ah! So Machine Learning is
just a new fancy name for
good old computing?

traditional algorithms

ml algorithms

Alphabet Soup

The AI Effect:
As soon as AI solves a problem,
the problem is no longer a part of AI

Project Debater

9:00 AM!

Supervised/Unsupervised Learning

Thanksgiving

The 5 Tribes of ML

From Pedro Domingo's The Master Algorithm:

  • The Symbolists

    • Work with high-level, human-readable, representations of problems, logic and search
    • Expert Systems

  • The Connectionists

    • Focus on re-engineering the brain
    • Artificial Neural Networks (Deep Learning)

  • The Evolutionaries

    • Genetic Algorithms

  • The Bayesians

    • Probability-based hypothesis that are updated as more data is processed
    • Spam filters

  • The Analogizers

    • Focus on techniques to match pieces of data to each other

How Machines Learn

How Machines Really Learn. [Footnote]

The more data,
the faster a field progresses

Data enables
evidence-informed policies

Data enables
evidence-informed policies
(that nobody cares about)

truth o meter

All the data sets share
the same summary statistics

Google News corpus:
father is to doctor as
mother is to nurse

Machine Learning and
Social Science Statistics

Understand or Predict?

  • Traditional Econometric Models

    • Seek to give understanding
    • Based on assumptions and probabilities
    • Hand-crafted feature selection
    • Mostly linear
    • data + simple models + advanced math

  • Machine Learning Models

    • Seek to give predictive accuracy
    • Fewer assumptions
    • Automatic feature selection
    • Non-linear
    • lots of data + complex models + optimizer

Overfitting

overfitting1
overfitting2

Causality and Predictions

  • Examples

    • Hotel prices: predicting occupancy vs estimating the effect of raising pricesl
    • Crime rates in a zone and number of policeman there

  • Determining causation requires understanding
  • Accurate predictions can help experimentation as a (sort-of) control group

Activity: Read and Tell

Image Attributions

/