15  Machine Learning I: Introduction to Machine Learning

Section Learning Objectives
  • Introduction to machine learning
  • Simple example
  • More examples

This chapter and the next are heavily built on two chapters in Géron (2022) “Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow,” Third Edition. You can find a link to the GitHub page of this textbook at Geron GitHub

Géron, Aurélien. 2022. Hands-on Machine Learning with Scikit-Learn, Keras & Tensorflow. 3rd ed. O’Reilly Media.

Machine learning is ubiquitous. Machine learning algorithm guide your daily google searches, determine the way Netflix presents its offerings to you, guide your selections when shopping on sites such as Amazon, translate your spoken words into code that your Phone or any other of the many voice assistants can process further into meaningful services for you, drive Teslas semi-autonomous, or simply recognize your face on a photo you upload onto Facebook. These are just a few of the many many examples where Machine Learning has entered your life, whether you are aware of it or not.

Brief History of Machine Learning

Brief History of Machine Learning

One of the earliest examples of a Machine Learning algorithm that you are familiar with is the Spam Filter. We will be using this example to further explain what machine learning does and how different machine learning algorithms can be classified.

15.1 Different Types of Machine Learning Algorithms

Machine learning algorithms can be classified according to the following criteria:

  1. Supervised vs. unsupervised vs. reinforcement learning

    Are they trained (estimated) with human supervision, without
    supervision, or do they reinforce actions based on rewards and
    penalties.
  2. Online vs. batch learning

     Do they learn incrementally as data becomes available or do
     they require "all of the data" at once
  3. Instance-based vs. model-based learning

     Do they compare new data points to known data points or do
     they detect patterns building a predictive model (based on
     parameters)

Let’s discuss this classification in some more detail. In supervised learning, the training set (i.e., data) you feed to the algorithm includes the desired outcome or solution, called label (i.e., the dependent or outcome variable). In other words, if you know what your outcome variable is, i.e., what it measures, then we say it has a label because you are able to classify the outcome variable according to some criteria.

If, on the other hand, you do not even know what exactly your outcome variable is, i.e., it is missing a label that would allow a quick classification of this variable, then we are talking about so called Unsupervised learning which deals with unlabeled data. In this instance we are usually trying to find some patterns in the outcome variable that we can then use for a possible interpretation of what the outcome variable actually measures.

Figure 15.1 summarizes the different types of machine learning according to our first classification above where we distinguished between

  1. Supervised Learning,

  2. Unsupervised Learning, and

  3. Reinforcement Learning.

Figure 15.1: Classification of ML algorithms.

Table 15.1 contrasts the language we use in Econometrics with the language commonly used in Machine Learning.

Table 15.1: The Language of Econometrics and Machine Learning
Item Econometrics Machine Learning
Data Data/Obs. Training data/set
y Dependent var Label
x or X Independent var Feature/predictor
Estimation Training an algorithm or model
\(\beta\) Parameter Weight

15.2 Stuff from ML Course

Supervised learning - take outcome with predictors i.e., we have labeled data

Unsupervised learning - you do not have labels, so just the X data. We can cluster it etc.,so it is more descriptive

Focus on supervised learning. Out of sample prediction (not out of domain)

  • Continuous (regression)
  • Discrete (classification)

Lasso introduces bias (OLS is unbiased and consistent) into coefficient estimates and into prediction BUT we get something back. It’s a tradeoff between out of sample prediction and introducing a bias.

Lasso can push coefficient estimates to zero.:w

15.3 Support Vector Machines

This is still labeled data because the label tells us which class and x1,x2 observations stems from.

15.4 Decision Tree

Think of splitting covariate space for X (say income) and its marginal effect on Y, which could differ by income decile, for instance. A decision tree is going to give us the partition where income has similar marginal effect.

Does it ‘greedy’ (myopic optimization) - not necessarily globally optimal. Recursive partitioning (or building a decision tree).

Key Concepts and Summary

Machine learning

  • The basics
  • Why is regression analysis machine learning. What type of machine learning is it?