Machine Learning for Economics

Author

Juergen Jung

Published

January 20, 2025

1 Course Administration and Introduction

A 600 level graduate economics course offered at the Department of Economics at Towson University in Maryland, USA.

Course Instructor: Prof. Juergen Jung

email: jjung@towson.edu

web: https://juejung.github.io/

This book was typeset with Quarto. Visit https://quarto.org/docs/books.

1.1 Course Administration

We use Python for a number of reasons:

It is open source and free
It is platform independent, that is, this software runs on Windows or Linux PC’s as well as Apple computers
Python has a large user base and is still growing in popularity. You will therefore be able to find a lot of material online in case you run into trouble and need expert help and/or good sources
It’s superfun!

1.1.1 Installing Python

Next you have to download Python. If you are running Linux or Apple, some version of Python is already installed. However, these basic Python versions miss some of the important scientific packages that you will still have to install. The most important ones are numpy, scipy, and matplotlib. If you “google” these packages you should be able to find them on the internet. They need to be installed.

I also recommend that you install an IDE for Python. One that is very amenable for scientific computation is called Spyder. Spyder is part of most Linux repositories and can easily be installed from there.

The easiest way to install Python and Spyder and all sorts of other useful packages for scientific computing is to install Python via the Anaconda distribution from: https://www.anaconda.com/download

This is a simple one click installation process which works for Linux/Windows/Mac and it installs everything! When prompted which version you want to install go with the latest one for your system which at the time of writing this is Python 3.12. On relatively new computers you want the 64 bit version, not the 32 bit version.

1.1.2 Installing Python libraries ISLP and PyTorch

Method I

The basic installation without the PyTorch library can be installed as follows. This should run everything except for the neural network code in Chapter 10. Again, before installing any libraries, let’s make a new environment first. Open a terminal window and type:

conda create --name myspyder Python=3.11
conda activate myspyder
conda install pip
pip install islp
conda install spyder

Method II

In order to install PyTorch and the torch_lightning library that facilitates working with PyTorch make a new environment first that is different for the earlier one. Don’t install Python 3.13 here as PyTorch is not fully supported on the latest Python version Open a terminal window and type:

conda create --name islp Python=3.11
conda activate islp
conda install pip
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
pip install torchinfo
pip install pytorch_lightning
pip install islp
pip install spyder-kernels==3.0.*
pip install jupyter

In Spyder you can now open the islp console under Consoles \(\rightarrow\) New Console in Environment You should see the islp (Python 3.11) environment as one of the options. Choose it and now all the codes should run.

You need Spyder 6.0.1 or higher

Only Spyder 6.0.1 or higher does have the option to chose a new console in a specific environment.

1.1.3 Submitting homework, the midterms and the final via Dropbox

A homework will be due every week. You need to submit the homework and all other assignments via Dropbox. There is a little bit of setup involved but it is not very complicated.

Go to Dropbox and sign up for a free account. This is a free account which gives you 2GB cloud disk space. It’s not immediately obvious from the Dropbox starting page how to sign up for the free account. On the left, below the two boxes you see a link that says get Dropbox Basic. Follow that link.
Download the Dropbox client/app on your computer and install it. This may take a couple minutes. You should now see a new Dropbox folder with a green marking on it in your folder structure. Usually under user but it depends on whether you are under windows, mac or linux.
Note
- If you are under Windows you may have to start the Dropbox client by hand each time you reboot the computer. You simply go to the Start menue button, start typing Dropbox in the search field and once the Dropbox link appears in the Start-Menue, right-click on it and click on Run as Administrator. This will start the client.
- You should see a small Dropbox symbol in your status line at the bottom that should say something like Dropbox Up to Date when you hover over it with the mouse pointer.
- Maybe you can figure out how to add Dropbox to your autostart menu, so that every time you turn on your computer Dropbox starts automatically. This is up to you.
Accept the share-a-folder invitation that I sent out via Dropbox. I will email this invitation after our first class session.
You will find a folder structure in this shared folder that should be self explanatory, i.e., a homework folder for homework, a midterm folder for the midterm, etc. All the homework script file with extensions .py are already inside of this folder.
When you work on a homework, simply open Spyder first. Then, from within Spyder navigate to this shared folder and open the homework script file that you want to work on. For the first homework this would be the file: homework1.py. So open homework1.py from within Spyder and start editing it. Once you hit save the file will automatically be mirrored via Dropbox and I will see the updated homework1.py file on my computer.
No further action is required - the homework is already submitted. (Provided your Dropbox is active (i.e., turned on))
PS: Please do not change the names of the scriptfiles.
A day or two after the due date of a homework assignment, you will find a file called: homework1_graded.py with my comments and the point score for this assignment.

1.2 Brief Introduction to Machine Learning

Section Learning Objectives

Introduction to machine learning
Econometrics vs ML terminology

This chapter and the next are heavily built on two chapters in Géron (2022) “Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow,” Third Edition. You can find a link to the GitHub page of this textbook at Geron GitHub

Géron, Aurélien. 2022. Hands-on Machine Learning with Scikit-Learn, Keras & Tensorflow. 3rd ed. O’Reilly Media.

Machine learning is ubiquitous. Machine learning algorithm guide your daily google searches, determine the way Netflix presents its offerings to you, guide your selections when shopping on sites such as Amazon, translate your spoken words into code that your Phone or any other of the many voice assistants can process further into meaningful services for you, drive Teslas semi-autonomous, or simply recognize your face on a photo you upload onto Facebook. These are just a few of the many many examples where Machine Learning has entered your life, whether you are aware of it or not.

One of the earliest examples of a Machine Learning algorithm that you are familiar with is the Spam Filter. We will be using this example to further explain what machine learning does and how different machine learning algorithms can be classified.

1.2.1 Different Types of Machine Learning Algorithms

Machine learning algorithms can be classified according to the following criteria:

Supervised vs. unsupervised vs. reinforcement learning

Are they trained (estimated) with human supervision, without
supervision, or do they reinforce actions based on rewards and
penalties.

Online vs. batch learning

 Do they learn incrementally as data becomes available or do
 they require "all of the data" at once

Instance-based vs. model-based learning

 Do they compare new data points to known data points or do
 they detect patterns building a predictive model (based on
 parameters)

Let’s discuss this classification in some more detail. In supervised learning, the training set (i.e., data) you feed to the algorithm includes the desired outcome or solution, called label (i.e., the dependent or outcome variable). In other words, if you know what your outcome variable is, i.e., what it measures, then we say it has a label because you are able to classify the outcome variable according to some criteria.

If, on the other hand, you do not even know what exactly your outcome variable is, i.e., it is missing a label that would allow a quick classification of this variable, then we are talking about so called Unsupervised learning which deals with unlabeled data. In this instance we are usually trying to find some patterns in the outcome variable that we can then use for a possible interpretation of what the outcome variable actually measures.

Figure 1.1 summarizes the different types of machine learning according to our first classification above where we distinguished between

Supervised Learning,
Unsupervised Learning, and
Reinforcement Learning.

Figure 1.1: Classification of ML algorithms.

Supervised learning - take outcome with predictors i.e., we have labeled data
Unsupervised learning - you do not have labels, so just the X data. We can cluster it etc.,so it is more descriptive

Focus on supervised learning. Out of sample prediction (not out of domain)

Continuous (regression)
Discrete (classification)

Table 1.1 contrasts the language we use in Econometrics with the language commonly used in Machine Learning.

Table 1.1: The Language of Econometrics and Machine Learning

Item	Econometrics	Machine Learning
Data	Data/Obs.	Training data/set
y	Dependent var	Label
x or X	Independent var	Feature/predictor
	Estimation	Training an algorithm or model
\(\beta\)	Parameter	Weight