15  Working with Data from the Web II

Chapter Learning Objectives
  • Discuss APIs
  • Yelp API Example
  • Federal Reserve Bank - FRED API Example

Application Programming Interfaces (APIs) provide convenient interfaces between different applications, independent of the application's architecture and programming language that it has been written in.

15.1 Intro to APIs

A web application API allows the user to make requests via HTTP for some type of data and the API will return this data in pre-packaged XML or JSON formats.

An API request is very much like accessing a website with a browser. Both calls use the HTTP protocol to download a file. The only difference is that an API follows a much more regulated syntax and downloads data in XML (eXtensible Markup Language) or JSON (JavaScript Object Notation) formats as opposed to HTML (HyperText Markup Language).

An API uses four separate methods to access a web server via HTTP:

  1. GET
  2. POST
  3. PUT
  4. DELETE

The GET method is used to download data. The POST method is used to fill out a form or submit information to the server. PUT is similar to post and used for updating already existing objects. Most APIs use POST instead of PUT, so PUT is very rare. Finally, the DELETE method is used to delete information/data from a web server.

Most websites that provide APIs require prior authentication to operate with the web server via the API. In other words, you will first need to 'apply for an API' by registering with the website. After registering for an API you may have to wait for a few days. Once your application is granted you are provided with an api_key value that allows the server to identify you when you are making your calls.

Some APIs require more detailed forms of identification using the OAuth protocol parameters for each request:

OAuth Parameter Value
oauth_consumer_key Your OAuth consumer key (from Manage API Access)
oauth_token The access token obtained (from Manage API Access)
oauth_signature_method hmac-sha1
oauth_signature The generated request signature, signed with the oauth_token_secret obtained
oauth_timestamp Timestamp for the request in secs since the Unix epoch
oauth_nonce A unique string randomly generated per request

This seems a bit complicated, but most larger website provide sample codes for using their APIs in various popular programming languages. Sometimes, they even provide small Python libraries to facilitate the interaction with their website.

15.2 Yelp-API

For interacting with the Yelp API for instance you can find information here. There is also a Python library available that you can download from here.

If you want to install this library you simply open a command line window (in Windows click on the windows start menu and type cmd followed by enter) and type:

pip install yelpapi

Before you can use this library in a Python script you need to have a yelp account, so make one. Then log in. You then need to apply for a 'new app'. This usually takes a few minutes. You will then be issued a unique client_id as well as a unique client_secret key. These two keys identify you as the user. Do not make these keys public.

Now, in order to use this library you can then simply write a small script as follows:

1from yelpapi import YelpAPI
from pprint import pprint

client_id = 'YOUR_CLIENT_ID_FROM_YOUR_YELP_APPLICATION'
client_secret = 'YOUR_CLIENT_SECRET_FROM_YOUR_YELP_APPLICATION'

yelp_api = YelpAPI(client_id, client_secret)
1
Import the Yelp API

You can then search the yelp_api client object with:

response = yelp_api.search_query(term='ice cream', \
    location='austin, tx', sort_by='rating', limit=5)
pprint(response)

Which searches for the top 5 rated ice cream places in Austin TX and orders the results by ratings. The pprint() function prints the dictionary data with nice indents so that you can see the data structure better.

Note

The yelp-api returns data in the form of a Python dictionary object. Go to the earlier chapter on data types on refresh your memory. It's often convenient to transform the dictionary data into a Pandas DataFrame. Ultimately, it's easier to analyse data when it is in a DataFrame.

Here is a small example script that shows an entire solution for this:

from yelpapi import YelpAPI
from pprint import pprint
import pandas as pd

client_id = 'YOUR_CLIENT_ID_FROM_YOUR_YELP_APPLICATION'
client_secret = 'YOUR_CLIENT_SECRET_FROM_YOUR_YELP_APPLICATION'

def parse_dict(init, lkey=''):
    """
    This function 'flattens' a nested dictionary so that all subkeys become
    primary keys and can then be read into a DataFrame as separate columns
    """
    ret = {}
    for rkey,val in init.items():
        key = lkey+rkey
        if isinstance(val, dict):
            ret.update(parse_dict(val, key+'_'))
        else:
            ret[key] = val
    return ret

# Access Yelp with the user idenficiation codes
yelp_api = YelpAPI(client_id, client_secret)

# Query Yelp via its API and search for 5 top ice cream places in Texas
# This will return a 'dictionary' data object.
response = yelp_api.search_query(term='ice cream', \
    location='austin, tx', sort_by='rating', limit=5)

# Print the results nicely with tabls for sub-keys in the resulting
# dictionary
pprint(response)

# Prepare a Pandas DataFrame so we can transform the dictionary-data into
# a Pandas DataFrame object

# Nr. of obs from our search
nrObs = len(response['businesses'])
index = range(nrObs)

# Take first observation as an example for key-extraction
#print(parse_dict(response['businesses'][0],''))
tempDict = parse_dict(response['businesses'][0],'')

# Make a list that contains all the keys in the dictionary
columns = []
for keys in tempDict:
    columns.append(keys)

# Generate the empty dataframe with all the key-columns
df = pd.DataFrame(index=index, columns=columns)

# Run a loop through the dictionary data and extract it into the DataFrame
for i in range(nrObs):
    tempDict = parse_dict(response['businesses'][i],'')
    for key in tempDict:
        # From dictionary into dataframe
        df.loc[df.index[i], key] = tempDict[key]

# Sort by rating
df = df.sort_values('rating', ascending=False)

15.3 Economic Data from the Federal Reserve

The Federal Reserve Economic Data (FRED) can be accessed with the library pandas_datareader.data.

Warning

Warning If you google how to access FRED you may come across a library (or service) called Quandl. Do not use it, it does not work well in my experience.

Here is the link with instructions about how you can install the library pandas-datareader

Simply open a terminal window and type:

conda install -c anaconda pandas-datareader

You will be prompted whether you want to install this library and possible a couple others. Simply type y to accept and the pandas-datareader library will install. You need internet access for this. Restart the iPython console within Spyder by simply closing it. It will then open a new interactive console (command line window) within Spyder. Now you are good to go.

You can now easily download data from the FRED database as follows:

import time
import calendar
import datetime
import os
import socket

import pandas_datareader.data as web
import datetime

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid", {'grid.linestyle': ':'})

start = datetime.datetime(1947, 1, 1)
end = datetime.datetime(2024, 3, 1)

# Quarterly data
# -----------------------------------------------------------------------------
print('Downloading quarterly data ... ')

data_list = ['USRECQ', 'GDP', 'GNP', 'GDPC1']
name_list =  ['Recession', 'GDP', 'GNP', 'rGDP']

df_q = web.DataReader(data_list, 'fred', start, end)
df_q.columns = name_list

# Plot data
print(df_q)

# Plot single column
print(df_q['GDP'])

# You can also save your graphs using
#plt.savefig(gtexpath+'GDP_Fig1.png', format='png')
#plt.show()
Downloading quarterly data ... 
            Recession        GDP        GNP       rGDP
DATE                                                  
1947-01-01          0    243.164    244.142   2182.681
1947-04-01          0    245.968    247.063   2176.892
1947-07-01          0    249.585    250.716   2172.432
1947-10-01          0    259.745    260.981   2206.452
1948-01-01          0    265.742    267.133   2239.682
...               ...        ...        ...        ...
2022-10-01          0  26408.405  26593.998  21989.981
2023-01-01          0  26813.601  26972.528  22112.329
2023-04-01          0  27063.012  27236.100  22225.350
2023-07-01          0  27610.128  27774.189  22490.692
2023-10-01          0  27944.627        NaN  22668.986

[308 rows x 4 columns]
DATE
1947-01-01      243.164
1947-04-01      245.968
1947-07-01      249.585
1947-10-01      259.745
1948-01-01      265.742
                ...    
2022-10-01    26408.405
2023-01-01    26813.601
2023-04-01    27063.012
2023-07-01    27610.128
2023-10-01    27944.627
Freq: QS-OCT, Name: GDP, Length: 308, dtype: float64

This code downloads the same data but organizes it differently.

Error in library(tidyquant): there is no package called 'tidyquant'
start <- as.Date("1947-01-01")
end <- as.Date("2024-03-01")

# Quarterly data
# -----------------------------------------------------------------------------
cat('Downloading quarterly data ... \n')

data_list <- c('USRECQ', 'GDP', 'GNP', 'GDPC1')

df_q <- tq_get(data_list, from = start, to = end, get = "economic.data")
Error in tq_get(data_list, from = start, to = end, get = "economic.data"): could not find function "tq_get"
#names(df_q) <- name_list

# Plot data
print(df_q)
Error in eval(expr, envir, enclos): object 'df_q' not found
Downloading quarterly data ... 

This will download GDP and other variables and plot them. When you download the data you can specify the time frame. This will of course depend on the specific time series you are interested in. Read the documentation about the data on the FRED to see what is available.

We can next plot some of the data.

df_q[['GDP', 'rGDP']].plot(title='Nominal and Real US-GDP')
plt.show()

Nominal and Real GDP

Nominal and Real GDP

Since the code is organized differently, you would first have to reorganize it so that each variable has its own column before you plot it.

# Try it.

Similarly you can use daily or monthly data using:

import time
import calendar
import datetime

import pandas_datareader.data as web
import datetime

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid", {'grid.linestyle': ':'})

start = datetime.datetime(1947, 1, 1)
end = datetime.datetime(2024, 3, 1)

# Daily data
# -----------------------------------------------------------------------------
print('Downloading daily data ... ')
data_list = ['USRECD', 'DFF', 'DPCREDIT']
name_list = ['Recession', 'FedFunds', 'FedDiscountRate']

df_d = web.DataReader(data_list, 'fred', start, end)
df_d.columns = name_list

# Monthly data
# -----------------------------------------------------------------------------
print('Downloading monthly data ... ')
data_list = ['USREC', 'UNRATE', 'FEDFUNDS', 'CPIAUCSL', 'CPILFESL', 'CIVPART', \
 'LNS14000003', 'LNS14000006', 'LNS14000009', 'LNU04032183',\
 'LNS14000001', 'LNS14000002', \
 'LNS14000028', 'LNS14000029', \
 'LNS14000031', 'LNS14000032', 'LNU04000034', 'LNU04000035', \
 'LNS14000036', 'LNS14000089', 'LNS14000091', \
 'LNS14000093', 'LNS14024230', \
 'LNS14027659', 'LNS14027660', \
 'LNS14027662', 'CGMD25O', 'CGDD25O', \
 'M2SL']

name_list = ['Recession', 'Unemployment', 'Interest-Rate', 'CPI_Urban', \
  'CPI_Core', 'LaborForcePart', \
  'UE_White', 'UE_Black', 'UE_Hispanic','UE_Asian', \
  'UE_Men', 'UE_Women', \
  'UE_White_Men', 'UE_White_Women', 'UE_Black_Men', \
  'UE_Black_Women', 'UE_Hispanic_Men', 'UE_Hispanic_Women',
  '20-24', '25-34', '35-44', '45-54', '55 and up', \
  'UE (>25, less than HS)', 'UE (>25, HS)', 'UE (>25, College)', \
  'UE (>25, Master)', 'UE (>25, Doctoral)', \
  'M2']

df_m = web.DataReader(data_list, 'fred', start, end)

15.4 More Tutorials

Here are additional web tutorials about how to use Python together with the API of popular websites:

Key Concepts and Summary
  • Using API to download FRED data
  • User token management
  1. Download GDP data
  2. Plot GDP data for the 20 most recent years