14. Working with Data from the Web II

Application Programming Interfaces (APIs) provide convenient interfaces between different applications, independent of the application’s architecture and programming language that it has been written in.

14.1. Intro to APIs

A web application API allows the user to make requests via HTTP for some type of data and the API will return this data in pre-packaged XML or JSON formats.

An API request is very much like accessing a website with a browser. Both calls use the HTTP protocol to download a file. The only difference is that an API follows a much more regulated syntax and downloads data in XML (eXtensible Markup Language) or JSON (JavaScript Object Notation) formats as opposed to HTML (HyperText Markup Language).

An API uses four separate methods to access a web server via HTTP:

  1. GET

  2. POST

  3. PUT

  4. DELETE

The GET method is used to download data. The POST method is used to fill out a form or submit information to the server. PUT is similar to post and used for updating already existing objects. Most APIs use POST instead of PUT, so PUT is very rare. Finally, the DELETE method is used to delete information/data from a web server.

Most websites that provide APIs require prior authentication to operate with the web server via the API. In other words, you will first need to ‘apply for an API’ by registering with the website. After registering for an API you may have to wait for a few days. Once your application is granted you are provided with an api_key value that allows the server to identify you when you are making your calls.

Some APIs require more detailed forms of identification using the OAuth protocol parameters for each request:

This seems a bit complicated, but most larger website provide sample codes for using their APIs in various popular programming languages. Sometimes, they even provide small Python libraries to facilitate the interaction with their website.

14.2. Yelp-API

For interacting with the Yelp API for instance you can find information here. There is also a Python library available that you can download from here.

If you want to install this library you simply open a command line window (in Windows click on the windows start menu and type cmd followed by enter) and type:

pip install yelpapi

Before you can use this library in a Python script you need to have a yelp account, so make one. Then log in. You then need to apply for a ‘new app’. This usually takes a few minutes. You will then be issued a unique client_id as well as a unique client_secret key. These two keys identify you as the user. Do not make these keys public.

Now, in order to use this library you can then simply write a small script as follows:

from yelpapi import YelpAPI
from pprint import pprint

client_id = 'YOUR_CLIENT_ID_FROM_YOUR_YELP_APPLICATION'
client_secret = 'YOUR_CLIENT_SECRET_FROM_YOUR_YELP_APPLICATION'

yelp_api = YelpAPI(client_id, client_secret)

You can then search the yelp_api client object with:

response = yelp_api.search_query(term='ice cream', \
    location='austin, tx', sort_by='rating', limit=5)
pprint(response)

Which searches for the top 5 rated ice cream places in Austin TX and orders the results by ratings. The pprint() function prints the dictionary data with nice indents so that you can see the data structure better.

Note

The yelp-api returns data in the form of a Python dictionary object. Go to the earlier chapter on data types on refresh your memory. It’s often convenient to transform the dictionary data into a Pandas DataFrame. Ultimately, it’s easier to analyse data when it is in a DataFrame.

Here is a small example script that shows an entire solution for this:

from yelpapi import YelpAPI
from pprint import pprint
import pandas as pd

client_id = 'YOUR_CLIENT_ID_FROM_YOUR_YELP_APPLICATION'
client_secret = 'YOUR_CLIENT_SECRET_FROM_YOUR_YELP_APPLICATION'

def parse_dict(init, lkey=''):
    """
    This function 'flattens' a nested dictionary so that all subkeys become
    primary keys and can then be read into a DataFrame as separate columns
    """
    ret = {}
    for rkey,val in init.items():
        key = lkey+rkey
        if isinstance(val, dict):
            ret.update(parse_dict(val, key+'_'))
        else:
            ret[key] = val
    return ret

# Access Yelp with the user idenficiation codes
yelp_api = YelpAPI(client_id, client_secret)

# Query Yelp via its API and search for 5 top ice cream places in Texas
# This will return a 'dictionary' data object.
response = yelp_api.search_query(term='ice cream', \
    location='austin, tx', sort_by='rating', limit=5)

# Print the results nicely with tabls for sub-keys in the resulting
# dictionary
pprint(response)

# Prepare a Pandas DataFrame so we can transform the dictionary-data into
# a Pandas DataFrame object

# Nr. of obs from our search
nrObs = len(response['businesses'])
index = range(nrObs)

# Take first observation as an example for key-extraction
#print(parse_dict(response['businesses'][0],''))
tempDict = parse_dict(response['businesses'][0],'')

# Make a list that contains all the keys in the dictionary
columns = []
for keys in tempDict:
    columns.append(keys)

# Generate the empty dataframe with all the key-columns
df = pd.DataFrame(index=index, columns=columns)

# Run a loop through the dictionary data and extract it into the DataFrame
for i in range(nrObs):
    tempDict = parse_dict(response['businesses'][i],'')
    for key in tempDict:
        # From dictionary into dataframe
        df.loc[df.index[i], key] = tempDict[key]

# Sort by rating
df = df.sort_values('rating', ascending=False)

14.3. Economic Data from the Federal Reserve

The website Quandl provides a very convenient API to download Economic data from various databases. Here is a list of databases that are accessible.

The Federal Reserve Economic Data (FRED) is one of the accessible databases. Quandl provides a nice Python wrapper which is a Python library that helps you to easily interact with the API of the website similar to the Yelp library above.

After applying for a free account with Quandl, you will receive an author-token.

Note

In quandl the author-token is referred to as the API Key.

Every time you access the API you need to provide this code. In addition you need to install the Quandl library using. Open a command line terminal on your computer

Warning

This is not the iPython command line terminal in Spyder. Under windows go to Start and search for cmd and click on it. This will open a command line prompt (black window).

Now type:

conda install install quandl

You will be prompted whether you want to install this library and possible a couple others that Quandl needs. Simply type y to accept and the Quandl library will install. You need internet access for this. Restart the iPython console within Spyder by simply closing it. It will then open a new interactive console (command line window) within Spyder. Now you are good to go.

You can now easily download data from the FRED database as follows:

import Quandl

# Download data from Quandl
# -----------------------------------------------------------------------------
# Quartely data
# -----------------------------------------------------------------------------
df_q = Quandl.get(["FRED/GDP", "FRED/GDPC1", "FRED/PCECC96", \
    "FRED/PCNDGC96", "FRED/DCLORX1Q020SBEA", "FRED/UNRATE"], \
    authtoken = 'author-token', \
    trim_start="1950-1-1", trim_end="2015-12-31", collapse = 'quarterly')
df_q.columns = ['GDP', 'realGDP', 'Consumption', 'Non-Durables', \
    'ClothingShoes','Unemployment']

# Monthly data
# -----------------------------------------------------------------------------
df_m = Quandl.get(["FRED/UNRATE", "FRED/FEDFUNDS", "FRED/CPIAUCSL"], \
    authtoken = 'API Key', \
    trim_start="1950-1-1", trim_end="2015-12-31", collapse = 'monthly')
df_m.columns = ['Unemployment', 'Interest-Rate', 'CPI_Urban']

# Annual data
# -----------------------------------------------------------------------------
df_a = Quandl.get(["FRED/UNRATE", "FRED/FEDFUNDS", "FRED/CPIAUCSL"], \
    authtoken = 'author-token', \
    trim_start="1950-1-1", trim_end="2015-12-31", collapse = 'annual')
df_a.columns = ['Unemployment', 'Interest-Rate', 'CPI_Urban']

# Plot data
# -----------------------------------------------------------------------------
df_q[['GDP', 'realGDP']].plot(title='Nominal and Real US-GDP')
# You can also save your graphs using
#plt.savefig(gtexpath+'GDP_Fig1.png', format='png')
#plt.show()

This will download GDP and other variables and plot them. When you download the data you can specify the time frame. This will of course depend on the specific time series you are interested in. Read the documentation about the data on the FRED to see what is available.

14.4. More Tutorials

Here are additional web tutorials about how to use Python together with the API of popular websites: