4 Data Structures

Chapter Learning Objectives

Lists
Tuples
Dictionaries
JSON format

4.1 Lists

A list is a simple container object that can hold an arbitrary number of Python objects. A list can be a list of numbers, words, or a combination. Here is an example:

Python Code
R Code

alist = [1, "me", 3.456, "you", 50]
print("The list is alist = {}".format(alist))

The list is alist = [1, 'me', 3.456, 'you', 50]

alist <- c(1, "me", 3.456, "you", 50)
cat("The list is alist =", alist)

The list is alist = 1 me 3.456 you 50

Alternatively we could translate the list into a string and then print it. You can tag together strings with the plus sign, which combines a string "The list is alist = " together with another string str(alist). The Python function str() translates a number into a string, i.e., a word and then prints the combined "word".

Python Code
R Code

print("The list is alist = " + str(alist))

The list is alist = [1, 'me', 3.456, 'you', 50]

print(paste("The list is alist =", alist))

[1] "The list is alist = 1"     "The list is alist = me"   
[3] "The list is alist = 3.456" "The list is alist = you"  
[5] "The list is alist = 50"

We can access the elements of a list bit-by-bit using list indexation. Note that the first element in the list is at position 0.

Warning

Python is zero-indexed. This means that the first element in a list (or other collection object such as tuple or array, more on these later) is at position “zero”.

R, on the other hand, is one-indexed. This means that the first element in a list is at position “one”.

Most general programming languages such as C, Java, Ruby, PHP, Python etc. are zero indexed, whereas most applied programming “languages” such as R, Matlab, Julia or Stata are one-indexed.

There is a philosophical debate about zero vs. one indexing going on, but the gist is that most computer scientists strongly prefer zero indexing whereas most math adjacent fields such as statistics and economics prefer one-based indexing as it seems more “natural.” See Wikipedia on Zero Based Indexing

Python Code
R Code

print(alist[0])
print(alist[1])
print(alist[2])
print(alist[3])
print(alist[4])

1
me
3.456
you
50

Warning

In R indexing does not start with zero, but with one. So that the first element in the list has to be indexed as alist[1].

cat(alist[1])
cat(alist[2])
cat(alist[3])
cat(alist[4])
cat(alist[5])

1me3.456you50

Or prettier

Python Code
R Code

print("alist[0] = {}".format(alist[0]))
print("alist[1] = {}".format(alist[1]))
print("alist[2] = {}".format(alist[2]))
print("alist[3] = {}".format(alist[3]))
print("alist[4] = {}".format(alist[4]))

alist[0] = 1
alist[1] = me
alist[2] = 3.456
alist[3] = you
alist[4] = 50

print(paste("alist[1] =", alist[1]))
print(paste("alist[2] =", alist[2]))
print(paste("alist[3] =", alist[3]))
print(paste("alist[4] =", alist[4]))
print(paste("alist[5] =", alist[5]))

[1] "alist[1] = 1"
[1] "alist[2] = me"
[1] "alist[3] = 3.456"
[1] "alist[4] = you"
[1] "alist[5] = 50"

If you want to extract more than one element of a list you can use a slice operator. This basically involves the colon symbol : at the appropriate position. If you want to extract the first three elements you can simply index the list as alist[:3]. If you want to get everything from element 2 onwards you can alist[2:]

Python Code
R Code

print(alist)
print("alist[0:4] = {}".format(alist[0:4]))
print("alist[:3] = {}".format(alist[:3]))
print("alist[2:] = {}".format(alist[2:]))

[1, 'me', 3.456, 'you', 50]
alist[0:4] = [1, 'me', 3.456, 'you']
alist[:3] = [1, 'me', 3.456]
alist[2:] = [3.456, 'you', 50]

This alist is already a vector, or atomic vector, and hence you can use cat() to print it to the screen.

alist <- c(1, 2, 3, 4, 5)

cat(alist, "\n")
cat("alist[1:5] =", alist[1:5], "\n")
cat("alist[1:3] =", alist[1:3], "\n")
cat("alist[3:5] =", alist[3:5], "\n")

1 2 3 4 5 
alist[1:5] = 1 2 3 4 5 
alist[1:3] = 1 2 3 
alist[3:5] = 3 4 5

If you want to grab the last element of the list you can start indexing with negative numbers.

Python Code
R Code

alist = [1, "me", 3.456, "you", 50]
print(alist)
print("alist[-1] = {}".format(alist[-1]))
print("alist[-2] = {}".format(alist[-2]))
print("alist[-3] = {}".format(alist[-3]))

[1, 'me', 3.456, 'you', 50]
alist[-1] = 50
alist[-2] = you
alist[-3] = 3.456

Warning

In R you cannot use cat to print the elements of a list, you need to use either print() or str(). Alternatively, you can first unlist() the list object and then use cat() to “print” the resulting atomic vector.

alist <- list(1, "me", 3.456, "you", 50)


print(alist)
str(alist)

# or unlist first
my_atomic_vec = unlist(alist)
cat(my_atomic_vec)

print(paste("alist[length(alist)] =", alist[length(alist)]))
cat('\n')
print(paste("alist[length(alist) - 1] =", alist[length(alist) - 1]))
cat('\n')
print(paste("alist[length(alist) - 2] =", alist[length(alist) - 2]))
cat('\n')

[[1]]
[1] 1

[[2]]
[1] "me"

[[3]]
[1] 3.456

[[4]]
[1] "you"

[[5]]
[1] 50

List of 5
 $ : num 1
 $ : chr "me"
 $ : num 3.46
 $ : chr "you"
 $ : num 50
1 me 3.456 you 50[1] "alist[length(alist)] = 50"

[1] "alist[length(alist) - 1] = you"

[1] "alist[length(alist) - 2] = 3.456"

You can change elements of a list by reassigning them using their index. So if you want to replace the third element of the list with the word "Mom" you simply assign it as

Python Code
R Code

alist[2] = "Mom"
print(alist)

[1, 'me', 'Mom', 'you', 50]

alist[2] = "Mom"

# Again: Don't use cat() here
print(alist)

[[1]]
[1] 1

[[2]]
[1] "Mom"

[[3]]
[1] 3.456

[[4]]
[1] "you"

[[5]]
[1] 50

What about more complicated lists, where the elements inside the list are lists themselves. In this case we are dealing with nested lists. Here is an example:

Python Code
R Code

myNestedList = [['Mom', 42], ['Dad', 41], ['Kids', 10, 12]]
print(myNestedList)

[['Mom', 42], ['Dad', 41], ['Kids', 10, 12]]

myNestedList <- list(
  list(name = "Mom", age = 42),
  list(name = "Dad", age = 41),
  list(name = "Kids", ages = c(10, 12))
)

print(myNestedList)

[[1]]
[[1]]$name
[1] "Mom"

[[1]]$age
[1] 42


[[2]]
[[2]]$name
[1] "Dad"

[[2]]$age
[1] 41


[[3]]
[[3]]$name
[1] "Kids"

[[3]]$ages
[1] 10 12

Note

A quick not on the R implementation: Nested lists are not directly supported in R as a built-in data structure. However, you can achieve a similar structure using lists and named lists in R.

Now let's see what happens if we index this list. Try the following:

Python Code
R Code

print(myNestedList[0])
print('---------------')
print(myNestedList[1])
print('---------------')

['Mom', 42]
---------------
['Dad', 41]
---------------

print(myNestedList[1])
print('---------------')
print(myNestedList[2])
print('---------------')

[[1]]
[[1]]$name
[1] "Mom"

[[1]]$age
[1] 42


[1] "---------------"
[[1]]
[[1]]$name
[1] "Dad"

[[1]]$age
[1] 41


[1] "---------------"

Then try

Python Code
R Code

print(myNestedList[0][0])
print('---------------')
print(myNestedList[1][0])
print('---------------')
print(myNestedList[1][1])
print('---------------')

Mom
---------------
Dad
---------------
41
---------------

print(myNestedList[1][1])
print('---------------')
print(myNestedList[2][1])
print('---------------')
print(myNestedList[2][2])
print('---------------')

[[1]]
[[1]]$name
[1] "Mom"

[[1]]$age
[1] 42


[1] "---------------"
[[1]]
[[1]]$name
[1] "Dad"

[[1]]$age
[1] 41


[1] "---------------"
[[1]]
NULL

[1] "---------------"

Now let's go one step deeper into the list:

Python Code
R Code

print(myNestedList[1][0][0])
print('---------------')
print(myNestedList[1][0][1])
print('---------------')
print(myNestedList[1][0][2])
print('---------------')

D
---------------
a
---------------
d
---------------

print(myNestedList[2][1][1])
print('---------------')
print(myNestedList[2][1][2])
print('---------------')
print(myNestedList[2][1][3])
print('---------------')

[[1]]
[[1]]$name
[1] "Dad"

[[1]]$age
[1] 41


[1] "---------------"
[[1]]
NULL

[1] "---------------"
[[1]]
NULL

[1] "---------------"

This example shows how you can extract content from a list inside of a list by simply adding brackets with indexing positions to the list name. If you go outside the range of the inside list, the interpreter will throw an error:

Python Code
R Code

print(myNestedList[1][0][3])

Error: IndexError: string index out of range

print(myNestedList[2][1][4])

[[1]]
NULL

4.2 Tuples

Are immutable lists, that once defined, cannot be changed anymore. It is a read-only list.

Python Code
R Code

a_tuple = (1, "me", 3.456, "you", 50)
print(a_tuple)

(1, 'me', 3.456, 'you', 50)

In order to use tuples in R you first need to install the library sets with install.packages("sets"). Then you need to import the library using library(sets). Now you are ready to use tuples.

Also, in R you first have to “push” the elements of your tuple into a list and then assign the list as a tuple!

library(sets)

a_tuple = tuple((1, "me", 3.456, "you", 50))
print(a_tuple)

Error: <text>:3:19: unexpected ','
2: 
3: a_tuple = tuple((1,
                     ^

Now try to change an element of the tuple and see what happens.

Python Code
R Code

a_tuple[2] = "Mom"
print(a_tuple)

Error: TypeError: 'tuple' object does not support item assignment

(1, 'me', 3.456, 'you', 50)

a_tuple[3] = "Mom"

Error: object 'a_tuple' not found

print(a_tuple)

Error in eval(expr, envir, enclos): object 'a_tuple' not found

Slice operators work exactly the same way as they work on lists.

4.3 Dictionaries

Dictionaries or short "dicts" are more general mappings and word list associative arrays or so called hashes. They are basically key-value pairs where a key can be almost any Python type. So instead of indexing a list with indexes (which are numbers starting from 0, 1, etc.) the indices of a dictionary can be words or other data types.

Here is a brief example where we use names of people as keys and store various information together with those keys. We can then retrieve the information of each person with the persons name.

Python Code
R Code

# Defining a dictionary as adict = {'name': income}
adict = {'James': 20000}
adict['Jim'] = 50000
adict['Tom'] = 80000

# Defining a dictionary as adict = {'name': income}
adict = {'James': 20000}

Warning: NAs introduced by coercion

Error in "James":20000: NA/NaN argument

adict['Jim'] = 50000

Error: object 'adict' not found

adict['Tom'] = 80000

Error: object 'adict' not found

We can now retrive this info using the key.

Python Code
R Code

print(adict)
print(adict['James'])
print(adict['Jim'])
print(adict['Tom'])

{'James': 20000, 'Jim': 50000, 'Tom': 80000}
20000
50000
80000

print(adict)

Error in eval(expr, envir, enclos): object 'adict' not found

print(adict['James'])

Error in eval(expr, envir, enclos): object 'adict' not found

print(adict['Jim'])

Error in eval(expr, envir, enclos): object 'adict' not found

print(adict['Tom'])

Error in eval(expr, envir, enclos): object 'adict' not found

You can delete an element with

Python Code
R Code

del adict['James']
print(adict)

{'Jim': 50000, 'Tom': 80000}

del adict['James']
print(adict)

Error: <text>:1:5: unexpected symbol
1: del adict
        ^

4.4 Pitfalls

We next give a couple examples of potential pitfalls that can cause programming mistakes.

Be careful how you copy a list:

Python Code
R Code

list1 = [ 1,2,3,4 ]
list2 = list1
list1.append(5)
print(list1)
print(list2)

[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]

list1 = [ 1,2,3,4 ]
list2 = list1
list1.append(5)
print(list1)
print(list2)

Error: <text>:1:9: unexpected '['
1: list1 = [
            ^

What happens here is that the list [1,2,3,4] is assigned to two separate names list1 and list2. These two names now point to the same, identical list. As soon as you change the list using list1.append(5) this change will be reflected in both names of the list. If you really just want to copy the list and give that separate copy the name list2, do the following:

Python Code
R Code

from copy import deepcopy

list1 = [ 1,2,3,4 ]
list2 = deepcopy(list1)
list1.append(5)
print(list1)
print(list2)

[1, 2, 3, 4, 5]
[1, 2, 3, 4]

# In R, you can use 'list()' to create lists, and 'c()' to create vectors.
list1 <- list(1, 2, 3, 4)
list2 <- list1  # This makes a copy of the reference, similar to deepcopy in Python
list1 <- c(list1, 5)  # Append 5 to the list1 vector
print(list1)
print(list2)

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

The list methods sort() and reverse() do not return a list object

Python Code
R Code

list1 = [4, 3, 2, 1]
list1.sort()
print(list1)  # You cannot do: print(list1.sort())

[1, 2, 3, 4]

list1 <- c(4, 3, 2, 1)
list1 <- sort(list1)
print(list1)

[1] 1 2 3 4

But if you now try:

Python Code
R Code

list1 = [4, 3, 2, 1]
list2 = [-1, 0] + list1.sort()
print(list2)

Error: TypeError: can only concatenate list (not "NoneType") to list

[1, 2, 3, 4]

list1 <- c(4, 3, 2, 1)
list1 <- sort(list1)
list2 <- c(-1, 0, list1)

print(list2)

[1] -1  0  1  2  3  4

you get an error. So here the sort() method is not returning a list object. Since we then try to add a list [-1, 0] to something that is NOT a list, the interpreter throws an error. Here is one way to fix this, use the sorted command:

Python Code
R Code

list1 = [4, 3, 2, 1]
list2 = [-1, 0] + sorted(list1)
print(list1)
print(list2)

[4, 3, 2, 1]
[-1, 0, 1, 2, 3, 4]

list1 <- c(4, 3, 2, 1)
list2 <- c(-1, 0, sort(list1))

cat("list1 =", list1, "\n")
cat("list2 =", list2, "\n")

list1 = 4 3 2 1 
list2 = -1 0 1 2 3 4

4.5 JSON Data Format

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write and can be used to store and communicate information to other products.

It is based on key:value pairs. Many programming languages support the JSON data format. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.

In brief, JSON is a way by which we store and exchange data, which is accomplished through its syntax, and is used in many web applications. The nice thing about JSON is that it has a human readable format, and this may be one of the reasons for using it in data transmission, in addition to its effectiveness when working with APIs (Application Programming Interface).

An example of JSON-formatted data is as follows:

{"name": "Frank", "age": 39, "isEmployed": true}

Python has a built in JSON library called json that needs to be imported if you want to convert a JSON string into a Python value object like a dictionary or a list.

Python Code
R Code

import json

jsonData = '{"name": "Frank", "age": 39}'
jsonToPython = json.loads(jsonData)
print(jsonToPython)

{'name': 'Frank', 'age': 39}

library(jsonlite)

jsonData <- '{"name": "Frank", "age": 39}'
jsonToR <- fromJSON(jsonData)

print(paste(jsonToR, "\n"))

[1] "Frank \n" "39 \n"

The JSON string has been converted to a dictionary. You can now use it as such.

Python Code
R Code

print(jsonToPython['name'])

Frank

print(jsonToR['name'])

$name
[1] "Frank"

If you want to convert a Python dictionary into a JSON string that can then be written to a file and read by other programs you can use the json.dumps() function.

Python Code
R Code

import json

pythonDictionary = {'name':'Bob', 'age':44, 'isEmployed':True}
dictionaryToJson = json.dumps(pythonDictionary)

print(dictionaryToJson)

{"name": "Bob", "age": 44, "isEmployed": true}

library(jsonlite)

pythonDictionary <- list(name = "Bob", age = 44, isEmployed = TRUE)
dictionaryToJson <- toJSON(pythonDictionary)

print(past(dictionaryToJson, "\n"))

Error in past(dictionaryToJson, "\n"): could not find function "past"

Self-check questions

Generate a list with numbers from 1 to 10
Print the first 5 elements of this list
Replace the last entry of the list with 100 and print the list again
Sort the list from largest to smallest element