Data Structures in R


There are different data structures in R. In this document, I briefly explain individual types. If you need more details, you can find them in R’s documentation.

R’s base data structures can be thought by their dimensionality (1 dimension, 2 dimensions, or N dimensions) and whether the contents are of the same type (homogeneous) or of the different types (heterogeneous). This gives rise to the five data types most often used in data analysis:

Dimension Homogeneous Heterogeneous
1d Vector List
2d Matrix Data frame
Nd Array

Scalars and types of variables

Types of variables

Note that R has a scalar type as well. A scalar data structure is the most basic data type that holds only a single atomic value at a time. Scalars don’t have to be numeric(often called double), they can also be different types such as characters (i.e. strings), integers, or logical values. We can check the type of a variable by using the typeof() function:

typeof(1)
## [1] "double"
typeof("politics")
## [1] "character"
typeof(TRUE)
## [1] "logical"

Note that having quotation marks around a number will give you a character variable, instead of a numeric variable. For example,

typeof("1")
## [1] "character"

Factors

Asides from types, there is also a property called Attributes. Attributes can be thought of as a named list (with unique names), and can be accessed individually with attr() or all at once (as a list) with attributes().

One important use of attributes is to define factors. A factor is a vector that can contain only predefined values, and is used to store categorical data. Factors are built on top of integer vectors using two attributes: the class, “factor”, which makes them behave differently from regular integer vectors, and the levels, which defines the set of allowed values.

x <- factor(c("a", "b", "b", "a"))
x
## [1] a b b a
## Levels: a b
class(x)
## [1] "factor"
levels(x)
## [1] "a" "b"

Coercion

We can change the type of a variable to type x using the function as.x. This process is called “coercion”. For example, the following code changes the number 65 to the string “65”:

as.character(65)
## [1] "65"
typeof(65)
## [1] "double"
typeof(as.character(65))
## [1] "character"

Similarly, you can coerce one type to another by using as.character(), as.double(), as.integer(), or as.logical().

Vectors

The basic data structure in R is the vector, a 1-dimensional array whose entries are the same type.

Creation

The following code produces a vector containing the numbers 1, 3, 5, 7, and 9:

vec <- c(1,3,5,7,9)
vec
## [1] 1 3 5 7 9

We don’t have to type out all the numbers. The following code assigns a vector of the numbers from 1 to 100 to vec:

vec <- 1:100
vec
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100

What if I only want even numbers from 1 to 100 (inclusive)? We can manipulate vectors using arithmetic operations (just like numbers). Note that arithmetic operations happen element-wise.

even <- 1:50 * 2
even
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72
## [37]  74  76  78  80  82  84  86  88  90  92  94  96  98 100

Or we can use seq() function:

even <- seq(2,100,2) # seq(start number, end number, by)
even
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72
## [37]  74  76  78  80  82  84  86  88  90  92  94  96  98 100

We can also use c() function to combine (“concatenate”) several small vectors into one large vector.

z <- 1:5
z <- c(z,3,z)
z
##  [1] 1 2 3 4 5 3 1 2 3 4 5

Checking

We can check if a variable is of type vector, using is.vector() or is.atomic(). Other types of variables can be also checked using is.character(), is.double(), is.integer(), and is.logical().

is.vector(vec)
## [1] TRUE
is.atomic(vec)
## [1] TRUE

Use the length() function to figure out how many elements there are in a vector.

odd <-seq(1,99,2)

Extracting elements

We can get multiple elements of a vector as well. The following code extracts the 5th to 9th even number (inclusive), and assigns it to the variable y:

y <- even[5:9]
y
## [1] 10 12 14 16 18

This extracts just the 3rd and 5th even numbers:

even[c(3,5)]
## [1]  6 10

We can also erase certain numbers using negative indexing. Let’s say I want all even numbers except the first two:

even[-c(1,2)]
##  [1]   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
## [37]  78  80  82  84  86  88  90  92  94  96  98 100

Matrices and Arrays

Matrices are just the 2-dimensional analogs of vectors while arrays are the n-dimensional analogs of vectors. As with vectors, elements of matrices and arrays have to be of the same type. Matrices are used commonly as part of the mathematical machinery of statistics. Arrays are much rarer, but worth being aware of.

Creation

Matrices and arrays are created with matrix() and array(), or by using the assignment form of dim():

a <- matrix(1:6, ncol = 3, nrow = 2)
a
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
b <- array(1:12, c(2, 3, 2))
b
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
c <- 1:6
dim(c) <- c(2, 3)
c
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

Notice that R takes the elements in the vector you give it and fills in the matrix column by column. If we want the elements to be filled in by row instead, we have to put in a byrow = TRUE argument:

A <- matrix(1:6, nrow = 2, byrow=TRUE)
A
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

Checking

To get the dimensions of the matrix, we can use the dim(), nrow() and ncol() functions.

length(a)
## [1] 6
nrow(a)
## [1] 2
ncol(a)
## [1] 3

You can change the names of row and column as well.

rownames(a)<-c("A","B")
colnames(a)<-c("a","b","c")
a
##   a b c
## A 1 3 5
## B 2 4 6

Extracting elements

To access the element in the ith row and jcolumn for the matrix A, use the index i,j:

A[1,2]
## [1] 2
A[1,]
## [1] 1 2 3
A[,1]
## [1] 1 4

Lists

In all the data structures so far, the elements have to be of the same type.

Creation

To have elements on different types in one data structure, we can use a list, which we create with list(). We can think of a list as a collection of key-value pairs. Keys should be strings.

event <- list(year = "2021", month = "Aug")
event
## $year
## [1] "2021"
## 
## $month
## [1] "Aug"

The str() function can be used to inspect what is inside person:

str(event)
## List of 2
##  $ year : chr "2021"
##  $ month: chr "Aug"

To access the year element person, we have 2 options:

event[["year"]]
## [1] "2021"
# or
event$year
## [1] "2021"

Checking

The elements of a list can be anything, even another data structure! Let’s add the Saturdays in August:

event$saturday <- c(7,4,21,28)
str(event)
## List of 3
##  $ year    : chr "2021"
##  $ month   : chr "Aug"
##  $ saturday: num [1:4] 7 4 21 28

To see the keys associated with a list, use the names() function:

names(event)
## [1] "year"     "month"    "saturday"

Data frames

A data frame is the most common way of storing data in R, and if used systematically makes data analysis easier. Under the hood, a data frame is a list of equal-length vectors. This makes it a 2-dimensional structure, so it shares properties of both the matrix and the list. This means that a data frame has names(), colnames(), and rownames(), although names() and colnames() are the same thing. The length() of a data frame is the length of the underlying list and so is the same as ncol(); nrow() gives the number of rows.

You can subset a data frame like a 1d structure (where it behaves like a list), or a 2d structure (where it behaves like a matrix). We will talk about subsetting later when we cover how to manipulate data in R.

Creation

df <- data.frame(x = 1:3, y = c("a", "b", "c"))
str(df)
## 'data.frame':    3 obs. of  2 variables:
##  $ x: int  1 2 3
##  $ y: Factor w/ 3 levels "a","b","c": 1 2 3

Checking

To check if an object is a data frame, use class() or test explicitly with is.data.frame():

class(df)
## [1] "data.frame"
is.data.frame(df)
## [1] TRUE

You can check the names of rows and columns.

colnames(df)
## [1] "x" "y"
rownames(df)
## [1] "1" "2" "3"

You can also check the numbers of rows and columns.

ncol(df)
## [1] 2
nrow(df)
## [1] 3

Coercion

You can coerce an object to a data frame with as.data.frame():

  • A vector will create a one-column data frame.
  • A list will create one column for each element; it’s an error if they’re not all the same length.
  • A matrix will create a data frame with the same number of columns and rows as the matrix.
vec<-c(1:5)
vec<-as.data.frame(vec)
str(vec)
## 'data.frame':    5 obs. of  1 variable:
##  $ vec: int  1 2 3 4 5
list<-list(1:2, 1:3, 1:4)
list<-as.data.frame(list)
## Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 2, 3, 4
matrix<-a
matrix<-as.data.frame(matrix)
str(matrix)
## 'data.frame':    2 obs. of  3 variables:
##  $ a: int  1 2
##  $ b: int  3 4
##  $ c: int  5 6