R - Factors

r

What is a factor in R?

A factor is a vector that stores categorical data — data that can be classified by a finite number of categories. These categories are known as the levels of a factor.

x <- c("b","c","b","a", "c", "c")
x <- factor(x)

Using the factor() function, we can have R convert the atomic character vector into a factor. R will automatically attempt to determine the levels of the factor. This will produce an error when factor is given an argument that is non-atomic.

str(x)
levels(x)
table(x)

What does the levels() function return?

The levels() function returns a vector containing only the names of the different levels of the factor.

What does the tables() function return?

The tables() function gives a table summarizing the factor. Using the table() function on x returned the name of the variable, a list of the levels of x, and then, underneath, the number of values that occurs in x corresponding with the above level. So in the table above example, we have 3 instances of the level "a", two instances of level "b", and one instance of level "c".

If the levels of our factor need to be in a particular order, we can use the factor() argument levels to define the order, and set the argument ordered to TRUE:

x <- c("b", "a", "b", "c", "a", "a")
x <- factor(x, levels = c("c", "b", "a"), ordered = TRUE)
str(x)
levels(x)
table(x)

Now R returned the levels in the order specified by the vector given to the levels arguments. The < symbol in the output of x and str(x) indicate that these levels are ordered, and the str(x) function reports that the object is an ordered factor.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License