What is a factor in R?
A factor is a vector that stores categorical data — data that can be classified by a finite number of categories. These categories are known as the levels of a factor.
x <- c("b","c","b","a", "c", "c")
x <- factor(x)
Using the factor() function, we can have R convert the atomic character vector into a factor. R will automatically attempt to determine the levels of the factor. This will produce an error when factor is given an argument that is non-atomic.
str(x)
levels(x)
table(x)
What does the levels() function return?
The levels() function returns a vector containing only the names of the different levels of the factor.
What does the tables() function return?
The tables() function gives a table summarizing the factor. Using the table() function on x returned the name of the variable, a list of the levels of x, and then, underneath, the number of values that occurs in x corresponding with the above level. So in the table above example, we have 3 instances of the level "a", two instances of level "b", and one instance of level "c".
If the levels of our factor need to be in a particular order, we can use the factor() argument levels to define the order, and set the argument ordered to TRUE:
x <- c("b", "a", "b", "c", "a", "a")
x <- factor(x, levels = c("c", "b", "a"), ordered = TRUE)
str(x)
levels(x)
table(x)
Now R returned the levels in the order specified by the vector given to the levels arguments. The < symbol in the output of x and str(x) indicate that these levels are ordered, and the str(x) function reports that the object is an ordered factor.