Vector Types and Declarations in R
Let's dive into how R stores and views data with its most primitive data type - vectors.
Vectors hold a certain type of values.
No declarations needed!
In strongly-typed language such as Java or C, you must declare the type of each variable (this is known as strongly typed. However, R is a loosely typed language, meaning you may assign values without declaring its type.
As shown below, there is no need to declare a vector type for either a string or numeric vector.
> x <- 4 # A numeric vector > s <- "hello world!" # A string vector
To check the types of a vector, you can use the
> mode(x)  "numeric" > mode(s)  "character"
Vectors are able to take on the values
NULL. Both indicate that the value is missing, but both have a clear distinction.
NA indicates that the data could have some value, which is unknown.
NULL, on the other hand, indicates that the data simply doesn't exist.
For example, a sample subject could have a variable "Gender"
NA, meaning that it is present, but unknown. However, if there were a follow-up variable such as "Son's name," when the subject has no children, the parameter here can be
NULL, indicating that the value simply does not exist.
Another important distinction is that some functions cannot be performed with an
NA value. If the value is
NULL, however, it treats the variable as if it didn't exist in the vector.
> x <- c(1,2,3,NA,5,6) > mean(x) NA > x <- c(1,2,3,NULL,5,6) > mean(x) 3.4
The type of
NULL is dependent on the types of the other vector variables.
> x <- c("hello", "hi", NULL) > mode(x)  "character" > y <- c(1, 2, 3, NA) > mode(y)  "numeric"
Creating a multi-valued vector
To instantiate an empty multi-valued vector, you can use the
> x <- vector(length=5) > x  FALSE FALSE FALSE FALSE FALSE
To check the length of any existing vector using the
> length(x)  5
Instead of instantiating an empty vector, you can jump straight to assigning variables in each of its slots. Simply use the
c() function (short for contatenation).
> v <- c("abc", 123) > v  "abc" "123"
Notice one thing here - all values are stored as one type. Thus, even though we inputted the numeric type
123, R converts it to the character-array
"123" instead. This brings us to the point that R vectors can only have one certain type.
> mode(v)  "character"
Accessing vectors by index
Each element in a vector may be accessed by its index value. Indicies start at 1, which is different from most programming languages, whose first indices start at 0.
> s <- c(1,2,3,4) > s  1 > s  3 > s <- 20 > s  1 2 3 20 > s  NA > s <- 24  1 2 3 20 24 > s <- 32  1 2 3 20 24 NA NA 32
Notice that R won't error out when we attempt to assign values even with the current vector not long enough. This could either be a good thing or a bad thing, depending on whether you know exactly what you're doing.
Specifying multiple indices
You can also pull out specific indicies by using a vector. A partial listing of a vector is known as a subvector.
> x <- c(1,2,3,4,5,6) > x[c(1,3,5)]  1 3 5
Note that you can index a value multiple times.
> x[c(1,1,1,1)]  1 1 1 1
To pull out items besides specific ones, use the
> x <- c(1,2,3,4,5,6) > x[c(-3,-4)]  1 2 5 6
Naming Vector Elements
To name the vector elements, we can use the
> x <- c(97, 84, 85) > names(x) < c("Sarah", "Mickey", "Jessica") > x Sarah Mickey Jessica 97 84 85
Sequences and Repetitions in R seq(), rep()
To avoid repetitiveness and tediously filling the elements of a vector, we may use the
We can create a vector with a known pattern or sequence with either the
seq() command or
: (colon) notation
: is used to indicate a sequence of integer values.
> x <- c(1:9) > x  1 2 3 4 5 6 7 8 9
To get more specific, we can use the
> x <- seq(from=15, to=45, by=3) > x  15 18 21 24 27 30 33 36 39 42 45
Notice how the values
to are inclusive.
We can also use
length parameter instead, which will equally split our sequence.
> x <- seq(from=1.1, to=3, length=20)  1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1  2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 # Every tenth number up to 100 > 1:10*10  10 20 30 40 50 60 70 80 90 100
Remember that you can open the manual page for any function by typing
Indexing with Sequences
Indexing can be performed not just to return one value, but multiple. We can do this using the
seq() function above.
> x <-seq(1:20) # Pull out just the first five elements > x[1:5]  1 2 3 4 5 # Pull out every third element > x[seq(1,20,3)]  1 4 7 10 13 16 19
R also allows you to easily create vectors containing repetitions with the
> x <- rep(c("hello there"), 4) > x  "hello there" "hello there" "hello there"  "hello there"
The first parameter is the constant to be repeated, while the second parameter is the number of times.
Vector Arithmetic and Recycling
Vectors can use simple arithmetic expressions (+, -, *, /) to perform basic operations. Let's first look at addition, then discuss a caveat of vector arithmetics.
Addition and Subtraction
You can add or subtract the corresponding elements of two or more vectors of the same length together.
> c(1,2,3) + c(99,98,97)  100 100 100 > c(1,2,3) + c(4,5,6)  5 7 9 > c(1,2,3) - c(1,1,1)  0 1 2
But what would happen if all the vectors weren't of the same length? Instead of erroring out, R performs recycling.
Recycling occurs when vector arithmetic is performed on multiple vectors of different sizes. R takes the shorter vector and repeats them until it becomes long enough to match the longer one.
> c(1,2,3,4,5,6) + c(1,3)  2 4 3 7 6 9
As you can see, the
c(1,3) vector repeated itself to form
c(1,3,1,3,1,3) so that it could successfully match the previous term.
If the shorter vector is not a vector of the longer one, then a warning message appears, but the operation still takes place.
> c(1,2,3,4,5) + c(1,3)  2 5 4 7 6 Warning message: In c(1, 2, 3, 4, 5) + c(1, 3) : longer object length is not a multiple of shorter object length
Multiplication and Division
Multiplying or dividing vectors is similar to addition and subtraction in that each corresponding element matches up and a product is formed. When the sizes differ, recycling occurs.
> c(1,2,3) * c(0,3,6)  0 6 18 > c(1,3,5) * c(2,4)  2 12 10Warning message: In c(1, 3, 5) * c(2, 4) : longer object length is not a multiple of shorter object length
Operators are mere functions
One small detail to notice is that these common arithmetic expressions are actually functions. Thus, they can be with a similar function notation.
> "*"(5,6)  30
We can also perform the modulo operator, which outputs the remainder after division of two numbers.
> c(55,54,53) %% c(3)  1 0 2
Advanced Linear Algebra Operations
You can also apply linear algebra on your vectors in R. To calculate the cross product, use
> crossprod(1:3, 4:6) [,1] [1,] 32
You'll notice that the return type isn't a new vector, but instead a one-dimensional matrix. We'll look at matrices in the next lesson.
Filtering Vectors all(), any(), which(), subset()
When working with vectors, you'll often need to filter out values that meet a criteria. In R, this process is known as filtering, and the platform provides several functions to help you extract subsets of data.
any() function returns
FALSE, depending on whether all arguments match that criteria. The
FALSE are of type
> x <- 1:100 > any(x > 101) FALSE > any(x == 2) TRUE > any(x <= 50) TRUE
On the flip side, we can use the
all() function to test if all values meet a certain criteria.
> x < 1:100 > all(x > 40) FALSE > all(x > 0) TRUE
With vectors, we may run comparison operations to return vector containing logical values. For example:
> x <- c(1,2,3,4,5,6) > x > 3  FALSE FALSE FALSE TRUE TRUE TRUE
As you can see, we are returned a logical vector containing TRUE and FALSE values, depending on how that positional element was evaluated.
How is this useful? We can use these resulting logical vectors to pull out subvectors. Let's say we only want to pull out odd values - we can write:
> x <- c(12,423,52,21,324) > x[x %% 2 == 1]  423 21
x %% 2 == 1 returns a logical vectors. All positions where
TRUE is held are then printed.
We can further use this feature to replace values that meet a certain criteria:
> x <- c(1,2,3,4,5,6) > x[x*x>20] = 1337 > x  1 2 3 4 1337 1337
Pulling out subvectors with
In the methods mentioned above,
NA values are included in the subvector, no matter the condition.
> x <- c(1,2,3,NA,5,6) > x[x>2]  3 NA 5 6
In the case when you need to exclude the
NA, you may use the
> subset(x, x>2)  3 5 6
Pulling out indicies with
If you need to pull out not the actual values but just the indicies in which the values of a certain condition reside, then use the
which() function. This will return all the indicies that match a certain criteria.
> z <- c(1,2,3,4,5,6) > which(z > 3)  4 5 6