## Vector Types and Declarations in R

Let's dive into how R stores and views data with its most primitive data type - vectors.

### Types

Vectors hold a certain type of values.

#### No declarations needed!

In strongly-typed language such as Java or C, you must declare the type of each variable (this is known as strongly typed. However, R is a loosely typed language, meaning you may assign values without declaring its type.

As shown below, there is no need to declare a vector type for either a string or numeric vector.

```
> x <- 4
# A numeric vector
> s <- "hello world!"
# A string vector
```

#### Checking Types

To check the types of a vector, you can use the `mode()`

function.

```
> mode(x)
[1] "numeric"
> mode(s)
[1] "character"
```

`NA`

and `NULL`

Vectors are able to take on the values `NA`

and `NULL`

. Both indicate that the value is missing, but both have a clear distinction. `NA`

indicates that the data could have some value, which is unknown. `NULL`

, on the other hand, indicates that the data simply doesn't exist.

For example, a sample subject could have a variable "Gender" `NA`

, meaning that it is present, but unknown. However, if there were a follow-up variable such as "Son's name," when the subject has no children, the parameter here can be `NULL`

, indicating that the value simply does not exist.

Another important distinction is that some functions cannot be performed with an `NA`

value. If the value is `NULL`

, however, it treats the variable as if it didn't exist in the vector.

```
> x <- c(1,2,3,NA,5,6)
> mean(x)
NA
> x <- c(1,2,3,NULL,5,6)
> mean(x)
3.4
```

The type of `NA`

or `NULL`

is dependent on the types of the other vector variables.

```
> x <- c("hello", "hi", NULL)
> mode(x[3])
[1] "character"
> y <- c(1, 2, 3, NA)
> mode(y[4])
[1] "numeric"
```

### Creating a multi-valued vector

To instantiate an empty multi-valued vector, you can use the `vector(length=5)`

function.

```
> x <- vector(length=5)
> x
[1] FALSE FALSE FALSE FALSE FALSE
```

To check the length of any existing vector using the `length()`

function.

```
> length(x)
[1] 5
```

Instead of instantiating an empty vector, you can jump straight to assigning variables in each of its slots. Simply use the `c()`

function (short for contatenation).

```
> v <- c("abc", 123)
> v
[1] "abc" "123"
```

Notice one thing here - all values are stored as one type. Thus, even though we inputted the numeric type `123`

, R converts it to the character-array `"123"`

instead. This brings us to the point that R vectors can only have one certain type.

```
> mode(v)
[1] "character"
```

### Accessing vectors by index

Each element in a vector may be accessed by its index value. Indicies start at 1, which is different from most programming languages, whose first indices start at 0.

```
> s <- c(1,2,3,4)
> s[1]
[1] 1
> s[3]
[1] 3
> s[4] <- 20
> s
[1] 1 2 3 20
> s[5]
[1] NA
> s[5] <- 24
[1] 1 2 3 20 24
> s[8] <- 32
[1] 1 2 3 20 24 NA NA 32
```

Notice that R won't error out when we attempt to assign values even with the current vector not long enough. This could either be a good thing or a bad thing, depending on whether you know exactly what you're doing.

#### Specifying multiple indices

You can also pull out specific indicies by using a vector. A partial listing of a vector is known as a subvector.

```
> x <- c(1,2,3,4,5,6)
> x[c(1,3,5)]
[1] 1 3 5
```

Note that you can index a value multiple times.

```
> x[c(1,1,1,1)]
[1] 1 1 1 1
```

#### Excluding items

To pull out items *besides* specific ones, use the `-`

key.

```
> x <- c(1,2,3,4,5,6)
> x[c(-3,-4)]
[1] 1 2 5 6
```

### Naming Vector Elements

To name the vector elements, we can use the `names()`

function

```
> x <- c(97, 84, 85)
> names(x) < c("Sarah", "Mickey", "Jessica")
> x
Sarah Mickey Jessica
97 84 85
```

## Sequences and Repetitions in R seq(), rep()

To avoid repetitiveness and tediously filling the elements of a vector, we may use the `seq()`

and `rep()`

functions.

### Sequences

We can create a vector with a known pattern or sequence with either the `seq()`

command or `:`

notation.

#### 1) `:`

(colon) notation

The `:`

is used to indicate a sequence of integer values.

```
> x <- c(1:9)
> x
[1] 1 2 3 4 5 6 7 8 9
```

#### 2) `seq()`

function

To get more specific, we can use the `seq()`

function.

```
> x <- seq(from=15, to=45, by=3)
> x
[1] 15 18 21 24 27 30 33 36 39 42 45
```

Notice how the values `from`

and `to`

are inclusive.

We can also use `length`

parameter instead, which will equally split our sequence.

```
> x <- seq(from=1.1, to=3, length=20)
[1] 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1
[12] 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0
# Every tenth number up to 100
> 1:10*10
[1] 10 20 30 40 50 60 70 80 90 100
```

Remember that you can open the manual page for any function by typing `?seq`

#### Indexing with Sequences

Indexing can be performed not just to return one value, but multiple. We can do this using the `seq()`

function above.

```
> x <-seq(1:20)
# Pull out just the first five elements
> x[1:5]
[1] 1 2 3 4 5
# Pull out every third element
> x[seq(1,20,3)]
[1] 1 4 7 10 13 16 19
```

### Repetitions

R also allows you to easily create vectors containing repetitions with the `rep()`

function.

```
> x <- rep(c("hello there"), 4)
> x
[1] "hello there" "hello there" "hello there"
[4] "hello there"
```

The first parameter is the constant to be repeated, while the second parameter is the number of times.

## Vector Arithmetic and Recycling

Vectors can use simple arithmetic expressions (+, -, *, /) to perform basic operations. Let's first look at addition, then discuss a caveat of vector arithmetics.

### Addition and Subtraction

You can add or subtract the corresponding elements of two or more vectors of the same length together.

```
> c(1,2,3) + c(99,98,97)
[1] 100 100 100
> c(1,2,3) + c(4,5,6)
[1] 5 7 9
> c(1,2,3) - c(1,1,1)
[1] 0 1 2
```

But what would happen if all the vectors weren't of the same length? Instead of erroring out, R performs recycling.

### Recycling

Recycling occurs when vector arithmetic is performed on multiple vectors of different sizes. R takes the shorter vector and repeats them until it becomes long enough to match the longer one.

```
> c(1,2,3,4,5,6) + c(1,3)
[1] 2 4 3 7 6 9
```

As you can see, the `c(1,3)`

vector repeated itself to form `c(1,3,1,3,1,3)`

so that it could successfully match the previous term.

If the shorter vector is not a vector of the longer one, then a warning message appears, but the operation still takes place.

```
> c(1,2,3,4,5) + c(1,3)
[1] 2 5 4 7 6
Warning message:
In c(1, 2, 3, 4, 5) + c(1, 3) :
longer object length is not a multiple of shorter object length
```

### Multiplication and Division

Multiplying or dividing vectors is similar to addition and subtraction in that each corresponding element matches up and a product is formed. When the sizes differ, recycling occurs.

```
> c(1,2,3) * c(0,3,6)
[1] 0 6 18
> c(1,3,5) * c(2,4)
[1] 2 12 10
```

Warning message:
In c(1, 3, 5) * c(2, 4) :
longer object length is not a multiple of shorter object length

#### Operators are mere functions

One small detail to notice is that these common arithmetic expressions are actually functions. Thus, they can be with a similar function notation.

```
> "*"(5,6)
[1] 30
```

### Modulo

We can also perform the modulo operator, which outputs the remainder after division of two numbers.

```
> c(55,54,53) %% c(3)
[1] 1 0 2
```

### Advanced Linear Algebra Operations

You can also apply linear algebra on your vectors in R. To calculate the cross product, use `crossprod()`

:

```
> crossprod(1:3, 4:6)
[,1]
[1,] 32
```

You'll notice that the return type isn't a new vector, but instead a one-dimensional matrix. We'll look at matrices in the next lesson.

## Filtering Vectors all(), any(), which(), subset()

When working with vectors, you'll often need to filter out values that meet a criteria. In R, this process is known as filtering, and the platform provides several functions to help you extract subsets of data.

### The `any()`

Function

The `any()`

function returns `TRUE`

or `FALSE`

, depending on whether all arguments match that criteria. The `TRUE`

and `FALSE`

are of type `logical`

.

```
> x <- 1:100
> any(x > 101)
FALSE
> any(x == 2)
TRUE
> any(x <= 50)
TRUE
```

### The `all()`

Function

On the flip side, we can use the `all()`

function to test if *all* values meet a certain criteria.

```
> x < 1:100
> all(x > 40)
FALSE
> all(x > 0)
TRUE
```

### Comparison Operations

With vectors, we may run comparison operations to return vector containing logical values. For example:

```
> x <- c(1,2,3,4,5,6)
> x > 3
[1] FALSE FALSE FALSE TRUE TRUE TRUE
```

As you can see, we are returned a logical vector containing TRUE and FALSE values, depending on how that positional element was evaluated.

How is this useful? We can use these resulting logical vectors to pull out subvectors. Let's say we only want to pull out odd values - we can write:

```
> x <- c(12,423,52,21,324)
> x[x %% 2 == 1]
[1] 423 21
```

The `x %% 2 == 1`

returns a logical vectors. All positions where `TRUE`

is held are then printed.

We can further use this feature to replace values that meet a certain criteria:

```
> x <- c(1,2,3,4,5,6)
> x[x*x>20] = 1337
> x
[1] 1 2 3 4 1337 1337
```

### Pulling out subvectors with `subset()`

In the methods mentioned above, `NA`

values are included in the subvector, no matter the condition.

```
> x <- c(1,2,3,NA,5,6)
> x[x>2]
[1] 3 NA 5 6
```

In the case when you need to exclude the `NA`

, you may use the `subset()`

function.

```
> subset(x, x>2)
[1] 3 5 6
```

### Pulling out indicies with `which()`

If you need to pull out not the actual values but just the indicies in which the values of a certain condition reside, then use the `which()`

function. This will return all the indicies that match a certain criteria.

```
> z <- c(1,2,3,4,5,6)
> which(z > 3)
[1] 4 5 6
```