Tuesday, December 21, 2010

Set Operations in R

R can perform different operations in sets, such as union, intersection, asymmetric difference of two sets, etc. Specifically, the following operations are available in R for set operations.

Operator
Usage
Definition
union
union(x, y)
Union of sets x and y
intersect
intersect(x, y)
Intersection of sets x and y
setdiff
setdiff(x, y)
Asymmetric difference between sets x and y (Elements in x but not in y)
setequal
setequal(x, y)
If sets x and y have the same elements
is.element
is.element(el, set)
If el is an element of set

Examples:

> x <- c(sort(sample(1:20, 9)),NA)
> y <- c(sort(sample(3:23, 7)),NA)
> x
[1]  1  3  5  8 11 17 18 19 20 NA
> y
[1]  7 11 15 16 17 19 22 NA
> union(x, y)
[1]  1  3  5  8 11 17 18 19 20 NA  7 15 16 22
> intersect(x, y)
[1] 11 17 19 NA
> setdiff(x, y)
[1]  1  3  5  8 18 20
> setdiff(y, x)
[1]  7 15 16 22
> setequal(x, y)
[1] FALSE

Note that each of union, intersect, setdiff and setequal will discard any duplicated values in the arguments. Look at the following example:

> x
[1]  1  3  5  8 11 17 18 19 20 NA
> x2 <- c(x, 1, 3, 5, 8)
> x2
[1]  1  3  5  8 11 17 18 19 20 NA  1  3  5  8
> setdiff(x, y)
[1]  1  3  5  8 18 20
> setdiff(x2, y)
[1]  1  3  5  8 18 20
> setequal(x, x2)
[1] TRUE

Although x and x2 have different length, they have the same UNIQUE elements so setequal(x, x2) returns a TRUE value.

is.element(x, y) is identical to x %in% y which is already discussed here. The return value of is.element is a vector of TRUE and FALSE with the same length as x, which indicates whether each element of x is an element of y or not.

> is.element(x, y)  # vector of length 10
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE
> is.element(y, x)  # vector of length 8
[1] FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE

No comments:

Post a Comment