Agile Java Man: R Tip of the Day

Monday, October 13, 2014

R Tip of the Day

If I'd known this a month ago, I'd have saved myself a lot of time. I was pulling GC data from a JVM's logs but my pattern matching wasn't perfect so I was getting a lot of dirty data (NA in R).

To clean these from a vector, you can do something similar to this:

data <- c( 1, 2, 3, NA, 5, 6 ) # data with some dirty elements
badData <- is.na(data) # elements are TRUE or FALSE
data(!badData)
cleanData <- data[!badData]

Similarly, if you have 2-dimensional (or more) data, you can clean all data when there is as much as one element of a tuple that is dirty. For example:

x <- c( 1, 2, 3, NA, 5, 6 )

y <- c( 10, 20, NA, 40, 50, 60 )

cleaned <- complete.cases(x, y)

will clean all data where either x or y (or both) is dirty:

> x[cleaned]

[1] 1 2 5 6

> y[cleaned]

[1] 10 20 50 60

Agile Java Man

Monday, October 13, 2014

R Tip of the Day

No comments:

Post a Comment

Blog Archive

About Me