If I'd known this a month ago, I'd have saved myself a lot of time. I was pulling GC data from a JVM's logs but my pattern matching wasn't perfect so I was getting a lot of dirty data (NA in R).
To clean these from a vector, you can do something similar to this:
data <- c( 1, 2, 3, NA, 5, 6 ) # data with some dirty elements
badData <- is.na(data) # elements are TRUE or FALSE
data(!badData)
cleanData <- data[!badData]
Similarly, if you have 2-dimensional (or more) data, you can clean all data when there is as much as one element of a tuple that is dirty. For example:
x <- c( 1, 2, 3, NA, 5, 6 )
y <- c( 10, 20, NA, 40, 50, 60 )
cleaned <- complete.cases(x, y)
will clean all data where either x or y (or both) is dirty:
> x[cleaned]
[1] 1 2 5 6
> y[cleaned]
[1] 10 20 50 60
No comments:
Post a Comment