Friday, December 3, 2010

Running time in R

In R, proc.time determines how much real and CPU time (in seconds) the currently running R process has already taken. proc.time returns five elements for backwards compatibility, but prints a named vector of length 3. The first two entries are the total user and system CPU times of the current R process and any child processes on which it has waited, and the third entry is the ‘real’ elapsed time since the process was started. system.time(expr) is used for timing a valid R expression which calls the function proc.time, evaluates expr, and then calls proc.time once more, returning the difference between the two proc.time calls. For example,

> ptm <- proc.time()
> for (i in 1:10000) x <- rnorm(1000)
> proc.time() - ptm
   user  system elapsed
   2.10    0.01    2.14
> system.time(for (i in 1:10000) x <- rnorm(1000))
   user  system elapsed
   2.01    0.00    2.06 

The definition of 'user' and 'system' times is from your OS. Typically it is something like
The 'user time' is the CPU time charged for the execution of user instructions of the calling process. The 'system time' is the CPU time charged for execution by the system on behalf of the calling process.
proc.time/system.time can be used to compare the running speed of different methods doing the same job. For example, we want to find out the maximum of a vector of 10000000 randomly generated Uniform[0,1] random variables.

> x<-runif(10000000)
> system.time(max(x))
   user  system elapsed
   0.05    0.00    0.05
> pc <- proc.time()
> cmax <- x[1]
> for (i in 2:10000000)
+ {
+   if(x[i] > cmax) cmax <- x[i]
+ }
> proc.time() - pc
   user  system elapsed
  16.88    0.11   18.21

We can see that there is a huge difference in running time between the two methods (0.05 seconds versus 18.21 seconds). Do not 'grow' data sets in loops or recursive function calls. Use R built-in functions whenever possible.

No comments:

Post a Comment