Refs:

# devtools::install_github("hadley/lineprof")
library(lineprof)

Profiling code is necessary to find bottlenecks and try to optimize the use of time and memory by removing them.

code = '
# Determine number of fields by reading first line
first <- scan(file, what = character(1), nlines = 1,
sep = sep, quiet = TRUE)
p <- length(first)

# Load all fields as character vectors
all <- scan(file, what = as.list(rep("character", p)),
sep = sep, skip = if (header) 1 else 0, quiet = TRUE)

# Convert from strings to appropriate types (never to factors)
all[] <- lapply(all, type.convert, as.is = TRUE)

# Set column names
names(all) <- first
} else {
names(all) <- paste0("V", seq_along(all))
}

# Convert list into data frame
as.data.frame(all)
}
'

write(code, "source.R")
source("source.R")  # this is necessary for lineprof to work
library(ggplot2)
write.csv(diamonds, "diamonds.csv", row.names = FALSE)
l
## Reducing depth to 2 (from 16)
##      time alloc release dups                                      ref
## 1   0.005 0.018   0.000    2                        "lazyLoadDBfetch"
## 2   0.001 0.003   0.000    0                                   "scan"
## 3   0.020 0.005   0.000   62                        c("scan", "file")
## 4   0.004 0.006   0.000    1                                   "scan"
## 5   0.001 0.003   0.000    1                       c("scan", "close")
## 6   0.001 0.007   0.000    0                                   "scan"
## 7   0.022 0.003   0.000   62                        c("scan", "file")
## 8   0.001 0.003   0.000    1                     c("scan", "as.list")
## 9   0.001 0.001   0.000    1                   c("scan", "identical")
## 10 10.991 2.359   0.890    0                                   "scan"
## 11  0.007 0.002   0.000    1                       c("scan", "close")
## 12  0.002 0.004   0.000    0                 c("lapply", "match.fun")
## 13  0.001 0.003   0.000    0                                 "lapply"
## 14  2.709 0.227   0.337   15                       c("lapply", "FUN")
## 15  0.001 0.001   0.000    0                                 "lapply"
## 16  3.323 0.344   0.000   18                       c("lapply", "FUN")
## 17  0.001 0.002   0.000    1                             character(0)
## 18  0.001 0.001   0.000    0                          "as.data.frame"
## 19  0.008 0.022   0.000    0    c("as.data.frame", "lazyLoadDBfetch")
## 20  0.361 0.931   0.524  294 c("as.data.frame", "as.data.frame.list")
## 21  0.001 0.000   0.000    0                             character(0)
##                                 src
## 2  scan
## 3  scan/file
## 4  scan
## 5  scan/close
## 6  scan
## 7  scan/file
## 8  scan/as.list
## 9  scan/identical
## 10 scan
## 11 scan/close
## 12 lapply/match.fun
## 13 lapply
## 14 lapply/FUN
## 15 lapply
## 16 lapply/FUN
## 17
## 18 as.data.frame
## 20 as.data.frame/as.data.frame.list
## 21

A good way to see the results is to use an interactive explorer using the shiny package:

library(shiny)
# opens a web page that shows your source code annotated with information about how long each line took to run
shine(l)
• The t column visualises how much time in seconds is spent on each line.

• The a is the memory (in megabytes) allocated by that line of code.

• The r is the memory (in megabytes) released by that line of code (this may vary, since it depends on the garbage collector).

• The d is the number of vector duplications that occurred. A vector duplication occurs when R copies a vector as a result of its copy on modify semantics.

To see the values just hover the mouse over the required bar.