Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OPTIMIZATION] Profiling of influx_query() #28

Closed
ckatsulis opened this issue Aug 20, 2017 · 8 comments
Closed

[OPTIMIZATION] Profiling of influx_query() #28

ckatsulis opened this issue Aug 20, 2017 · 8 comments

Comments

@ckatsulis
Copy link

ckatsulis commented Aug 20, 2017

Dominik,

First, a thanks for you effort and attention to this package. Hopefully you will be getting more and more help in the upcoming months as influxdb continues to gain ground.

Observation

One interesting observation with the package is the time it takes to pull down large data sets relative to the network activity on the querying host where the R call is made.

I have noticed large lags between when the data connection ceasing activity and when the R call eventually returns a result. As I'm on 155Mbits/sec download, I see my network connection peak at around 35-50Mbits for about 10-20 seconds, then nothing. I then believe it takes R several minutes to assemble what has been pulled. Is this consistent with your experience?

Investigation

I tried to get a profile using profvis and profr, however it causes my windows R session to crash. Are you able to try this on your end for windows?

However, I was able to get this to work for my linux machine. I wanted to give you one from about 100k+ lines, but the file size was around 6MB and takes 35 seconds on my little VM. This was the result from a call with around 10k lines:

image

The problem looks fairly well nested in some character / string handling. In my experience, it might benefit all if those calls could be replaced with optimized Rcpp equivalents. I don't know if you have a need for speed, but a change would drastically elevate R's competitiveness to python or the like in these situations.

profile = dget("profile.txt")
profile.txt

System Info

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] profvis_0.3.3     dygraphs_1.1.1.4  basePricer_0.1.0  profr_0.3.1       nnls_1.4          xts_0.10-1        zoo_1.8-0         influxdbr_0.11.18

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11    lattice_0.20-35 digest_0.6.12   R6_2.2.2        grid_3.4.0      plyr_1.8.4      magrittr_1.5    httr_1.3.0      stringi_1.1.5   curl_2.7        tools_3.4.0     stringr_1.2.0  
[13] htmlwidgets_0.8 purrr_0.2.2.2   compiler_3.4.0  htmltools_0.3.6
@dleutnant
Copy link
Owner

@ckatsulis Thanks for your profiling!!! Could you please check the performance again with latest dev version 8b0568d

@ckatsulis
Copy link
Author

ckatsulis commented Aug 21, 2017 via email

@dleutnant
Copy link
Owner

dleutnant commented Aug 21, 2017

devtools::install_github("dleutnant/influxdbr@dev")

@ckatsulis
Copy link
Author

ckatsulis commented Aug 21, 2017

That works exactly as anticipated. Thanks again for the quick turn around! I'm guessing 20x improvement.

You have recently changed your main wiki page, but you used to remind patrons how to install the dev version until you can push these to CRAN. People are definately gonna want these last three improvements!

@dleutnant
Copy link
Owner

dleutnant commented Aug 22, 2017

CRAN maintainers are quite fast these days: New release of influxdbr 0.13.0. is already on CRAN! Thanks for your support, @ckatsulis !

@ckatsulis
Copy link
Author

ckatsulis commented Aug 22, 2017 via email

@dleutnant
Copy link
Owner

I'm going to close this issue for now as it addressed a specific parsing problem (which has been solved).

@ckatsulis
Copy link
Author

ckatsulis commented Aug 31, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants