Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write data.frame (or list of xts) with influx_write #20

Closed
dleutnant opened this issue Jun 6, 2017 · 21 comments
Closed

write data.frame (or list of xts) with influx_write #20

dleutnant opened this issue Jun 6, 2017 · 21 comments
Labels

Comments

@dleutnant
Copy link
Owner

dleutnant commented Jun 6, 2017

xts matrix objects fail to support to write tag values per field value (#19).
There are probably two ways to solve this:

  1. Write a data.frame and specify which columns are tags, fields and time
  2. Write a list of xts; each object holds either values or tags. All xts objects must have the same index.
@deric
Copy link

deric commented Jun 6, 2017

This would be cool, writing each row with separate request is quite inefficient. Are you planning to merge this feature to master soon?

@dleutnant
Copy link
Owner Author

dleutnant commented Jun 6, 2017

I haven't started coding the required parsing scheme because it is out of my use case. I can't tell when I'll begin to implement that feature. However, it is on my "todo"-list (thus this issue).

@dleutnant
Copy link
Owner Author

@deric btw: Do you prefer one way over the other?

@deric
Copy link

deric commented Jun 6, 2017

The first one makes more sense to me. As I'm looking at the xts objects I don't see an easy way how to pass row-related tags.

I'd like to have something like this:

myxts <- xts(x = df[, c("value1", "value2")], order.by = df$time)
influxdbr::influx_write(con = con, db = dbname, 
      xts = myxts, measurement = "measure",
      tags = df[, c("tag1", "tag2")]
)

Currently all development is done in prepare_cran branch? Are you planning the CRAN release soon? For implementing this feature is it better to fork prepare_cran branch?

@dleutnant
Copy link
Owner Author

dleutnant commented Jun 6, 2017

Currently all development is done in prepare_cran branch?

YES/NO. From now on, all development takes places in "dev" (#13 shows more information about the why).

Are you planning the CRAN release soon?

YES. After some final tests, this will be happen shortly. Submitted version will be in branch "master".

For implementing this feature is it better to fork prepare_cran branch?

NO. Please feel free to start working in dev branch (which I've just created). Thank you!

@JonasGutermuth
Copy link
Contributor

JonasGutermuth commented Aug 28, 2017

I would also prefer the first solution. Thank you for your work solving this problem.

Maybe you could implement the specify which columns are tags, fields and time with the group_by and nest function:
data %>% group_by(time, columnWithTag1,columnWithTag2) %>% nest

@dleutnant
Copy link
Owner Author

I'm experimenting a bit how to design the function call in dev branch. I think the easiest way would be to have a influx_write function which takes care of the object provided (either xts or tibble) and provides two additional arguments, namely time_col , tag_cols (both optional). The other columns of the tibble are interpreted as fields.

@dleutnant
Copy link
Owner Author

dleutnant commented Aug 29, 2017

data %>% group_by(time, columnWithTag1,columnWithTag2) %>% nest

Could you provide a simple example? How does data look like?

@JonasGutermuth
Copy link
Contributor

Data:

time country city population health
2017-04-01 USA New York 1000000 0.89
2017-04-01 Germany Berlin 100000 0.95
2017-04-01 Germany Berlin 100000 0.95

the result would be (data %>% group_by(country,city) %>% nest):

country city data
USA New York <dataframe>
Germany Berlin <dataframe>

So you can use the nested dataframes as fields (+ time) and the others as tags.

@dleutnant
Copy link
Owner Author

dleutnant commented Aug 29, 2017

How about:
data %>% influx_write(con = con, db = "db", measurement = "mes", time_col = "time", tag_cols = c("country", "city")) (population and health are used as field columns)?

@JonasGutermuth
Copy link
Contributor

This is a good solution. I like to have the possibility to use it like this:

influx_select(con = con, db = "db", measurement = "mes", ...) %>% modifyFunction %>% influx_write(con = con, db = "db", measurement = "mes", time_col = "time", tag_cols = c("country", "city"))

The only reason why I brought a nested way into play was the attempt, to add the information of tag_cols = c("country", "city") into the data. It was a maybe to complicated idea, to fulfill the idea of adding every information to the dataframe (vgl. tidyverse idea).

@dleutnant
Copy link
Owner Author

dleutnant commented Aug 30, 2017

current dev branch has a new generic influx_write function, which handles both data.frames and xts objects. Data.frames must be free of NA's (at the moment).
Hint: There is a tiny breaking change in the function argument: xts is now x !

Any feedback is greatly appreciated:)

@JonasGutermuth
Copy link
Contributor

The code looks really good. I can offer you to test it in the next week.

@JonasGutermuth
Copy link
Contributor

JonasGutermuth commented Sep 7, 2017

I installed the development version but I get errors:
Error: http: unable to parse 'df,p=Block,bn=WR 01-10,b=Block 1,dd=WR 02,d=PowerOneInverter_02,iid=6107,level=dd Capacity=30,Yield=100.809999999998,SpYield=3.36033333333326,Availability=1,SOLAR_RADIATION=4.62707916666667,PR=0.726232081253546 1499558401': invalid boolean

Query:
influxdbr::influx_write(con = con, db = db, x = (t), measurement = "df",time_col = "time", tag_cols = c("p", "bn", "b", "dd", "d", "iid","level"))

My datas are:
bildschirmfoto vom 2017-09-07 08-49-14

@dleutnant
Copy link
Owner Author

Could you please post the column types with str() ?

@JonasGutermuth
Copy link
Contributor

'data.frame': 6660 obs. of 14 variables:
$ time : POSIXct, format: "2017-07-09 00:00:01" "2017-07-09 00:00:01" "2017-07-09 00:00:01" "2017-07-09 00:00:01" ...
$ p : chr "Block" "Block" "Block" "Block" ...
$ bn : chr "WR 01-10" "WR 01-10" "WR 01-10" "WR 01-10" ...
$ b : chr "Block 1" "Block 1" "Block 1" "Block 1" ...
$ dd : chr "WR 01" "WR 02" "WR 03" "WR 04" ...
$ d : chr "PowerOneInverter_02" "PowerOneInverter_03" "PowerOneInverter_04" "PowerOneInverter_05" ...
$ iid : chr "6107" "6107" "6107" "6107" ...
$ Capacity : num 30 30 30 30 30 30 30 30 30 30 ...
$ Yield : num 101 103 102 103 101 ...
$ SpYield : num 3.36 3.43 3.41 3.42 3.37 ...
$ Availability : num 1 1 1 1 1 1 1 1 1 1 ...
$ SOLAR_RADIATION: num 4.63 4.63 4.63 4.63 4.63 ...
$ PR : num 0.726 0.741 0.737 0.739 0.729 ...
$ level : chr "dd" "dd" "dd" "dd" ...

@dleutnant
Copy link
Owner Author

Yep, error reproducible (special characters in tag values) and fixed. Can you confirm?

@JonasGutermuth
Copy link
Contributor

Problem is still not fixed (or I'm not able to use the feature)

@dleutnant
Copy link
Owner Author

Did you restart your R session? Do you get exactly the same error message?

@dleutnant
Copy link
Owner Author

dleutnant commented Sep 7, 2017

for the record and to compare insert statements (new escapes spaces in tag values):
old (with error):
> insert df,p=Block,bn=WR 01-10,b=Block 1,dd=WR 02,d=PowerOneInverter_02,iid=6107,level=dd Capacity=30,Yield=100.809999999998,SpYield=3.36033333333326,Availability=1,SOLAR_RADIATION=4.62707916666667,PR=0.726232081253546 1499558401 ERR: {"error":"unable to parse 'df,p=Block,bn=WR 01-10,b=Block 1,dd=WR 02,d=PowerOneInverter_02,iid=6107,level=dd Capacity=30,Yield=100.809999999998,SpYield=3.36033333333326,Availability=1,SOLAR_RADIATION=4.62707916666667,PR=0.726232081253546 1499558401': invalid boolean"}
new (no error):
> insert df,p=Block,bn=WR\ 01-10,b=Block\ 1,dd=WR\ 02,d=PowerOneInverter_02,iid=6107,level=dd Capacity=30,Yield=100.809999999998,SpYield=3.36033333333326,Availability=1,SOLAR_RADIATION=4.62707916666667,PR=0.726232081253546 1499558401
check:

> select * from df
name: df
time       Availability Capacity PR                SOLAR_RADIATION  SpYield          Yield            b       bn       d                   dd    iid  level p
----       ------------ -------- --                ---------------  -------          -----            -       --       -                   --    ---  ----- -
1499558401 1            30       0.726232081253546 4.62707916666667 3.36033333333326 100.809999999998 Block 1 WR 01-10 PowerOneInverter_02 WR 02 6107 dd    Block

@JonasGutermuth
Copy link
Contributor

sorry my fault. It is working

Speedtest
Old Version: 4,56634sec
Neu Version: 1,21665sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants