Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the missing site info in the datasets #81

Open
erensezener opened this issue Jul 13, 2016 · 5 comments
Open

Regarding the missing site info in the datasets #81

erensezener opened this issue Jul 13, 2016 · 5 comments
Assignees

Comments

@erensezener
Copy link
Contributor

I have been told by @ge00rg and @clauslang that the data is only from one site in the daily DB. I don't see that this is the case, see the below snippet. But why are there sites like 0.5, 1.5 etc.? Is this expected?

h5 = h5py.File('daily_database.hdf5', 'r')
data = h5['weather_data'][:]
np.unique(data[:,1]) #since column 1 is site
>>> array([ 0. ,  0.5,  1. ,  2.5,  3.5,  4. ])
@ge00rg
Copy link
Contributor

ge00rg commented Jul 13, 2016

I don't remember the exact query we made...nor do I know why these are
the indices, they should be contiguous itegers. Did syou test the other
db? Can you try using the nquery engine on it and see wether you get
plausible valuzes for get_data and/or get_val_range?

Am 2016-07-13 13:02, schrieb C. Eren Sezener:

I have been told by @ge00rg [1] and @clauslang [2] that the data is
only from one site in the daily DB. I don't see that this is the case,
see the below snippet. But why are there sites like 0.5, 1.5 etc.? Is
this expected?

h5 = h5py.File('daily_database.hdf5', 'r')
data = h5['weather_data'][:]
np.unique(data[:,1]) #since column 1 is site

array([ 0. , 0.5, 1. , 2.5, 3.5, 4. ])

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub [3], or mute the
thread [4].

Links:

[1] https://github.com/ge00rg
[2] https://github.com/clauslang
[3] #81
[4]
https://github.com/notifications/unsubscribe/AP3d6OTn5BUWUrbtVjmLYDFY6geadbQ1ks5qVMW9gaJpZM4JLTY2

@erensezener
Copy link
Contributor Author

The hourly DB sites are quite fucked up:

>>> h5 = h5py.File('hourly_database.hdf5', 'r'); data = h5['weather_data'][:]
>>> np.unique(data[:,2])
array([  0.00000000e+00,   1.00000000e+00,   4.00000000e+00,
         2.01606212e+11])

So we have sites 1 and 4 and a date (WTF?)

@erensezener
Copy link
Contributor Author

Can you try using the nquery engine on it and see wether you get
plausible valuzes for get_data and/or get_val_range?

The DB should be essentially the same. You can check it yourself.

@erensezener
Copy link
Contributor Author

I am running all the scrapers again such that the outputs will be written to different DBs. Then all scraper authors must review their data since the DB will consist of only their own data.

@denisalevi
Copy link
Contributor

If you want scraper authors to review their DB, please provide a clear code snippet, explaining how to access the data or use the query engine and where which data should be stored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants