-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pymc3.Data converts input data to float64 type - so int data cannot later be used as an index #3813
Comments
A temporary workaround seems to be to use theano to cast my index data back to ints:
|
I have something like this in my code:
So I believe that what happens here is that TBQH, I am not sure why this works, because it looks like the code inside One thing to do would be to simply sidestep |
Hi! Robert's solution (using |
FWIW, this makes me wonder why |
Yeah, I wondered the same thing. Could be a useful PR.
I don't have time to work on it right now but I'll keep that in mind 😉
Le mer. 19 févr. 2020 à 22:49, rpgoldman <[email protected]> a
écrit :
… FWIW, this makes me wonder why pm.Data does not accept a dtype argument.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3813>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHIJMTGDITZJ3LDOMR56BXDRDWSNRANCNFSM4KX5QJTA>
.
|
Hi guys, thanks for your comments! I would be happy to have a go at a PR, but what form should the fix take? From what you said I think the choices are 1&2 and A&B below? either:
and B. or, changing Cheers |
My preference would be to do number 1, and ideally to have I don't have a strong opinion on A vs. B, but i haven't looked into this yet. The argument for this is the principle of minimal surprise (I didn't expect that integer to turn into a float!), but the argument against this is that programmers using numpy are notoriously sloppy about types, because numpy tries to be clever about this. Personally, I'm not a big fan of that approach, but my bet is that there are a ton of people who sloppily use 1 or 0 to initialize arrays, confident that they will turn into 1.0 or 0.0... |
Yeah, I think the best would probably be 1 and 2 😆 But if too big a change, and if we assume that people are sloppy about types, then maybe 2 would be better, as it automates the process. No strong opinion on A and B either -- maybe a slight preference for A: if |
@hottwaj To return to this, I think 1 + B would be the right approach. I note that there is already an I'm not sure what dtypes we should permit, though. Anything other than just How about making a |
Yeah I'm not sure we need any other types than |
Great thanks guys, I will go with 1+B and add support for |
…nput data (previously all input data was coerced to float) WIP for pymc-devs#3813
Actually rather than sitting on this I have done some initial changes and submitted a PR.
(minor issue: should e.g. int8, int16 be accepted? is there a generic way of testing that a dtype is a numpy int/float type?)
So I've deviated a bit from what we agreed and implemented more of a 2+B :) Happy to revert to an implementation in the style of 1+B if that's what you'd prefer though. Thanks! |
* Initial changes to allow pymc3.Data() to support both int and float input data (previously all input data was coerced to float) WIP for #3813 * added exception for invalid dtype input to pandas_to_array * Refined implementation * Finished dtype conversion handling * Added SharedVariable option to getattr_value * Added dtype handling to set_data function * Added tests for pm.Data used for index variables * Added tests for using pm.data as RV input * Ran Black on data tests files * Added release note * Updated release notes * Updated code in light of Luciano's comments * Fixed implementation of integer checking * Simplified implementation of type checking * Corrected implementation for other uses of pandas_to_array Co-authored-by: hottwaj <[email protected]>
Hi there guys
I'd like to create a model that I want to fit many times to different datasets for cross validation purposes.
One of my columns of input data is categorical, so I use it to index a vector of RVs depending on which category is presented in each sample of data. Something like this:
Note that my
category_codes
data is a numpy array of integersThat last line of code above triggers an error, here's the traceback within pymc3:
It seems that within pymc3.Data(), my category_codes data is being coerced to float64, which is not a valid indexing type.
Looking at the source for
pymc3.Data()
I think the problem is ultimately in the called functionpymc3.model.pandas_to_array
which converts its input data to a float on its last line, see https://github.com/pymc-devs/pymc3/blob/master/pymc3/model.py#L1495Can
pymc3.Data()
and/orpymc3.model.pandas_to_array
be changed to be preserve the input data type?Thanks!
The text was updated successfully, but these errors were encountered: