-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uniquely identifying derivation pathways/provenance for featurization #191
Comments
As of December, or so, every propnet quantity is assigned a unique ID when it is created (it's a random uuid). It was intended to be used as a bookkeeping mechanism so that we wouldn't have to save the values of quantities in provenance trees, but instead refer to the quantity object by ID. These IDs may be sufficiently unique for your featurizer, although they alone do not hold information about provenance. With the new PR, the hash value of a quantity will take into account provenance, although it does not guarantee equality because it doesn't hash the value. |
Right, I could certainly distinguish among the quantities generated for a single material using that. What I'm saying is that I want to be able to identify distinct quantities that were derived in exactly the same way for a set of multiple materials, so I can use them as features corresponding to a dataset. For example, I might get 50 vicker's hardnesses per material with the standard MP dataset. If I want to use these as features, I'd like to be able to put them into columns that correspond to "identical" features, which in my mind corresponds to the derivation path. |
Oh, I see what you're getting at. Hmm, yeah it's not immediately obvious to me how to do that either. I imagine you'd have to hash the whole model tree in some deterministic way. |
Yeah, that's what I was thinking too. It might be an interesting idea to do that for other reasons as well. For example, graph evaluation might be really facile if you could "cache" the action of the graph for datasets that are isomorphic, which I think might be easier than doing the logic of graph evaluation every time. |
@dmrdjenovich Do you have any thoughts about this? Since you were just working with tree traversal. |
I have a keen interest in making a featurizer that uses propnet-derived features, but I'm not sure how to create an identifier for every Quantity that contains the information for its symbol+evaluation pathway (which I'd want to separate to maximize my feature set. I think a provenance could probably be meaningfully hashed, but I'm not sure how to do it off of the top of my head.
The text was updated successfully, but these errors were encountered: