Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including more features apart from diagnostic codes #8

Open
sarwart opened this issue Jul 30, 2020 · 8 comments
Open

Including more features apart from diagnostic codes #8

sarwart opened this issue Jul 30, 2020 · 8 comments

Comments

@sarwart
Copy link

sarwart commented Jul 30, 2020

Hi Retain team,

I am interested in using retain for my research. I wanted to know how to prepare data in the presence of more features like vital signs or medication apart from diagnostic codes? Do I need to concatenate all the feature together? For example for a patient with diagnostic codes (c1 to c3) and vital signs (v1 to v3) , should the input be like [c1,c2,c3,v1,v2,v3] for a single visit?

@sarwart sarwart changed the title Including extra features apart from diagnostic codes Including more features apart from diagnostic codes Jul 30, 2020
@mp2893
Copy link
Owner

mp2893 commented Jul 30, 2020

Hi Tabinda,

Yes that’s how you use additional features. However, note that the contributions of the continuous features (e.g. blood pressure) are calculated a bit differently than the contributions of discrete features. (See the original paper for actual equation).

Best
Ed

@sarwart
Copy link
Author

sarwart commented Jul 31, 2020

Dear Ed,

Thank you for your reply. Yes I noticed but in the paper it is mentioned that continuous features could be used.

In case of learning to diagnose (L2D) [27], the input vector xi consists of continuous clinical measures.

When do you say "contributions", are you talking about interpretation of the attention model? What exactly do I need to change to include the continuous features?

Tabinda

@mp2893
Copy link
Owner

mp2893 commented Jul 31, 2020

Contributions in RETAIN are different from simply just studying the attention. RETAIN allows users to decompose the prediction value into contributions of individual features in each visit using Eq.2 in the paper.
You don't need to change anything, you can simply just append more features to the input vector.
And I recommend using the TensorFlow version implemented by Optum (https://github.com/Optum/retain-keras), which is better maintained than this one.

@sarwart
Copy link
Author

sarwart commented Jul 31, 2020

Thank you Ed

@lucasliu0928
Copy link

Hi Retain team,
I am really interested in this work and trying to understand the framework. I have a couple of questions:

  1. I wonder how does the model deal with patients having different numbers of visits and visits having different lengths of codes/features? Do you pad zeros to the end of the patient's visit lists and do the same for codes within each visit?
  2. If we only use continuous features, what would be the input argument for "inputDimSize", it is 942 in your example for the entire 3-digit ICD9 codes. would it be the number of features?

@mp2893
Copy link
Owner

mp2893 commented Aug 17, 2020

Hi Lucas,

Thanks for taking interest in this work.

  1. In principle you are correct. This RETAIN code was implemented in Theano, which is a really old library, so I didn't exactly do as your question, but you should follow your logic when implementing RETAIN with modern libraries such as TensorFlow or PyTorch. However, don't forget to use proper masks to deal with padded elements.
  2. Yes it would be the total number of features.

Best,
Ed

@lucasliu0928
Copy link

Hi Lucas,

Thanks for taking interest in this work.

  1. In principle you are correct. This RETAIN code was implemented in Theano, which is a really old library, so I didn't exactly do as your question, but you should follow your logic when implementing RETAIN with modern libraries such as TensorFlow or PyTorch. However, don't forget to use proper masks to deal with padded elements.
  2. Yes it would be the total number of features.

Best,
Ed

Hi Ed,
Thank you for the explanations!

Best,
Lucas

@sarwart
Copy link
Author

sarwart commented Aug 17, 2020

Hi Lucas,

If you implement Retain for continuous feature, it will be great if you can share the code on github.

Tabinda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants