Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Backward ForgetMult + Bidirectional QRNN #24

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

elmarhaussmann
Copy link

I had some time on the weekend and thought this might be fun. The PR implements a backward ForgetMult for CPU and CUDA and changes to QRNN and QRNNLayer to make them bidirectional. I tried to keep changes to the original code minimal without duplicating a lot of code.

I tested this with Pytorch 0.4.0 and Python 3.6.5. Gradient checks etc. pass. On preliminary results with IMDB movie reviews it looks like a bidirectional QRNN (2 layers, each with forward and backward, 256 hidden units) performs slightly better (~0.5% accuracy) than a unidirectional QRNN of same size (4 layers, 256 hidden units), but I haven't had enough time to finish experiments to be certain on this.

Let me know what you think and where the code still needs changes (I know it's not perfect in some places, especially QRNNLayer).

@salesforce-cla
Copy link

Thanks for the contribution! Before we can merge this, we need @elmarhaussmann to sign the Salesforce.com Contributor License Agreement.

@elmarhaussmann
Copy link
Author

@Smerity: ping. No interest in merging? (I know it's old stuff now)

@dnbaker
Copy link

dnbaker commented Feb 14, 2019

I'd be interested in having this merged in as well, though in the interim I'm glad to have the use of your fork. I imagine adding half/double support would also be worth doing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants