Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Paragram Embedding #30

Merged
merged 13 commits into from
Feb 8, 2020
Merged

Conversation

tejasvaidhyadev
Copy link
Member

I am adding Paragram Embedding and Google-drive download fetch method for datadeps in Paragram.I think it will be great to have default G-drive download fetch method in Datadeps

src/Paragram.jl Outdated Show resolved Hide resolved
@oxinabox
Copy link
Member

I am adding Paragram Embedding and Google-drive download fetch method for datadeps in Paragram.

I think it will be great to have default G-drive download fetch method in Datadeps

I made a prototype of this a few years back. It would be nice to see something more polished.
https://github.com/oxinabox/PyDrive.jl
idk if in DataDeps or in a GoogleDrive.jl package that supports DataDeps (potetially similar to how AWSS3.jl supports DataDeps)
I guess its not much code so it could go in DataDeps.
But being able to add Auth (like the PyDrive.jl demo does) the code code grows so might be better to have it else-where

I agree that it doesn't belong here.
It would be kind if nice to hold on this PR til that code found a better home.
But if you need these for a project we can look at keeping it here for now.

@tejasvaidhyadev
Copy link
Member Author

For now, I don't need it for any project but would love to find a home to code.
Any suggestion how to proceed ?

@oxinabox
Copy link
Member

I suggest you create a GoogleDrive.jl package,
based on the code you have.
(use PkgTemplates if you are not already).

Register that package, then we will add it as a dependency of Embeddings.jl

@tejasvaidhyadev
Copy link
Member Author

tejasvaidhyadev commented Jan 16, 2020

Hi @oxinabox I created GoogleDrive.jl Package , and also opened a Registration PR
After the PR is merged I will push all the above suggested changes to this PR.

@tejasvaidhyadev
Copy link
Member Author

@oxinabox as suggested GoogleDrive.jl as dependency and common.jl is added

@codecov-io
Copy link

codecov-io commented Jan 22, 2020

Codecov Report

Merging #30 into master will decrease coverage by 0.81%.
The diff coverage is 88.46%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #30      +/-   ##
==========================================
- Coverage   96.77%   95.95%   -0.82%     
==========================================
  Files           4        6       +2     
  Lines          93       99       +6     
==========================================
+ Hits           90       95       +5     
- Misses          3        4       +1
Impacted Files Coverage Δ
src/Embeddings.jl 100% <100%> (ø) ⬆️
src/glove.jl 100% <100%> (+8%) ⬆️
src/Paragram.jl 80% <80%> (ø)
src/common.jl 89.47% <89.47%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65390f1...ac45848. Read the comment docs.

src/Embeddings.jl Outdated Show resolved Hide resolved
src/Paragram.jl Outdated Show resolved Hide resolved
src/glove.jl Outdated Show resolved Hide resolved
src/common.jl Outdated
@@ -0,0 +1,28 @@

function _load_embeddings_csv( embedding_file, max_vocab_size, keep_words)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also use this for FastText maybe? there is at least some code overlap.
Might need to factor out the common inner parts.

We should probably not call it a CSV since it is space seperated,
Or we call it a CSV and pass in a deliminator option

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deliminator option is added as suggested

src/Paragram.jl Outdated Show resolved Hide resolved
src/common.jl Outdated Show resolved Hide resolved
src/common.jl Outdated Show resolved Hide resolved
src/common.jl Outdated Show resolved Hide resolved
@oxinabox
Copy link
Member

oxinabox commented Feb 8, 2020

thanks

@oxinabox oxinabox merged commit 73e09d7 into JuliaText:master Feb 8, 2020
@tejasvaidhyadev
Copy link
Member Author

Hi @oxinabox ,
I will soon implement _load_embeddings_csv for FastText as well.
sorry for delay because of my exams are going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants