Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MovieLens Datasets #147

Merged
merged 16 commits into from
Jun 30, 2022
Merged

MovieLens Datasets #147

merged 16 commits into from
Jun 30, 2022

Conversation

Dsantra92
Copy link
Collaborator

@Dsantra92 Dsantra92 commented Jun 23, 2022

  • MovieLens Struct
  • Base.show
  • Docs
  • MovieLens 100K
  • MovieLens 1m
  • MoviLens 20m
  • MovieLens 25m
  • Tests

@Dsantra92 Dsantra92 changed the title MovieLens 100k MovieLens Datasets Jun 23, 2022
@Dsantra92 Dsantra92 linked an issue Jun 23, 2022 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Jun 23, 2022

Codecov Report

Merging #147 (71ee554) into master (63b865f) will increase coverage by 5.32%.
The diff coverage is 76.27%.

@@            Coverage Diff             @@
##           master     #147      +/-   ##
==========================================
+ Coverage   38.68%   44.01%   +5.32%     
==========================================
  Files          39       40       +1     
  Lines        1755     2029     +274     
==========================================
+ Hits          679      893     +214     
- Misses       1076     1136      +60     
Impacted Files Coverage Δ
src/datasets/graphs/movielens.jl 76.19% <76.19%> (ø)
src/MLDatasets.jl 100.00% <100.00%> (ø)
src/utils.jl 61.22% <0.00%> (+10.20%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 63b865f...71ee554. Read the comment docs.

user_data["gender"] = user_df[!, 3] .== "M" # I hope I don't get cancelled for binarizing this field
user_data["occupation"] = user_df[!, 4]
user_data["zipcode"] = user_df[!, 5]
return user_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation is not uniform in this file. It should be 4 blanks everywhere

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, vim messed it up somewhere, I will fix these things in a cleanup commit later.

@Dsantra92
Copy link
Collaborator Author

Forgot to add the indentation fix change, will be added in later commit.

@Dsantra92
Copy link
Collaborator Author

There are inconsistencies in data storing format across the 3 variations: 100k,1m and current datasets(20m, 25m etc.). Will address the issue when all of them have working APIs.

@Dsantra92 Dsantra92 marked this pull request as ready for review June 28, 2022 15:03
@Dsantra92 Dsantra92 requested a review from CarloLucibello June 28, 2022 15:03
@CarloLucibello
Copy link
Member

Are all tests passing locally?

@Dsantra92
Copy link
Collaborator Author

Are all tests passing locally?

Yes

src/datasets/graphs/movielens.jl Outdated Show resolved Hide resolved
src/datasets/graphs/movielens.jl Outdated Show resolved Hide resolved
src/datasets/graphs/movielens.jl Outdated Show resolved Hide resolved
src/datasets/graphs/movielens.jl Outdated Show resolved Hide resolved
src/datasets/graphs/movielens.jl Outdated Show resolved Hide resolved
src/datasets/graphs/movielens.jl Outdated Show resolved Hide resolved
@CarloLucibello CarloLucibello merged commit 917665f into JuliaML:master Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Movielens datasets
3 participants