Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confused about dataset split #4

Open
junkangwu opened this issue Jul 23, 2021 · 3 comments
Open

confused about dataset split #4

junkangwu opened this issue Jul 23, 2021 · 3 comments

Comments

@junkangwu
Copy link

junkangwu commented Jul 23, 2021

Hi, nice work about Variational Autoencoder on recommendation. However, I am confused about the method of data split which is the same way as 2018WWW-Variational autoencoders for collaborative filtering
In the

unique_uid = user_activity.index

unique_uid is the index of active user rather than the uid (unique_uid['userId']). Owing to the filter operator before, some userId are moved out. Then some valid userId at the end will not be considered if we adopt the index of user_activity rather than the actual uid. I guess it might be a error or is there any other meaning of that?

Looking forward to your reply, Thanks.
Best.

@shashankg7
Copy link

I have the same doubt. I am not sure why index is used instead of the actual uid ?

@YvetteLi
Copy link

YvetteLi commented Oct 4, 2022

Hi,

I agree with you and I think its a bug in the code. Initially, I wasn't able to run the code and thought it was probably some data issue, and I went back to change the code as follows.

In preprocess.py

def filter_triplets(tp, min_uc=min_uc, min_sc=min_sc): 
    if min_sc > 0:
        itemcount = get_count(tp, 'movieId')
        tp = tp[tp['movieId'].isin(itemcount[itemcount >= min_sc].movieId)]
        # tp = tp[tp['movieId'].isin(itemcount.index[itemcount >= min_sc])]
    if min_uc > 0:
        usercount = get_count(tp, 'userId')
        tp = tp[tp['userId'].isin(usercount[usercount >= min_uc].userId)]
        # tp = tp[tp['userId'].isin(usercount.index[usercount >= min_uc])]
    
    usercount, itemcount = get_count(tp, 'userId').set_index('userId'), get_count(tp, 'movieId').set_index('movieId')

@LiaoYunxi
Copy link

thanks for your advice~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants