Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Performance Issue on General Methods #753

Closed
johnny12150 opened this issue Mar 5, 2021 · 10 comments
Closed

[Question] Performance Issue on General Methods #753

johnny12150 opened this issue Mar 5, 2021 · 10 comments
Assignees
Labels
question Further information is requested

Comments

@johnny12150
Copy link

johnny12150 commented Mar 5, 2021

I have tested some general methods such as pop and itemKNN with Tmall dataset.
However, their topK metrics seem a little bit odd to me.
This is what I get with pop and itemKNN respectively.

Fri 05 Mar 2021 15:03:20 INFO valid result: 
recall@20 : 0.0413    mrr@20 : 0.1413    ndcg@20 : 0.0688    hit@20 : 0.3017    precision@20 : 0.0536    
Fri 05 Mar 2021 15:03:20 INFO Saving current best: saved/Pop-Mar-05-2021_15-00-52.pth
Fri 05 Mar 2021 15:03:20 INFO Loading model structure and parameters from saved/Pop-Mar-05-2021_15-00-52.pth
Fri 05 Mar 2021 15:04:09 INFO best valid result: {'recall@20': 0.0413, 'mrr@20': 0.1413, 'ndcg@20': 0.0688, 'hit@20': 0.3017, 'precision@20': 0.0536}
Fri 05 Mar 2021 15:04:09 INFO test result: {'recall@20': 0.032, 'mrr@20': 0.2363, 'ndcg@20': 0.1468, 'hit@20': 0.464, 'precision@20': 0.1399}
Fri 05 Mar 2021 15:10:54 INFO valid result: 
recall@20 : 0.2272    mrr@20 : 0.5338    ndcg@20 : 0.3037    hit@20 : 0.8744    precision@20 : 0.2156    
Fri 05 Mar 2021 15:10:54 INFO Saving current best: saved/ItemKNN-Mar-05-2021_15-07-31.pth
Fri 05 Mar 2021 15:10:54 INFO Loading model structure and parameters from saved/ItemKNN-Mar-05-2021_15-07-31.pth
Fri 05 Mar 2021 15:12:43 INFO best valid result: {'recall@20': 0.2272, 'mrr@20': 0.5338, 'ndcg@20': 0.3037, 'hit@20': 0.8744, 'precision@20': 0.2156}
Fri 05 Mar 2021 15:12:43 INFO test result: {'recall@20': 0.1657, 'mrr@20': 0.7576, 'ndcg@20': 0.5654, 'hit@20': 0.9826, 'precision@20': 0.5355}

I just pick a paper that uses the dataset as well and the performance results are in the picture below.
image

The MRR and NDCG shouldn't be that high (10x times higher than most papers calculated).

By the way, this is my config setting in yaml.

USER_ID_FIELD: user_id
load_col:
  inter: [user_id, item_id, timestamp]
epochs: 30
topk: [20]
valid_metric: MRR@20
split_ratio: [0.7,0.1,0.2]
training_neg_sample_num: 100
@tsotfsk
Copy link
Contributor

tsotfsk commented Mar 5, 2021

Hi@johnny12150, can you tell me which version of the RecBole you are using?

Actually, we have fixed the bug twice and the rule-based model‘s result will be affected(Pop,ItemKNN), while the neural network model will be almost unaffected. You can get some details in issue #699, #622.

Let me give you an example to illustrate the changes. Suppose that there are 5 items in our dataset and we evaluate the model by Recall@3. For one user, he/her just have one ground truth, and the output of the model is [0, 0, 0, 0, 0].

before pr #658, they all rank 1st, and we will get Recall@3=1.
after pr #658 and before pr #731, they all rank 5th, and we will get Recall@3=0.
after pr #731, the Recall@3=0 or 1, because we will randomly choose three of the five as our recommendation.

The results you listed seem to be the earliest version.

@johnny12150
Copy link
Author

johnny12150 commented Mar 6, 2021

I have tested it with the newest released on pip version and it seems the same.
image
I also used the command print(recbole.__version__) to check the version is correct.
The previous results were tested on the previous version.

Sat 06 Mar 2021 18:21:15 INFO best valid result: {'recall@5': 0.1116, 'recall@10': 0.1658, 'recall@20': 0.23, 'recall@50': 0.3315, 'mrr@5': 0.5435, 'mrr@10': 0.5568, 'mrr@20': 0.5624, 'mrr@50': 0.5647, 'ndcg@5': 0.3558, 'ndcg@10': 0.3043, 'ndcg@20': 0.2921, 'ndcg@50': 0.3172, 'hit@5': 0.7152, 'hit@10': 0.8138, 'hit@20': 0.8929, 'hit@50': 0.9603, 'precision@5': 0.3305, 'precision@10': 0.257, 'precision@20': 0.1854, 'precision@50': 0.111}
Sat 06 Mar 2021 18:21:15 INFO test result: {'recall@5': 0.0891, 'recall@10': 0.1412, 'recall@20': 0.2051, 'recall@50': 0.3078, 'mrr@5': 0.6739, 'mrr@10': 0.6837, 'mrr@20': 0.6872, 'mrr@50': 0.6883, 'ndcg@5': 0.488, 'ndcg@10': 0.4229, 'ndcg@20': 0.3571, 'ndcg@50': 0.354, 'hit@5': 0.8396, 'hit@10': 0.9114, 'hit@20': 0.9601, 'hit@50': 0.9916, 'precision@5': 0.4635, 'precision@10': 0.3811, 'precision@20': 0.2889, 'precision@50': 0.182}

The MRR is even higher with ItemKNN.
I think the parameter training_neg_sample_num won't affect the candidates that the recommender can recommend, right?

@tsotfsk
Copy link
Contributor

tsotfsk commented Mar 6, 2021

Thanks for your information. The version you used is between #658 and #731, so the result should not be very high. Is your dataset
downloaded from our library RecDatasets? If it's true, I want to know which type you are using?(click data or buy data). If not, could you provide me with a copy? My E-Mail is [email protected]

@johnny12150
Copy link
Author

johnny12150 commented Mar 6, 2021

Yes, I have tried both click and buy without removing duplicates.
I am going to try the diginetica ones now since this problem happened in the last version, too.
If it's the same then I will give yoochoose a shoot.

@tsotfsk
Copy link
Contributor

tsotfsk commented Mar 6, 2021

OK, I find that this dataset is so big that it takes me one hour to test it. My result of Pop is very different from yours.

06 Mar 22:39    INFO best valid result: {'recall@20': 0.0191, 'mrr@20': 0.0175, 'ndcg@20': 0.0155, 'hit@20': 0.0255, 'precision@20': 0.0013}
06 Mar 22:39    INFO test result: {'recall@20': 0.028, 'mrr@20': 0.0349, 'ndcg@20': 0.0256, 'hit@20': 0.0477, 'precision@20': 0.0024}

The result is tested in the last version. Maybe your setting is inconsistent with the paper, or you are using a sample of the dataset . Please check it and I will test ItemKNN and let you know the result as soon as possible.

@johnny12150
Copy link
Author

johnny12150 commented Mar 6, 2021

I reinstall the package and test the datasets again and three datasets have all matched the expectation with pop.
However, the ItemKNN seems to be a little be higher than expected around 0.25 with MRR@20 with Tmall.
Edit

07 Mar 01:00    INFO best valid result: {'recall@5': 0.1857, 'recall@10': 0.2688, 'recall@20': 0.3655, 'recall@50': 0.5049, 'mrr@5': 0.17, 'mrr@10': 0.1838, 'mrr@20': 0.1914, 'mrr@50': 0.196, 'ndcg@5': 0.1457, 'ndcg@10': 0.1741, 'ndcg@20': 0.2021, 'ndcg@50': 0.2346, 'hit@5': 0.2849, 'hit@10': 0.3883, 'hit@20': 0.498, 'hit@50': 0.6389, 'precision@5': 0.068, 'precision@10': 0.0494, 'precision@20': 0.0339, 'precision@50': 0.019}
07 Mar 01:00    INFO test result: {'recall@5': 0.1964, 'recall@10': 0.282, 'recall@20': 0.3789, 'recall@50': 0.5126, 'mrr@5': 0.2236, 'mrr@10': 0.2381, 'mrr@20': 0.2455, 'mrr@50': 0.2495, 'ndcg@5': 0.1736, 'ndcg@10': 0.2005, 'ndcg@20': 0.2298, 'ndcg@50': 0.264, 'hit@5': 0.3569, 'hit@10': 0.4657, 'hit@20': 0.5722, 'hit@50': 0.6945, 'precision@5': 0.0958, 'precision@10': 0.0703, 'precision@20': 0.0481, 'precision@50': 0.0268}

@2017pxy 2017pxy added the question Further information is requested label Mar 7, 2021
@johnny12150
Copy link
Author

I tested the ItemKNN with tmall-click and it stuck before training for more than 10 hours.

07 Mar 02:54    INFO Build [ModelType.TRADITIONAL] DataLoader for [evaluation] with format [InputType.POINTWISE]
07 Mar 02:54    INFO Evaluation Setting:
        Group by user_id
        Ordering: {'strategy': 'shuffle'}
        Splitting: {'strategy': 'by_ratio', 'ratios': [0.8, 0.1, 0.1]}
        Negative Sampling: {'strategy': 'full', 'distribution': 'uniform'}
07 Mar 02:54    INFO batch_size = [[100, 100]], shuffle = [False]

07 Mar 02:54    WARNING Batch size is changed to 2200292.
07 Mar 02:54    WARNING Batch size is changed to 2200292.

Is there any config setting I missed in yaml file?

@EliverQ
Copy link
Collaborator

EliverQ commented Mar 7, 2021

I tested the ItemKNN with tmall-click and it stuck before training for more than 10 hours.

07 Mar 02:54    INFO Build [ModelType.TRADITIONAL] DataLoader for [evaluation] with format [InputType.POINTWISE]
07 Mar 02:54    INFO Evaluation Setting:
        Group by user_id
        Ordering: {'strategy': 'shuffle'}
        Splitting: {'strategy': 'by_ratio', 'ratios': [0.8, 0.1, 0.1]}
        Negative Sampling: {'strategy': 'full', 'distribution': 'uniform'}
07 Mar 02:54    INFO batch_size = [[100, 100]], shuffle = [False]

07 Mar 02:54    WARNING Batch size is changed to 2200292.
07 Mar 02:54    WARNING Batch size is changed to 2200292.

Is there any config setting I missed in yaml file?

Hi, @johnny12150 . Could you please provide your complete yaml file? I'll test it and let you know the result as soon as possible.

@johnny12150
Copy link
Author

@EliverQ
This is the one that I currently use.

USER_ID_FIELD: user_id
load_col:
  inter: [user_id, item_id, timestamp]
epochs: 30
train_batch_size: 100
eval_batch_size: 100  # val and test batch_size
topk: [10, 20]
valid_metric: MRR@20
stopping_step: 5  # early stop after num epochs
split_ratio: [0.8,0.1,0.1]

@johnny12150
Copy link
Author

I reinstall the package and test the datasets again and three datasets have all matched the expectation with pop.
However, the ItemKNN seems to be a little be higher than expected around 0.25 with MRR@20 with Tmall.
Edit

07 Mar 01:00    INFO best valid result: {'recall@5': 0.1857, 'recall@10': 0.2688, 'recall@20': 0.3655, 'recall@50': 0.5049, 'mrr@5': 0.17, 'mrr@10': 0.1838, 'mrr@20': 0.1914, 'mrr@50': 0.196, 'ndcg@5': 0.1457, 'ndcg@10': 0.1741, 'ndcg@20': 0.2021, 'ndcg@50': 0.2346, 'hit@5': 0.2849, 'hit@10': 0.3883, 'hit@20': 0.498, 'hit@50': 0.6389, 'precision@5': 0.068, 'precision@10': 0.0494, 'precision@20': 0.0339, 'precision@50': 0.019}
07 Mar 01:00    INFO test result: {'recall@5': 0.1964, 'recall@10': 0.282, 'recall@20': 0.3789, 'recall@50': 0.5126, 'mrr@5': 0.2236, 'mrr@10': 0.2381, 'mrr@20': 0.2455, 'mrr@50': 0.2495, 'ndcg@5': 0.1736, 'ndcg@10': 0.2005, 'ndcg@20': 0.2298, 'ndcg@50': 0.264, 'hit@5': 0.3569, 'hit@10': 0.4657, 'hit@20': 0.5722, 'hit@50': 0.6945, 'precision@5': 0.0958, 'precision@10': 0.0703, 'precision@20': 0.0481, 'precision@50': 0.0268}

I use the code of a survey paper provided and the result is much lower.

https://github.com/rn5l/session-rec/blob/master/algorithms/knn/iknn.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants