Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overriding newest solution with oldest solution #22

Closed
rajatgoyal715 opened this issue Aug 4, 2019 · 2 comments · Fixed by #31
Closed

Overriding newest solution with oldest solution #22

rajatgoyal715 opened this issue Aug 4, 2019 · 2 comments · Fixed by #31
Labels
bug Something isn't working

Comments

@rajatgoyal715
Copy link
Member

Right now, if two or more solutions to the same problem are found, then we override the existing solution with the new one crawled. But the crawling order is from newest to oldest. So, in this case, it will override the new solution with the old solution, which is not what we want.

Steps to reproduce the behavior:

  1. This is the query we are using to get the submissions: https://www.hackerrank.com/rest/contests/master/submissions/?offset=0&limit=100
  2. Here, we traverse through the list of submissions: https://github.com/Nullifiers/Hackerrank-Solution-Crawler/blob/master/hsc/crawler.py#L85

Solution:

  1. We can reverse the order of traversing.
  2. Or, we can get the submissions in reverse order, maybe by tweaking some query parameters.

We also need to handle the offset and limit feature in this case.

@rajatgoyal715 rajatgoyal715 added the bug Something isn't working label Aug 4, 2019
@rishabhsingh971
Copy link
Member

Currently, we don't override the existing file, as you can see here. So in the same crawler session, a new file won't be overwritten by an old one.

@rishabhsingh971
Copy link
Member

rishabhsingh971 commented Aug 4, 2019

If we crawl the solutions and multiple accepted solutions (with same file extension) are found for a problem only the solution which came first will be saved (latest at that time).
But if we crawl again and an even newer file is found, it also won't overwrite the file saved in the previous session.

Possible solutions:

  1. Store last crawl time and only overwrite if created_at of the current file is greater than last crawl time.
  2. Store created_at of each file and only overwrite if created_at of the current file is greater than the created_at of the last solution.

This can be done after #23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants