Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maybe add some advanced duplicate removals #29

Open
jensbri opened this issue Jul 12, 2024 · 0 comments
Open

maybe add some advanced duplicate removals #29

jensbri opened this issue Jul 12, 2024 · 0 comments

Comments

@jensbri
Copy link
Contributor

jensbri commented Jul 12, 2024

Some background:

  • in Option B, task concat a df the challenge was that I pressed the play button repeatedly and ended up adding (concat) to my df over and over again
  • therefore, I wanted to check if I had done that programmatically: find duplicates; in other words: If there are 245 lines in the core dataset that is concat with 500 lines and I end up with 245+500+500 whereof 500 are 100% duplicates, that would be the main intended outcome
  • what Ben mentioned in the first run was: removing duplicates from the tips by making sure that a subset of the columns is 100% match
  • this is probably an important task for researchers dealing with imperfect data collection: Defining rules which columns NaN is still a valid row vs. not
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant