Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve performance of reading files with duplicate column names
I need to load a file with 30k columns, 10k of these have the same name. Currently, this is practically impossible because makeunique(), which produces unique column names, has cubic complexity. This changes the algorithm to use Set and Dict to quickly look up the existence of columns and to cache the last numeric suffix used to uniquify column names. Care has been taken to ensure that columns are named the same way as before. To that extent, additional tests were added in the previous commit.
- Loading branch information