You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, we have 3 rules which do more or less the same, i.e. find duplicate items by column/columns: https://github.com/scrapinghub/arche/blob/master/src/arche/rules/duplicates.py
find_by (multiple columns support, "Duplicates")
find_by_unique (one field, "Uniqueness/Duplicates By unique Tag")
find_by_name_url (tags, named "Duplicated Items/Duplicates By name_field, product_url_field Tags")
That's confusing. As a part of #123 , I propose to move from tags to arguments which basically removes the need of using tags and obsoletes the last two.
Arche(uniques=["url", ("id", "color")])
single entry url will check that the column contains only unique values
tuple ("id", "color") will check that all rows contain unique combination of id color @peonone
The text was updated successfully, but these errors were encountered:
At the moment, we have 3 rules which do more or less the same, i.e. find duplicate items by column/columns:
https://github.com/scrapinghub/arche/blob/master/src/arche/rules/duplicates.py
find_by (multiple columns support, "Duplicates")
find_by_unique (one field, "Uniqueness/Duplicates By unique Tag")
find_by_name_url (tags, named "Duplicated Items/Duplicates By name_field, product_url_field Tags")
That's confusing. As a part of #123 , I propose to move from tags to arguments which basically removes the need of using tags and obsoletes the last two.
Arche(uniques=["url", ("id", "color")])
single entry
url
will check that the column contains only unique valuestuple
("id", "color")
will check that all rows contain unique combination ofid color
@peonone
The text was updated successfully, but these errors were encountered: