Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tags are not deduplicated before saving the page #23

Closed
Truncated opened this issue May 9, 2024 · 1 comment
Closed

Tags are not deduplicated before saving the page #23

Truncated opened this issue May 9, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Truncated
Copy link

Truncated commented May 9, 2024

Summary

Some websites are slapping 2 and 3 sources of labels in the header space. Sometimes it's identical, sometimes not. Adding a de-dupe routine to cycle through the end list of found tags, maybe using a.filter(onlyUnique); as described here, or the distinct method?

Details:

Their metadata in the header appears to be "throw against the wall" approach:

From example: https://www.forbes.com/sites/jiawertz/2024/05/07/ai-can-boost-solopreneurs-productivity-by-40/?sh=234850ea4cd1

<meta name="keywords" itemprop="keywords" content="Generative AI,AI,artificial intelligence,solopreneur,automation,chatGPT,content creation,personalization">
<meta name="news_keywords" itemprop="keywords" content="Generative AI,AI,artificial intelligence,solopreneur,automation,chatGPT,content creation,personalization">
.... further down...
<meta name="news_keywords" itemprop="keywords" content="Generative AI,AI,artificial intelligence,solopreneur,automation,chatGPT,content creation,personalization">

At least Forbes is consistent... economictimes mixes it up:
From example: https://m.economictimes.com/small-biz/sme-sector/how-software-and-it-jobs-are-disappearing-in-favour-of-ai-and-what-is-going-to-fill-that-vacuum/amp_articleshow/109640608.cms

<meta name="news_keywords" content="startups,Small Business,AI,technology,IT,skills,workforce,jobs,future">
<meta content="AI,technology,IT,skills,workforce,jobs,future" name="keywords">
@inhumantsar inhumantsar added the bug Something isn't working label May 9, 2024
@inhumantsar inhumantsar changed the title Tripled Tags... what are they good for... absolutely nothing! Tags are not deduplicated before saving the page May 9, 2024
inhumantsar added a commit that referenced this issue May 11, 2024
- fix: refactor new note modal, add validation (#23)
- fix: remove broken github link and useless log refresh button
- fix: avoid saving settings if no changes are detected
inhumantsar added a commit that referenced this issue Jun 8, 2024
inhumantsar added a commit that referenced this issue Jun 8, 2024
- fix: strip hash marks from file names (#20)
- fix: clean and dedupe tags (#23 #27)
- fix: ensure slurpedTime is formatted with its settings template (#33)
@inhumantsar
Copy link
Owner

should be fixed in 0.1.12!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants