Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archiving URL references #14

Open
zmanion opened this issue Nov 30, 2022 · 11 comments
Open

Archiving URL references #14

zmanion opened this issue Nov 30, 2022 · 11 comments

Comments

@zmanion
Copy link
Contributor

zmanion commented Nov 30, 2022

Link rot is a problem, how serious is this? Vulnerability information is often conveyed in social media (e.g., Twitter/Mastadon posts), which are typically more ephemeral than other types of references. What options do we have? archive.org and the Library of Congress? wget or some other in-house solution?

CC @todb

@kurtseifried
Copy link

kurtseifried commented Nov 30, 2022 via email

@todb
Copy link

todb commented Nov 30, 2022

Thanks @kurtseifried ! Yeah I'll poke you if I end up on the same path with them.

@todb
Copy link

todb commented Nov 30, 2022

For references' sake, I did grab all the extant Twitter references a couple weeks ago when Twitter doom was becoming obvious -- they're now stashed on https://archive.today

https://github.com/todb/junkdrawer/blob/master/cve-twitter-refs/archives.csv

@todb-r7
Copy link

todb-r7 commented Dec 6, 2022

I've put out a call for help to the US Library of Congress, which I suspect is a more stable institution than my current solution of archive.today. I'll hit up archive.org next, but I suspect that the LoC's Web Archive Team would be all over this. In the grand scheme of things, it's not an impossibly huge list of references to archive (several thousand but not several million).

@zmanion
Copy link
Contributor Author

zmanion commented Jan 5, 2023

A rare but existing case brougt up in CNA Slack, URLs might be videos, which are often short-lived (get flagged, made private, etc.).

@todb-r7
Copy link

todb-r7 commented Jan 5, 2023

Just as a quick update, I'm actually in touch with both archive.org and LoC people, so stay tuned. Holidays slowed down comms but I expect that to pick up again!

@zmanion
Copy link
Contributor Author

zmanion commented Mar 2, 2023

I have ArchiveBox running and while further testing, use, and discussion is needed, so far it looks like a reasonable self-hosted option.

@zmanion
Copy link
Contributor Author

zmanion commented Mar 2, 2023

A rare but existing case brougt up in CNA Slack, URLs might be videos, which are often short-lived (get flagged, made private, etc.).

I really dislike videos as (primary) vulnerability reports, but ArchiveBox supports grabbing videos.

@zmanion
Copy link
Contributor Author

zmanion commented Mar 2, 2023

Just as a quick update, I'm actually in touch with both archive.org and LoC people, so stay tuned. Holidays slowed down comms but I expect that to pick up again!

I think you mentioned that the archive.org and LoC options will not work?

One idea I had early on was just to submit every reference to archive.org. Pay for an API key/sufficient rate limits if needed.

@zmanion
Copy link
Contributor Author

zmanion commented Mar 17, 2023

@todb-cisa
Copy link

ArchiveTeam did a thing!

https://wiki.archiveteam.org/index.php/CVE_References

@zmanion zmanion moved this to Backlog in CVE Oct 5, 2024
@zmanion zmanion added this to CVE Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants