Archiving URL references #14

zmanion · 2022-11-30T14:58:06Z

Link rot is a problem, how serious is this? Vulnerability information is often conveyed in social media (e.g., Twitter/Mastadon posts), which are typically more ephemeral than other types of references. What options do we have? archive.org and the Library of Congress? wget or some other in-house solution?

CC @todb

The text was updated successfully, but these errors were encountered:

kurtseifried · 2022-11-30T17:33:19Z

About 50% of cve.org URL data is dead, either dead dead or like a marketing page. This is after all the sun.com and the like was removed. I grabbed two domains that had expired and are listed in the data set. You also need to archive it for the simple reason of what you downloaded and processed may not be what I downloaded and processed, assuming the site is even up and running. E.g.hunter.dev was down for a few days one time while I wanted to get some data from it. Also, I contacted arhcive.org sales (they do private/public custom archives/service/etc.) twice and started a sales process but they went dark and I gave up. So good luck with that, if you do manage to get their sales people to sell something please let me know who so I can contact them.

…

On Wed, Nov 30, 2022 at 7:58 AM Art Manion ***@***.***> wrote: Link rot is a problem, how serious is this? Vulnerability information is often conveyed in social media (e.g., Twitter/Mastadon posts), which are typically more ephemeral than other types of references. What options do we have? archive.org and the Library of Congress? wget or some other in-house solution? CC @todb <https://github.com/todb> — Reply to this email directly, view it on GitHub <#14>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEOEQZT3JJPVXEN5UIBY2DWK5TQTANCNFSM6AAAAAASPXG2LA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- Kurt Seifried (He/Him) ***@***.***

todb · 2022-11-30T19:22:29Z

Thanks @kurtseifried ! Yeah I'll poke you if I end up on the same path with them.

todb · 2022-11-30T20:37:45Z

For references' sake, I did grab all the extant Twitter references a couple weeks ago when Twitter doom was becoming obvious -- they're now stashed on https://archive.today

https://github.com/todb/junkdrawer/blob/master/cve-twitter-refs/archives.csv

todb-r7 · 2022-12-06T17:33:37Z

I've put out a call for help to the US Library of Congress, which I suspect is a more stable institution than my current solution of archive.today. I'll hit up archive.org next, but I suspect that the LoC's Web Archive Team would be all over this. In the grand scheme of things, it's not an impossibly huge list of references to archive (several thousand but not several million).

zmanion · 2023-01-05T15:54:16Z

A rare but existing case brougt up in CNA Slack, URLs might be videos, which are often short-lived (get flagged, made private, etc.).

todb-r7 · 2023-01-05T16:04:52Z

Just as a quick update, I'm actually in touch with both archive.org and LoC people, so stay tuned. Holidays slowed down comms but I expect that to pick up again!

zmanion · 2023-03-02T20:00:31Z

I have ArchiveBox running and while further testing, use, and discussion is needed, so far it looks like a reasonable self-hosted option.

zmanion · 2023-03-02T20:02:02Z

A rare but existing case brougt up in CNA Slack, URLs might be videos, which are often short-lived (get flagged, made private, etc.).

I really dislike videos as (primary) vulnerability reports, but ArchiveBox supports grabbing videos.

zmanion · 2023-03-02T20:03:07Z

Just as a quick update, I'm actually in touch with both archive.org and LoC people, so stay tuned. Holidays slowed down comms but I expect that to pick up again!

I think you mentioned that the archive.org and LoC options will not work?

One idea I had early on was just to submit every reference to archive.org. Pay for an API key/sufficient rate limits if needed.

zmanion · 2023-03-17T21:23:14Z

https://github.com/todb/cve-archive

todb-cisa · 2023-06-29T16:30:05Z

ArchiveTeam did a thing!

https://wiki.archiveteam.org/index.php/CVE_References

zmanion moved this to Backlog in CVE Oct 5, 2024

zmanion added this to CVE Oct 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Archiving URL references #14

Archiving URL references #14

zmanion commented Nov 30, 2022

kurtseifried commented Nov 30, 2022 via email

todb commented Nov 30, 2022

todb commented Nov 30, 2022 •

edited

Loading

todb-r7 commented Dec 6, 2022

zmanion commented Jan 5, 2023

todb-r7 commented Jan 5, 2023

zmanion commented Mar 2, 2023

zmanion commented Mar 2, 2023

zmanion commented Mar 2, 2023

zmanion commented Mar 17, 2023

todb-cisa commented Jun 29, 2023

Archiving URL references #14

Archiving URL references #14

Comments

zmanion commented Nov 30, 2022

kurtseifried commented Nov 30, 2022 via email

todb commented Nov 30, 2022

todb commented Nov 30, 2022 • edited Loading

todb-r7 commented Dec 6, 2022

zmanion commented Jan 5, 2023

todb-r7 commented Jan 5, 2023

zmanion commented Mar 2, 2023

zmanion commented Mar 2, 2023

zmanion commented Mar 2, 2023

zmanion commented Mar 17, 2023

todb-cisa commented Jun 29, 2023

todb commented Nov 30, 2022 •

edited

Loading