Stabilize the JSON format #101

munntjlx · 2023-12-11T19:34:29Z

Is your feature request related to a problem? Please describe.
We are using noseyparker as part of a 'defect dojo' secret tracking and discovery process. We use the jsonl format, but it seems to change arbitrarily between even minor versions. This causes us to have to modify our parser scripts (for defect dojo) since our unit tests fail when the JSON format changes.

Describe the solution you'd like
Can we stabilize the json format or agree to only change it on x versions? This would make derivitave works or programs having a 'stable' base upon which to build.

Describe alternatives you've considered
Perhaps 'we change the json format every .2 or .4 increments of a new version? Or an 'odd vs 'even' which gives us some stability on the JSON format?

Additional context
Just to note this is a GREAT project and we really appreciate the work you are doing!

bradlarsen · 2023-12-11T20:15:35Z

Thanks for the request @munntjlx. Yes, this is a worthy task. See #72 for related.

I have tried with mixed success up to this point to only add fields to the JSON format. With a lenient parser — one that ignores unknown additional fields — the core information should still be parseable in the presence of additions, without having to change the parser.

It may take another release or three, but this is in my plans. (Relatedly, I'd like to also stabilize the SQL schema for the datastore, but that will likely take longer.)

bradlarsen · 2023-12-11T20:22:50Z

P.S. @munntjlx out of curiosity, are your defect dojo parser scripts proprietary, or are they part of the DefectDojo project?

munntjlx · 2023-12-11T21:11:11Z

THey will eventually be part of the main project. Tanvi is working on these which we hope to mainline once they get around to importing the parser.

…

________________________________________ From: Brad Larsen ***@***.***> Sent: Monday, 11 December 2023 15:23 To: praetorian-inc/noseyparker Cc: Munn, Thomas (LNG-RDU); Mention Subject: Re: [praetorian-inc/noseyparker] Stabilize the JSON format (Issue #101) *** External email: use caution *** P.S. @munntjlx<https://github.com/munntjlx> out of curiosity, are your defect dojo parser scripts proprietary, or are they part of the DefectDojo project? — Reply to this email directly, view it on GitHub<#101 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALVIUB34Y7IJJORMBAIYIMLYI5TSJAVCNFSM6AAAAABAQJLZRWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJQHAZDSNZVGI>. You are receiving this because you were mentioned.Message ID: ***@***.***>

bradlarsen · 2023-12-11T22:23:26Z

Anyway, aside from trying hard to not change the JSON output between releases, the place to start with stabilization is to write a schema and validate against that (at least in testing): #72.

bradlarsen · 2024-02-16T23:43:14Z

@munntjlx there are more changes to the JSON report coming in the next release (#122). Some fundamentals have changed in the Nosey Parker data model that require some visible changes to the JSON report format.

I'm hoping that these changes are the last major ones, and that future format changes will be infrequent and involve only adding new fields.

I'm also hoping to put together a JSON Schema definition of the report format for the next release. This should help both with documentation of the format and also help with identifying changes of that format.

This is a big PR that makes a number of significant changes to the Nosey Parker data model. - The minimum supported Rust version has been changed from 1.70 to 1.76. - The data model and datastore have been significantly overhauled: - The rules used during scanning are now explicitly recorded in the datastore. Each rule is additionally accompanied by a content-based identifier that uniquely identifies the rule based on its pattern. - Each match is now associated with the rule that produced it, rather than just the rule's name (which can change as rules are modified). - Each match is now assigned a unique content-based identifier. - Findings (i.e., groups of matches with the same capture groups, produced by the same rule) are now represented explicitly in the datastore. Each finding is assigned a unique content-based identifier. - Now, each time a rule matches, a single match object is produced. Each match in the datastore is now associated with an array of capture groups. Previously, a rule whose pattern had multiple capture groups would produce one match object for each group, with each one being associated with a single capture group. - Provenance metadata for blobs is recorded in a much simpler way than before. The new representation explicitly records file and git-based provenance, but also adds explicit support for _extensible_ provenance. This change will make it possible in the future to have Nosey Parker scan and usefully report blobs produced by custom input data enumerators (e.g., a Python script that lists files from the Common Crawl WARC files). - Scores are now associated with matches instead of findings. - Comments can now be associated with both matches and findings, instead of just findings. - The JSON and JSONL report formats have changed. These will stabilize in a future release ([#101](#101)). - The `matching_input` field for matches has been removed and replaced with a new `groups` field, which contains an array of base64-encoded bytestrings. - Each match now includes additional `rule_text_id`, `rule_structural_id`, and `structural_id` fields. - The `provenance` field of each match is now slightly different. - Schema migration of older Nosey Parker datastores is no longer performed. Previously, this would automatically and silently be done when opening a datastore from an older version. Explicit support for datastore migration may be added back in a future release.

munntjlx · 2024-02-20T13:57:55Z

Thanks for the heads up. We have been standardizing on 'specific' versions to help reduce the workload associated with schma changes. A json schema would go a long way for helping us to 'stabilize' final version formats. Thomas

…

________________________________________ From: Brad Larsen ***@***.***> Sent: Friday, 16 February 2024 18:43 To: praetorian-inc/noseyparker Cc: Munn, Thomas (LNG-RDU); Mention Subject: Re: [praetorian-inc/noseyparker] Stabilize the JSON format (Issue #101) *** External email: use caution *** @munntjlx<https://github.com/munntjlx> there are more changes to the JSON report coming in the next release (#122<#122>). Some fundamentals have changed in the Nosey Parker data model that require some visible changes to the JSON report format. I'm hoping that these changes are the last major ones, and that future format changes will be infrequent and involve only adding new fields. I'm also hoping to put together a JSON Schema definition of the report format for the next release. This should help both with documentation of the format and also help with identifying changes of that format. — Reply to this email directly, view it on GitHub<#101 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALVIUB5GML6JQKGXZ7WNCBLYT7VJ7AVCNFSM6AAAAABAQJLZRWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBZGQ4TCMZZGQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

bradlarsen · 2024-03-07T21:56:55Z

Describe the solution you'd like
Can we stabilize the json format or agree to only change it on x versions? This would make derivitave works or programs having a 'stable' base upon which to build.

Possible changes to the JSON format will become less frequent as Nosey Parker becomes more mature.

Though I'm not going to promise not to change the JSON format at this point, I will provide an updated JSON schema and an announcement in the release notes when this does happen. The JSON schemas between versions could be diffed to understand what has changed. (You can get the JSON schema from the v0.17.0 releases, or using the new noseyparker generate json-schema command.)

I don't have any changes that I want to make to the JSON format at the moment. I suspect that future modifications to it would be in the form of additional data rather than renaming fields or changing its organization.

munntjlx · 2024-03-08T05:32:12Z

Our Noseyparker decoder has been added to Defect Dojo Main Project. Probably need to get it beyond v0.16

munntjlx · 2024-03-08T23:12:45Z

Additions would be preferable!

…

________________________________________ From: Brad Larsen ***@***.***> Sent: Thursday, 7 March 2024 16:57 To: praetorian-inc/noseyparker Cc: Munn, Thomas (LNG-RDU); Mention Subject: Re: [praetorian-inc/noseyparker] Stabilize the JSON format (Issue #101) *** External email: use caution *** Describe the solution you'd like Can we stabilize the json format or agree to only change it on x versions? This would make derivitave works or programs having a 'stable' base upon which to build. Possible changes to the JSON format will become less frequent as Nosey Parker becomes more mature. Though I'm not going to promise not to change the JSON format at this point, I will provide an updated JSON schema and an announcement in the release notes when this does happen. The JSON schemas between versions could be diffed to understand what has changed. (You can get the JSON schema from the v0.17.0 releases<https://github.com/praetorian-inc/noseyparker/releases/v0.17.0>, or using the new noseyparker generate json-schema command.) I don't have any changes that I want to make to the JSON format at the moment. I suspect that future modifications to it would be in the form of additional data rather than renaming fields or changing its organization. — Reply to this email directly, view it on GitHub<#101 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALVIUBYHESE7Y6ZZA24G7ILYXDPD3AVCNFSM6AAAAABAQJLZRWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBUGU3TANJUGM>. You are receiving this because you were mentioned.Message ID: ***@***.***>

munntjlx added the enhancement New feature or request label Dec 11, 2023

bradlarsen added documentation Improvements or additions to documentation reporting Related to reporting of findings labels Dec 11, 2023

bradlarsen mentioned this issue Feb 16, 2024

Simplify and enhance the datastore #122

Merged

bradlarsen closed this as completed Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize the JSON format #101

Stabilize the JSON format #101

munntjlx commented Dec 11, 2023

bradlarsen commented Dec 11, 2023

bradlarsen commented Dec 11, 2023

munntjlx commented Dec 11, 2023 via email

bradlarsen commented Dec 11, 2023

bradlarsen commented Feb 16, 2024

munntjlx commented Feb 20, 2024 via email

bradlarsen commented Mar 7, 2024

munntjlx commented Mar 8, 2024

munntjlx commented Mar 8, 2024 via email