Is this project still maintained or abandoned #137

kmrsh · 2021-02-06T12:26:38Z

We're planning to use this library in our project and would like to know this is still maintained and whether can we get some clarification on the usage?.

Can we generate a json output out of the dom or any guide on how to do it
Can we parse the template once and reuse it (or serialize to disk) in order to parse the data files thereafter instead of re-parsing the template.

The text was updated successfully, but these errors were encountered:

d0c-s4vage · 2021-02-08T02:18:41Z

Hi! I wouldn't say that this project is unmaintained so much as not high on my priority list at the moment.

To answer your questions:

Can we generate a json output out of the dom or any guide on how to do it

Yes, you could! You would have to implement it yourself, but the dom structure is iterable - you could recurse into each substructure/etc.

Do you know what type of information you would want in the JSON document? I'm assuming you'd want pretty much all metadata available. I maybe be able to add this as part of working on #136

d0c-s4vage · 2021-02-08T02:19:28Z

Can we parse the template once and reuse it (or serialize to disk) in order to parse the data files thereafter instead of re-parsing the template.

Hmm, you should be able to pickle the AST that results from parsing the template so that you can load it directly again without the template.

d0c-s4vage · 2021-02-08T02:19:59Z

Expect some movement on this this week - I have a number of items queued up for pfp/py010parser.

kmrsh · 2021-02-08T02:51:48Z

Hi! I wouldn't say that this project is unmaintained so much as not high on my priority list at the moment.

To answer your questions:

Can we generate a json output out of the dom or any guide on how to do it

Yes, you could! You would have to implement it yourself, but the dom structure is iterable - you could recurse into each substructure/etc.

Do you know what type of information you would want in the JSON document? I'm assuming you'd want pretty much all metadata available. I maybe be able to add this as part of working on #136

Absolutely impressed with your reply because this is the best library I found so far that can easily use with the 010 templates, so bit worried to use it as there were no update since mid last year.

Yes simply want to output the same information that you print on the screen when we invoke dom._pfp__show, Also it would be ideal to provide interface to add custom converters, ex: to xml, yml, json, or any other format user wish to write a converter for.

Believe it or not this binary files heavily used in telco industry and some companies use 010 templates to define the schema, so it is ideal to use a converter like pfp and get json output so that it can be further processed in big data environment.

"Hmm, you should be able to pickle the AST that results from parsing the template so that you can load it directly again without the template."

Well I tried but it gave the error saying

TypeError: cannot pickle '_regex.Match' object

This is extremely useful in high velocity environment where thousands of files are consumed and if we try to parse the template every time it is unnecessary overhead. If we can parse the template once and persist it memory, it would be ideal.

I actually followed few steps by looking at the code you provided.

    interp = pfp.create_interp(template_file=template_path)

    for data_file in data_files:
        data = open(os.path.expanduser(data_file), "rb")
        data = BitwrappedStream(data)

        dom = interp.parse(
            stream=data
        )
        print(dom._pfp__show(include_offset=False))

This is fine for once execution but if the script runs again it will load the template and parse again. This is what I want to eliminate.

Another possible bug or intended behavior is
pfp.parse() function actually takes interp as a parameter, but it checks for template or template_file before checking interp and throws an error, Otherwise I could have simplify the above logic to below parse. template check before interp

    interp = pfp.create_interp(template_file=template_path)

    for data_file in data_files:
        dom = interp.parse(
            data_file=data_file,
            interp=interp
        )
        print(dom._pfp__show(include_offset=False))

kmrsh · 2021-02-08T03:00:46Z

ex: current print output as a different output format (ex: json)

ex: Current output


struct {
    fileheader = struct {
        MainVersion = UChar(1 [01])
        SubVersion = UChar(0 [00])
        TraceNo    = UInt(4294967295 [ffffffff])
        Reserved   = UChar[26] ('ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ')
    }
    EventRecord = EVENTRECORED[2]
        EventRecord[0] = struct {
                CHRHeader  = struct {
                    EventId    = UShort(4870 [1306])
                    EventLength = UInt(42 [0000002a])
                    eNodeBId   = UInt(21156 [000052a4])

Simplified json output

{
    fileheader: {
        MainVersion: 1,
        SubVersion: 0,
        TraceNo: 4294967295
        Reserved: ''
    },
    EventRecordCollection: [
        {
            CHRHeader: {
                EventId: 4870,
                EventLength: 42
                ...
            }
        }
    ]
}

extended json output incliding types and other properties as shown in current print output

{
    fileheader: {
        MainVersion: {type: 'UChar', value: 1, bin: '01'},
        SubVersion: {type: 'UChar', value: 0, bin: '00'},
}

Since you've already implemented _pfp__show in Fields this should be very much similar to that except it generates json output and return it (instead of printing it)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is this project still maintained or abandoned #137

Is this project still maintained or abandoned #137

kmrsh commented Feb 6, 2021

d0c-s4vage commented Feb 8, 2021

d0c-s4vage commented Feb 8, 2021

d0c-s4vage commented Feb 8, 2021

kmrsh commented Feb 8, 2021

kmrsh commented Feb 8, 2021 •

edited

Loading

Is this project still maintained or abandoned #137

Is this project still maintained or abandoned #137

Comments

kmrsh commented Feb 6, 2021

d0c-s4vage commented Feb 8, 2021

d0c-s4vage commented Feb 8, 2021

d0c-s4vage commented Feb 8, 2021

kmrsh commented Feb 8, 2021

kmrsh commented Feb 8, 2021 • edited Loading

kmrsh commented Feb 8, 2021 •

edited

Loading