Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this project still maintained or abandoned #137

Open
kmrsh opened this issue Feb 6, 2021 · 5 comments
Open

Is this project still maintained or abandoned #137

kmrsh opened this issue Feb 6, 2021 · 5 comments

Comments

@kmrsh
Copy link

kmrsh commented Feb 6, 2021

We're planning to use this library in our project and would like to know this is still maintained and whether can we get some clarification on the usage?.

  1. Can we generate a json output out of the dom or any guide on how to do it
  2. Can we parse the template once and reuse it (or serialize to disk) in order to parse the data files thereafter instead of re-parsing the template.
@d0c-s4vage
Copy link
Owner

Hi! I wouldn't say that this project is unmaintained so much as not high on my priority list at the moment.

To answer your questions:

Can we generate a json output out of the dom or any guide on how to do it

Yes, you could! You would have to implement it yourself, but the dom structure is iterable - you could recurse into each substructure/etc.

Do you know what type of information you would want in the JSON document? I'm assuming you'd want pretty much all metadata available. I maybe be able to add this as part of working on #136

@d0c-s4vage
Copy link
Owner

Can we parse the template once and reuse it (or serialize to disk) in order to parse the data files thereafter instead of re-parsing the template.

Hmm, you should be able to pickle the AST that results from parsing the template so that you can load it directly again without the template.

@d0c-s4vage
Copy link
Owner

Expect some movement on this this week - I have a number of items queued up for pfp/py010parser.

@kmrsh
Copy link
Author

kmrsh commented Feb 8, 2021

Hi! I wouldn't say that this project is unmaintained so much as not high on my priority list at the moment.

To answer your questions:

Can we generate a json output out of the dom or any guide on how to do it

Yes, you could! You would have to implement it yourself, but the dom structure is iterable - you could recurse into each substructure/etc.

Do you know what type of information you would want in the JSON document? I'm assuming you'd want pretty much all metadata available. I maybe be able to add this as part of working on #136

Absolutely impressed with your reply because this is the best library I found so far that can easily use with the 010 templates, so bit worried to use it as there were no update since mid last year.

Yes simply want to output the same information that you print on the screen when we invoke dom._pfp__show, Also it would be ideal to provide interface to add custom converters, ex: to xml, yml, json, or any other format user wish to write a converter for.

Believe it or not this binary files heavily used in telco industry and some companies use 010 templates to define the schema, so it is ideal to use a converter like pfp and get json output so that it can be further processed in big data environment.

"Hmm, you should be able to pickle the AST that results from parsing the template so that you can load it directly again without the template."

Well I tried but it gave the error saying

TypeError: cannot pickle '_regex.Match' object

This is extremely useful in high velocity environment where thousands of files are consumed and if we try to parse the template every time it is unnecessary overhead. If we can parse the template once and persist it memory, it would be ideal.

I actually followed few steps by looking at the code you provided.

    interp = pfp.create_interp(template_file=template_path)

    for data_file in data_files:
        data = open(os.path.expanduser(data_file), "rb")
        data = BitwrappedStream(data)

        dom = interp.parse(
            stream=data
        )
        print(dom._pfp__show(include_offset=False))

This is fine for once execution but if the script runs again it will load the template and parse again. This is what I want to eliminate.

Another possible bug or intended behavior is
pfp.parse() function actually takes interp as a parameter, but it checks for template or template_file before checking interp and throws an error, Otherwise I could have simplify the above logic to below parse. template check before interp

    interp = pfp.create_interp(template_file=template_path)

    for data_file in data_files:
        dom = interp.parse(
            data_file=data_file,
            interp=interp
        )
        print(dom._pfp__show(include_offset=False))

@kmrsh
Copy link
Author

kmrsh commented Feb 8, 2021

ex: current print output as a different output format (ex: json)

ex: Current output


struct {
    fileheader = struct {
        MainVersion = UChar(1 [01])
        SubVersion = UChar(0 [00])
        TraceNo    = UInt(4294967295 [ffffffff])
        Reserved   = UChar[26] ('ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ')
    }
    EventRecord = EVENTRECORED[2]
        EventRecord[0] = struct {
                CHRHeader  = struct {
                    EventId    = UShort(4870 [1306])
                    EventLength = UInt(42 [0000002a])
                    eNodeBId   = UInt(21156 [000052a4])

Simplified json output

{
    fileheader: {
        MainVersion: 1,
        SubVersion: 0,
        TraceNo: 4294967295
        Reserved: ''
    },
    EventRecordCollection: [
        {
            CHRHeader: {
                EventId: 4870,
                EventLength: 42
                ...
            }
        }
    ]
}                    

extended json output incliding types and other properties as shown in current print output

{
    fileheader: {
        MainVersion: {type: 'UChar', value: 1, bin: '01'},
        SubVersion: {type: 'UChar', value: 0, bin: '00'},
}

Since you've already implemented _pfp__show in Fields this should be very much similar to that except it generates json output and return it (instead of printing it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants