-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add address property to physicalLocation object #302
Comments
I agree that it is neither fish nor fowl, so it belongs on the |
location object could contain this, potentially, or perhaps some other property. stack frames do have this notion. some tools provide a relative virtual address against some other unknown base. to be discussed at the f2f. |
I asked some of our local binary analysis experts about this topic. Below are two responses: From Tom Johnson:
From Eric Schulte:
|
Sorry, closed in error! |
Thanks for Tom and Eric for that detailed analysis. The only thing I will add is that SARIF already has a construct for the "file offset/address" concept: it is I agree with you that "image or section offset" makes sense to express an address that might be relocated. We might imagine an
Tom, could you please explain the scenario where we'd need to specify multiple bases? If we need to support that, we could simply give the
|
I'm not sure there's a need for multiple bases for a single location. My comment was that you might want to recognize multiple kinds of bases. And it sounds like you're already considering that with file-offset and section-offset. So for a structured executable file, you have a clear notion of content organized by sections. And those sections could be relocated in memory at different locations at load time. So, a section-based offset makes sense here. Another possible subject might be a ROM image for firmware that just gets directly injected into memory somewhere. There may be no internal structure to the image - it's just a block of bytes. In this case, the file-offset notion is appropriate. We had an in-house discussion that in ELF binaries, there is also a notion of "segment". Typically a segment is a combination of sections that all get loaded contiguously in memory w/ the same access permissions. Sometimes, though, segments include additional parts of the binary file (for example, headers). Someone may find it useful to talk in terms of segment+offset locations rather than section+offset locations. |
Got it. How about this for an
Each pair is optional, both pairs can be present. |
I'm not sure we want to create a bucket object that contains mutually exclusive properties that define the address. It would be preferable to provide generic object that could be repurposable for various scenarios. Conceptually, as Tom notes, these addresses are chained, e.g., address of PE + sections header.virtual address (gives us start of a section) + offset. if you think about the problem this way, it is similar to logical locations (except that we're using actual binary internals to build the 'path to address'. the size of these pieces is relevant to analysis. The BinSkim code may be helpful illustrating how that particular tool manipulates low level binary address information. Poking around, I notice a small other wrinkle, which relates to padding that may be required to produce the address. This BinSkim ImageFieldData construct actually looks close to what might work. I'm thinking something along the lines of: { Again, very similar to the logical location and nested files mechanisms already in the format. And so, suffers/benefits from the same advantages and disadvantages. Note that the non-deterministic base address is similar in concept to the value associated with uriBaseId. Section header addresses + offsets are similar to the deterministic relative of paths under source control. Just trying to make SARIF connections to help connect this concept to existing patterns. Here's the basic view of what we're up to: (Virtual address of something) - (base address) = relative virtual address The RVA + some unknown base address (such as the base address of the loaded module). |
I don't think it needs to be that complicated. Unlike nested files and logical locations, these address objects have a limited nesting. I think the notion of a "parent" is overkill. The design I proposed is specific to ELFs; you might consider that a drawback. OTOH, it supports ELF's well, and takes into account that in ELFs, sections have names but segments do not (hence the string-valued
... but it does run into the usual problem where our code gen can't deal with a property that might have more than one type. Tom mentioned (see his bullet point on "effective address") that the absolute address might be hard to define in a clear way; I took that as a clue not to include it in the format. OTOH, for tools that know how to compute it, like BinSkim, we could include |
The limited nesting is a good point, one that occurred to me as well after posting the above. For me, I think the next best step is to look in detail at some analysis results of this kind. The SARIF, as always, should be a support for developers to prove to themselves that there is, in fact, a problem to act on. We should ask ourselves what that process would look like. Consider a simple example, 'your imports section is executable'. Here are the addresses of interest:
The interesting question is how a viewer wants to take the user through this information to conclude, yes, the section that contains my import section is marked executable. I could imagine getting a hex view of the binary with some outlines/chrome demarcating various things. You could imagine a structural view of that data within the binary. |
A proposed design for an address descriptor: baseAddress we should prepare a change draft and review in advance of a TC discussion |
In addition to this core design, SARIF could populate a load map that would be persisted in a similar way as the files and logicalLocations table. address object could refer to these entries, which would also allow for parenting/nesting as other constructs do. |
EBALLOT PROPOSAL Provide an API IMPACT |
Feedback from MS [UPDATED]:
By associated an address with a PLC, we will naturally have a mechanism for specifying regions, etc. The standard artifact parenting mechanism can be leveraged. An artifact has a timestamp, settling that concern. |
… (#1323) * solution builds post change + transformer logic (UTCs fail) * fixing utcs and release md * manually merging some files cleanly * modifying transformer logic due to change in node name! * rc++
Should add a snippet object for the contents |
TC33 conclusions: address moves to physicalLocation |
E-BALLOT #3 PROPOSALProvide an SCHEMA CHANGES
NOTESAn earlier proposal included adding |
The types of baseAddress and offset should be integer as this is their natural type. Other than for display purposes, they need to integers to compute absolute addresses or differences of addresses. Making this an integer means that JSON libraries will produce the correct type automatically. The viewer should determine the output format and displaying the value as hexadecimal is generally no more difficult than outputing a string or decimal representation of a number. |
Approved in e-ballot-3. |
I confirm that this change is correctly in the schema. |
Binary analysis tools may report issues in terms of an effective address. These are not physical locations (as we have been thinking of them) and none of the options that the regions offer suffice to express such addresses adequately. Addresses are not offsets in any meaningful sense. One could shoe-horn them into location.fullyQualifiedLogicalName using kind="address", but that feels unsatisfactory.
The only place in the spec where an address is allowed is in a stackFrame object, but note that stackFrame.location exists too.
Consequently, I propose that we move the address object into the location. It could either go in as a new property location.address, or it could be pushed down into either the physical or logical location properties. Conceptually, it is neither, but I concede that an address is more like a physical thing than a logical thing, so it could go into a new optional property of physicalLocation named "address".
The text was updated successfully, but these errors were encountered: