Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FEAT_LRCPC support to the aarch64 decoder #5744

Closed
derekbruening opened this issue Nov 18, 2022 · 6 comments
Closed

Add FEAT_LRCPC support to the aarch64 decoder #5744

derekbruening opened this issue Nov 18, 2022 · 6 comments

Comments

@derekbruening
Copy link
Contributor

Our armv8.2 processors have FEAT_LRCPC (lrcpc in Linux cpuinfo) and our binaries are executing opcodes like LDAPR yet DR's decoder does not decode it, causing drmemtrace to not trace it, and possibly leading to crashes when it is inside rseq regions (that is still being investigated; we are not 100% sure that is the cause; will be filed separately):

$ disasm_a64 f8bfc0c5
bfd:       f8bfc0c5 ldapr x5, [x6]
DynamoRIO: f8bfc0c5 xx $0xf8bfc0c5 %x5 %x6 %x16 %sp -> %x5 %x6 %x16 %sp

Xref #2626, #4847, #4848

@AssadHashmi AssadHashmi self-assigned this Nov 18, 2022
AssadHashmi added a commit that referenced this issue Nov 22, 2022
This patch adds the appropriate macros, tests and codec entries
to encode the following:
LDAPR <Wt>, [<Xn|SP> {,#0}]
LDAPR <Xt>, [<Xn|SP> {,#0}]
LDAPRB <Wt>, [<Xn|SP> {,#0}]
LDAPRH <Wt>, [<Xn|SP> {,#0}]

It also adds FEATURE_LRCPC which introduces these 3 instructions
supporting the weaker release consistency processor consistent (RCpc)
model that enables the reordering of a store-Release followed by a
load-acquire to a different address.

Issues: #5744, #4847
AssadHashmi added a commit that referenced this issue Nov 23, 2022
This patch adds the appropriate macros, tests and codec entries
to encode the following:
LDAPR <Wt>, [<Xn|SP> {,#0}]
LDAPR <Xt>, [<Xn|SP> {,#0}]
LDAPRB <Wt>, [<Xn|SP> {,#0}]
LDAPRH <Wt>, [<Xn|SP> {,#0}]

It also adds FEATURE_LRCPC which introduces these 3 instructions
supporting the weaker release consistency processor consistent (RCpc)
model that enables the reordering of a store-release followed by a
load-acquire to a different address.

Issues: #5744, #4847
@derekbruening
Copy link
Contributor Author

We were observing regular crashes in rseq reqions containing LDAPR in large proprietary apps and those crashes are now gone with PR #5756 in place. However, we never figured out exactly why it was crashing and why the OP_xx operands didn't prevent the crash. The OP_xx fails to indicate a load, so drmemtrace failed to record a load address, but for rseq it seems like that should not matter (a missing store would be a problem).

The crash seems to be a corruption of an input register to the rseq region: x1 ended up incorrectly with x0's application value, so it points to some register spill/restore problem. But, the OP_xx includes a superset of the actual source registers so preserving input registers to rseq should still be fine. I tried creating a reproducer by adding LDAPR to the linux.rseq test but I failed to reproduce any problem. I had something like this in an rseq region:

interp: start_pc = 0x0000ffff8938af80
  0x0000ffff8938af80  910023e6   add    %sp $0x0008 lsl $0x00 -> %x6
  0x0000ffff8938af84  f8bfc0c5   xx     $0xf8bfc0c5 %x5 %x6 %x16 %sp -> %x5 %x6 %x16 %sp
        reads flag before writing it!
  0x0000ffff8938af88  f80343e0   stur   %x0 -> +0x34(%sp)[8byte]
Stopping bb at rseq boundary 0x0000ffff8938af8f
end_pc = 0x0000ffff8938af8c

That's the ldapr x5, [x6] from above. I was not able to reconstruct all the same register patterns as in the actual crash so maybe there is some other factor missing from my attempt.

Maybe the registers as destinations in the OP_xx caused the problem? But with the sources they are not dead prior to that point and I could not see any issue in manual examination of the DR and drreg instrumentation.

Unfortunately we may have to leave this as a mystery. If we could figure out precisely what happened maybe we could improve the OP_xx scheme or improve the rseq mangling or drreg spill robustness.

@derekbruening
Copy link
Contributor Author

I think this can be closed as all the FEAT_LRCPC opcodes are now supported?

@derekbruening
Copy link
Contributor Author

We didn't realize this before, but adding FEAT_LRCPC support ends up breaking old traces! Traces gathered w/o LDAPR as a load do not have a record for a load address. So when such a trace is analyzed with a new reader with LDAPR support as part of say a simulator it will recognize it as a load and expect a load record but not find one. This caused us some headaches.

How to solve this? For a planned compatibility break we'd bump the trace version and add support for the old version to the reader. What would that look like here: pretending it's not really a load if we detect the prior version? Should we be doing that in general whenever adding new load/store opcode support?? It's not clear.

@AssadHashmi
Copy link
Contributor

How to solve this? For a planned compatibility break we'd bump the trace version and add support for the old version to the reader. What would that look like here: pretending it's not really a load if we detect the prior version? Should we be doing that in general whenever adding new load/store opcode support?? It's not clear.

Would this apply just to memory instructions?

@AssadHashmi
Copy link
Contributor

I think this can be closed as all the FEAT_LRCPC opcodes are now supported?

Yes, but there is a later feature FEAT_LRCPC2, which introduces:
LDAPURH, LDAPURSH, LDAPUR, LDAPURSW, LDAPURSB, LDAPURB, STLUR, STLURH, STLURB

I prefer to close this and create another issue for that if we have not added support by the time it is required.

@derekbruening
Copy link
Contributor Author

How to solve this? For a planned compatibility break we'd bump the trace version and add support for the old version to the reader. What would that look like here: pretending it's not really a load if we detect the prior version? Should we be doing that in general whenever adding new load/store opcode support?? It's not clear.

Would this apply just to memory instructions?

Yes, it seems like it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants