Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use __tdata_align to align thread local storage #504

Merged
merged 2 commits into from
Sep 14, 2022

Conversation

ian-h-chamberlain
Copy link
Contributor

Fixes #497

Accompanying PR: devkitPro/devkitarm-crtls#6

Cases I was able to test:
Create static __thread variables of the following types and verify their values are initialized as expected, both in the main thread and one created with threadCreate:

  • Align of 4 then align of 16
  • Align of 4 then u8
  • Align 16 then align of 4
  • Align of 16 then u8
  • u8 then align of 4
  • u8 then align of 16
  • u8 then u16
  • u16 then u8

@@ -168,7 +164,7 @@ void initThreadVars(struct Thread_tag *thread)
tv->thread_ptr = thread;
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Warray-bounds"
tv->tls_tp = (thread != NULL ? (u8*)thread->stacktop : __tls_start) - 8; // Arm ELF TLS ABI mandates an 8-byte header
tv->tls_tp = (thread != NULL ? (u8*)thread->stacktop : __tls_start);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a change in behavior. 8 additional bytes are allocated for each thread to be used as the header, which seemed necessary for the alignment to be correct as I was experimenting, but there might be a way to avoid it.

Copy link
Member

@fincs fincs Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double pointer sized TLS header is only used by more general, shared-library-oriented TLS models that we don't use (we effectively use local-exec), thus the reason why libctru (and libnx) fake the existence of the header by simply subtracting its size from the real start of the TLS area. I don't believe it necessary to introduce this extra header at all, and I am not sure why you have had problems with alignment without it, I can only think of the fact that the stack on 32-bit machines typically needs 8-byte alignment, so that should be the baseline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took a bit of trial and error to get right, since the location of the header also needs to be aligned properly, but I think I figured it out, and pushed my changes.

This appears to work with all the test cases I listed above in the description as well.

Since only ARM_TLS_LE32 is used in practice with this library, the
8-byte TLS header goes unused, so we can just fake it by subtracting 8
from the dato offset and using that as tls_tp instead.
@fincs fincs merged commit b20ac22 into devkitPro:master Sep 14, 2022
@ian-h-chamberlain ian-h-chamberlain deleted the feature/overalign-tdata branch December 28, 2022 20:19
@gyrovorbis
Copy link

gyrovorbis commented Aug 21, 2023

Hey, guys, so this is going to sound kind of random, buuuut...

I'm a developer on the Sega Dreamcast indie SDK, KallistiOS, and I recently finished up my own quest to implement static TLS on the thing, and followed in the footsteps you guys laid out here and some of the alignment issues you ran into. Wound up making a really big multithreaded test suite just doing terrible things with .tdata and .tbss manually, automatically, and over-aligned variables to get things working.

First of all, thanks. Secondly, I just wanted to make sure I'm not missing something and you guys don't have any issues, but don't you also have to align the .tbss data similarly? Check what I had to do here:
KallistiOS/KallistiOS#111

Anyway, kudos from the DC scene. Looking forward to working with devkitPro some day with my own engine. :)

@ian-h-chamberlain
Copy link
Contributor Author

First of all, thanks. Secondly, I just wanted to make sure I'm not missing something and you guys don't have any issues, but don't you also have to align the .tbss data similarly? Check what I had to do here:

@gyrovorbis wow, I'm amazed that this somehow helped someone on another project! Glad you were able to find it and found it useful for your purposes as well.

As for the alignment of .tbss, we did find two issues which led to devkitPro/devkitarm-crtls#7:

  1. Our linker script was forcing alignment incorrectly for both .tdata and .tbss sections – ALIGN(4) : vs : ALIGN(4) 🤦
  2. Even after fixing that, we weren't accounting for the proper alignment of the .tbss section before marking the __tls_end, which resulted in the wrong size for the main thread's local storage section. Part of the PR above was also adding some extra padding using ALIGNOF(.tbss), which as far as I could tell was sufficient to resolve the problem.

It looks like your code is doing some of the work in its thd_create_tls_data function instead of the linker script, but I think the end result is about the same? You might have a better understanding, having encountered this problem more recently and being more familiar with the KallistiOS code — if you think we're still missing something it might be good to revisit on our side too. Thanks for reaching out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS memory errors with aligned thread-local variables
3 participants