Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable LLD_REPORT_UNDEFINED by default #16003

Merged
merged 1 commit into from
Dec 8, 2022
Merged

Conversation

sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Jan 13, 2022

This makes undefined symbol errors more precise by including the name of the
object that references the undefined symbol.

Its also paves the way (in my mind anyway) for finally fixing reverse dependencies
in a salable way. See #15982. That PR uses an alternative script for the pre-processing
of dependencies but also fundamentally relies on processing JS libraries both before
and after linking.

The cost is about 300ms per link operation due to double processing of
the JS libraries, but results are cached so in practice this only
happens the first time a given link command is run (see #18326).

@sbc100 sbc100 force-pushed the lld_report_undefined_more branch 9 times, most recently from 4f59e74 to ab49e53 Compare December 4, 2022 21:20
@sbc100 sbc100 requested a review from kripken December 4, 2022 23:06
@sbc100
Copy link
Collaborator Author

sbc100 commented Dec 4, 2022

This change has been behind a flag for a long time now and I think its finally ready to be the default.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the change you mentioned adds 300ms to startup? (that we might want to profile)

ChangeLog.md Outdated Show resolved Hide resolved
sbc100 added a commit that referenced this pull request Dec 5, 2022
Avoid generating the full JS output, only generate the metadata.  This
saves about 10% of the cost.

See: #16003
sbc100 added a commit that referenced this pull request Dec 5, 2022
Avoid generating the full JS output, only generate the metadata.  This
saves about 10% of the cost.

See: #16003
sbc100 added a commit that referenced this pull request Dec 5, 2022
Avoid generating the full JS output, only generate the metadata.  This
saves about 10% of the cost.

See: #16003
@sbc100
Copy link
Collaborator Author

sbc100 commented Dec 5, 2022

#18320 reduces the overhead to 275ms on my machine.

@sbc100 sbc100 force-pushed the lld_report_undefined_more branch from ab49e53 to 922deca Compare December 5, 2022 23:55
sbc100 added a commit that referenced this pull request Dec 6, 2022
…8320)

Avoid generating the full JS output, only generate the metadata.  This
saves about 10% of the cost.

See: #16003
sbc100 added a commit that referenced this pull request Dec 6, 2022
This means that the JS libraries only only need to be processed when
there is cache miss.  The cost of processing the JS libraries is about
300ms on my machine which is about 30% of the link time for hello
world.  When there is cache hit this cost is reduced to 3ms.

This change is in prepartion for switching this mode on my default in.

See: #16003
sbc100 added a commit that referenced this pull request Dec 6, 2022
This means that the JS libraries only only need to be processed when
there is cache miss.  The cost of processing the JS libraries is about
300ms on my machine which is about 30% of the link time for hello
world.  When there is cache hit this cost is reduced to 3ms.

This change is in prepartion for switching this mode on my default in.

See: #16003
sbc100 added a commit that referenced this pull request Dec 6, 2022
This means that the JS libraries only only need to be processed when
there is cache miss.  The cost of processing the JS libraries is about
300ms on my machine which is about 30% of the link time for hello
world.  When there is cache hit this cost is reduced to 3ms.

This change is in prepartion for switching this mode on my default in.

See: #16003
@sbc100 sbc100 force-pushed the lld_report_undefined_more branch from 922deca to 30c3a36 Compare December 6, 2022 19:38
sbc100 added a commit that referenced this pull request Dec 6, 2022
This means that the JS libraries only only need to be processed when
there is cache miss.  The cost of processing the JS libraries is about
300ms on my machine which is about 30% of the link time for hello
world.  When there is cache hit this cost is reduced to 3ms.

This change is in prepartion for switching this mode on my default in.

See: #16003
@sbc100 sbc100 force-pushed the lld_report_undefined_more branch from 30c3a36 to 531a1f6 Compare December 6, 2022 19:41
sbc100 added a commit that referenced this pull request Dec 6, 2022
This means that the JS libraries only only need to be processed when
there is cache miss.  The cost of processing the JS libraries is about
300ms on my machine which is about 30% of the link time for hello
world.  When there is cache hit this cost is reduced to 3ms.

This change is in prepartion for switching this mode on my default in.

See: #16003
sbc100 added a commit that referenced this pull request Dec 7, 2022
This means that the JS libraries only only need to be processed when
there is cache miss.  The cost of processing the JS libraries is about
300ms on my machine which is about 30% of the link time for hello
world.  When there is cache hit this cost is reduced to 3ms.

This change is in prepartion for switching this mode on my default in.

See: #16003
sbc100 added a commit that referenced this pull request Dec 7, 2022
This means that the JS libraries only only need to be processed when
there is cache miss.  The cost of processing the JS libraries is about
300ms on my machine which is about 30% of the link time for hello
world.  When there is cache hit this cost is reduced to 3ms.

This change is in prepartion for switching this mode on my default in.

See: #16003
@sbc100 sbc100 force-pushed the lld_report_undefined_more branch from 531a1f6 to a110a42 Compare December 7, 2022 20:04
@kripken
Copy link
Member

kripken commented Dec 8, 2022

Speed sounds good!

sbc100 added a commit that referenced this pull request Dec 8, 2022
- Use `elif` here so its clear that we either include one block or the
  next, but never both.
- Fix signature of setjmp.
- Include the `_emscripten_throw_longjmp` stub regardless of
  `INCLUDE_FULL_LIBRARY` settings.

This final fix is a prerequisite for #16003.
This makes undefined symbol errors more precise by including the name of
the object that references the undefined symbol.

Its also paves the way (in my mind anyway) for finally fixing reverse
dependencies in a salable way.  See #15982.  That PR uses an alternative
script for the pre-processing of dependencies but also fundamentally
relies on processing JS libraries both before and after linking.

The cost is about 300ms per link operation due to double processing of
the JS libraries, but results are cached so in practice this only
happens the first time a given link command is run (see #18326).
@sbc100 sbc100 force-pushed the lld_report_undefined_more branch from 5a1fae6 to 39214db Compare December 8, 2022 18:02
@sbc100 sbc100 requested a review from dschuff December 8, 2022 18:03
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks, lgtm.

@sbc100 sbc100 enabled auto-merge (squash) December 8, 2022 19:14
@sbc100 sbc100 merged commit f11d619 into main Dec 8, 2022
@sbc100 sbc100 deleted the lld_report_undefined_more branch December 8, 2022 19:15
sbc100 added a commit that referenced this pull request Dec 8, 2022
This setting became the default in #16003 and there is no reason to
keep it around anymore.
sbc100 added a commit that referenced this pull request Feb 24, 2023
This setting became the default in #16003 and there is no reason to
keep it around anymore.
sbc100 added a commit that referenced this pull request Feb 24, 2023
This setting became the default in #16003.  There is no reason to keep
it around anymore and we have has no issues reported in the last 3
months (4 releases).
sbc100 added a commit that referenced this pull request Feb 24, 2023
This setting became the default in #16003.  There is no reason to keep
it around anymore and we have has no issues reported in the last 3
months (4 releases).
sbc100 added a commit that referenced this pull request Feb 24, 2023
This setting became the default in #16003.  There is no reason to keep
it around anymore and we have has no issues reported in the last 3
months (4 releases).
@curiousdannii
Copy link
Contributor

curiousdannii commented Mar 1, 2023

Sorry to raise an error just after you concluded there were no issues 😛 But you know, sometimes it takes a while to try updating dependencies.

I tried updating to 3.1.28+ and I now get errors for undefined symbols when an EM_JS function is defined in a library.

wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_lower_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_lower_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_upper_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_upper_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_title_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_title_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_canon_decompose_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_canon_decompose_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_canon_normalize_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_canon_normalize_uni

The EM_JS functions are defined in a library. If the functions are used in a executable then it makes these errors.

I'm not sure why, but another EM_ASYNC_JS function doesn't cause any errors. Maybe because it's only used by the same file?

Building with -sLLD_REPORT_UNDEFINED=0 fixes the issue.

@sbc100
Copy link
Collaborator Author

sbc100 commented Mar 1, 2023

Sorry to raise an error just after you concluded there were no issues stuck_out_tongue But you know, sometimes it takes a while to try updating dependencies.

I tried updating to 3.1.28+ and I now get errors for undefined symbols when an EM_JS function is defined in a library.

wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_lower_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_lower_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_upper_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_upper_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_title_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_to_title_case_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_canon_decompose_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_canon_decompose_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_canon_normalize_uni
wasm-ld: error: libremglk.a(gi_dispa.c.o): undefined symbol: glk_buffer_canon_normalize_uni

I think the reason for this is that the EM_JS macro basically expands to a declaration of the EM_JS function as external with an __attribute__((import_name)) attached to it.

However, when you use it outside in other compilation units they don't see that declaration. Actually that sounds odd now that I say its since those attributes should probably attached to the final symbol.

In any case, I think the soltution that others have uses it to only call the EM_JS function within the file where they are defined, and then define small (difined) wrappers for them for use outside the compilation unit.

e.g. declare glk_buffer_to_title_case_uni_js as the EM_JS function then define a local glk_buffer_to_title_case_uni which then just calls it.

(I will investigate further to see if we can actually this fix with you having to change any code).

@curiousdannii
Copy link
Contributor

Sure, that makes sense. If you can find a work around that would be great, otherwise that is a simple solution.

If you can't find a workaround, then the docs should also get an update saying that it's only safe to call a EM_JS etc function from the same file.

sbc100 added a commit that referenced this pull request Mar 9, 2023
When no local usage of an EM_JS function is present the compiler was
completely removing the import of the function (and its import_name).

With this change we force at least one reference to the function by
taking its address in an otherwise unused (and GC-able) pointer.

This use case was broken by the switch to LLD_REPORT_UNDEFINED in #16003.

Fixes: #18927
sbc100 added a commit that referenced this pull request Mar 9, 2023
When no local usage of an EM_JS function is present the compiler was
completely removing the import of the function (and its import_name).

With this change we force at least one reference to the function by
taking its address in an otherwise unused (and GC-able) pointer.

This use case was broken by the switch to LLD_REPORT_UNDEFINED in #16003.

Fixes: #18927
sbc100 added a commit that referenced this pull request Mar 9, 2023
When no local usage of an EM_JS function is present the compiler was
completely removing the import of the function (and its import_name).

With this change we force at least one reference to the function by
taking its address in an otherwise unused (and GC-able) pointer.

This use case was broken by the switch to LLD_REPORT_UNDEFINED in #16003.

Fixes: #18927
impact-maker pushed a commit to impact-maker/emscripten that referenced this pull request Mar 17, 2023
This setting became the default in emscripten-core#16003.  There is no reason to keep
it around anymore and we have has no issues reported in the last 3
months (4 releases).
impact-maker pushed a commit to impact-maker/emscripten that referenced this pull request Mar 17, 2023
When no local usage of an EM_JS function is present the compiler was
completely removing the import of the function (and its import_name).

With this change we force at least one reference to the function by
taking its address in an otherwise unused (and GC-able) pointer.

This use case was broken by the switch to LLD_REPORT_UNDEFINED in emscripten-core#16003.

Fixes: emscripten-core#18927
impact-maker pushed a commit to impact-maker/emscripten that referenced this pull request Mar 17, 2023
This setting became the default in emscripten-core#16003.  There is no reason to keep
it around anymore and we have has no issues reported in the last 3
months (4 releases).
impact-maker pushed a commit to impact-maker/emscripten that referenced this pull request Mar 17, 2023
When no local usage of an EM_JS function is present the compiler was
completely removing the import of the function (and its import_name).

With this change we force at least one reference to the function by
taking its address in an otherwise unused (and GC-able) pointer.

This use case was broken by the switch to LLD_REPORT_UNDEFINED in emscripten-core#16003.

Fixes: emscripten-core#18927
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants