Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

print tracebacks and line numbers for Lua runtime errors #6564

Merged
merged 8 commits into from
Mar 23, 2023

Conversation

mattwigway
Copy link
Contributor

@mattwigway mattwigway commented Mar 3, 2023

Issue

This fixes #6563, and provides tracebacks and line number for Lua runtime errors. It does this by replacing all sol::functions with sol::protected_function and checking for errors after a Lua function is called. If one is detected, a traceback is printed to stderr, and an exception is thrown which causes OSRM to exit (as it did before when runtime errors were encountered).

Before:

$ osrm-extract -p ../profiles/bicycle_lts.lua bicycle_lts/bicycle_lts.osm.pbf
[lots of intermediate output]
libc++abi: terminating with uncaught exception of type sol::error: lua: error: attempt to compare two nil values

After:

[lots of intermediate output]
attempt to compare two nil values
stack traceback:
	[C]: in function 'math.max'
	...bc/git/vmt-mode-shift-study/src/routing/profiles/lts.lua:128: in function 'lts.lts_for_way'
	../profiles/bicycle_lts.lua:515: in function 'process_way'
attempt to compare two nil values
stack traceback:
	[C]: in function 'math.max'
	...bc/git/vmt-mode-shift-study/src/routing/profiles/lts.lua:128: in function 'lts.lts_for_way'
	../profiles/bicycle_lts.lua:515: in function 'process_way'
attempt to compare two nil values
stack traceback:
	[C]: in function 'math.max'
	...bc/git/vmt-mode-shift-study/src/routing/profiles/lts.lua:128: in function 'lts.lts_for_way'
	../profiles/bicycle_lts.lua:515: in function 'process_way'
attempt to compare two nil values
stack traceback:
	[C]: in function 'math.max'
	...bc/git/vmt-mode-shift-study/src/routing/profiles/lts.lua:128: in function 'lts.lts_for_way'
	../profiles/bicycle_lts.lua:515: in function 'process_way'
libc++abi: terminating with uncaught exception of type osrm::util::exception: Lua error (see stderr for traceback)

I believe the error message is printed multiple times because the error is getting hit in multiple threads before the exception kills the entire process. My understanding is that a single exception should kill the entire process, not just the thread, but I'm new to C++ so if that's incorrect changes will be required to ensure runtime errors actually terminate extraction.

Performance impact is negligible/nonexistent. With my small test PBF file, it took 3.511s to build and partition with this patch and 3.563 without.

I ran the format.sh script as requested and it made some format changes elsewhere in the files. If this is an issue I can go back and manually revert those changes. It doesn't seem to pollute the diff too much. went ahead and reverted these for ease of review.

Tasklist

I'm new to C++ and contributing to OSRM, so there's a few things I'm not completely sure of and would love feedback on.

  • Make sure that errors in any Lua thread will bring the entire extraction process to a halt, not just that thread.
  • I don't know what source_function and GetStringListFromFunction do. I changed these to have Lua error handling but it I'm not sure how to test them.
  • Most places where profile functions are called there's a switch statement based on version; I define the result variable that I use to check for the error outside the statement. If the profile API version is increased, and the switch statements are not updated appropriately so that the lua function is always called, it's possible that the code would check for an error using an uninitialized result. I know C++ does weird things in this case - is it desirable to add an explicit check to each function to ensure the result has actually been initialized before checking if it is valid?
  • CHANGELOG.md entry (How to write a changelog entry)
  • update relevant Wiki pages
  • add tests (see testing documentation)

Requirements / Relations

N/A

@mattwigway mattwigway marked this pull request as draft March 3, 2023 14:20
mattwigway added a commit to mattwigway/osrm-backend that referenced this pull request Mar 3, 2023
@SiarheiFedartsou
Copy link
Member

Most places where profile functions are called there's a switch statement based on version; I define the result variable that I use to check for the error outside the statement. If the profile API version is increased, and the switch statements are not updated appropriately so that the lua function is always called, it's possible that the code would check for an error using an uninitialized result. I know C++ does weird things in this case - is it desirable to add an explicit check to each function to ensure the result has actually been initialized before checking if it is valid?

Not sure I fully get it, but we 100% control profile API versions, i.e. we won't have any bump of profile API version until we will do that on our own, so I think we can just go with this risk that in the future we may forget to update those switches, but I hope in this case some compiler warning or sanitizer check should catch it. Btw why we have those switches is just because we have to be backward compatible with all old versions of Lua profiles.

I don't know what source_function and GetStringListFromFunction do. I changed these to have Lua error handling but it I'm not sure how to test them.

As I can see both GetStringListFromFunction and source_function are just to support VERY old version 1 of Lua API, tbh it is hard to say without doing some archeology in old commits what exactly it is supposed to do, but can we may be workaround it somehow? E.g. I don't think we really need to support this feature with tracebacks for so old version of API - I think we can just agree that we support it for only the recent version(s) of Lua API.

Let me please know if you need more help here :)

mattwigway added a commit to mattwigway/osrm-backend that referenced this pull request Mar 8, 2023
mattwigway added a commit to mattwigway/osrm-backend that referenced this pull request Mar 8, 2023
mattwigway added a commit to mattwigway/osrm-backend that referenced this pull request Mar 8, 2023
@mattwigway mattwigway force-pushed the lua-runtime-errors branch from 08f6eef to 974b777 Compare March 8, 2023 17:46
@mattwigway mattwigway marked this pull request as ready for review March 8, 2023 19:55
@mattwigway
Copy link
Contributor Author

@SiarheiFedartsou I think this is ready to go! I added unit tests, backed out the changes for those functions we weren't sure about, and confirmed that an error on any lua thread shuts OSRM down entirely. I don't see any wiki pages that would be affected by this change, so haven't made any updates there.

@SiarheiFedartsou
Copy link
Member

Thanks @mattwigway !
Can you please rebase/merge this PR with master? It contains fixes for CI which was not working before...

@SiarheiFedartsou
Copy link
Member

@mattwigway
and you should reformat code here. It can be done with ./scripts/format.sh script, make sure to use clang-format-10 please(and let me know in case of any problems please)

mattwigway added a commit to mattwigway/osrm-backend that referenced this pull request Mar 9, 2023
mattwigway added a commit to mattwigway/osrm-backend that referenced this pull request Mar 9, 2023
mattwigway added a commit to mattwigway/osrm-backend that referenced this pull request Mar 9, 2023
@mattwigway mattwigway force-pushed the lua-runtime-errors branch from 974b777 to 6abc896 Compare March 9, 2023 17:28
mattwigway added a commit to mattwigway/osrm-backend that referenced this pull request Mar 9, 2023
@mattwigway
Copy link
Contributor Author

Ah, clang-format-10 is the piece I was missing before. Rebased and formatted.

@SiarheiFedartsou
Copy link
Member

SiarheiFedartsou commented Mar 9, 2023

@mattwigway you are probably missing #include <fstream> - that’s why CI is red. Please fix when you have a time… Or let me know and I will do it on my own in the nearest days.

@mattwigway
Copy link
Contributor Author

Oops, I meant to delete the function that uses fstream as it's from an earlier way I tried to specify the test. Will fix.

mattwigway added a commit to mattwigway/osrm-backend that referenced this pull request Mar 10, 2023
@mattwigway
Copy link
Contributor Author

@SiarheiFedartsou I think this should be fixed now.

@mattwigway
Copy link
Contributor Author

I'm not sure what the memory leak test is flagging. I'll investigate later this week.

@SiarheiFedartsou
Copy link
Member

I'm not sure what the memory leak test is flagging. I'll investigate later this week.

Yeah, would be nice, usually it does not have false positives, but taking into account use case(it happens only on error and we exit program anyway) I think we can just suppress it if we won’t find a reason or it will be too complicated to fix. Also I was going to look on my own, but I think I will be able to do that only on the next week in the best case.

@mattwigway
Copy link
Contributor Author

I think what's going on here is that when process_way crashes and the traceback is printed, the way that it's currently processing never gets freed. This would have happened before, and the only reason it's showing up now is that there is now a test of bad code fed to the extractor. Normally the exception shuts down the whole OSRM process so it doesn't matter as you say, but in the test we catch the exception to make sure the exception handling worked. So I think suppressing the warning is probably appropriate, as it won't be an issue in production.

@SiarheiFedartsou
Copy link
Member

@mattwigway then I think you can configure it somewhere here

AFAIR this file follows syntax described in https://clang.llvm.org/docs/SanitizerSpecialCaseList.html, so you can suppress this specific issue.

When the extractor encounters a lua runtime error, some osmium objects are not freed. In production this doesn't matter because these errors bring down OSRM. In the tests we catch them to ensure they occur, and the leaksanitizer flags them.
@mattwigway
Copy link
Contributor Author

@SiarheiFedartsou looks like that change fixed it. I rebased from master, LMK if anything else is needed.

@SiarheiFedartsou
Copy link
Member

@mattwigway many thanks for this contribution and I will be happy to review any other PRs from you 👍

@SiarheiFedartsou SiarheiFedartsou self-requested a review March 23, 2023 18:18
@SiarheiFedartsou SiarheiFedartsou merged commit d6afe91 into Project-OSRM:master Mar 23, 2023
@mattwigway mattwigway deleted the lua-runtime-errors branch March 23, 2023 18:32
whytro pushed a commit to whytro/osrm-backend that referenced this pull request Apr 27, 2023
…M#6564)

* print tracebacks and line numbers for Lua runtime errors

* revert format changes

* update changelog with lua traceback, Project-OSRM#6564

* revert using protected_function for old GetStringListFromFunction and source_function Project-OSRM#6564

* add unit test for line numbers in tracebacks, Project-OSRM#6564

* apply clang-format (Project-OSRM#6564)

* remove unused test helper function, Project-OSRM#6564

* suppress leaksanitizer warnings in extract-tests, Project-OSRM#6564

When the extractor encounters a lua runtime error, some osmium objects are not freed. In production this doesn't matter because these errors bring down OSRM. In the tests we catch them to ensure they occur, and the leaksanitizer flags them.
mattwigway added a commit to mattwigway/osrm-backend that referenced this pull request Jul 20, 2023
…M#6564)

* print tracebacks and line numbers for Lua runtime errors

* revert format changes

* update changelog with lua traceback, Project-OSRM#6564

* revert using protected_function for old GetStringListFromFunction and source_function Project-OSRM#6564

* add unit test for line numbers in tracebacks, Project-OSRM#6564

* apply clang-format (Project-OSRM#6564)

* remove unused test helper function, Project-OSRM#6564

* suppress leaksanitizer warnings in extract-tests, Project-OSRM#6564

When the extractor encounters a lua runtime error, some osmium objects are not freed. In production this doesn't matter because these errors bring down OSRM. In the tests we catch them to ensure they occur, and the leaksanitizer flags them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No line numbers in Lua runtime errors
2 participants