-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Try to fix test_model.sh #5361
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5361
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Cancelled JobAs of commit e5509b0 with merge base 034e098 (): NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
5fd7bc2
to
2d3c2e7
Compare
Trying to fix failures such as https://github.com/pytorch/executorch/actions/runs/10855583772/job/30128512970 For the time missing operator error is repro'ed on my Mac, I see cmake-out/ was not cleaned up and merged.yaml was missing linear.out because it was super old. Now without deep understanding of how CI job workers cache previous runs, I'm refactoring `build_cmake_executor_runner` in test_model.sh to make sure it always build clean.
2d3c2e7
to
e5509b0
Compare
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
} | ||
|
||
run_portable_executor_runner() { | ||
# Run test model | ||
if [[ "${BUILD_TOOL}" == "buck2" ]]; then | ||
buck2 run //examples/portable/executor_runner:executor_runner -- --model_path "./${MODEL_NAME}.pte" | ||
elif [[ "${BUILD_TOOL}" == "cmake" ]]; then | ||
if [[ ! -f ${CMAKE_OUTPUT_DIR}/executor_runner ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure that this would've broken anything, but it does look suspicious. nice find!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this does fix it, it feels like there's a higher-level issue that this is working around. For a given commit/PR, calling build_cmake_executor_runner
once or five times should create the same result, so caching should be safe. But if skipping the cache fixes things, then it implies that there's possibly an older version of CMAKE_OUTPUT_DIR sitting around. And if it's left over from a previous job run, then we have some pretty serious hermeticity issues.
But if it's not left over from a previous run, then are we calling run_portable_executor_runner multiple times in a single job? And why would one call produce a different executor_runner binary than another call? The code in the repo hasn't changed, and it's always built with the same cmake flags.
Unless calls to this are interleaved with some other "cmake" call that itself overwrites CMAKE_OUTPUT_DIR with a different build configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it fixes the broken builds, ship it!
&& cd ${CMAKE_OUTPUT_DIR} \ | ||
&& retry cmake -DCMAKE_BUILD_TYPE=Release \ | ||
rm -rf ${CMAKE_OUTPUT_DIR} | ||
cmake -DCMAKE_BUILD_TYPE=Debug \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use retry cmake ...
to keep the logic from before? Or are you removing it intentionally?
&& cd ${CMAKE_OUTPUT_DIR} \ | ||
&& retry cmake -DCMAKE_BUILD_TYPE=Release \ | ||
rm -rf ${CMAKE_OUTPUT_DIR} | ||
cmake -DCMAKE_BUILD_TYPE=Debug \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention the move from Release to Debug in the PR summary. Consider adding a comment here explaining why we use this build mode here.
But besides that and removing retry
, I don't see a behavior change here: it still removes the directory and generates the cmake system.
} | ||
|
||
run_portable_executor_runner() { | ||
# Run test model | ||
if [[ "${BUILD_TOOL}" == "buck2" ]]; then | ||
buck2 run //examples/portable/executor_runner:executor_runner -- --model_path "./${MODEL_NAME}.pte" | ||
elif [[ "${BUILD_TOOL}" == "cmake" ]]; then | ||
if [[ ! -f ${CMAKE_OUTPUT_DIR}/executor_runner ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this does fix it, it feels like there's a higher-level issue that this is working around. For a given commit/PR, calling build_cmake_executor_runner
once or five times should create the same result, so caching should be safe. But if skipping the cache fixes things, then it implies that there's possibly an older version of CMAKE_OUTPUT_DIR sitting around. And if it's left over from a previous job run, then we have some pretty serious hermeticity issues.
But if it's not left over from a previous run, then are we calling run_portable_executor_runner multiple times in a single job? And why would one call produce a different executor_runner binary than another call? The code in the repo hasn't changed, and it's always built with the same cmake flags.
Unless calls to this are interleaved with some other "cmake" call that itself overwrites CMAKE_OUTPUT_DIR with a different build configuration.
@larryliu0820 merged this pull request in bfce743. |
Trying to fix failures such as https://github.com/pytorch/executorch/actions/runs/10855583772/job/30128512970
For the time missing operator error is repro'ed on my Mac, I see cmake-out/ was not cleaned up and merged.yaml was missing linear.out because it was super old.
Now without deep understanding of how CI job workers cache previous runs, I'm refactoring
build_cmake_executor_runner
in test_model.sh to make sure it always build clean.