-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elixir 1.15 sometimes fails during mix release due to missing module #12777
Comments
Initially, we blamed the cache so we reset it and then split up the cache for Docker, but it did not help. I also hit compilation races once in a while locally (by running One thing to notice is that we use the module name |
Thank you. However, without an isolated mechanism to reproduce the error, there isn't much we can do.
It depends. If you are using the module at compile-time, then it can be a problem (for example, a race condition). Elixir v1.15 did not change anything related to that, explicitly, but a race condition may happen when we make parts of the system faster (making it more likely for the race to be encountered). In both cases though, we need an isolated and minimal way to reproduce it. :) |
I will report back if we find a way to reliably reproduce it, so far we can't even do it reliably on CI because the issue self-resolves after a restart.
No, it's just an atom there, so should not be the root cause. |
Workaround for this: elixir-lang/elixir#12777
Looks like this commit fixed the issue: firezone/firezone@1ffd08f. I think we might get a hint from it since it only does two things:
We also noticed that the race only happens with the Swoosh adapter and other code is not affected, I checked the Swoosh source, and the only suspicious thing is Will report more if I find the root cause. |
Make all tests pass I removed some of VPN/Wall settings (they are irrelevant once we move out gateway) along with port-based rules conditions (since we are moving to userspace wg). Make sure that container can be built and run in PR CI step Remove omnibus install scripts Bring ecto.* helpers back to life Fix priv/repo path Add skeleton of API app Add client, gateway, relay boilerplate code Drop REST API boilerplate for now Add primitive tests and more structure for API app Control channels for Clients, Relays and Gateways (#1551) Replace web app with a new one based on Tailwind and esbuild (#1568) Re-enable SQL sandboxing for Phoenix apps Bring back browser/config.xml Remove unused import Remove unused docker-compose file Add minimal scaffholding for relay Install necessary components for toolchain Avoid concurrent jobs Move everything to a workspace Move gitignore and lockfile to workspace root Move rust-toolchain to workspace root Add caching to CI Update .github/workflows/rust.yml Signed-off-by: Thomas Eizinger <[email protected]> Implement basic STUN server (#1603) This is an alternative to #1602 that implements the server using a library I've found called `stun_codec`. It already has support for parsing a variety of attributes. The following is a nice website to test some of the functionality: https://icetest.info/ The server is still listening on: `ec2-3-89-112-240.compute-1.amazonaws.com:3478`. Install Rust before computing cache keys (#1606) Enforce no warnings in docs (#1605) relay: Parse and respond to allocation requests (#1604) With this patch, the relay can parse and respond to allocation requests. I ran some basics tests against https://icetest.info/ and implemented a regression test as a result of the logged data. In writing this, I also had to slightly change the design of `Server` (as expected). Event handlers for incoming data now do not return a message directly. Instead, the caller is responsible to drain `Command`s from it. When creating an allocation, we need to start listening on a new port. This needs to happen outside the `Server` as I am going for a sans-IO style. We emit a `Command` that instructs the main event loop to listen on a new port. Any incoming data on that port will be forwarded to the `Server`. At the moment, this incoming data is just dropped. This is actually standards-compliant because we cannot handle binding requests yet which would allow this data to be forwarded to the client. In some areas, the code is still a bit rough but I expect to iron those things out as we go along. relay: add basic README (#1611) relay: refresh allocations (#1610) relay: don't repeat magic numbers througout the code (#1612) A small refactoring to keep magic numbers only in one place. relay: remember allocations by port (#1613) Instead of remembering the used ports separately, we store a reference to each allocation by port. ci: remove broken workflows (#1614) These workflows are all red which is expected as far as I understand. I'd suggest we remove them to reduce the noise when reviewing PRs. In case we ever wanted to bring parts of it back, Git is our best friend. Feel free to close if you think differently. Update workflows for cloud chaos (#1615) Updating workflows to skip on PR and run on merges to `cloud`. IAM context (#1577) Things I've left for later to IAM: 1. Subject session expiration (to prevent session extension attacks); 2. UserPass adapter; 3. Token adapter and removal of APITokens in favor of `api_client` actor with a Token provider; 4. Cleanup of Configurations schema and table 5. SCIM 6. Groups and Actor Profile (name, email) Sync 7. Email delivery once Web app is done with the templates 8. We might also want to persist sessions to database, to then show list of active sessions to the user and allow to terminate some of them from UI 9. SAML? 10. Rename `unprivileged` role name to `end_user` 11. Add `first_` and `last_name`, and sync/edit blocking logic around it. 12. Rename Clients to Devices? Fix PR-labeler config (#1623) Fix PR labeler config 🤞 fix(relay): use correct variable (#1617) We had a semantic conflict here that resulted in a broken build. This PR fixes that. Co-authored-by: Jamil <[email protected]> 1.0 views (part 1) (#1599) - [x] Users - [x] Groups - [x] Devices - [x] Gateways relay: create channel bindings and relay data (#1618) Here is a short demo: [Relay](https://github.com/firezone/firezone/assets/5486389/c0199294-70ca-47b4-90ae-2c96428bdb56) You can run this locally using the `./run_smoke_test.sh` shell-script. It is not reliable enough yet to be used in CI but I used one if its outputs to make a regression test. --------- Co-authored-by: Jamil <[email protected]> Implementing channels logic (#1619) Fix minor bugs and tidy up existing work on new views (#1628) Just fixing some bugs and inconsistencies I found while going through the new views. Fix some of TODOs left from IAM PR (#1627) Move elixir code to a subfolder (#1631) refactor(relay): introduce type-safe `Server` APIs (#1630) We introduce dedicated types for each message that the `Server` can handle. This allows us to make the functions public because the type-system now guarantees that those are either parsed from bytes or constructed with the correct data. The latter will be useful to write tests against a richer API. Deployment for the cloud version (#1638) TODO: - [x] Cluster formation for all API and web nodes - [x] Injest Docker logs to Stackdriver - [x] Fix assets building for prod To finish later: - [ ] Structured logging: https://issuetracker.google.com/issues/285950891 - [ ] Better networking policy (eg. use public postmark ranges and deny all unwanted egress) - [ ] OpenTelemetry collector for Google Stackdriver - [ ] LoggerJSON.Plug integration --------- Signed-off-by: Andrew Dryga <[email protected]> Co-authored-by: Jamil <[email protected]> Set correct outbound email in local env Try to fix CI step relay: implement authentication (#1641) Remove Elixir checks from pre-commit hook and rename CI step that runs it Always run Elixir CI checks when code in main branch changed Fix typos Run pre-commit CI step on all PRs Add newlines in the end of files Add resource type and expose it in WS API along with name (#1649) Additionally: 1. Fixed ipv6 formatting for stun/turn addresses 2. Fixed a tests that check for race conditions concurrently Normalize CIDR resource addresses Remove outdated TODO feat(rust): bump to new stable release 1.70.0 (#1648) Continuous delivery to staging (#1655) Add terraform code owners Lave a note on workflow_run feature and fix checkout feature Experiment with condition Workflow is not picked up by GitHub for some reason Try a different CI setup Add missing on_workflow call Remove copy-pasted required inputs Fix races for concurrency control Inherit secrets to child workflows Fix path to versions file Rename pre-commit step Bump checkout action vsn in rust workflow Try pushing update using GH API Fix github branch name Do not attempt to persist tag versions back to the repo Add missing env for terraform workflow Try to wrap tf vars in backticks Add double quotes to the var itself Fix assets pipeline, add Elixir deps audit, add Android applink manifest (#1659) feat(relay): implement nonces for authentication (#1654) To complete the authentication scheme for the relay, we need to prompt the client with a nonce when they send an unauthenticated request. The semantic meaning of a nonce is opaque to the client. As a starting point, we implement a count-based scheme. Each nonce is valid for 10 requests. After that, a request will be rejected with a 401 and the client has to authenticate with a new nonce. This scheme provides a basic form of replay-protection. feat(relay): provide a commandline interface using clap (#1658) This saves us several lines of code and allows usage of the relay via commandline arguments in addition to env variables. Note that because of `#[arg(env)]`, all of these can still be configured via environment variables too. feat(relay): add Dockerfile (#1661) This adds a basic Dockerfile for the relay so users and devs can easily start it. fix(relay): treat `stamp_secret` as string (#1660) Previously, the relay would treat the `stamp_secret` internally as bytes and share it with the outside world as hex-string. The portal however treats it as an opaque string and uses the UTF-8 bytes to create username and password. This patch aligns the relay's functionality with the portal and stores the `stamp_secret` internally as a string. ci: specify workspace directory for cache action correctly (#1663) ci: install musl target via `rust-toolchain.toml` file (#1664) Targets specified in the `rust-toolchain.toml` file are automatically installed by `rustup`. This avoid setup steps for other devs and also simplifies the CI setup. To be able to compile native code to musl, we do need `musl-gcc` which comes with the `musl-tools` package on ubuntu. feat(relay): connect to portal on startup (#1643) With this PR, the relay can be configured with a WebSocket URL on startup. If given, it will attempt to connect to it and join the `relay` room with its `stamp_secret`. Once the `init` message is received, regular relay operation will begin. jamilbk%feat/stub website in cloud (#1675) * Remove `www/` * Stub empty `website/` to silence Vercel. This shouldn't cause conflicts when we merge `cloud` to `master`. Perhaps we want to start working off `master` soon, and move the current tip of master to `legacy`? Use pnpm over yarn (#1678) Did some research when picking a package manager for the website and settled on `pnpm` for the following reasons: - CLI-compatible with `npm` - Typically faster than even `yarn` especially on Apple silicon - Security: Pnpm uses a different dependency resolution algorithm and different folder structure of node_modules that prevents illegal access to packages by other packages. I think I caught all the places, but I may be missing something, so if this isn't a good idea we can revert back. This PR also cleans up the actions workflows to remove dead code. Use pnpm for asset setup too (#1681) Add pnpm to runners (#1683) Found another place where pnpm needs to be added. Hotifx seeds and references (#1689) connlib: moves it to the main firezone library This brindgs connlib from its own separated repo to firezone's monorepo. On top of bringing connlib we also add and unify the Dockerfile for all rust binaries and add a docker-compose that can run a headless client, a relay and a gateway which eventually will test the whole flow between a client and a resource. For this to work we also incorporated some elixir scripts to generate portal tokens for those components. Do not expire encoded Gateway/Relay tokens Fix API error rendering Render error when public key is reused Fix stub module name Remove outdated env files rust: fix dockerfile for building multiple images in parallel (#1699) When using `docker compose build` or any other way of building docker images in parallel the way the cache was working with the rust's Dockerfile made the caches between images overlap and corrupt each other. We add a `locked` which prevents multiple writers to the same cache to fix this behaviour. Return changeset on name suffix constraint error docker: fix building for macos (#1700) There are problems building the docker images in macos using musl due to ring's problems therefore we started using slim-debian with glibc for development. Authentication for the live app (#1674) Co-authored-by: Jamil <[email protected]> portal: Policies CRUD views (#1692) @AndrewDryga ~~Was still hitting some redirect issues so I'll wait for those to be resolved before continuing on building more views.~~ Edit: After some sleep and coffee, I figured it out. Nice work on the sign in form! I went ahead and scoped existing dashboard links with `@account` and fixed a dark mode issue -- you may want to cherry-pick those commits. I'll add these to authenticated routes and integrate into what you have so far. As I was going through last night exploring your route approach I thought of some edge cases; can discuss next week. I think the main one that came to mind was that we probably want to differentiate between login flows initiated directly in the browser (this is an admin logging into the dashboard) vs login flows initiated from a client app (these will terminate with a final redirect to respective `dest` whitelisted URL). Maybe it makes sense to segregate these flows? If a regular user tries login directly from the browser maybe we want to show them something like "Please login from your Firezone application instead" as they should only be able to initiate logins from a client application. Or maybe there's simply no possibility to end up at the final Android App Link or `firezone://` URI with a login initiated directly from the browser? portal: Status indicator badge (#1703) Did some research on status page providers to manage incidents. statuspage.io seems to be easy to use and cost-effective, fairly popular and provides a good amount of flexibility to customize emails, notifications, etc. Super easy to set up and use but am not married to it if anyone feels strongly about using another incident management service. https://firezone.statuspage.io <img width="235" alt="Screenshot 2023-06-27 at 8 07 29 AM" src="https://github.com/firezone/firezone/assets/167144/8ad12b9b-7345-4a5d-bf43-c8af798d85f9"> Fix compilation warnings that are not fixed in merged PRs Do not render ipv6 relay address if it's nil CONTRIBUTING.md updates (#1704) **Update CONTRIBUTING.md** Why: * The CONTRIBUTING.md doc seems to have fallen slightly out of date with how Firezone now works. This commit updates the doc to provide a quick start guide for getting all of the various Firezone components up and running as quick as possible. The doc then links to the more specific `Elixir` and `Rust` README.md files in the respective directories to help developers who would like to contribute. **Update docker-compose vault health check** Why: * The current Vault health check listed in the docker-compose file does not seem to be working when using `localhost` in the `wget` command. Updating the URL to use `127.0.0.1` seems to have fixed it. --------- Signed-off-by: bmanifold <[email protected]> Co-authored-by: Jamil <[email protected]> Fix formatting issue My editor failed here due to a bug: elixir-lsp/vscode-elixir-ls#345 connlib: Improve FFI bridges for Apple and Android (#1691) This makes it possible to build the Apple/Android FFI bridges and integrate them with their respective client apps. --------- Signed-off-by: Francesca Lovebloom <[email protected]> Co-authored-by: Roopesh Chander <[email protected]> Fix/docker compose up (#1705) This PR fixes `docker compose up` but it doesn't have the test client -> resource flow working but it prevent anything from erroring at startup. This fixes: * tokens (use the correct token for the client user agent we are using) * randomize `name_suffix` at start up for connlib (we will eventually allow options to set it manually) * remove port ranges for relay (see firezone/corp#613) fix(relay): ensure smoke test script fails on error (#1711) Due to a silly bash mistake (I hate bash), the error from the gateway binary wasn't actually propagated to the script. Thus, we did not notice that it was been broken for a while. Attempting to fix it turned up that we were double-hexing the relay secret and using invalid passwords for the clients. fix(connlib): format with `cargo fmt` (#1709) Runs `cargo fmt` on the entire `rust/` directory. This somehow doesn't seem to be enforced, I think that is because we changed the previous CI to now only run for the `relay` crate. I'd like to merge this first to avoid the diff and in a 2nd PR, we can work on unifying CI again. fix(relay): remove smoke test CI script (#1717) Unfortunately, this doesn't seem to be stable. I don't really understand why. Judging from the logs, the problem is not in the relay but somehow the final UDP packet doesn't arrive at the `gateway` binary. To not unnecessarily block other PRs, I am removing the check for now. Add more websocat examples for connecting to a resource Wait for client and gateway containers for api to become ready Add docs section to see if everything is connected to the panel Explicitly subscribe to id channels Looks like for some reason the id/1 callback doesn't subscribe the channel process any more (only the socket itself), so we are doing that explicitly now. Stub out client app directories in monorepo structure (#1716) Stubs out the client app dirs and basic CI workflow for the client apps in preparation to move them into this repository. After this is merged @roop @pratikvelani you should be able to add the client repos here. chore: unify and optimize Rust CI (#1710) - Instead of having two, very similar jobs, we run our fmt, clippy and tests steps across all crates and operating systems. - We remove the dependency of the android and apple builds on the tests and thus get faster feedback. - We force clippy to fail on any warning. This one is super important IMO. Warnings in Rust are very useful and ignoring them can lead to bugs (think "unused Result" etc). Resolves #1714. --------- Signed-off-by: Thomas Eizinger <[email protected]> Co-authored-by: Francesca Lovebloom <[email protected]> connlib: Connection mock (#1721) Resolves firezone/corp#607 Setting the env var `CONNLIB_MOCK` when building through either `build-rust.sh` or `gradle` will activate the `mock` feature. Attempt to enable merge queue (#1713) https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#merge_group Feat/connlib full flow (#1722) With this PR the full control-plane message flow is working. Meaning that if you do: ``` docker compose up -d docker compose exec -it client "ping 172.20.0.2" # will fix this IP later ``` Messages start flowing to gateway. The gateway still not correctly forwards the messages to the resource since masquerading is still not working, although I suspect there might be an additional problem. Will fix this in my next PR along with a README on how to test this whole flow. This PR also fixes how we sent the stamp secret to the gateway from the relay, but I still see some warnings in the webrtc that I'm sure that are due to a mismatch between how webrtc-rs and the relay handle messages (The most important being `bind() failed: unexpected response type`), I will take a look at that and a way to test that the flow works when: 1. hole-punching is available 2. through relay when it's not Since the flow right now works without hole-punching or relay since the gateway is in the same network in the docker compose. Bump Elixir/OTP versions (#1730) Bump versions in Dockerfile Fix flaky tests docs(relay): bring README.md up to date (#1718) Drop invalid cache restore keys Fix ubuntu 20.04 CI (#1734) add a prefix key with host os to rust test job to prevent caching issues CI: add a flow that test client to resource ping (#1729) This PR fixes a bunch of small things to allow a new flow to test clients pinging a resource within docker compose. Masquerade/Forwarding is enabled directly in the container for now, this might change in the future. Also added a README to be able to run this locally. --------- Signed-off-by: Gabi <[email protected]> Co-authored-by: Jamil <[email protected]> feat(relay): default portal URL (#1719) Instead of having portal URL and token optional, we default the portal URL and decide based on the presence of the token, whether we should connect to the portal on startup. This allows the relay to be used/tested standalone and keeps the number of config options and error cases small. We require the user to config the full path of the websocket and thus avoid the need for duplicating the connlib function. Given that most users will never need to override this option, this seems like a good trade-off. Resolves firezone/corp#614. Feat/connlib handle error messages (#1735) With this PR we handle in the client an error message due to gateway/relay although rate limiting is needed. Waiting for #1729 to be merged. portal: Stub out Settings views (#1702) Adds Setting UI views based on the Balsamiq Wireframes. This should be merged **after** #1679 <img width="1469" alt="Screenshot 2023-06-26 at 4 48 55 PM" src="https://github.com/firezone/firezone/assets/167144/0994b12b-5d8d-48a6-bc8d-c9ba07d2403c"> <img width="1469" alt="Screenshot 2023-06-26 at 4 49 01 PM" src="https://github.com/firezone/firezone/assets/167144/1d69a54d-2740-4ab0-819b-75a50a976285"> <img width="1616" alt="Screenshot 2023-06-29 at 12 29 26 AM" src="https://github.com/firezone/firezone/assets/167144/94a8913f-93be-4502-b30e-c70f147dbe62"> <img width="1616" alt="Screenshot 2023-06-29 at 12 29 14 AM" src="https://github.com/firezone/firezone/assets/167144/16dfc709-65b9-44fd-adad-c412dc1d44e6"> <img width="1616" alt="Screenshot 2023-06-29 at 2 36 43 PM" src="https://github.com/firezone/firezone/assets/167144/3cddc4b3-7494-4710-953e-4d60108b9aa8"> <img width="1616" alt="Screenshot 2023-06-29 at 2 36 56 PM" src="https://github.com/firezone/firezone/assets/167144/1f433239-1023-471d-916c-76c43f47835e"> <img width="1616" alt="Screenshot 2023-06-29 at 2 37 05 PM" src="https://github.com/firezone/firezone/assets/167144/9cd4be23-02eb-4adf-902b-00c02cecd744"> Add android client to the repo (#1738) - Add android client to the repo --------- Signed-off-by: Pratik Velani <[email protected]> Co-authored-by: Jamil <[email protected]> Bring in apple client into monorepo (#1737) This PR brings in the apple client into the monorepo. --------- Co-authored-by: Jamil <[email protected]> feat(relay): use structured logging (#1741) With this patch, the relay exposes a `--json` and `JSON_LOG` env variable that will activate logs in JSON format the way it is expected by google cloud: https://cloud.google.com/logging/docs/structured-logging In addition, we make use of spans to record contextual information as first-class variables that are available in the context of every message. An example output here is: ``` {"time":"2023-07-06T19:54:42.643694430Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/main.rs","line":"156"},"severity":"INFO","message":"Seeding RNG from '0'"} {"time":"2023-07-06T19:54:42.644408014Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/main.rs","line":"130"},"severity":"INFO","message":"Listening for incoming traffic on UDP port 3478"} {"time":"2023-07-06T19:54:42.843247996Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"417"},"span":{"lifetime":"600","name":"allocate"},"spans":[{"sender":"127.0.0.1:46406","transaction_id":"0531a911a24d1e5297b94cb2","name":"client"},{"lifetime":"600","name":"allocate"}],"severity":"INFO","ip4RelayAddress":"127.0.0.1:65460","message":"Created new allocation"} {"time":"2023-07-06T19:54:42.851623041Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"569"},"span":{"allocation":"AID-1","peer_address":"127.0.0.1:42314","requested_channel":"16384","name":"channel_bind"},"spans":[{"sender":"127.0.0.1:46406","transaction_id":"e99e07e482789cdc30bd2b50","name":"client"},{"allocation":"AID-1","peer_address":"127.0.0.1:42314","requested_channel":"16384","name":"channel_bind"}],"severity":"INFO","message":"Successfully bound channel"} {"time":"2023-07-06T19:54:42.852889208Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"288"},"span":{"allocation_id":"AID-1","channel":16384,"recipient":"127.0.0.1:46406","sender":"127.0.0.1:42314","name":"peer"},"spans":[{"allocation_id":"AID-1","channel":16384,"recipient":"127.0.0.1:46406","sender":"127.0.0.1:42314","name":"peer"}],"severity":"DEBUG","message":"Relaying 32 bytes"} {"time":"2023-07-06T19:54:42.854625857Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"619"},"span":{"channel":"16384","recipient":"127.0.0.1:42314","name":"channel_data"},"spans":[{"sender":"127.0.0.1:46406","name":"client"},{"channel":"16384","recipient":"127.0.0.1:42314","name":"channel_data"}],"severity":"DEBUG","message":"Relaying 32 bytes"} ``` For some reason, the current `span` is always duplicated but I don't think that is a big issue. When run using the regular log formatter, it looks like this: ``` 2023-07-06T20:02:33.939273Z INFO relay: Seeding RNG from '0' 2023-07-06T20:02:33.940153Z INFO relay: Listening for incoming traffic on UDP port 3478 2023-07-06T20:02:34.135801Z INFO client{sender=127.0.0.1:33919 transaction_id="7092a2363377709cd18b9d98"}:allocate{lifetime=600}: relay: Created new allocation ip4_relay_address=127.0.0.1:65460 2023-07-06T20:02:34.144833Z INFO client{sender=127.0.0.1:33919 transaction_id="4e1a18e58953242c92a075a3"}:channel_bind{requested_channel=16384 peer_address=127.0.0.1:47859 allocation="AID-1"}: relay: Successfully bound channel 2023-07-06T20:02:34.145501Z DEBUG peer{sender=127.0.0.1:47859 allocation_id=AID-1 recipient=127.0.0.1:33919 channel=16384}: relay: Relaying 32 bytes 2023-07-06T20:02:34.146863Z DEBUG client{sender=127.0.0.1:33919}:channel_data{channel=16384 recipient=127.0.0.1:47859}: relay: Relaying 32 bytes ``` This provides lots of contextual information in a DRY and easily parse-able way. --------- Co-authored-by: Jamil <[email protected]> Pass all required checks that weren't triggered in the PR (#1748) Fixes #1747 Fixes #1746 Pass-checks workflow per subdir (#1749) Fix cache for Docker buildx (#1750) ~~This is an attempt to fix the CI bug [here](https://github.com/firezone/firezone/actions/runs/5491388141/jobs/10007864417#step:4:1638) possibly introduced in [d9eb2d1](d9eb2d18#diff-88bd94db0d5cfd5f0617b7c4ed48c0212597378ed7e28714c5d86c95999b4c7dR29) and uncovered / exacerbated in Elixir 1.15~~ Edit: looks like this ended up being a couple cache issues with GitHub actions: 1. The `elixir_api-container-build` cache would always overwrite the `elixir_web-container-build` on subsequent builds of the same `github.ref_name` (cache is scoped to branch name by default), leading to the consistent error `Elixir.Web.Mailer.NoopAdapter does not exist` whenever a branch was pushed to more than once. 2. The same thing happens with the `integration_test-basic-flow` job because the `api` service gets built after the `web` service in docker-compose.yml, overwriting its cache For some reason it seems the `APPLICATION_NAME` ARG is not busting the Docker cache properly on GitHub actions for elixir container builds, so the fix here was to [use `scope=`](https://docs.docker.com/build/cache/backends/gha/#scope) to segregate the cache layers between builds of the same branch. Move NoopAdapter to Domain app (#1756) Workaround for this: elixir-lang/elixir#12777 Feat/expire peers (#1739) This PR takes care of expiring connections with peer from the gateway side. --------- Co-authored-by: Jamil <[email protected]> fix(relay): reuse `delete_allocation` function (#1743) Previously, we would access the state around allocations from different places. This actually led to a minor memory leak where we wouldn't clean up the `allocations_by_port` table. We refactor the code slightly to avoid this. --------- Co-authored-by: Jamil <[email protected]> connlib: Use latest `swift-bridge` release (#1753) A new version of `swift-bridge` released today, so we don't need it to be a git dependency anymore. headless & gateway: impl callbacks (#1757) After rebasing over this #1744 CI should pass connlib: Hook up callbacks (#1744) Co-authored-by: Jamil <[email protected]> Add slack notification for failed deployments Fix flaky test Fix health checks path
I actually think that explains a lot. If the @on_load hook fails, the module is not loaded, which is why it says it cannot find the module. The actual root cause is logged only a couple of lines above:
The reason why this happens is because There are two fixes here:
I will move the issue to Swoosh. :) Thank you for the follow up! |
Done, please consider sending a PR there too: swoosh/swoosh#792 :) |
Give 1.11.4 a go! @AndrewDryga |
Thank you @josevalim and @princemaple ❤️ |
Elixir and Erlang/OTP versions
elixir 1.15.2-otp-26
erlang 26.0.2
Operating system
Ubuntu 22.04.2
Current behavior
We started to have issues with
mix release
that fails once in a while. Restarting the CI job most usually fixes the issue, so it's most likely some sort of race condition during compilation.For crash errors see:
https://github.com/firezone/firezone/actions/runs/5491388141/jobs/10007864417#step:4:1638
https://github.com/firezone/firezone/actions/runs/5510881208/jobs/10045788818#step:9:1009
you can see other builds that were successful at the same branch without any code changes, eg.:
https://github.com/firezone/firezone/actions/runs/5491969244/jobs/10009045418#step:4:2783 (the latter failed job was triggered on the same codebase by GitHub merge queue checks).
Expected behavior
Consistently compile the module as it was pre-1.15.
The text was updated successfully, but these errors were encountered: