Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add experimental support for peer-to-peer updates #595

Merged
merged 238 commits into from
Jul 23, 2021

Conversation

gmaclennan
Copy link
Member

@gmaclennan gmaclennan commented May 17, 2021

This PR started as an attempt to add strict TypeScript type-checking to this code, but that work resulted in identifying several bugs and some issues with the existing architecture that created many opportunities for race conditions.

This is a really complicated piece of code, with some complicated logic for managing peers appearing and disappearing from a local network. It's important that we can know it works reliably, but also the code needs to be well documented and easy to maintain by the entire team. This PR attempts to address that.

In general I've found that writing this with Typescript and async / await has resulted in much faster development - the code here took about 25 hours to write (excluding thinking time away from the computer) and it should be easier and quicker to add features and fix bugs.

Typescript

Everything is strictly typed now.

State Management

We were using the module rwlock to manage async state management. The main async state that we need to maintain is that the service can start and stop many times (because the user can navigate to and from the sync screen, or close and re-open the app) and the start and stop functions can be async. The use of rwlock meant that the starting and closing states were hidden.

I created an abstraction, AsyncService, which manages start and stop of an async service according to tested rules. This means that we can test this complex code and simplify the code where the upgrade logic lies.

async / await

Using callbacks with code that has so many async actions makes it hard to follow what is happening and catch race conditions. I have changed all the code to use async / await patterns, which I think makes the code easier to follow and maintain by avoiding nested callbacks and ensuring that the code flow is the same as the logic flow (top-to-bottom).

State complexity

The shape of the state emitted by the UpgradeManager was complex, which resulted in a lot of code to decide what state to show to the user in the UX. The main things that the UX needs to know are:

  1. Is the p2p upgrade service running (without error)?
  2. Progress of any downloads
  3. Progress of any uploads
  4. Is there an update downloaded and available to install?

This PR simplifies the state to just capture that information, it is documented in lib/types.d.js as UpgradeState

Code documentation

It's really important that this code is easy for others on the team to understand and maintain. It took me quite a while to parse the original code, so I have written many comments to explain why the code is written the way it is. I have also tried to be really clear about what are public methods and what methods are private (e.g. internal to the implementation and not something to be tested or used outside each class), with explanations of what each class and method does. I have used JSDoc format for comments, extended with Typescript annotations.

Server

  • No need for drain logic, since server.close() will stop accepting new connections, then wait for existing connections to close before closing.
  • Include url property in installer schema returned by server (url is for downloading the APK)
  • Fix start/stop logic in server to avoid race conditions
  • Validate list installers response body matches schema
  • Use fastify for better error handling and simpler API
  • Simplified emitted state (only need a list of active uploads from server)

Storage

  • Simplify state logic and async to remove race conditions
  • Read info about APKs from files themselves
  • Cache installer info rather than read it each time
  • Don't read files into memory for generating hash
  • Use async methods for disk operations
  • Turn into EventEmitter so can listen to available installers
  • Async initialization to allow async cleanup of files (and reading info about current installers)
  • createWriteStream() will only finish (e.g. emit "finish" event) when the installer is completely written to storage. This was tricky code, so I abstracted it as a util startFinishStream() and thoroughly tested it.

Discovery

  • Ensure onPeer event is removed when discovery is stopped, to avoid the event being added multiple times
  • Add TTL & timeout to discovered installers from peers, so that installers from peers that go offline are removed from state
  • Validate lists of installers from peers and protect against prototype pollution
  • Simplified state that emits a list of available installers
  • No logic about evaluating installers - will return all discovered installers no matter platform, version etc.
  • Check port is available on each start (since it could become unavailable between stop() and start())
  • createReadStream() for downloading (but abstracts how this is done) and ensures that an installer is removed from "available" if download fails (to avoid repeat attempts at download - will become available again if re-discovered by discovery.lookup())

Evaluating potential upgrades

We had several places for comparing and evaluating upgrades. I tried to move all this logic into testable utility functions, with two separate tests:

  1. Checking if a given installer is compatible with the current device
  2. Getting the most recent of compatible installers

This should make it easier to add additional logic for selecting a suitable upgrade candidate.

Next Steps

I really hope that these changes make this code more reliable and easier to maintain in the future.

There are a few tasks remaining (with estimated time):

When complete this code should close these issues: #589, #585, #576, #562, #559, #537

gmaclennan and others added 30 commits February 8, 2021 11:45
@gmaclennan
Copy link
Member Author

I identified 2 bugs in my own QA testing:

  • A device would show "No app updates found / checked 1 device" for a second or two before an update started downloading. E.g. it was reporting a device was checked before it had actually checked it.
  • The checked devices list would not be reset when navigating away from and returning to the sync screen.

These bugs are fixed now in this PR, and there are more comprehensive tests for the "checkedPeers" functionality.

* develop:
  chore: Update instructions on NDK version in contributing docs
  chore: Update current NDK version in docs
@gmaclennan gmaclennan marked this pull request as ready for review July 21, 2021 10:36
@gmaclennan gmaclennan changed the base branch from share-apk-p2p to develop July 21, 2021 10:38
@gmaclennan gmaclennan changed the title chore: Typescript & async/await refactor feat: Add experimental support for peer-to-peer updates Jul 21, 2021
This was referenced Jul 21, 2021
@gmaclennan gmaclennan merged commit 871fce8 into develop Jul 23, 2021
@gmaclennan gmaclennan deleted the p2p-update/typescript branch July 23, 2021 17:19
@achou11 achou11 mentioned this pull request Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants