feat: Add experimental support for peer-to-peer updates #595

gmaclennan · 2021-05-17T20:01:44Z

This PR started as an attempt to add strict TypeScript type-checking to this code, but that work resulted in identifying several bugs and some issues with the existing architecture that created many opportunities for race conditions.

This is a really complicated piece of code, with some complicated logic for managing peers appearing and disappearing from a local network. It's important that we can know it works reliably, but also the code needs to be well documented and easy to maintain by the entire team. This PR attempts to address that.

In general I've found that writing this with Typescript and async / await has resulted in much faster development - the code here took about 25 hours to write (excluding thinking time away from the computer) and it should be easier and quicker to add features and fix bugs.

Typescript

Everything is strictly typed now.

State Management

We were using the module rwlock to manage async state management. The main async state that we need to maintain is that the service can start and stop many times (because the user can navigate to and from the sync screen, or close and re-open the app) and the start and stop functions can be async. The use of rwlock meant that the starting and closing states were hidden.

I created an abstraction, AsyncService, which manages start and stop of an async service according to tested rules. This means that we can test this complex code and simplify the code where the upgrade logic lies.

async / await

Using callbacks with code that has so many async actions makes it hard to follow what is happening and catch race conditions. I have changed all the code to use async / await patterns, which I think makes the code easier to follow and maintain by avoiding nested callbacks and ensuring that the code flow is the same as the logic flow (top-to-bottom).

State complexity

The shape of the state emitted by the UpgradeManager was complex, which resulted in a lot of code to decide what state to show to the user in the UX. The main things that the UX needs to know are:

Is the p2p upgrade service running (without error)?
Progress of any downloads
Progress of any uploads
Is there an update downloaded and available to install?

This PR simplifies the state to just capture that information, it is documented in lib/types.d.js as UpgradeState

Code documentation

It's really important that this code is easy for others on the team to understand and maintain. It took me quite a while to parse the original code, so I have written many comments to explain why the code is written the way it is. I have also tried to be really clear about what are public methods and what methods are private (e.g. internal to the implementation and not something to be tested or used outside each class), with explanations of what each class and method does. I have used JSDoc format for comments, extended with Typescript annotations.

Server

No need for drain logic, since server.close() will stop accepting new connections, then wait for existing connections to close before closing.
Include url property in installer schema returned by server (url is for downloading the APK)
Fix start/stop logic in server to avoid race conditions
Validate list installers response body matches schema
Use fastify for better error handling and simpler API
Simplified emitted state (only need a list of active uploads from server)

Storage

Simplify state logic and async to remove race conditions
Read info about APKs from files themselves
Cache installer info rather than read it each time
Don't read files into memory for generating hash
Use async methods for disk operations
Turn into EventEmitter so can listen to available installers
Async initialization to allow async cleanup of files (and reading info about current installers)
createWriteStream() will only finish (e.g. emit "finish" event) when the installer is completely written to storage. This was tricky code, so I abstracted it as a util startFinishStream() and thoroughly tested it.

Discovery

Ensure onPeer event is removed when discovery is stopped, to avoid the event being added multiple times
Add TTL & timeout to discovered installers from peers, so that installers from peers that go offline are removed from state
Validate lists of installers from peers and protect against prototype pollution
Simplified state that emits a list of available installers
No logic about evaluating installers - will return all discovered installers no matter platform, version etc.
Check port is available on each start (since it could become unavailable between stop() and start())
createReadStream() for downloading (but abstracts how this is done) and ensures that an installer is removed from "available" if download fails (to avoid repeat attempts at download - will become available again if re-discovered by discovery.lookup())

Evaluating potential upgrades

We had several places for comparing and evaluating upgrades. I tried to move all this logic into testable utility functions, with two separate tests:

Checking if a given installer is compatible with the current device
Getting the most recent of compatible installers

This should make it easier to add additional logic for selecting a suitable upgrade candidate.

Next Steps

I really hope that these changes make this code more reliable and easier to maintain in the future.

There are a few tasks remaining (with estimated time):

Tests for upgrade-discovery (3h)
Tests for upgrade-server (2h)
Tests for upgrade-manager (4h)
Integration of new API with server.js (1h)
Update frontend code for new state shape (2h)
Cleanup AsyncService code feat: Add experimental support for peer-to-peer updates #595 (comment) (1h) - non-essential chore
Change all _name to #name feat: Add experimental support for peer-to-peer updates #595 (comment) (1h) - non-essential chore
Evaluate how long to wait for initial download (1h)
Accidentally fixed beforeAfterStream tests for node@16, breaking node@12. Revert (1h)

When complete this code should close these issues: #589, #585, #576, #562, #559, #537

This reverts commit 2b4e3ac.

the lodash.isMatch function we are using will match an empty array against any array, so to check the state is an empty array, we need to use the compare function to strictly check the array length.

gmaclennan · 2021-06-03T18:55:44Z

I identified 2 bugs in my own QA testing:

A device would show "No app updates found / checked 1 device" for a second or two before an update started downloading. E.g. it was reporting a device was checked before it had actually checked it.
The checked devices list would not be reset when navigating away from and returning to the sync screen.

These bugs are fixed now in this PR, and there are more comprehensive tests for the "checkedPeers" functionality.

fixes #612

* develop: chore: Update instructions on NDK version in contributing docs chore: Update current NDK version in docs

gmaclennan and others added 30 commits February 8, 2021 11:45

Add AppInfo native module for accessing sourceDir

c825111

Add share sheet for sharing Mapeo APK

d169df6

Test auto-updating app from an APK

0dd3aa5

Test reading package signatures

28271b1

Prep APK for sharing and start upgrade server.

710aee9

get correct mapeo app version + show on settings

5b71221

remove old comments

afd1c8f

implement upgrade storage module

257a046

prevent 'filename' from being exposed on upgrades

4344edd

add createReadStream API to stream upgrade data

0330de8

chore: add eslint linting

5175afa

test: move fake.apk to static/ subdir

7017242

test: formatting

e61fae4

fix: make 'arch' an array of strings

b31cbc6

chore: add 'test' npm script

5049f32

add initial UpgradeServer implementation

e289d74

add new UpgradeServer implementation

b401b1f

test: add edge case tests for upgrade server

ca1e0c6

move the upgrade discovery key to its own file

f7b8b2b

have UpgradeServer accept a port to advertise on

bc37167

make UpgradeStorage#getAvailableUpgrades sync

9a49773

refactor: consolidate apk->upgradeoption logic

5cf5717

Revert "make UpgradeStorage#getAvailableUpgrades sync"

e9cfc9f

This reverts commit 2b4e3ac.

refactor: use a file for downloaded upgrade info

9aae25f

refactor: move LocalUpgradeInfo into a nodule

672fde7

refactor: rename createApkWriteStream

e12d102

add work-in-progress UpgradeDownloader component

29d73f5

chore: lint

2f1ad41

add download progress tracking to state

c89a0e8

implement the UpgradeDownloader checker component

8082c9c

gmaclennan added 9 commits June 3, 2021 11:01

Remove debug logging on CI (was used for debugging test failure)

3e5eb67

chore: Fix to backend build script

fa065aa

chore: Clean up logging code

84fbde0

Failing test for checkedPeers

bc3bef7

fix: checkedPeers should only update after potential upgrades r checked

598a4c1

chore: typings for tests

ec1f6cb

failing test for checkedPeers reseting after stop,start

da66f31

fix: Checked Peers should reset after start and stop

cbac680

fix tests, stricter testing for empty arrays in state

ae824ca

the lodash.isMatch function we are using will match an empty array against any array, so to check the state is an empty array, we need to use the compare function to strictly check the array length.

gmaclennan mentioned this pull request Jun 4, 2021

feat(p2p-upgrade): Read APK version, bundleId and supported arch before installing #585

Closed

gmaclennan added 4 commits June 4, 2021 11:55

fix: Don't use same keep-alive connection for downloads and polling

781d040

fixes #612

Update upgrades QA build numbers

c2827bb

fix typo

3c6d684

Merge branch 'develop' into p2p-update/typescript

5f4520a

* develop: chore: Update instructions on NDK version in contributing docs chore: Update current NDK version in docs

gmaclennan marked this pull request as ready for review July 21, 2021 10:36

gmaclennan changed the base branch from share-apk-p2p to develop July 21, 2021 10:38

gmaclennan changed the title ~~chore: Typescript & async/await refactor~~ feat: Add experimental support for peer-to-peer updates Jul 21, 2021

Merge branch 'develop' into p2p-update/typescript

0934614

This was referenced Jul 21, 2021

Peer-to-peer upgrades #524

Closed

Delete old binaries #592

Closed

gmaclennan added 5 commits July 21, 2021 17:04

Merge branch 'develop' into p2p-update/typescript

602c8e9

Fix "Go to Map" button style for empty observation list screen

3d41403

fix e2e debug test config (for local testing)

d77f67c

Increase server start timeout in attempt to fix e2e tests

56cda61

Merge branch 'develop' into p2p-update/typescript

7deb3df

gmaclennan merged commit 871fce8 into develop Jul 23, 2021

gmaclennan deleted the p2p-update/typescript branch July 23, 2021 17:19

achou11 mentioned this pull request Sep 27, 2021

Release v5.3.0 #750

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add experimental support for peer-to-peer updates #595

feat: Add experimental support for peer-to-peer updates #595

gmaclennan commented May 17, 2021 •

edited

Loading

gmaclennan commented Jun 3, 2021

feat: Add experimental support for peer-to-peer updates #595

feat: Add experimental support for peer-to-peer updates #595

Conversation

gmaclennan commented May 17, 2021 • edited Loading

Typescript

State Management

async / await

State complexity

Code documentation

Server

Storage

Discovery

Evaluating potential upgrades

Next Steps

gmaclennan commented Jun 3, 2021

gmaclennan commented May 17, 2021 •

edited

Loading