-
Notifications
You must be signed in to change notification settings - Fork 31
Lightning fast messaging and cold phone #55
Comments
I pledge 5 hours a week as tree tender / Clojure-Go integrator / Clojure contributor, as this is what has happened in practice (more than 5h last week, hopefully less this week for offline inbox MVP, TBD for two weeks time). |
I've done some whisper tests: on small network <100 whisper nodes we got a lot of message duplication, ×5 - x10 - each node recieves each message for many times, restores message, do some preparation, checks and after that discards duplicate. |
@JekaMas that sounds good! They're implementing bloom filters, as I know, to do this more intellectually. Another approach for us could be to set various topics instead of making it fixed for messaging. But it won't scale for the "real" network, only for our small subset. Though it sounds like a good approach from the first glance even for the real network. |
I pledge focus of 1 (30h/week). After end of #1 - 40h/week |
the same as @b00ris wrote. |
i pledge 30h/week |
To get this moving quickly, I sketched out idea for how I think we can get to MVP: Minimal Viable Product Description:
Example: As a user on iPhone 7 I want to be able to sign up in <30s without feeling like my phone gets hot.
Example: Figure out how to get disk IO statistics over time on Android.
Example: Assuming phone gets hot due to high CPU usage, Whisper decryption is causing high CPU usage. Quickest way to test: disable Whisper and simulate user story without it. |
Updated MVP in iteration. Only change from comment above is goal date changed due to uncertain availability / slow start. |
@divan do you want to commit to this swarm? |
Brief update on CPU investigation. Tool for monitoring Status CPU (feel free to extend to support more metrics, forwarding to metrics collection platforms, etc): https://github.com/status-im/statusmonitor I tested yesterday CPU usage for idle screen on release build with and without status-go (using STUB_STATUS_GO build flag). This flag effectively disables status-go usage (it is still compiled into app, though). I wanted to figure out what portion of background CPU activity for app in the idle state (i.e. chat screen open and no interaction) is introduced by status-go and by status-react code Here are results I got: Idle with and without status-go (note: this phone has 8 cores, so max is 800%): I'm going to do more measurements and data collection, including rebuilding and repeating it a couple of times from scratch to make sure I'm not making silly mistakes while running experiments. |
Should we split this issue into two parts: a cold cell and a fast messaging? As far as i can see high cpu usage != high battery consuming in common case. One thread could wait for a resource and cpu usage would be 100% and no other threads cant use that cpu however a battery consuming is low in the case. |
I'm going to desctibe the possible issue related to a whisper group chatting. Actually each chat user recieves as many message copies as many they has connestions (if TTL is not achieved). It looks like this graph ang streams (the first number in node is node name and the second number is message copy count): So if a user has 50 connections he revieves 50 message copyes and each recieving does 2 stages of decryption and one decode with reflection usage. There're totally 100 decryptions and 50 decodes for each message in chat. It looks like an exponent and 'no free lunch'. I think this could cause on a chat performance. The possible solution is about changing current Ethereum protocol to be able discard messages from given servises with given hashes or uniqe ID (like GUID or twitter's snowflake or Lazada's Luigi). |
I am going to add an ability to measure CPU and RAM difference per two provided builds in different user flows (create/recover user, send transaction/location/request, chat, group chat etc.) which are represented in appium automated tests. |
To not lose it in Slack history: Jarrad asked "so what's the summary on performance issues so far? do we understand the underlying causes?" My reply: Work in progress.
Not sure what @tiabc is investigating. For you:
|
I'm going to commit 30-40h/week to this swarm, starting with - status-im/status-mobile#2852 |
Investigation from me and @JekaMas related to Problem:We use one topic for all whisper messages(https://github.com/status-im/status-react/blob/484e982bdf4b09c168aab142f190ef9427cbbfa3/src/status_im/protocol/web3/filtering.cljs#L6). Hypothesis:
Experiments:(see https://github.com/status-im/status-go/tree/debug/whisper_perf_topic) a) one topic(https://github.com/status-im/status-go/blob/5a7e4c3a0019a3b5bf38cc3c76dadfefa7d14749/e2e/whisper/whisper_send_message_test.go#L89, run: b) many topics(https://github.com/status-im/status-go/blob/5a7e4c3a0019a3b5bf38cc3c76dadfefa7d14749/e2e/whisper/whisper_send_message_test.go#L161, run: с) many topics(https://github.com/status-im/status-go/blob/5a7e4c3a0019a3b5bf38cc3c76dadfefa7d14749/e2e/whisper/whisper_send_message_test.go#L16 func matchSingleTopic(topic TopicType, bt []byte) bool {
if len(bt) > 4 {
bt = bt[:4]
}
for j, b := range bt {
if topic[j] != b {
return false
}
}
return true
} can't return false if bt is empty slice. Result Migrating from one topics to many topics strongly reduce cpu usage for status app. |
As part of MVP:
I spent some time to formulate precise hypotheses for some major pieces of work we are currently doing. They can be found here:
The main one missing is: status-im/status-mobile#2852 which will probably be factored out into two by @janherich @dmitryn and me sometime soon. Please have a look at above. Also note that big infrastructure/tooling stuff and LES2 etc things are out of scope of this swarm. This is purely about critical path for MVP and doing the minimal coding necessary to test and eventually fix hypotheses. I believe if we are in rough agreement with this we should consider MVP done (it's been a rough month) and figure out where we want to be the next 1-2 week for iteration 1 as part of the swarm group call tomorrow |
Swarm 55 update: Meeting notes: https://docs.google.com/document/d/1KEqE3JGpA4ZKmpbffZZubcRkVbV6v9DtTxzs5EV8Z68/edit# Swarm people:
Iteration 1 scope: Goal date: January 19th Deliverable 1: Test hypotheses:
Deliverable 2: User stories that are unacceptable right now for supported devices (Anna, Chad?) Deliverable 3: Come up with new hypotheses based on what we learn in 1. |
Updated original issue. #55 |
@yenda Do many topics solve the issue complitely? If so we need one more patch on Ethereum (ethereum/go-ethereum#15811). This PR already approved but not merged yet. PR should decrease cpu usage even more. I really want to hear about #55 testing it's hard to believe that one fix solves such issue :) Is app now fast and cold? |
Swarm update. Misc
Iteration 1 updateDeliverable 1: Test hypotheses:
Partial progress: status-im/status-mobile#3072
Partial progress: status-im/status-mobile#2922 and basic logging for verification in status-go. @yenda blocked by some local tooling issues and critical release bugs.
Progress: status-im/status-mobile#2965 (comment)
Complete: status-im/status-mobile#3045 (solves/brings us back to baseline for status-im/status-mobile#2852) Deliverable 2: User stories that are unacceptable right now for supported devices (Anna, Chad?) Partial progress. Release blocker in terms of priority:
Main user story partially solved is: status-im/status-mobile#2852 Deliverable 3: Come up with new hypotheses based on what we learn in 1. See iteration 2 for this. Iteration 2
Additionally, some RN navigation PR (Roman), general investigation in RN workers (Roman, Dmitry). User stories will continue to be formulated as 0.9.13 is released as well. Main surprising thing learned is (at least to OP): Whisper overhead appears to be only 20% compared to baseline geth with upstream RPC, suggesting network overhead is at p2p/discovery layer. Iteration 2 goal date: 2018-02-01. New project board for iteration 2: https://github.com/orgs/status-im/projects/8 |
For board, in general, we have:
|
From Slack: tldr splitting into two swarms Hey all! So this swarm has grown quite a bit and is currently a bit too big. Initially we thought a lot of problems were on status-go side, but then it turned out a bunch are on status-react/app side. This has lead to the surface area being kind of big. @themue (and others) have brought up the idea that it'd be good to split the swarm up a bit. After talking to some people this is what we are going to do. They'd still practice same methodology - starting with end user story and testing a hypothesis that requires minimal amount of work to impact user story, but the specific goals would be a bit different. One swarm/working group, say 55a, will be around UI/rendering stuff (Dmitry, Roman, Jan, Igor currently I think). Specifically this means hypotheses centered around status-im/status-mobile#3095 right now. @dmitryn has agreed to lead this one. Another swarm, say 55b, is largely about network overhead status-im/status-mobile#2931 but also about Whisper many topic one (though this one doesn't have specific user story attached to it, but it generally seems promising). This one is largely Go related, but it also requires Clojure integration, so it's currently Eugene, Boris, and Eric. @jeka has agreed to lead this one. As @anna and others identify more user stories we'll see where they best bit, but we can play this by ear I think. As a start, to reduce coordination costs and decouple efforts a bit, we can just create two channels and have two separate meetings (last one was a bit rushed and covered a lot, as I'm sure some of you felt). Then it is up to @jeka and @dmitryn how they want to organize things, like creating a new idea or setting iterations or whatever. |
Preamble
Summary
Make the app usable from a performance point of view for all the supported user flows for product MVP.
Vision
Use basic hypothesis testing to solve the following qualitative user statement:
As well as focusing on critical path, making assumption explicit, operating under assumptions/uncertainty and doing minimal and tooling work necessary to solve the user story.
Swarm Participants
Swarm size: 10 people.
Requirements
Loosely: Understanding of supported user stories and devices. No additional requirements in this idea outside of what has been or will be specified as part of everyday work. Creating additional tooling is not part of this swarm's work.
Goals & Implementation Plan
(a) Get a global overview of this problem
(b) Use rigor and hypothesis testing to pursue the most fruitful directions
(c) Work as close to the root of the assumption tree as possible while still having leverage
See https://docs.google.com/document/d/1OZtzfojToJtZhj2LnokA9-YU7aL_gm9NzaGtI-vuZ6E/edit# for original doc.
General heuristics:
Testing hypotheses quickly. If it takes 1 month to test hypothesis then we can only do 2-3 in 3 months, and we don’t learn a lot. But if we can do 2 a week we can do 20+ and our knowledge will be correspondingly higher.
Minimum Viable Product
Goal Date: 2017-12-29
Completed: 2018-01-10 (partial completion)
Description:
performance on supported devices and are part of Status Core MVP supported
flows.
Example: As a user on iPhone 7 I want to be able to sign up in <30s without
feeling like my phone gets hot.
us; identify needs for plausible but currently opaque metrics.
Example: Figure out how to get disk IO statistics over time on Android.
well the quickest way to test these hypotheses.
Example: Assuming phone gets hot due to high CPU usage, Whisper decryption is
causing high CPU usage. Quickest way to test: disable Whisper and simulate user
story without it.
Iteration 1
Goal Date: 2018-01-18
Description:
Deliverable 1: Test hypotheses:
Deliverable 2: User stories that are unacceptable right now for supported devices (Anna, Chad?)
Deliverable 3: Come up with new hypotheses based on what we learn in 1.
Iteration 2...N
Goal Date: TBD
Once we have all the tools for benchmarking in place and most bottlenecks are fixed, we need to ensure we have documents how to avoid having performance regressions in future as well as automated performance tests developed under #22.
Supporting Role Communication
Post-Mortem
Copyright
Copyright and related rights waived via CC0.
The text was updated successfully, but these errors were encountered: