-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate GATT service registrations #107
Comments
Thanks Jimmo. Confirmed this is not an issue in Herald. Android line to clear services before adding: https://github.com/vmware/herald-for-android/blob/develop/herald/src/main/java/com/vmware/herald/sensor/ble/ConcreteBLETransmitter.java#L537 iOS line to clear services before adding: https://github.com/vmware/herald-for-ios/blob/develop/herald/herald/Sensor/BLE/BLETransmitter.swift#L153 |
I understand that the code does call I also see this issue with the Herald test app -- just doing a direct Looking more closely, in ConcreteBLETransmitter.java, the |
Re-opening to re-check the specific use case of deliberately turning BLe on/off/on. We have tested this code a significant number of times, continually, and run full suites every week of manual testing. It is always possible in real world code that a regression can slip in. I would remind you that this is a network system whose behaviour on individual phones - despite manufacturers and OS API documentation's insistence to the contrary - varies widely across many devices and OS versions. It's also worth noting that our code and protocol is completely open and so you are able to test ours - not always the case for DCT protocols / subsystems. @jimmo "did you actually test it?" is rather an unprofessional comment, shows an immaturity in correspondence with your peers, is unkind and unwarranted. I have been very open with you on here and will always continue to do so. We've been working on this for months including in our limited spare time and do so to help all DCT teams to further our common understanding of Bluetooth performance, vagaries of devices, and to help improve human health outcomes globally. Please consider your language in future reports, and please re-read our code of conduct before reporting a new issue. |
This is an unfair characterisation. I can see that it could be interpreted sarcastically, however, this is my understanding of the above issue:
Asking if it was subsequently tested after the issue was reported was a fair question, given how quickly it was dismissed as |
@adamfowleruk Please use nRF Connect on Android -- for the reasons you've outlined above regarding CoreBluetooth limitations, nRF Connect on iPhone isn't very useful for any sort of low-level debugging. I have repro'ed this in bluez on Linux and with the BlueKitchen stack (normally used on embedded devices but in this case talking to a HCI controller over UART from Linux). Here are some additional screenshots showing it in nRF Connect for Android. You can see three different RPAs all currently sending beacons (I cleared the scan results immediately before taking this screenshot). On the next screenshot you can see I am concurrently connected to all three (in three tabs), and the three identical services being returned (it's the same in all three tabs).
See above "When the polling notices the adaptor is powered off, it immediately null's bluetoothGattServer without clearing services or close()ing it." I'm also not entirely convinced that clearServices itself always works but haven't investigated in detail. |
Please also see https://github.com/AU-COVIDSafe/mobile-android/issues/33#issuecomment-745038577 for details of another repro. |
Hi @jimmo @HendX - again our testing and regular Bluetooth scans don't show this issue. I've even just redone the tests against the latest code today in case I'm going mad in my old age, and still cannot reproduce the issue, hence closing it quickly before. We perform regular checks and scans and have never seen the issue you mention. I would advise you to try a raw Bluetooth scanner at this point. I use dedicated devices and software rather than my linux box here just to be safe on my test results. My testing certainly isn't showing an issue with the Herald code on our develop branch. Attached is a wireshark capture of an ios transmitting device. No duplicate services shown on restarting BLe with the app running. I believe this is the scenario you are alluding to @jimmo, correct? I want to be sure because we cannot replicate it. Capture of an iOS Herald demo app transmitter:- Also a capture from an android transmitter:- We've also tested it on Android by looking at the raw scan results on the phones themselves.
2020-12-15 08:18:11.913 6768-6768/com.vmware.herald.app D/Sensor::BLE.ConcreteBLEReceiver: onScanResult (result=ScanResult{device=59:44:24:42:A8:19, scanRecord=ScanRecord [mAdvertiseFlags=2, mServiceUuids=[cc0ac8b7-03b5-4252-8d84-44d199e16065], mManufacturerSpecificData={65530=[9, 109, -74, -59, 0, 0]}, mServiceData={}, mTxPowerLevel=-2147483648, mDeviceName=null], rssi=-59, timestampNanos=252009707856595, eventType=27, primaryPhy=1, secondaryPhy=0, advertisingSid=255, txPower=127, periodicAdvertisingInterval=0}, data=02010209fffaff096db6c5000011076560e199d144848d5242b503b7c80acc00000000000000000000000000000000000000000000000000000000000000)
2020-12-15 08:18:28.241 6768-6768/com.vmware.herald.app D/Sensor::BLE.ConcreteBLEReceiver: onScanResult (result=ScanResult{device=5E:95:C9:DE:90:C1, scanRecord=ScanRecord [mAdvertiseFlags=2, mServiceUuids=[cc0ac8b7-03b5-4252-8d84-44d199e16065], mManufacturerSpecificData={65530=[125, 110, 113, -30, 0, 0]}, mServiceData={}, mTxPowerLevel=-2147483648, mDeviceName=null], rssi=-58, timestampNanos=252026020657506, eventType=27, primaryPhy=1, secondaryPhy=0, advertisingSid=255, txPower=127, periodicAdvertisingInterval=0}, data=02010209fffaff7d6e71e2000011076560e199d144848d5242b503b7c80acc00000000000000000000000000000000000000000000000000000000000000)
2020-12-15 08:18:47.250 6768-6768/com.vmware.herald.app D/Sensor::BLE.ConcreteBLEReceiver: onScanResult (result=ScanResult{device=56:19:EF:B1:73:E6, scanRecord=ScanRecord [mAdvertiseFlags=2, mServiceUuids=[cc0ac8b7-03b5-4252-8d84-44d199e16065], mManufacturerSpecificData={65530=[120, -61, 98, 20, 0, 0]}, mServiceData={}, mTxPowerLevel=-2147483648, mDeviceName=null], rssi=-58, timestampNanos=252045048708454, eventType=27, primaryPhy=1, secondaryPhy=0, advertisingSid=255, txPower=127, periodicAdvertisingInterval=0}, data=02010209fffaff78c36214000011076560e199d144848d5242b503b7c80acc00000000000000000000000000000000000000000000000000000000000000)
At this point it appears the problem is with your test methodology and not the Herald code. If you can please be extremely specific about the exact test scenario, test against the develop branch of Herald on all devices, and let us know the device model and OS versions that would be extremely useful. We may well have the same device here to test. Also please test our develop branch code just to be certain. At the moment though we may well have to close this as cannot reproduce/wontfix because our testing has never exhibited this behaviour. We do regular testing, and a full suite across all our phones for every major release (our majors are dot releases. i.e. v1.0, v1.1 etc.), including with Bluetooth scanners to verify what goes over the network in our advertisements. This is why we were able to close your report so quickly the first time. Thanks for your efforts. Like I said before, I'll keep this open another day in case you find evidence I can replicate on the Herald demo app on the develop branch, but I really can't replicate your issue here with Herald currently. |
I've created a new label that's less aggressive than the 'invalid' 'wontfix' pairing. ❤️ |
"Transmitter" is the peripheral/broadcaster role, right? This bug is specifically for the case of an Android peripheral/broadcaster.
I looked at this capture and do not see anything other than advertising and scan response payloads. I do agree that there are no concurrent RPAs (but see below). However, the main issue that I'm raising here is that the services are registered multiple times. I think perhaps you might be confusing the advertised services (i.e. the (IN)COMPLETE_SERVICE_UUID field in the advertising payload) with the actual services being hosted by the GATT server (i.e. what you see in a service discovery). I don't see anywhere in your nRF Connect screenshots or logs where there's a connection to a device and a service discovery. See my second screenshot for an example. I'm not sure why you don't see the duplicate RPAs but there is one crucial difference which is that your Pixel 2 is running 29 (Android 10) whereas mine is Android 11, so perhaps the ability to have multiple concurrent addresses is an Android 11 feature, and now happens with multiple gatt server instances. Wild guess, but I mention this Android 10 vs 11 point specifically because I vaguely remember Apple/Google ENF calls out running the ENF advertising on a dedicated address in their design -- perhaps they added support for that in Android 11.
I think the key thing here is that a scanner wouldn't show the duplicate service registration. You need a full BLE stack. As described above, I'm using a minimal-but-extremely-well-qualified embedded stack (BlueKitchen), there's very little magic between my test code and the radio.
Please take a look at the code that I have highlighted twice above -- when the adaptor is powered off, it nulls the gatt server, never getting a chance to clear services or stop the server.
Just out of curiosity -- all the the manufacturer data fields in the logs above (i.e. PseudoDeviceAddress) still seem to end in two zeros? This should be fixed in the latest code? |
@jimmo I don't see any new data in your reply about the issue raised. As asked twice already please provide the exact details of your test devices. Also provide your exact methodology. Without this data I will not be able to replicate your issue so please be specific. As it stands you have not provided enough information for me to reproduce. Again please provide this information below:-
You are incorrect about the scanner not seeing duplicate device service advertisements. If they are not observed over the network via a scanner, how can an observer see multiple services? My scanner monitors the three advertising channels simulataneously unless it sees a connection, and then it flips one of the channels to follow that data packet. Herald rarely uses connections to transfer this data. I'm pretty sure I saw one of these exchanges whilst performing the captures so the data is in there. Filter for non ATT and SCAN_REQ/RES packets. If I receive no confirmation of the above information, and no new data, by morning I shall close this issue. |
@adamfowleruk and @jimmo I have just had a chance to test the Herald demo application yesterday, using the latest 'develop' branch (as of yesterday). Here's what I found:
I suspected that COVIDSafe integration with herald might have introduced a new bug that caused multi-advertisement with the same UUID. But there seems also a separate issue of GATT services not being properly cleared. From the preliminary tests above, it seems this may be specific to Samsung phones, but @jimmo confirmed that the issue manifested in Pixel 2, so I may be wrong here. It would be worthwhile checking whether bluetoothgatt.clearServices() really cleared the services properly. If not, this could point to a firmware bug (but this is purely speculation at this stage). Details of Samsung Galaxy S7 Edge tests
The phone was factory rest, no other bluetooth functionalities enabled (so no Samsung Share, or Google Nearby Share) to ensure no interference from other GATT servers. For the scanner, I use a standard Ubuntu 20.04 LTS, running on Lenovo Thinkpad T460s, with a bluetooth 4.2 adapter. For the scanning software, I use the bluetoothctl tool that's part of the BlueZ suite (provided in the default Ubuntu installation). I use adb logcat to capture the logcat output. For the test, I run Herald-for-android app ('develop' branch). I first scanned for the service UUID 428132af-4746-42d3-801e-4572d65bfd9b, and connected to the phone. The services listed for the phone are as expected (no duplicates). Then I turned bluetooth off on the phone and turned it back on. This causes the bluetooth address of the phone to rotate. bluetoothctl sessionlogcat excerpt (startAdvert events) |
Once again, I am not talking about service advertisements. I am talking about service discovery, where the client requests the list of services from the server and it replies with the 16-bit handle ranges. This has nothing to do with advertising. In fact, a peripheral may also act as a client and request the supported services from the central. This is how (for example) a peripheral device discovers the device name of the central.
This happens behind the scenes on Android/iOS when you request the services. The OS libraries abstract away the handles that are returned. A similar process happens for characteristic discovery within a service.
I did.
Thanks for the ultimatum. Feel free to close any time. |
@jimmo "Thanks for the ultimatum. Feel free to close any time." - What do you hope to achieve by trying to provoke an emotional response? Or by your last post in general? If you truly care about engaging in an open-source community to help improve software then this kind of baiting language is extremely unhelpful. If you wish to be taken seriously by your peers in our industry then please communicate in a professional and mature manner. Also, please respond to specific questions with specific answers rather than by re-quoting other parts of our responses. Forgive me if this observation sounds harsh or is incorrect, but your last reply reads more as a rant rather than a real attempt at engaging with our community. This is certainly how many of your replies read. If this was not your intention then please reconsider how you engage in this forum in future. The Herald Project members are providing their spare time to help create a fully open source set of software around reliable Bluetooth communication, that can be fully inspected for security and efficacy, and is developed fully in the open. Of course there are going to be issues - the ability to inspect and find them is the beauty of working in an open-source community rather than a closed source one! These issues should be expected, however irksome. When they are found, and we can reproduce them, we will fix them. Of course we will. I think our record on this has been pretty good on the Herald project. We do, however, need the data we are asking for though as without being able to reproduce an issue we cannot ensure through testing that it is fixed. Saying we think something is fixed only to find later it is not would be even more frustrating for all concerned, I'm sure you will agree. Us asking for specific information is not an unreasonable request, and is in fact standard in any bug remediation process. You should not take these requests as us not believing you - we merely need the data to help us identify the issues at hand. We go to great lengths to publish as much information as possible on this project. We may occasionally clumsily word things - we're only human after all. If clarification is sought we shall address questions. We are trying to do the right thing, do what works, and to always act with kindness. Being on the receiving end of comments like yours harms our community's volunteers emotional well-being and welfare, and takes their time away from improving software. It is not right, and it is not warranted. Further, we have shown a great deal of patience in asking for clarification, with a small number of specific questions, which you still have not provided an adequate response to. If your aim is to help improve open-source software then please start replying to specific questions with specific answers and ensure that your words are professional, that you respond with the relevant data when asked, and that most importantly you treat people with kindness and respect. I understand that Digital Contact Tracing apps and approaches, and their supporting technology, elicit an emotional response from people. Passion is good in the software industry. Passion drives us forward to achieve new things. Passion though doesn't give people the right to target their displeasure at individuals and groups rather than engage in scientific debate as it should be - open, data-driven, facts based, and above all respectful. I again for a second time direct you to our community code of conduct. This makes it clear that unprofessional behaviour shall not be tolerated. Please ensure that your future correspondence either via email or GitHub issues meets these standards. Up to this point it, at times, has not. Otherwise the project team shall have to, regretfully, consider what enforcement action we take on your interactions with our community. This may include removing entirely, or redacting partially, your comments where they contain unprofessional content. We prefer a completely open community, but you should be aware that as per the code of conduct this is one avenue of enforcement that is open to us. Our sincere hope is that you engage more professionally in future. Some of your reports have been useful. I'm sure there is value in continuing to collaborate with you - but not at any welfare cost to our team. If you wish to discuss these issues privately, or need support in some other way, then please do email me directly. You have my email address and I shall always be available to you for support. Yours Sincerely, Adam. |
@alwentiu Thank you very much for this very useful diagnostic information. Sharing which phone models and OS' were used, and which showed this issue and which did not is precisely the information I have been looking for, so thank you for sharing. Could I please also ask that you provide the Herald debug txt files from the S7 and S20 too? Thanks. Separately a UK based collaborator found an issue yesterday with a Samsung S20. This is particularly relevant as they only came on the market in March 2020. In this instance the phone incorrectly reported its advertising capabilities to Herald. Some manufacturers re-implement some parts of the Android APIs. This is why we're somewhat paranoid about not relying on API documentation in Android alone, and instead prefer to test everything and to collect real data. These are the kind of advertising issues on both older and modern phones which we have encountered many times over recent months. We're testing a workaround to the issues we found yesterday over the next couple of days, but your additional data will help confirm whether this is, as we currently suspect, a separate issue or instead a related one. It's worth noting that the S20 in particular has two architecture variants. The one most prevalent seems to be SnapDragon based (The one a colleague here has), whereas the one likely to be present in Australia is Exynos based. It may be that these different stacks are causing inconsistency even though they are both "Samsung S20" phones. Our debug logs log the exact model code and so are very useful in debugging. Thank you also for your detailed explanation of your test procedure, and for providing your logs. They have been very useful. I hope to hear back from you soon. |
Hey @adamfowleruk, Since you're from the UK, and most of us here are from Australia, I think there's definitely a significant cultural difference in terms of what constitutes professional behaviour. I don't think anything from @jimmo has been untoward, except perhaps the last line about the ultimatum (though I do think that your required turnaround time is unreasonable - analyzing this is our side project as much as building Herald is yours, and we can't necessarily meet demands at the drop of the hat). However, you do read as somewhat defensive and panicky, and in your last reply to @alwentiu, also somewhat passive-aggressive towards @jimmo, which I would definitely consider to be unprofessional behaviour. The people in this issue and related ones are some of the most skilled and esteemed people in the Australian technology community, and I would hazard a guess that you didn't expect this kind of scrutiny to descend so rapidly on your little two-man band (plus various government teams and contractors in a Slack group that is closed off from the rest of the world - it doesn't feel very open-source or "developed in the open" or "open community" to me). However, this was pitched somehow to the Digital Transformation Agency of the Australian Government, and they have publicly requested for input, though we probably would have torn it apart regardless. We have spent the months since late April analyzing COVIDSafe, uncovering flaws from significant UX issues through to significant functionality bugs and - somewhat impressively IMO - a path to RCE on many devices. Getting these fixed has been like talking to a wall, and I believe the Select Senate Committee has not yet gotten to reading the tech community's submissions. Ultimately the approach taken by Herald is not new, and follows in the footsteps of the failures of OpenTrace, NHS v1, COVIDSafe, ABTraceTogether, and TousAntiCovid et al. It is fundamentally flawed and the only way that Herald makes it remotely work is a series of tricks and hacks that are not palatable to the general populace (i.e. the end users). But given the strange constraints internal to the DTA and Department of Health (mostly a fundamental misunderstanding of how ENF works), this is all we have, and we do not want to see our hard work and input get thrown away by regressions in a hasty migration to another library. We want it to be as functional, private, and secure as possible, and we want an accurate depiction of its capabilities presented rather than a magical "yeah it works in 100% of all scenarios" chart like the DTA has presented. Regardless of the warranty and liability disclaimer buried in the Apache licence, it needs to be fit for purpose - even if we disagree with everyone from BCG to the Minister about what exactly that purpose is. People's personal data, movements, social graphs, and lives are at stake with an app like this, particularly when millions of people have downloaded it and the reopening of the country [foolishly] relied upon it. I may have misinterpreted parts of the thread as the very low level details of BTLE are a bit beyond my current understanding, but it seems to me that Jim raised an issue with X, you've responded that you can't see Y in your tests and need further info, Jim has stated that the issue is not with Y but with X, and you have finished up by stating that unless he answers your questions about Y you will close this issue. That's hardly conducive to "... gracefully accepting constructive feedback" as listed in the CoC. Please re-evaluate your approach to community feedback. If you don't understand the details of something that has been raised or need further clarification, I'm sure Jim or others will be happy to step in and explain it, but you seem to be getting increasingly hostile and it's not going to help anyone. I can just imagine the headlines if our experts get banned from discussion for raising bugs with COVIDSafe's stack. |
@yaakov-h I'm afraid I do not agree with your characterisation of my responses, or of your interpretation of this thread. You appear to have inferred both emotion and motivation in my responses where there are none. As I have stated, where issues are raised and data provided we shall address them. If the community needs more time then they are free to ask for it. No 'extra time' was asked for in this thread. I shall not be leaving issues open for long periods of time by default, however, as this doesn't help anyone. I have asked for clarification multiple times on the issues raised. Multiple times they have not been responded to with the requested information. @alwentiu did provide some useful data though. I note much of your response is around the COVIDsafe app. Please direct that feedback to the Australian government. |
Extra debug logging to try and replicate theheraldproject#107 - Now issue potentially traced to Exynos based Samsung devices we can try and replicate - This change only applies extra logging to confirm the issue cause locally - ISV partner has an Exynos S20 for testing Signed-off-by: Adam Fowler <[email protected]>
Related to theheraldproject#107. Potential workaround for Exynos gatt server implementation - Existing gattServer instances on devices with the Exynos base chipset do not remove adverts from the same source app - Affects Exynos version of some Samsung handsets, but not the more prevalent SnapDragon versions - Only causes an issue in testing if Bluetooth is turned on/off 23 times - a rare occurrence in normal usage - Without the fix Herald would still allow an affected device to interact with other devices (iOS and Android) thanks to the 'calling card' / write characteristic - Does not cause an issue for detection unless write characteristic/calling card is disabled (Not done in the Herald code base or library) - Issue theheraldproject#110 also occurs on Exynos based handsets only - Committing to develop to allow third party ISV testing today Signed-off-by: Adam Fowler <[email protected]>
@alwentiu Can you please retest your three devices using the latest Herald demo app code on develop? We may have identified two issues (#107 and #111) with how the Exynos chipset works differently from other Android devices. We've tested here on a limited number of handsets (Exynos variants of devices are not that common here), but would be great to get your devices tested too please. If they still fail then please include their Herald debug log files, as there's extra logging enabled in debug mode now too. |
@adamfowleruk I've just tested the fix on Samsung Galaxy S7 Edge and Samsung Galaxy S20 (both are apparently running Exynos ...), and yes the issue seems to be fixed. |
The isMultipleAdvertisementSupported is more complicated than this. I don't have access to an Exynos based phone, but the behavior on my Pixel 2 (Android 11, 5th October Security Update) is that isMultipleAdvertisementSupported returns True when the adaptor is powered on, but False when it isn't. This means that ConcreteBLETransmitter::isSupported() is confusing the bleTimer() method.
However, with the new commits added to develop, I no longer see duplicate service registrations (i.e. service discovery), but I do see duplicate RPAs still (i.e. concurrent advertisers). So... The two commits added to develop do two things:
It's very hard to follow this -- bluetoothGattServer is both a member of the AdvertLoopTask class and a local to startAdvert (now a member of the ConcreteBLETransmitter class). @alwentiu 's fix in AU-COVIDSafe/mobile-android#36 works because in bleTimer() it clearServices() and stops() the bluetoothGattServer before null'ing it. I haven't tested this to see if it also solves the RPA issue, but I suspect that it might also need to stopAdvertising. However, I would suggest a simpler fix, now that isSupported() is doing the right thing. in bleTimer(), remove the power off check at the start -- just I have confirmed that this solves both issues (duplicate RPAs and duplicate service registration) on my Pixel 2. |
Please see https://github.com/AU-COVIDSafe/mobile-android/issues/33
I don't think this is necessarily happening in Herald, but raising here FYI as you might want to implement some sort of guard in SensorArray or ConcreteBLETransmitter to prevent this from happening.
The text was updated successfully, but these errors were encountered: