Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EnviroDIYPublisher POST enhancements #454

Open
neilh10 opened this issue Sep 20, 2023 · 6 comments
Open

EnviroDIYPublisher POST enhancements #454

neilh10 opened this issue Sep 20, 2023 · 6 comments

Comments

@neilh10
Copy link
Contributor

neilh10 commented Sep 20, 2023

For EnviroDIYPublisher tpwrules has proposed an enhanced method for uploading to the server, to allow for faster server processing. It results in larger overall packet, by packing readings in to the packet with an enhanced JSON format. By doing this there can be less POSTs, less repetition of the UUIDs and therefore less traffic on the wireless communications channel.

However larger packets on a marginal wireless channel can mean less successful POSTs, so its going to be site dependent as to if this will result in less overall data transferred on the specific wireless channel.

I'm originating this issue as a software best practice's to discuss the options.

There is a server component that explains the enhancement in more detail
WikiWatershed/monitor-my-watershed#649
WikiWatershed/monitor-my-watershed#674

Currently the discussion is #434
based on https://github.com/tpwrules/ModularSensors/tree/batch-transmission

neilh10 referenced this issue in tpwrules/ModularSensors Sep 20, 2023
@neilh10
Copy link
Contributor Author

neilh10 commented Sep 20, 2023

I’m commenting from a reliable delivery of messages, modeled on the the standard Computer Science communications model https://en.wikipedia.org/wiki/OSI_model .

Typically the Data Link Layer implements retransmission to insure successful delivery of packet.
I would suggest that for ModularSensors reliable delivery , once a reading is taken, and committed to the .csv file on uSD it should be reliably delivered to the Server. Wireless Radio waves also have their own specific transmission issues – particularly for larger data buffers there is less chance of the data being received, and more retransmission of data.

A practical problem with ModularSensor architecture of allowing multiple Publishers, is that the servers can all be in different states, and a readings queue needs to be established per server. I’m not sure that multiple publishers is actually that useful, but that is the current architecture.

tpwrules@74cac0d
For proposed implementation of ::xxPublisher with a logbuffer based in ram memory, filled every period by a reading action and spoofing a 201 response, it’s breaking the layered model and the meaning of 201. By spoofing a 201, I would expect the implementation to be guarantying the meaning of the 201 - that it will be delivered to the server.
For the period that the data is stored in the logbuffer, which for 15minutes sampling if it was 4records is one hour, and 8 records is two hours, If someone walks up to the system and plugs in a USB monitor, if there is a reset watchdog or maintenance action (reset) – the last set of readings stored in the ram buffer are lost.
In addition, due to limited RAM, its no saleable to the other xxPublishers.

Readings can be stored in a file on the uSD, which I’ve done in a very similar manner to the logBuffer functionality. This effectively gives a large buffer, though characterizing the real-time effects is on going and the buffer size needs to be effectively limited for reasonable response times. I recently had field items that stored close to 3months of readings at 15minutes ~ 9000+, and in realistic test situation with good cell connection then they where all delivered to the server. WikiWatershed/monitor-my-watershed#673 (comment).
On the original field device, which has a noisy wireless channel uploading readings is taking weeks.
However it looks good and if the wireless conditions allow, I expect it will complete the reliable upload
https://monitormywatershed.org/sites/TUCA_MW12/

The concept of building the JSON to only have one UUID per transaction is a nice industry standard, and could be implemented reliably by writing the readings to the uSD. The use of ram memory store is a quick fix that is IHMO NOT extensible to the general case of reliable delivery of all readings to the server.
The systems I implemented of writing to the uSD, could be adapted by changing the parameter
bool useQueDataSource = false;
to
uint8_t useQueDataSource = 0; //Where 0 represent current transaction type, 1-n would represent using a queued uSD source, and 2-n would be use JSON attempting minimal channel overhead.

This would give flexibility to optimizing the method based on field conditions, either at compile, or by a an adaptive retransmission algorithm based on RSSI and previous failures. WikiWatershed/monitor-my-watershed#485
In the EnviroDIYPublisher implementation, its likely that it will still need a ram buffer, but this is short term (ms) and can be on the stack.

However it does go over wireless radio, and on the edge of the radio signal range there is less chance of them being received. A side effect of larger buffers is it reduces the effective wireless range.
With an adaptive transmission scheme, starting with a large JSON buffer (N large) and high failure rate, then N could be reduced until it reaches 1. Though this is likely to be harder to test.
Just my two cents 😊

@aufdenkampe
Copy link
Member

@neilh10, a quick correction to this comment:

A practical problem with ModularSensor architecture of allowing multiple Publishers, is that the servers can all be in diffent states, and a readings queue needs to be established per server. I’m not sure that multiple publishers is actually that useful, but that is the current architecture. I believe it would save flash program space by removing with a condition compile the unused publishers

The ModularSensors architecture is very efficient, because the compilers only include the code from files that are specified in the include statements, so if a publisher, or sensor, or modem isn't included in a sketch, then the code for that feature isn't compiled. That's the genius of the ModularSensors architecture, and why it was named "Modular".

@neilh10
Copy link
Contributor Author

neilh10 commented Sep 20, 2023

aufdenkampe thanks for the observation, and yes that is the theory ... and C++ is known for code bloat.
I'll edit the wording to remove the sentence (and clean up all that spelling - how did I miss all those foibles!).
I've made a note to myself to experiment at little in neilh10#138

@aufdenkampe
Copy link
Member

@neilh10, it's more than theory, it's how it works. Only files that are included are compiled.

The code bloat that can and often happens with C++ is when a lot of optional functionality is in a single file. @SRGDamia1 has done an excellent job with the Object Oriented Programming (OOP) design of ModularSensors, separating concerns, so that all the optional functionality is separated into different files. She's done continuous refactoring to maintain these strict separations of concerns by abstracting out shared functions into very lean base classes and putting all the specifics into optional source files for every subclass. Her code is exceptionally DRY (Don't Repeat Yourself) and easy-to-read. I don't see any bloat.

@tpwrules
Copy link
Contributor

tpwrules commented Sep 21, 2023

Point taken on the 201 response, I've fixed that.

As for the reliability, I simply have not seen the behavior you see. The only issue I ran into with the larger requests is that it would crash my modem, but I was able to work around that too. If the timeout is increased, TCP should be able to manage the dropped packets in theory. If a user was concerned about that behavior, they could reduce sendEveryX and MS_LOG_DATA_BUFFER_SIZE to reduce the mean and maximum packet sizes, respectively.

Our operation is severely power constrained (and somewhat cost constrained) and admittedly we optimized for that case. But even then the only time over the past six months we have lost data was during a global outage of our cellular provider for several days. And that data isn't truly lost, it's still stored on the SD card for when maintenance is done. I understand that doesn't quite meet your (or my) definition of reliable, but it's still an aberration.

I would be very happy to see an extension of my work that could buffer data on the SD card too and plan to work a little on that in the future. Yours did not meet our needs at the time which is why we developed our own solution. We also had concerns about the SD card activity and processing further increasing power consumption.

@neilh10
Copy link
Contributor Author

neilh10 commented Sep 21, 2023

Practically speaking if there is a FIFO and the EnviroDIYPublisher::publishData builds the JSON request from that FIFO then it could be implemented as either ram or uSD flash that meets both requirements.
If the HTTP response is supplied to the higher layer as to the success of that request, then the higher layer can manage the FIFO, and however many readings where in that POST.

Practically speaking - until its implemented (and tested on the production server) it can't be tested from a Mayfly. WikiWatershed/monitor-my-watershed#649 (comment) (I'm not clear if you actually have a server instance with your code well tested?)

@tpwrules I would be interested to see hear about your power model.
There is a lot of value in characterizing the real world conditions, and then sharing that data for a better understanding of optimization's.

My reference is cell phone, with some of the Mayflys at the limit of the CellPhone range.
Actual making a connection can be weather dependent and since its about surface streams, in the riparian area of a stream, season dependent with the growth of vegetation.
I'm looking to make the delivery of time series measurements as reliable as "Boot net" - walking up to the system and offloading the data. My power model is a solar collection - with the possibility of storm reducing the solar collection for two weeks. For the people I'm working with, also collecting a uSD periodically to make up for technical short comings of the telemetry isn't something they are likely to do.

A reference model of using battery powering and delivering status (rather than time series readings) would be a severely constrained power model that would need specific optimization's to extend the battery power as long as possible.

Practically speaking, software can be adapted through compile options for different models, so it seems to me both models could be made to work.
FYI I documented my approach as working Aug 16, 2020 - #194 (comment)
so I've been testing it for over three years, and have it in multiple field systems.
I restated it July 10 - #194 (comment)

The core of what I do is to have the upper layer setup for
https://github.com/neilh10/ModularSensors/blob/release1/src/publishers/EnviroDIYPublisher.cpp#L162
then I've adapted it in a distributed FIFO to read from the buffered FIFO
https://github.com/neilh10/ModularSensors/blob/release1/src/publishers/EnviroDIYPublisher.cpp#L334
https://github.com/neilh10/ModularSensors/blob/release1/src/publishers/EnviroDIYPublisher.cpp#L352
and it works.

In addition I log each cellphone call and time taken for the response (DBGxxx.log on the uSD), which has been a valuable view of the servers responses.

My measurements in Aug indicate

For discussion/comparison, I would assume a JSON extension with 4 readings takes the same amount of time, and negligible extra modem communication time.

So for 4 POSTs the current systems takes
4 * (26+2.5secs first post) = 114seconds of Cellphone on, power

with my uSD based queue, 4 POSTs per cell phone call that I'm making it gets
26 +4*4 = 38seconds - a big improvement on 114seconds
and reliable queuing of all undelivered data, not receiving a 201

with the JSON extension for 4 readings in a POST
26+2.5 = 28.5seconds - also an improvement on 38seconds, plus a 4x through-put improvement for the server.

My suggestion - would be to complete the current integration that just impacts the ModularSensors, and is an optional call for all users to try it out. Now on https://github.com/EnviroDIY/ModularSensors/tree/reliable_delivery
Then refactor for a better FIFO API.
When/if the server integrates the JSON extension, that then would be the time to have the ModularSensors upgradeable to your instance of a JSON extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants