-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Versioning reporting across components #200
Comments
Could the problem be restated in the context of our current process/reporting? When I read the problem I can't identify what requirement/desire isn't met with our current reporting (with the context of the changes Joe is already addressing such as adding PSP info, reporting module info, etc). Basically I continue to lack the understanding of why any additional "solution" is required when I don't understand the problem with what we are currently doing. What is the real problem you are trying to solve (that isn't solved already by historical reporting + our current set of updates in work)? |
IMO the problem we need to solve is two fold:
Edit: correction - both development and real operations require the first point - when you boot the software you really want to have a way to confirm you are running the version you intend to be running. That's really the important part. |
Also worth keeping in mind in flight there are com blackouts, watchdogs, reboots, multiple images, etc. I often see "red limits" put on version numbers in telemetry to immediately identify changes (and have utilized this in practice on projects where it was essential and the warning was critical to success). But I'm still not following the motivation for any of the "Solutions" so I'm unable to trade the negative impact of changes (re-educating all our current users who depend on how things currently work, impact analysis on sustaining engineering/ops/ground stations/etc) vs these proposed changes. |
Speaking from experience - on more than one occasion I've struggled with making some change but not getting expected behavior out of the target system after loading. Only to find out the code didn't actually get loaded for one reason or another - so I wasn't even running the version that had whatever I was looking for. Furthermore, also consider issues like nasa/cFE#952 -- This is in fact a great example of why a version report at boot time is important, otherwise you never would know that you just restarted the same version of your app rather than the updated version. Ideally it works best when the identifier gets updated automatically with every change (i.e. not a hand-edit). It's usefulness is inversely proportional to the likelihood that more than one build reports the same info. |
@skliper I see the comm blackouts + watchdog resets as a good reason for putting an identifier in the TLM housekeeping stream - it can happen that you get a failure which involves a watchdog reset and fallback to a backup image or something of that nature. If there is no identifying into in the TLM - then you may not realize right away that this happened. (uptime is another potential indicator of anomalies like that). |
@jphickey - yup, I'm agreeing with everything you are saying. What I don't understand (still) is what problem we are trying to solve relative to our current implementation (+ your changes already in work). We have the version info in TLM (although not PSP, it'd be good to add) and it's identifiable when it's a development version (along with what the development version was based on), we have enhanced versioning (info from git) reported at startup, we have development build number and baseline reported at startup. Why would we do Solution 8? Seems like that would break things. How is 7 different than what we already get with enhanced version reporting? 6 - we've got a versioning section in the users guide, what else is being proposed? 5 - we report checksums already (checksum app), what change is being proposed? 4 - why? for ops the real time telemetry and events seem sufficient? 1/2/3 - what's the motivation to change, when any change requires re-education of ALL users this doesn't seem to solve an issue with new users not understanding the current process. |
IMO much of the info we currently report is really superfluous -- the most critical/important aspect of this version report is to let the user/operator know they are running the version they are supposed to be running. The specific details of that build (i.e. where it came from, whether it is official release) should be known to the user already - they presumably built it and loaded it, after all - running on a target is fairly late in the game to be figuring out that that what you built wasn't an official release - the documentation should reveal it long before the user ever got to that stage. That's why I think the most important aspect is that: a) modules all self-report some version identification info If you have a+b then all details like whether it is a dev build or not, or the last official release, should already be known implicitly from that. |
@jphickey Agreed. I like the enhanced version reporting, everything is uniquely identifiable and can be reported in event messages. I also think there's good reasoning behind the "simple" tlm of integers that communicate official release numbers or that it's a development version along with tracing to at least the MAJOR.MINOR release it was based on. |
Beyond that, I struggle to understand what additional build numbers, strings, etc gives us. For reference, when I say enhanced versioning I'm referring to git describe, build username, and build date. |
At any rate - going back to the original question here - as a developer, I'm most interested in, and pay most attention to item (7) from the initial summary (git-derived version description). We already implement this, and it is reported separately from other version info on a different event ID. So I think this fully addresses the developer use case because:
But for operations the requirements/concerns are quite different:
|
I think the enhanced versioning (git-derived) covers almost all the requirements. The only issues are:
|
BTW - item (2) in my comment is easily solvable, its just a matter of providing a hook so the user can easily tie into whatever the equivalent of "git describe" is on their version system of choice. It's probably safe to assume everyone uses a VCS of some sort, even if not git. (if they don't, then this issue is almost certainly moot). |
Jake and I talked and agreed on the three levels of information Integers in HK TLM - mean the last official release or, if REV=99, that it's a development build and the MAJOR.MINOR of the last official release Static human-readable numbers (version and baseline) in version.h - Indicates what "we", the cfs team, delivered At-build enhanced-versioning - Indicates any local changes the user has added on top of our delivery, assuming they're using git. |
So path forward,
|
Maybe HK could report MAJOR.MINOR.REVISION.MISSION_REV on releases and MAJOR(of last baseline).MINOR(of last baseline).99.BUILDNUMBER? Then the build number is traceable between the two? |
where for 99.BUILDNUMBER we are reporting build number from version.h... (which could be filled in w/ a git hook?) EDIT: where build number comes from git describe... so would match enhanced versioning (as long as the buildnumber is incremented by 1 as part of the update to version.h) |
I think that could work:
For the same codebase |
The only issue is that the "last official release" has 3 numbers not just 2. So how can one differentiate a development build based off 6.7.0 vs 6.7.1? This is why I was hoping we could use the 4th number alone as a dev flag, while keeping MAJOR.MINOR.REV the same between dev/official releases. How about MAJOR.MINOR.REV always refers to the last official release, and we overload the 4th number as (build# + 1000). So 4th number >= 1000 means a dev build, subtract 1000 to get commits since epoch. In official releases it is set to 0, and missions can use it as their local rev number (or however they want) so long as they keep the number <= 999. |
If we always use the MAJOR.MINOR.0 release as the baseline for counting our build number then we can stay with MAJOR.MINOR.99.BUILDNUMBER? Basically you could tell what the real baseline was from buildnumber (that is written to our header). EXAMPLE: we'd know that MAJOR.MINOR.5 buildnumber relative to MAJOR.MINOR.0 is 200, so MAJOR.MINOR.99.201 is really MAJOR.MINOR.5 + 1 commit. EDIT: I guess w/ 1 byte we'd wrap at 256... but as long as the TLM is really the "hint" but the event contains all the info maybe that's ok? If REVs diverge then the scheme really needs to consider buildnumber in the TLM as a "hint" since there could be a MAJOR.MINOR.4 that diverged that could also have a buildnumber of 201... but I don't think we are trying to provide something in TLM that is 100% unique for every possible situation (that's the job of the event, to the best we can), it's just helpful I'm OK with treating the HK packet version info as a "hint" when it's a development build since it is typically used in situations where it's less critical. |
I still would think we'd want the MAJOR.MINOR.REV content in the TLM packet to be true/correct - this is why I don't like clobbering/overriding the REV field - it's relevant. If the dev baseline release was 6.8.1 then we should report 6.8.1 in the TLM. There might be some ground system logic that looks for a specific value in these fields or otherwise does something different based on what it receives for the version tuple. I'm not sure that a build# can always reliably indicate the baseline - think about the case where development of the next major release is underway (like now, with 7.0.0) and if we had to go back and make a 6.8.1 or 6.8.2 patch release (still possible!). For instance the dev build# of a 6.8.x+ build would have a build# that looks similar to an early 7.0.0 dev build. So a build# alone is not sufficient to unambiguously identify a baseline. In reality as a developer the best way to unambiguously identify the baseline is with the VCS-based version ID (git describe). Speaking for myself, this is all I really pay attention to in a dev build, because it is has the complete info. The main goal of the TLM info in a dev build should not be to identify the build at all - it is just to tell the operator that the TLM info is not the whole story. Since the fields are 8-bits each I would consider just defining the MISSION_REV (4th byte) as:
Of course we could still use 99 and just reserve 1-98 for mission use but that leaves over half the range unused.... Again, the objective should not be to fully identify a dev build in HK TLM - there isn't enough bits for that. The only objective is to give a hint/indicator that this is not an official release. One simple, reliable indicator (such as 0xFF, or 99, or whatever, in the MISSION_REV) is good enough - and doesn't clobber the real MAJOR.MINOR.REV info which still might be important. |
Is it worth changing version reporting in TLM for development versions 3 times in 3 releases vs just 6.7 cycle being unique (since I broke the pattern) just to get the REVISION information in the TLM packet for development versions? Since it's development maybe it doesn't really matter, it is a lot of thrashing though. I think it's fine since I don't really care (I broke it the first time)... any community or former/active/future project input? |
Yes, because we haven't gotten it right yet :-). Technically 6.8.0/bootes is still pre-release so in theory we could patch it, right? - if we agree to my suggestion it's just a matter of changing the "dev flag" from revision to mission_rev - trivial to fix and make consistent. |
And just to note - whenever bootes is officially released we have to change the version info anyway. |
I'm fine with your suggested change (0xFF in MISSION_REV). Seems like a fair balance of info vs bandwidth. It's a change to the hk packet either way since neither PSP or the mission rev info is in there now. |
Yes, we'd need to add PSP info to the ES HK TLM. The mission rev is already in there though, for OSAL and CFE at least. The only change is adding PSP info. |
To summarize what I'd like to do here, each module (OSAL, PSP, CFE) would publicize the following info via their API/headers:
For reporting:
Alternatively: we could make build number 16 bits instead of 32, then it could theoretically also be included in HK TLM without bloating it too much. But I don't think this is necessary, since no two builds should report the same MAJOR.MINOR.REV.MISSION_REV in a real FSW deployment - only dev environments would have this possibility, and devs should be focusing on VERSION_INF event, not HK TLM. Edit: originally I omitted code name, I figure this should be part of the set though, since we are using this to identify compatible module groups, its important to report as well. |
Good point, I missed that and with no change required for cFE or OSAL is it worth changing a tlm packet at this point just to add PSP (current content has been "good enough" up to this point)? Maybe in the event is enough? Projects typically own PSP's anyways... I could be convinced either way at this point. |
@jphickey I love the MISSION_REV: 255 solution! |
I think we should add it for consistency. Caelum is a breaking-change release and we can consider tlm part of the "public interface" so it is fair game. |
My opinion on the CFE_MISSION_REV location has drifted back to putting it in the mission config file in sample defs. I know I pushed for the opposite, but since modules can be replaced without changing actual code it seems useful (along with the original point brought up that config parameter changes may be useful to track w/ an incremented MISSION_REV). Related to nasa/cFE#1084. |
@skliper are you saying having a single mission_rev for the mission implementation of cfe+osal+psp or having multiple COMPONENT_MISSION_REV in sample_defs? |
I was only referring to the existing CFE_MISSION_REV, which used to have duplicate defines in both *_platform_cfg.h and cfe_version.h. In nasa/cFE#1084 it was removed from *_platform_cfg.h. It seems to make more sense to be defined in the sample_mission_cfg.h. |
I would agree with putting MISSION_REV in the CMake file, I think it does fit better the conceptually (particularly because one can substantially change behavior with a mission header change in the defs dir). It throws a bit of a monkey wrench into my previous suggestion to use 255 as a flag for dev build though. That flag needs to be part of the version.h header within the module - same one that contains major/minor/rev. |
Here's another option: do both! The "defs" dir state should be captured in the extended info as it is. That is, the defs dir is included in the set of things we call "git describe" on and this string is reported separately, as the mission version. So any changes that are made to the "defs" dir should show up here. So what if we do not replace the CFE_MISSION_REV with a CMake defs-based number - but rather keep both of them? (That is, since we already report extended version info for the "defs" directory - could we add a new, mission-assigned rev number there?) I do think there is some value to having a numeric value that the mission can bump up if they ever change the source code of each individual module, including CFE. That has some value - and so does having a rev number in the "defs" directory, but the two kinda cover different things - they don't conflict, they complement. |
Good points! I feel like with the extended version info on defs that I had forgotten about (and all the modules getting reported after the update in event messages) covers what I was missing. What we have already is sufficient for me. Not that we couldn't add some other number, but just the commit count is enough for me. |
OK then! From the sounds of things, we might have consensus now? I'd like to go ahead and complete the PR for nasa/osal#824 according to this paradigm, if its settled enough. |
Add a PSP implementation of the version API discussed in nasa/cFS#200
Add a PSP implementation of the version API discussed in nasa/cFS#200
Here's another interesting system https://github.community/ |
Fix nasa#200, Follow include standard
Follow up to discussion in nasa/osal#824 and in CCB:2021-02-24
Here's my view of the situation so far.
Problem:
Each of these mechanisms should "report" a minimal set of non-contradictory, self-evident information.
Solution approaches for reporting the version state during development
These are not mutually exclusive, I foresee landing in some sort of combination between them.
REVISION
number to99
MISSION_REV
number99
99.99.99.99
or0.0.0.0
Constraints
The text was updated successfully, but these errors were encountered: