-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify check attempt data type to uint32 already used somewhere #656
Conversation
Before
AfterNo crash so far. |
855190f
to
f26ecbf
Compare
ref/IP/48137 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a corresponding schema change as well. Currently, the check_attempt
columns have type tinyint unsigned
(MySQL) or tinyuint
(PostgreSQL).
f26ecbf
to
2a53b1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. However, given that the schema upgrade changes column types in history tables (thus potentially resulting in time-consuming rewrites thereof), I'm not yet sure if we'd want to backport this and what this means for the schema version number change (which I want to consider before a final approval).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on Friday, please adapt this PR so that the change to the state_history
is optional in the migration, so that only users who actually need it have to apply it. Otherwise, everyone, even all the users that just have regular checks with a few attempts would have to take the penalty of rewriting the that whole table for no real benefit.
Unfortunately, this will imply a possible discrepancy between fresh and upgraded installations, so there should be a corresponding hint in the full schema file.
2a53b1e
to
8272f26
Compare
expectedMysqlSchemaVersion = 5 | ||
expectedPostgresSchemaVersion = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done separately and even in a separate PR, probably as part of #707.
schema/mysql/schema.sql
Outdated
@@ -1343,4 +1343,4 @@ CREATE TABLE icingadb_schema ( | |||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=DYNAMIC; | |||
|
|||
INSERT INTO icingadb_schema (version, timestamp) | |||
VALUES (4, CURRENT_TIMESTAMP() * 1000); | |||
VALUES (5, CURRENT_TIMESTAMP() * 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, there are already upgrade files w/o even such changes. But the specific things our 1.1.2 upgrade files already do don't break the daemon and are even idempotent:
- MySQL/MariaDB: Fix
icingadb_schema.timestamp
not being UNX time #700 - PostgreSQL: get_sla_ok_percent to return decimal #710
So they don't need a new schema version, but this PR does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So they don't need a new schema version, but this PR does.
Of course they do! What makes them different in any way from this PR?
As I said before, these specific schema version changes should be committed either in a separate commit or even better in a separate PR. This schema change has nothing to do with the actual fix and as such, should better be part of the #707 PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our upgrading docs should mention this bug and the schema fix and why it is not included in the upgrade scripts. It should also mention a workaround for this bug, which is to increase retry_interval
so that time_to_hard_state = 255 * x * retry_interval
. We should also document the statements to be executed and make clear that they take a very long time and should not be aborted under any circumstances.
Edit: I don't like the additional upgrade scripts. Since there's something to be written in the upgrading docs anyway, the statements should be there.
There and only there? I mean if these exist as a file like all the other ones, one can apply them the same way and in one go like this:
If it's only in the upgrading docs, you have to copy & paste, just another step where mistakes can be made. |
We'd have to have full documentation in both places to make absolutely sure people read and understand it, although I'm pretty sure there'll still be people who just import the upgrade script without reading it, because why not. Of course, we could rename the file to |
1790fa4
to
89f43ac
Compare
I don't think it's possible to make everyone ignoring the upgrading instructions happy here. If it was, there shouldn't be an optional part. There will probably be users that ignore the upgrading docs, upgrade their setup, and then run into the bug a year later.
Sure, the name
But that special handling would be necessary regardless where we put it now. But if the optional part is only mentioned in the upgrading docs and not where all the other schema upgrades are, I think there's a higher chance that we would simply forget that there was this optional upgrade some time in the past. |
That is a good idea. |
89f43ac
to
af61e66
Compare
af61e66
to
0ae093a
Compare
4dff7fc
to
34fb7ed
Compare
doc/04-Upgrading.md
Outdated
### Upgrading the state_history Table (optional) | ||
|
||
Icinga DB crashes if hosts/services reach check attempt 256. The `check_attempt` column of the `state_history` table | ||
is too small to fit values greater 255. ([#655](https://github.com/Icinga/icingadb/issues/655)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that if you read that section top to bottom, with the information you were given up to that sentence, you may ask yourself "well, if it still crashes, what does this update even do?".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your claim sounds better with if you read only the 1st 1/3 of that section. Yes. There's a crash. And the update doesn't fix it.* But we have good reasons and wrote them down in the upgrading docs.
* Actually it does, you "just" have to apply two files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your claim sounds better with if you read only the 1st 1/3 of that section.
Well yes, people tend to read text from top to bottom. And usually it's a feature if understanding a sentence doesn't depend on information only given later in the text. Otherwise, you end up with a text you have to read multiple times until you understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've prepared a suggestion for how this can be written in 1298c30. It's based on the text provided in this PR (hence the Co-authored-by
) but restructured so that it first explains why the schema upgrade is split and then mentions the options including the "may crash" warning if the upgrade is skipped. Apart from that, I've tweaked the wording in a few places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good in general, but for max_check_attempts=256 (which already works btw.) the highest possible check_attempt is 255: https://github.com/Icinga/icinga2/blob/9e31b8b5590c6d67b7dd538b2c884bd377a4e486/lib/icinga/checkable-check.cpp#L233-L237
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Feel free to change it, but I think it's fine to leave it as is with 255 obviously being a safe value for uint8 (with extra room for an off by one somewhere that hopefully doesn't exist 🙈)
34fb7ed
to
204ef91
Compare
204ef91
to
6e3a96c
Compare
A float isn't necessary as in Icinga 2 Checkable#max_check_attempts and check_attempt are ints. But uint8 isn't enough for e.g. 1 check/s to get HARD after 5m (300s > 255).
Co-authored-by: Alexander A. Klimov <[email protected]>
6e3a96c
to
ac85b52
Compare
A float isn't necessary as in Icinga 2 Checkable#max_check_attempts and check_attempt are ints. But uint8 isn't enough for e.g. 1 check/s to get HARD after 5m (300s > 255).