-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd crashes due to corrupt data dir (3.2.16, Fedora 28) #10012
Comments
Oh! Wiping data No worries for a dev workstation. Less cool if this were production... normal? |
@vorburger can you tell us about the data that is recovered from snapshot? I believe this panic can be caused by restoring from snapshot that has /v2 data only and no /v3. ref: #9890 |
I'm new to etcd and have not, or at least not intentionally, used v2 at all. But I actually didn't |
Sure zip up the data I can take a look. The v3 binary also exposes v2 API so if ETCDCTL_API=3 was not set etcdctl could write to v2 store. Perhaps causing the issue.
I agree on the panic, let’s see what we can figure out.
… On Aug 15, 2018, at 10:04 PM, Michael Vorburger ⛑️ ***@***.***> wrote:
I'm new to etcd and have not, or at least not intentionally, used v2 at all. But I actually didn't rm but mv, so I would still have that data - would it be useful if I ZIP and shared it for reproduction? I don't remember how I got to that state though, sorry. Unless I've had the etcd RPM package installed for longer than I thought, and it used to be v2, and then during a Fedora upgrade it became v3... is that possible? FYI this is NOT blocking me - I just thought I would file it if it adds value to the project. If you are sure that this is "just" #9890 and no need to repro with a ZIP of my data, then just close it? Although, in an ideal world, I guess it still never should just crash like this? 😄
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Here is my |
@vorburger just checking in, I will have some time this week to dig into this. |
@vorburger I took a closer look and found data in the existing snapshots pointing to an existing v2.2 member.
If I tried to start the cluster with the datadir you provided I receive the same panic noted above. As noted in the upgrade documents upgrade-checklists to upgrade to 3.0 the etcd cluster needs to be at least 2.3. So for this to work we would of needed to upgrade from 2.2 to 2.3 and then move towards 3.2.16. The panic is originaly pinned to #9480. So while a panic is not what we would want to see I feel this is fairly well documented. But since we were here I figured lets try to migrate and see if it would work. So I did a backup and restore of the directory you provided with a 2.3.8 binary. Then started the node we see update succeeds.
Next with v3.0.17
Next we will add some v3 data to avoid panic as if I just start this with v3.1.19 we also get panic because we have no v3 data.
Upgrade works as expected.
Now we are in the clear and can just upgrade v3 binary and panic is no longer an issue. I hope you find this useful information. |
Cool. I don't mind if you just close this issue with this. What I do find a little bit curious is that I have no memory of ever having explicitly done anything to "point to an existing v2.2 member" (barely understand what that even means TBH). So just wondering if I've had the etcd RPM package installed when it was only v2 before v3, and played with v2, and then during a Fedora upgrade it became v3, and that caused this. That, in theory, in an ideal world, would mean the "upgrade path" is broken. Or may be it's just something I did explicitly, like 2-3 years ago, and forgot about. Unles you have plans to better handle this crash, just close this issue - but I wanted to at least record this thought here, just in case this crash ever comes up anywhere again. |
@vorburger you bet and I appreciate the issue please don't think twice about that, I can leave the issue open.
The data was clearly from v2.2 which is not supported for upgrade to v3.x. If etcd were upgraded per the documentation the panic issue still would have happened moving from 3.0.x to 3.1.x. The reason for this panic to prevent accidental v3 data loss (e.g. |
Tempting (and thanks for the offer, and faith in me) but I'm already spread too thin (FYI I'm helping out over on jetcd!) and most importantly would first have do some some catching up re. Go... 😈 |
Yes I am watching this and thank you! Interesting I am trying to help out at jetcd but need catching up with Java :) |
cc @jpbetz |
This is resolved right? Okay to close? |
@jpbetz re. "This is resolved right? Okay to close?" note @hexfusion saying "I feel handling the error is probably a reasonable goal and I will try to take a look at it soonish." |
To elaborate etcd conditionally calls panic so it is intentional in this case as a safeguard. I think a user seeing this panic will eventually find these GitHub issues, but a message to the fact that the v3 store must have data could be helpful. Again I need to review this but I did want to add a little clarity to my statement. Lines 441 to 447 in 34fcaba
|
I will create a new issue to track this change, closing this as the question was answered and I do plan on reviewing further. |
I have installed etcd on Fedora 28 via
sudo dnf install etcd
, and am up-to-date.If I do something like
cd /tmp ; etcd
(without parameters) it starts just fine.But if I
systemctl start etcd
it crashes, similarly to below.I realize this is could be some kind of an issue re. how systemd launches etcd, like a wrong parameter, thus possibly more of a mistake in the service configuration file packaging than a core etcd bug, but it probably still should not "crash hard" (with coredump), but print some sort of more useful message for whatever it is not happy about?
Glancing at
/usr/lib/systemd/system/etcd.service
and/etc/etcd/etcd.conf
I have been able to reproduce it without systemd by just launching it with the same parameters from a CLI like so:The text was updated successfully, but these errors were encountered: