Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

percona56-cluster-1.2 - fails to transfer state snapshot with xtrabackup (2015Q4) #357

Closed
pannon opened this issue May 1, 2016 · 13 comments

Comments

@pannon
Copy link

pannon commented May 1, 2016

In a nutshell cluster setup fails at the SST step (state snapshot transfer).

The following is seen in the /var/log/mysql/error.log:

WSREP_SST: [INFO] Stale sst_in_progress file: /var/mysql//sst_in_progress (20160502 09:47:12.438)
usage:  pfiles [-F] { pid | core } ...
  (report open files of each process)
  -F: force grabbing of the target process
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20160502 09:47:12.591)
2016-05-02 09:47:12 69144 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.x.x' --datadir '/var/mysql/' --defaults-file '/opt/local/etc/my.cnf' --defaults-group-suffix '' --parent '69144'  '' 
        Read: '69429:   socat -u TCP-LISTEN:4444,reuseaddr stdio'
2016-05-02 09:47:14 69144 [Note] WSREP: (42459ef3, 'tcp://0.0.0.0:4567') turning message relay requesting off

The same steps (and SST) work with percona55-cluster-1.2.

@mamash
Copy link

mamash commented May 2, 2016

Sounds like a regression, I'll investigate.

@mamash
Copy link

mamash commented May 5, 2016

Peter, can you help reproduce the situation where this happens, please?

@pannon
Copy link
Author

pannon commented May 5, 2016

Hi Filip, yes, did a fresh install and followed the Joyent percona guide

Same error.

What information do you need at this stage?

@dcrudgington
Copy link

For the "WSREP: Failed to read" error it looks like it may be just a PATH or can't find something issue which may be due to the pfiles. You could run "wsrep_sst_xtrabackup" manually to see what it can't find.

@pannon
Copy link
Author

pannon commented May 11, 2016

Will try to dig a bit deeper and report back.

@pannon
Copy link
Author

pannon commented Jul 23, 2016

Sorry for the long delay (bit busy lately..). Finally managed to track this down. The issue is in the /opt/local/bin/wsrep_sst_xtrabackup-v2 file.

There are basically two typos.

The first one is with pfiles around line 513:
should be: (pfiles $(pgrep 'socat|nc') || true) | grep "AF_INET.* ${PORT}$" >/dev/null && break
but in the latest version is missing the braces: pfiles $(pgrep 'socat|nc') || true | grep "AF_INET.* ${PORT}$" >/dev/null && break

The second is with the use of pgid (multiple occurrences).

Example, around line 406 pgid=$(ps -o pgid= $$ | grep -o '[0-9]*') this should be definitely pgid=$(ps -o pgid=$$ | grep -o '[0-9]*') (without the space pgid= $$).

But even then it will throw an error, for this I don't have a proper fix yet.

SST is working now and I have a working Percona56 cluster on SmartOS.

@pannon
Copy link
Author

pannon commented Jul 24, 2016

Attaching my working copy of /opt/local/bin/wsrep_sst_xtrabackup-v2.

wsrep_sst_xtrabackup-v2.zip

mamash referenced this issue in TritonDataCenter/pkgsrc-joyent Aug 26, 2016
@mamash
Copy link

mamash commented Aug 26, 2016

Peter, I implemented the first correction today. The ps should actually be ps -o pgid= -p $$ - omitting the -p arg works on some systems, but not on SunOS. The 2015Q4 cluster packages were just updated to 5.6.0, including this fix.

@pannon
Copy link
Author

pannon commented Aug 29, 2016

Thanks Filip, would you like me to test anything at this stage? I need to spin up a new cluster anyway...

@mamash
Copy link

mamash commented Aug 29, 2016

Feel free to test away. If those two changes were all you had to make, things should work out of the box now.

jperkin pushed a commit that referenced this issue Oct 18, 2016
## [1.11.3][] (2016-09-16)

  * Fix known_hosts caching to match on the entire hostlist
    [PR #364](capistrano/sshkit#364) @byroot

## [1.11.2][] (2016-07-29)

### Bug fixes

  * Fixed a crash occurring when `Host@keys` was set to a non-Enumerable.
    @xavierholt [PR #360](capistrano/sshkit#360)

## [1.11.1][] (2016-06-17)

### Bug fixes

  * Fixed a regression in 1.11.0 that would cause
    `ArgumentError: invalid option(s): known_hosts` in some older versions of
    net-ssh. @byroot [#357](capistrano/sshkit#357)

## [1.11.0][] (2016-06-14)

### Bug fixes

  * Fixed colorized output alignment in Logger::Pretty. @xavierholt
    [PR #349](capistrano/sshkit#349)
  * Fixed a bug that prevented nested `with` calls
    [#43](capistrano/sshkit#43)

### Other changes

  * Known hosts lookup optimization is now enabled by default. @byroot

## 1.10.0 (2016-04-22)

  * You can now opt-in to caching of SSH's known_hosts file for a speed boost
    when deploying to a large fleet of servers. Refer to the
    [README](https://github.com/capistrano/sshkit/tree/v1.10.0#known-hosts-caching) for
    details. We plan to turn this on by default in a future version of SSHKit.
    [PR #330](capistrano/sshkit#330) @byroot
  * SSHKit now explicitly closes its pooled SSH connections when Ruby exits;
    this fixes `zlib(finalizer): the stream was freed prematurely` warnings
    [PR #343](capistrano/sshkit#343) @mattbrictson
  * Allow command map entries (`SSHKit::CommandMap#[]`) to be Procs
    [PR #310](capistrano/sshkit#310)
    @mikz

## 1.9.0

**Refer to the 1.9.0.rc1 release notes for a full list of new features, fixes,
and potentially breaking changes since SSHKit 1.8.1.** There are no changes
since 1.9.0.rc1.

## 1.9.0.rc1

### Potentially breaking changes

  * The SSHKit DSL is no longer automatically included when you `require` it.
    **This means you  must now explicitly `include SSHKit::DSL`.**
    See [PR #219](capistrano/sshkit#219) for details.
    @beatrichartz
  * `SSHKit::Backend::Printer#test` now always returns true
    [PR #312](capistrano/sshkit#312) @mikz

### New features

  * `SSHKit::Formatter::Abstract` now accepts an optional Hash of options
    [PR #308](capistrano/sshkit#308) @mattbrictson
  * Add `SSHKit::Backend.current` so that Capistrano plugin authors can refactor
    helper methods and still have easy access to the currently-executing Backend
    without having to use global variables.
  * Add `SSHKit.config.default_runner` options that allows to override default command runner.
    This option also accepts a name of the custom runner class.
  * The ConnectionPool has been rewritten in this release to be more efficient
    and have a cleaner internal API. You can still completely disable the pool
    by setting `SSHKit::Backend::Netssh.pool.idle_timeout = 0`.
    @mattbrictson @byroot [PR #328](capistrano/sshkit#328)

### Bug fixes

  * make sure working directory for commands is properly cleared after `within` blocks
    [PR #307](capistrano/sshkit#307)
    @steved
  * display more accurate string for commands with spaces being output in `Formatter::Pretty`
    [PR #304](capistrano/sshkit#304)
    @steved
    [PR #319](capistrano/sshkit#319) @mattbrictson
  * Fix a race condition experienced in JRuby that could cause multi-server
    deploys to fail. [PR #322](capistrano/sshkit#322)
    @mattbrictson
@pannon
Copy link
Author

pannon commented Nov 21, 2016

Verified and tested, thanks again.

@pannon pannon closed this as completed Nov 21, 2016
@alcir
Copy link

alcir commented Dec 1, 2016

Every a few time I try to install percona cluster on smartos. And every time I am unable to put it to work.
This time I get:

2016-12-01 17:01:16 84268 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.96.11.215:4444/xtrabackup_sst//1' --socket '/tmp/mysql.sock' --datadir '/var/mysql/' --defaults-file '/opt/local/etc/my.cnf' --defaults-group-suffix '' '' --gtid '48aa3e0c-b7d8-11e6-be4c-d385ab82e6e9:5': 22 (Invalid argument)

SmartOS/2016Q3/x86_64

@alcir
Copy link

alcir commented Dec 1, 2016

Wow. Increasing memory assigned to the zone, solved the issue.

jperkin pushed a commit that referenced this issue Jan 16, 2017
Date: 	2016-02-17
Bugfixes

    Permit changing existing value on a ToOneField to None. (Closes #1449)

v0.13.2
Date: 	2016-02-14
Bugfixes

    Fix in Resource.save_related: related_obj can be empty in patch requests (introduced in #1378). (Fixes #1436)

    Fixed bug that prevented fitlering on related resources. apply_filters hook now used in obj_get. (Fixes #1435, Fixes #1443)

    Use build_filters in obj_get. (Fixes #1444)

    Updated DjangoAuthorization to disallow read unless a user has change permission. (#1407, PR #1409)

    Authorization classes now handle usernames containing spaces. Closes #966.

    Cleaned up old, unneeded code. (closes PR #1433)
            Reuse Django test Client.patch(). (@SeanHayes, closes #1442)
            Just a typo fix in the testing docs (by @bezidejni, closes #810)
            Removed references to patterns() (by @SeanHayes, closes #1437)
            Removed deprecated methods Resource.apply_authorization_limits and Authorization.apply_limits from code and documentation. (by @SeanHayes, closes #1383, #1045, #1284, #837)
            Updates docs/cookbook.rst to make sure it's clear which url to import. (by @yuvadm, closes #716)
            Updated docs/tutorial.rst. Without "null=True, blank=True" parameters in Slugfield, expecting "automatic slug generation" in save method is pointless. (by @orges, closes #753)
            Cleaned up Riak docs. (by @SeanHayes, closes #275)
            Include import statement for trailing_slash. (by @ljosa, closes #770)
            Fix docs: Meta.filtering is actually a dict. (by @georgedorn, closes #807)
            Fix load data command. (by @blite, closes #357, #358)

    Related schemas no longer raise error when not URL accessible. (Fixes PR #1439)

    Avoid modifying Field instances during request/response cycle. (closes #1415)

    Removing the Manager dependency in ToManyField.dehydrate(). (Closes #537)

v0.13.1
Date: 	2016-01-25
Bugfixes

    Prevent muting non-tastypie's exceptions (#1297, PR #1404)
    Gracefully handle UnsupportFormat exception (#1154, PR #1417)
    Add related schema urls (#782, PR #1309)
    Repr value must be str in Py2 (#1421, PR #1422)
    Fixed assertHttpAccepted (PR #1416)

v0.13.0
Date: 	2016-01-12

Dropped Django 1.5-1.6 support, added Django 1.9.
Bugfixes

    Various performance improvements (#1330, #1335, #1337, #1363)
    More descriptive error messages (#1201)
    Throttled requests now include Retry-After header. (#1204)
    In DecimalField.hydrate, catch decimal.InvalidOperation and raise ApiFieldError (#862)
    Add 'primary_key' Field To Schema (#1141)
    ContentTypes: Remove 'return' in __init__; remove redundant parentheses (#1090)
    Allow callable strings for ToOneField.attribute (#1193)
    Ensure Tastypie doesn't return extra data it received (#1169)
    In DecimalField.hydrate, catch decimal.InvalidOperation and raise ApiFieldError (#862)
    Fixed tastypie's losing received microseconds. (#1126)
    Data leakage fix (#1203)
    Ignore extra related data (#1336)
    Suppress Content-Type header on HTTP 204 (see #111) (#1054)
    Allow creation of related resources that have an 'items' related_name (supercedes #1000) (#1340)
    Serializers: remove unimplemented to_html/from_html (#1343)
    If GEOS is not installed then exclude geos related calls. (#1348)
    Fixed Resource.deserialize() to honor format parameter (#1354 #1356, #1358)
    Raise ValueError when trying to register a Resource class instead of a Resource instance. (#1361)
    Fix hydrating/saving of related resources. (#1363)
    Use Tastypie DateField for DateField on the model. (SHA: b248e7f)
    ApiFieldError on empty non-null field (#1208)
    Full schema (all schemas in a single request) (#1207)
    Added verbose_name to API schema. (#1370)
    Fixes Reverse One to One Relationships (Replaces #568) (#1378)
    Fixed "GIS importerror vs improperlyconfigured" (#1384)
    Fixed bug which occurs when detail_uri_name field has a default value (Issue #1323) (#1387)
    Fixed disabling cache using timeout=0, fixes #1213, #1212 (#1399)
    Removed Django 1.5-1.6 support, added 1.9 support. (#1400)
    stop using django.conf.urls.patterns (#1402)
    Fix for saving related items when resource_uri is provided but other unique data is not. (#1394) (#1410)


v0.12.2
Date: 	2015-07-16

Dropped Python 2.6 support, added Django 1.8.
Bugfixes

    Dropped support for Python 2.6
    Added support for Django 1.8
    Fix stale data caused by prefetch_related cache (SHA: b78661d)
jperkin pushed a commit that referenced this issue Feb 1, 2017
Release 0.3.0 of Streamlink!

A lot of updates to each plugin (thank you @beardypig !), automated Windows releases, PEP8 formatting throughout Streamlink are some of the few updates to this release as we near a stable 1.0.0 release.

Main features are:

    Lot's of maintaining / updates to plugins
    General bug and doc fixes
    Major improvements to development (github issue templates, automatically created releases)

Agustín Carrasco <[email protected]> (1):
      Links on crunchy's rss no longer contain the show name in the url (#379)

Brainzyy <[email protected]> (1):
      Add basic tests for stream.me plugin (#391)

Javier Cantero <[email protected]> (2):
      plugins/twitch: use version v3 of the API
      plugins/twitch: use kraken URL

John Smith <[email protected]> (3):
      Added support for bongacams.com streams (#329)
      streamlink_cli.main: close stream_fd on exit (#427)
      streamlink_cli.utils.progress: write new line at finish (#442)

Max Riegler <[email protected]> (1):
      plugins.chaturbate: new regex (#457)

Michiel Sikma <[email protected]> (1):
      Update PLAYER_VERSION, as old one does not return data. Add ability to use streams with /embed/video in the URL, from embedded players. (#311)

Mohamed El Morabity <[email protected]> (6):
      Add support for pluzz.francetv.fr (#343)
      Fix ArteTV plugin (#385)
      Add support for Canal+ TV group channels (#416)
      Update installation instructions for Fedora (#443)
      Add support for Play TV (#439)
      Use token generator for HLS streams, as for HDS ones (#466)

RosadinTV <[email protected]> (1):
      --can-handle-url-no-redirect parameter added (#333)

Stefan Hanreich <[email protected]> (1):
      added chocolatey to the documentation (#380)

bastimeyer <[email protected]> (3):
      Automatically create Github releases
      Set changelog in automated github releases
      Add a github issue template

beardypig <[email protected]> (55):
      plugins.tvcatchup: site layout changed, updated the stream regex to accommodate the change (#338)
      plugins.streamlive: streamlive.to have added some extra protection to their streams which currently prevents us from capturing them (#339)
      cli: add command line option to specific logging path for subprocess errorlog
      plugins.trtspor: added support for trtspor.com (#349)
      plugins.kanal7: fixed page change in kanal7 live stream (#348)
      plugins.picarto: Remove the unreliable rtmp stream (#353)
      packaging: removed the built in backports infavour of including them as dependencies when required (#355)
      Boost the test coverage a bit (#362)
      plugins: all regex string should be raw (#361)
      ci: build and test on Python 3.6 (+3.7 on travis, with allowed failure) (#360)
      packages.flashmedia: fix bug in AMFMessage (#359)
      tests: use mock from unittest when available otherwise fallback to mock (#358)
      stream.hls: try to retry stream segments (#357)
      tests: add codecov config file (#363)
      plugins.picarto: updated plugin to use tech_switch divs to find the stream parameters
      plugins.mitele: support for live streams on mitele.es
      docs: add a note about python-devel needing to be installed in some cases
      docs/release: generate the changelog as rst instead of md
      plugins.adultswim: support https urls
      use iso 8601 date format for the changelog
      plugins.tf1: added plugin to support tf1.fr and lci.fr
      plugins.raiplay: added plugin to support raiplay.it
      plugins.vaughnlive: updated player version and info URL (#383)
      plugins.tv8cat: added support for tv8.cat live stream (#390)
      Fix TF1.fr plugin (#389)
      plugins.stream: fix a default scheme handling for urls
      Add support for some Bulgarian live streams (#392)
      rtmp: fix bug in redirect for rtmp streams
      plugins.sportal: added support for the live stream on sportal.bg
      plugins.bnt: update the user agent string for the http requests
      plugins.ssh101: update to support new site layout
      Optionally use FFMPEG to mux separate video and audio streams (#224)
      Support for 4K videos in YouTube (#225)
      windows-installer: add the version info to the installer file
      include CHANGELOG.rst instead of .md in the egg
      stream.hls: output duplicate streams for HLS when multiple streams of the same quality are available
      stream.ffmpegmux: fix support for avconv, avconv will be used if ffmpeg is not found
      Adultswin VOD support (#406)
      Move streamlink_cli.utils.named_pipe in to streamlink.utils
      plugins.rtve: update plugin to support new streaming method
      stream.hds: omit HDS streams that are protected by DRM
      Adultswin VOD fix for live show replays (#418)
      plugins.rtve: add support for legacy stream URLs
      installer: remove the streamlink bin dir from %PATH% before installing
      plugins.twitch: only check hosted channels when playing a live stream
      docs: tweaks to docs and docs build process
      Fix iframe detection for BTN/cdn.bg streams (#437)
      fix some regex that give deprecation warnings in python 3.6
      plugins.adultswim: correct behaviour for archived streams
      plugins.nineanime: add scheme to grabber api url if not present
      session: add an option to disable Diffie Hellman key exchange
      plugins.srgssr: added support for srg ssr sites: srf, rts and rsi
      plugins.srgssr: fixed bug in api URL and fixed akamai urls with authparams
      cli: try to terminate the player process before killing it (if terminate takes too long)
      plugins.swisstxt: add support for the SRG SSR sites sports sections

fozzy <[email protected]> (1):
      Add plugin for huajiao.com and zhanqi.tv (#334)

sqrt2 <[email protected]> (1):
      Fix swf_url in livestream.com plugin (#428)

stepshal <[email protected]> (1):
      Remove trailing.

stepshal <[email protected]> (2):
      Add blank line after class or function definition (#408)
      PEP8 (#414)
jperkin pushed a commit that referenced this issue Jun 13, 2017
=== 4.1.0
=== 4.1.0.rc1

 * ProxyJump support [Ryan McGeary, #500]
 * Fix agent detection on Windows [Christian Koehler, #495]

=== 4.1.0.beta1

 * Fix nil error when libsodium is not there [chapmajs ,#488]
 * SSH certificate support for client auth [David Bartley, #485]

=== 4.0.1
=== 4.0.1.rc2

 * ENV["HOME"] might be empty so filter non expandable paths [Matt Casper, #351]

=== 4.0.1.rc1

 * support of rbnacl 4.0 and better error message [#479]
 * support include in config files [Kimura Masayuki, #475]
 * fixed issue with ruby 2.2 or older on windows [#472]

=== 4.0.0
=== 4.0.0.rc3

 * parse `+` character in config files [Christoph Lupprich, #470, #314]

=== 4.0.0.rc2

 * Fixed OpenSSL 2.0/Ruby 2.4.0 warnings [Miklós Fazekas, #468]
 * Added ssh-ed25519 to KnownHosts:SUPPORTED_TYPE [detatka-kuzlatka-otevrete, Miklós Fazekas, #459]
 * Allow nil for :passhrase and passing in nil option is now a depreaction warning [Miklós Fazekas, #465]

=== 4.0.0.rc1

 * Allow :password to be nil for capistrano v2 compatibility [Will Bryant, #357]
 * In next_packet if prefer consuming buffer before filling it again if we have enough data [Miklós Fazekas, #454]

=== 4.0.0.beta4

 * Added exitstatus method to exec's return [Miklós Fazekas, #452]
 * Don't raise from exec if server closes transport just after channel close [Miklós Fazekas, #450]
 * Removed java_pageant, as jruby should be using regular pagent impl [Miklós Fazekas, ]
 * Use SSH_AUTH_SOCK if possible on windows (cygwin) [Miklós Fazekas, Martin Dürst, #365, #361]
 * HTTPS proxy support [Marcus Ilgner, #432]
 * Supports ruby 2.4.0.dev new exception type from OpenSSL::PKey.read

=== 4.0.0.beta3

 * Fix Net::SSH::Disconnect exceptions when channels are closed cleanly [Miklos Fazekas, #421, #422]

=== 4.0.0.beta2

 * Fix raiseUnlessLoaded undefined ERROR issue [Miklos Fazekas, #418]

=== 4.0.0.beta1

* Fix pageant [elconas, #235]
* Relaxed rbnacl,rbnacl-selenium contstraints ang give better errors about them [Miklos Fazekas, #398]
* Fix UTF-8 encoding issues [Ethan J. Brown, #407]

=== 4.0.0.alpha4

* Experimental event loop abstraction [Miklos Fazekas]
* RbNacl dependency is optional [Miklos Fazekas]
* agent_socket_factory option [Alon Goldboim]
* client sends KEXINIT, it doesn't have to wait for server [Miklos Fazekas]
* better error message when option is nil [Kane Morgan]
* prompting can be customized [Miklos Fazekas]

=== 4.0.0.alpha3

* added max_select_wait_time [Eugene Kenny]

=== 4.0.0.alpha2

* when transport closes we're cleaning up channels [Miklos Fazekas]

=== 4.0.0.alpha1

* ed25519 key support [Miklos Fazekas]
* removed camellia [Miklos Fazekas]

=== 3.1.0
=== 3.1.0.rc1

* fix Secure#verify [Jean Boussier]
* use the smallest of don't spend longer time than keepalive if it's configured [Eugene Kenny]

=== 3.1.0.beta3

* forward/on_open_failed should stop listning closed socket otherwise it locks #269 [Miklos Fazekas,Scott McGillivray]
* fix incorrect pattern handling in config files #310 [Miklos Fazekas]

=== 3.1.0.beta2

* trying to execute something on a not yet opend channel throws nicer messag [Miklos Fazekas]
* calling close on a not opened channel marks the channel for close [Miklos Fazekas]
* read keepalive configuration from ssh config files [Miklos Fazekas]
* send client version on hadshake before waiting for server to reduce handshake time [Miklos Fazekas]
* allow custom Net::SSH::KnownHosts implementations [Jean Boussier]
* memoize known host so we only search it once per session [Jean Boussier, Miklos Fazekas]

=== 3.0.2
=== 3.0.2.rc1

* fixed rare WaitWritable error with proxy commands [Miklos Fazkas, Andre Meij]]
* if Net::SSH.start user is nil and config has no entry we default to Etc.getlogin
* Bugfix: CHANNEL_CLOSE was sent before draining ouput buffer #280 [Christopher F. Auston]

=== 3.0.1
=== 3.0.1.rc1

* Breaking change from 2.* series: exec! without block now returns empty string instread of nil if command has no output [net-ssh/net-ssh#273]
* Support remote_user as %r in proxy commands [Dominic Scheirlinck]
* Raise Net::SSH::ConnectionTimeout from connection timeout [Carl Hoerberg]

=== 3.0.0.rc1

* SemVer: Major version change because of dropping of ruby 1.9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants