Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelised execution of static content deploy is broken on 2.3-develop #22563

Closed
hostep opened this issue Apr 30, 2019 · 12 comments
Closed

Parallelised execution of static content deploy is broken on 2.3-develop #22563

hostep opened this issue Apr 30, 2019 · 12 comments
Assignees
Labels
Component: Deploy Fixed in 2.2.x The issue has been fixed in 2.2 release line Fixed in 2.3.x The issue has been fixed in 2.3 release line Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release

Comments

@hostep
Copy link
Contributor

hostep commented Apr 30, 2019

Preconditions (*)

  1. Tested on Magento 2.3-develop branch (works fine on version 2.3.1 btw)
  2. PHP 7.1.27 with pcntl extension enabled

Steps to reproduce (*)

  1. Checkout the 2.3-develop branch from Magento 2 repo (I've used commit 93a8162)
  2. Run composer install
  3. Run bin/magento setup:install ...
  4. Run bin/magento setup:static-content:deploy -f --jobs=2

Expected result (*)

  1. SCD runs without errors

Actual result (*)

$ bin/magento setup:static-content:deploy -f --jobs=2

Deploy using quick strategy
PHP Fatal error:  Uncaught TypeError: pcntl_waitpid() expects parameter 1 to be integer, boolean given in app/code/Magento/Deploy/Process/Queue.php:342
Stack trace:
#0 app/code/Magento/Deploy/Process/Queue.php(342): pcntl_waitpid(false, NULL, 1)
#1 app/code/Magento/Deploy/Process/Queue.php(234): Magento\Deploy\Process\Queue->isDeployed(Object(Magento\Deploy\Package\Package))
#2 app/code/Magento/Deploy/Process/Queue.php(215): Magento\Deploy\Process\Queue->executePackage(Object(Magento\Deploy\Package\Package), 'frontend/Magent...', Array, false)
#3 app/code/Magento/Deploy/Process/Queue.php(168): Magento\Deploy\Process\Queue->assertAndExecute('frontend/Magent...', Array, Array)
#4 app/code/Magento/Deploy/Strategy/QuickDeploy.php(76): Magento\Deploy\Process\Queue->process()
#5  in app/code/Magento/Deploy/Process/Queue.php on line 342

Fatal error: Uncaught TypeError: pcntl_waitpid() expects parameter 1 to be integer, boolean given in app/code/Magento/Deploy/Process/Queue.php:342
Stack trace:
#0 app/code/Magento/Deploy/Process/Queue.php(342): pcntl_waitpid(false, NULL, 1)
#1 app/code/Magento/Deploy/Process/Queue.php(234): Magento\Deploy\Process\Queue->isDeployed(Object(Magento\Deploy\Package\Package))
#2 app/code/Magento/Deploy/Process/Queue.php(215): Magento\Deploy\Process\Queue->executePackage(Object(Magento\Deploy\Package\Package), 'frontend/Magent...', Array, false)
#3 app/code/Magento/Deploy/Process/Queue.php(168): Magento\Deploy\Process\Queue->assertAndExecute('frontend/Magent...', Array, Array)
#4 app/code/Magento/Deploy/Strategy/QuickDeploy.php(76): Magento\Deploy\Process\Queue->process()
#5  in app/code/Magento/Deploy/Process/Queue.php on line 342

Discussion

Issue was probably introduced in 7421dfb#diff-415a71577d8921cc45d695471db3a36fL340 (not verified yet)

@m2-assistant
Copy link

m2-assistant bot commented Apr 30, 2019

Hi @hostep. Thank you for your report.
To help us process this issue please make sure that you provided the following information:

  • Summary of the issue
  • Information on your environment
  • Steps to reproduce
  • Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento-engcom-team give me 2.3-develop instance - upcoming 2.3.x release

For more details, please, review the Magento Contributor Assistant documentation.

@hostep do you confirm that you was able to reproduce the issue on vanilla Magento instance following steps to reproduce?

  • yes
  • no

@magento-engcom-team magento-engcom-team added the Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed label Apr 30, 2019
@hostep
Copy link
Contributor Author

hostep commented Apr 30, 2019

Reverting that getPid method to the original doesn't solve the issue:

return isset($this->processIds[$package->getPath()])
            ? $this->processIds[$package->getPath()]
            : null;

But it gives a slightly different error message:

$ bin/magento setup:static-content:deploy -f --jobs=2

Deploy using quick strategy
PHP Fatal error:  Uncaught TypeError: pcntl_waitpid() expects parameter 1 to be integer, null given in app/code/Magento/Deploy/Process/Queue.php:342
Stack trace:
#0 app/code/Magento/Deploy/Process/Queue.php(342): pcntl_waitpid(NULL, NULL, 1)
#1 app/code/Magento/Deploy/Process/Queue.php(234): Magento\Deploy\Process\Queue->isDeployed(Object(Magento\Deploy\Package\Package))
#2 app/code/Magento/Deploy/Process/Queue.php(215): Magento\Deploy\Process\Queue->executePackage(Object(Magento\Deploy\Package\Package), 'frontend/Magent...', Array, false)
#3 app/code/Magento/Deploy/Process/Queue.php(168): Magento\Deploy\Process\Queue->assertAndExecute('frontend/Magent...', Array, Array)
#4 app/code/Magento/Deploy/Strategy/QuickDeploy.php(76): Magento\Deploy\Process\Queue->process()
#5 /Use in app/code/Magento/Deploy/Process/Queue.php on line 342

Fatal error: Uncaught TypeError: pcntl_waitpid() expects parameter 1 to be integer, null given in app/code/Magento/Deploy/Process/Queue.php:342
Stack trace:
#0 app/code/Magento/Deploy/Process/Queue.php(342): pcntl_waitpid(NULL, NULL, 1)
#1 app/code/Magento/Deploy/Process/Queue.php(234): Magento\Deploy\Process\Queue->isDeployed(Object(Magento\Deploy\Package\Package))
#2 app/code/Magento/Deploy/Process/Queue.php(215): Magento\Deploy\Process\Queue->executePackage(Object(Magento\Deploy\Package\Package), 'frontend/Magent...', Array, false)
#3 app/code/Magento/Deploy/Process/Queue.php(168): Magento\Deploy\Process\Queue->assertAndExecute('frontend/Magent...', Array, Array)
#4 app/code/Magento/Deploy/Strategy/QuickDeploy.php(76): Magento\Deploy\Process\Queue->process()
#5 /Use in app/code/Magento/Deploy/Process/Queue.php on line 342

Notice the difference in expects parameter 1 to be integer, boolean given vs expects parameter 1 to be integer, null given.
So there is a more complex problem going on here.

@hostep
Copy link
Contributor Author

hostep commented Apr 30, 2019

Git bisecting reveals the following commit to be the cause: 147d11d

And indeed, removing the line declare(strict_types=1); makes it "work" again on my machine.

But this doesn't feel right, I think there is still some deeper problem here, making the PHP execution more stricter is just revealing the problem more clearly, I believe there is still an actual issue here: passing null to pcntl_waitpid. This is probably why when I try to execute the SCD command with pcntl enabled on my local machine never seems to end, it keep running forever.

Will do some more digging.

@hostep
Copy link
Contributor Author

hostep commented May 1, 2019

I can get it more or less working again with the following code changes:

diff --git a/app/code/Magento/Deploy/Process/Queue.php b/app/code/Magento/Deploy/Process/Queue.php
index fd7aad44e0a..383b67b5d3c 100644
--- a/app/code/Magento/Deploy/Process/Queue.php
+++ b/app/code/Magento/Deploy/Process/Queue.php
@@ -340,7 +340,7 @@ class Queue
             if ($package->getState() === null) {
                 // phpcs:ignore Magento2.Functions.DiscouragedFunction
                 $pid = pcntl_waitpid($this->getPid($package), $status, WNOHANG);
-                if ($pid === $this->getPid($package)) {
+                if ($this->getPid($package) !== 0 && $pid === $this->getPid($package)) {
                     $package->setState(Package::STATE_COMPLETED);

                     unset($this->inProgress[$package->getPath()]);
@@ -354,14 +354,14 @@ class Queue
     }

     /**
-     * Returns process ID or null if not found.
+     * Returns process ID or 0 if not found.
      *
      * @param Package $package
-     * @return int|null
+     * @return int
      */
     private function getPid(Package $package)
     {
-        return isset($this->processIds[$package->getPath()]) ?? null;
+        return $this->processIds[$package->getPath()] ?? 0;
     }

     /**

But the command still seems to run endlessly on my local machine. Never quits and doesn't seem to execute anything.
This also happens on Magento 2.3.1, so I'm thinking this is a local environment problem I'm having (macOS 10.14.4, and PHP installed with MacPorts).

According to a discussion on Slack yesterday, there was at least one person who got it actually executing fine using Docker, but I don't have that setup here right now. So maybe somebody who is able to run this parallelised SCD can test on 2.3.1 and then also on 2.3-develop after applying the above changes and see if that restores the functionality again?

Thanks!

@Nazar65 Nazar65 added Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed Component: Deploy Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release labels May 1, 2019
@mahesh-rajawat mahesh-rajawat added the Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed label May 1, 2019
@ghost ghost removed the Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed label May 1, 2019
@magento-engcom-team
Copy link
Contributor

✅ Confirmed by @maheshWebkul721
Thank you for verifying the issue! 👍 Your confirmation will help us to acknowledge and process this report.

@mahesh-rajawat mahesh-rajawat self-assigned this May 1, 2019
@m2-assistant
Copy link

m2-assistant bot commented May 1, 2019

Hi @maheshWebkul721. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

  • 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).

    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.

  • 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • 4. Verify that the issue is reproducible on 2.3-develop branch

    Details- Add the comment @magento-engcom-team give me 2.3-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • 5. Verify that the issue is reproducible on 2.2-develop branch.
    Details- Add the comment @magento-engcom-team give me 2.2-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.2-develop branch, please add the label Reproduced on 2.2.x

@davidalger davidalger self-assigned this May 2, 2019
@m2-assistant
Copy link

m2-assistant bot commented May 2, 2019

Hi @davidalger. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

  • 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).

    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.

  • 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • 4. Verify that the issue is reproducible on 2.3-develop branch

    Details- Add the comment @magento-engcom-team give me 2.3-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • 5. Verify that the issue is reproducible on 2.2-develop branch.
    Details- Add the comment @magento-engcom-team give me 2.2-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.2-develop branch, please add the label Reproduced on 2.2.x

@davidalger
Copy link
Member

@hostep I wish I'd seen this issue before I spent hours figuring out why I was getting the infamous [RuntimeException] Error while waiting for package deployed error on about 50% of the builds where parallelization was used on a Concourse CI setup I'm testing. The patch you show above actually touches on exactly where the underlying problem is for why the SCD hangs, although there is a problem with returning 0 from getPid since pcntl_waitpid will reap any child process which doesn't work since that method is supposed to remove the package from inProgress when it reaps the child process.

I have a patch which I've been working on locally fixing the random failures resulting in #21852 on 2.2.x, and which adds better error handling around both points where pcntl_waitpid is used. In applying this to the 2.3 branch, I'll fix the boolean type on the gitPid function as well.

In my Concourse CI pipeline I'm testing, about 50% of builds were failing with parallel jobs with the RuntimeException noted in #21852, which is what set me down the path of fixing this. I'm working on building a fast CI/CD pipeline using build artifacts for a project with over a dozen themes (takes over 10 minutes for SCD without multiple threads being used).

Update coming soon.

davidalger added a commit to davidalger/magento2 that referenced this issue May 2, 2019
Reverts inadvertent change in  7421dfb causing the improper type to be returned as noted on magento#22563
davidalger added a commit to davidalger/magento2 that referenced this issue May 2, 2019
davidalger added a commit to davidalger/magento2 that referenced this issue May 2, 2019
….3-develop and issue where SCD hangs ending in RuntimeException on 2.2-develop

Resolves magento#22563 and magento#21852
@hostep
Copy link
Contributor Author

hostep commented May 2, 2019

@davidalger: nice! So based on #21852 it looks like this problem has been happening since at least 2.2.7? So then it probably makes sense that it's not working properly over here on my local environment using Magento 2.3.1.

The reason why I changed it from null to 0 was to restore old behavior. PHP probably casted the null to 0 before strict type checking was enabled. But I wasn't aware of that the old behavior actually didn't work properly as well. I also have zero experience with the pcntl functions, so this is all a bit new to me :)

Looking forwards to a fix, I've never really played with the --jobs flag before, but we could most likely use it on some projects with a high number of locale's/theme's.

@davidalger
Copy link
Member

@hostep

PHP probably casted the null to 0 before strict type checking was enabled.

I thought that as well, until I added proper error checking around that. If you pass null to pcntl_waitpid it actually returns -1 and the pcntl_errno function will indicate a PCNTL_ECHILD error ("No child process"):

this problem has been happening since at least 2.2.7

Not sure when on 2.2.x it was introduced, but at least since 2.2.6 since that's the version I found the issue throwing the RuntimeException building my pipeline. Ran 60+ builds overnight (one every 5 minutes) after my fix without a single failure. My guess tells me it's been around for a while.

@magento-engcom-team
Copy link
Contributor

Hi @hostep. Thank you for your report.
The issue has been fixed in #22607 by @davidalger in 2.3-develop branch
Related commit(s):

The fix will be available with the upcoming 2.3.3 release.

@sidolov
Copy link
Contributor

sidolov commented May 24, 2019

Hi @hostep. Thank you for your report.
The issue has been fixed in #22610 by @davidalger in 2.2-develop branch
Related commit(s):

The fix will be available with the upcoming 2.2.10 release.

@sidolov sidolov added the Fixed in 2.2.x The issue has been fixed in 2.2 release line label May 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Deploy Fixed in 2.2.x The issue has been fixed in 2.2 release line Fixed in 2.3.x The issue has been fixed in 2.3 release line Issue: Clear Description Gate 2 Passed. Manual verification of the issue description passed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release
Projects
None yet
Development

No branches or pull requests

6 participants