Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process files in parallel #32

Merged
merged 1 commit into from
Jan 28, 2018

Conversation

theofidry
Copy link
Member

@theofidry theofidry commented Jan 21, 2018

The process of adding files with Box works like the following:

  • Collect all the files to add with Configuration
  • Add the files to `Box
    • Find the path that will be used to add the file to the PHAR
    • Replace the placeholder values
    • Runs the compactors on the file content
    • Add the file to the PHAR

There is two expansive processes here:

  • collecting the files
  • processing the file contents

The first one is out of scope of this PR, but this PR address the second point which is reducing the processing time by parallelising this work.


Note for myself: extract from this PR building files from an iterator/array of files which will drastically simplify this PR.

@theofidry
Copy link
Member Author

Hello @kelunik. I'm trying to integrate AMP here for doing some parallel processing but I fail to make it work. The error I get is:

First reason
#0 /path/to/box/src/Compactor/FileExtensionCompactor.php(47): in_array('php', NULL, true)
#1 /path/to/box/src/Compactor/BaseCompactor.php(29): KevinGH\Box\Compactor\FileExtensionCompactor->supports('vendor/amphp/am...')
#2 closure://function (string $filePath) use ($retrieveBasePath, $mapFile, $placeholders, $compactors) {
            \Assert\Assertion::file($filePath);
            \Assert\Assertion::readable($filePath);

            // Resolve paths
            $relativePath = $retrieveBasePath($filePath);
            $mappedPath = $mapFile($relativePath);

            if (null !== $mappedPath) {
                $relativePath = $mappedPath;
            }

            if (null === $relativePath) {
                $relativePath = $filePath;
            }

            // Process content
            $contents = file_get_contents($filePath);

            $contents = str_replace(
                array_keys($placeholders),
                array_values($placeholders),
                $contents
            );

            $contents = array_reduce(
                $compactors,
                function (string $contents, \KevinGH\Box\Compactor $compactor) use ($relativePath): string {
                    return $compactor->compact($relativePath, $contents);
                },
                $contents
            );

            return [$relativePath, $contents];
        }(30): KevinGH\Box\Compactor\BaseCompactor->compact('vendor/amphp/am...', '<?php\n\nnamespac...')
#3 [internal function]: Opis\Closure\SerializableClosure->{closure}('<?php\n\nnamespac...', Object(KevinGH\Box\Compactor\Json))
#4 closure://function (string $filePath) use ($retrieveBasePath, $mapFile, $placeholders, $compactors) {
            \Assert\Assertion::file($filePath);
            \Assert\Assertion::readable($filePath);

            // Resolve paths
            $relativePath = $retrieveBasePath($filePath);
            $mappedPath = $mapFile($relativePath);

            if (null !== $mappedPath) {
                $relativePath = $mappedPath;
            }

            if (null === $relativePath) {
                $relativePath = $filePath;
            }

            // Process content
            $contents = file_get_contents($filePath);

            $contents = str_replace(
                array_keys($placeholders),
                array_values($placeholders),
                $contents
            );

            $contents = array_reduce(
                $compactors,
                function (string $contents, \KevinGH\Box\Compactor $compactor) use ($relativePath): string {
                    return $compactor->compact($relativePath, $contents);
                },
                $contents
            );

            return [$relativePath, $contents];
        }(32): array_reduce(Array, Object(Closure), '<?php\n\nnamespac...')
#5 [internal function]: Opis\Closure\SerializableClosure->{closure}('/Users/tfidry/P...')
#6 /path/to/box/vendor/opis/closure/src/SerializableClosure.php(110): call_user_func_array(Object(Closure), Array)
#7 /path/to/box/vendor/amphp/parallel-functions/src/Internal/ParallelTask.php(28): Opis\Closure\SerializableClosure->__invoke('/Users/tfidry/P...')
#8 /path/to/box/vendor/amphp/amp/lib/functions.php(55): Amp\ParallelFunctions\Internal\ParallelTask->run(Object(Amp\Parallel\Worker\BasicEnvironment))
#9 /path/to/box/vendor/amphp/parallel/lib/Worker/TaskRunner.php(41): Amp\call(Array, Object(Amp\Parallel\Worker\BasicEnvironment))
#10 [internal function]: Amp\Parallel\Worker\TaskRunner->execute()
#11 /path/to/box/vendor/amphp/amp/lib/Coroutine.php(74): Generator->send(Object(Amp\Parallel\Worker\Internal\Job))
#12 /path/to/box/vendor/amphp/amp/lib/Internal/Placeholder.php(127): Amp\Coroutine->Amp\{closure}(NULL, Object(Amp\Parallel\Worker\Internal\Job))
#13 /path/to/box/vendor/amphp/amp/lib/Coroutine.php(79): Amp\Coroutine->resolve(Object(Amp\Parallel\Worker\Internal\Job))
#14 /path/to/box/vendor/amphp/amp/lib/Internal/Placeholder.php(127): Amp\Coroutine->Amp\{closure}(NULL, '\x00\x10\x0E\x00\x00O:32:"Amp\\...')
#15 /path/to/box/vendor/amphp/amp/lib/Deferred.php(36): class@anonymous->resolve('\x00\x10\x0E\x00\x00O:32:"Amp\\...')
#16 /path/to/box/vendor/amphp/byte-stream/lib/ResourceInputStream.php(72): Amp\Deferred->resolve('\x00\x10\x0E\x00\x00O:32:"Amp\\...')
#17 /path/to/box/vendor/amphp/amp/lib/Loop/NativeDriver.php(172): Amp\ByteStream\ResourceInputStream::Amp\ByteStream\{closure}('b', Resource id #1, NULL)
#18 /path/to/box/vendor/amphp/amp/lib/Loop/NativeDriver.php(68): Amp\Loop\NativeDriver->selectStreams(Array, Array, -0.001)
#19 /path/to/box/vendor/amphp/amp/lib/Loop/Driver.php(130): Amp\Loop\NativeDriver->dispatch(true)
#20 /path/to/box/vendor/amphp/amp/lib/Loop/Driver.php(70): Amp\Loop\Driver->tick()
#21 /path/to/box/vendor/amphp/amp/lib/Loop.php(76): Amp\Loop\Driver->run()
#22 /path/to/box/vendor/amphp/parallel/lib/Context/Internal/process-runner.php(85): Amp\Loop::run(Object(Closure))
#23 {main}

Which points to this line saying $this->extensions is null.

However this seems incorrect and hints on a serialization issue. But when I try to serialize/unserialize it manually with the regular PHP functions everything looks fine. Is there any way to debug this? I can't pinpoint where this is happening exactly.

Any idea?

@kelunik
Copy link

kelunik commented Jan 21, 2018

@theofidry What do I have to run to reproduce the error?

@theofidry
Copy link
Member Author

php -d phar.readonly=0 bin/box build

@kelunik
Copy link

kelunik commented Jan 21, 2018

I can reproduce that with php -d phar.readonly=0 bin/box build -vv, but where is the exception caught?

Nevermind, will just catch it in src/Box.php.

@theofidry
Copy link
Member Author

The exception is not caught: it's thrown at this line and then output in the console. I used break points to look at the details of the exception

src/Box.php Outdated
function (\SplFileInfo $fileInfo): string {
return $fileInfo->getPathname();
},
iterator_to_array($files)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$files may be an array here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, it's not in practice but I should account for that still 👍

@kelunik
Copy link

kelunik commented Jan 21, 2018

Seems like there's some bug in the closure serialization of https://github.com/opis/closure.

It works if you map $file in $files like that instead of using use (...) and replace the argument with $args and then add list($file, $retrieveBasePath, $mapFile, $placeholders, $compactors) = $args:

$args = [];                                          
foreach ($files as $file) {            
    $args[] = [$file, $retrieveBasePath, $mapFile, $placeholders, $compactors];
}

Now I get Fatal error: Uncaught Amp\ByteStream\StreamException: Failed to write to socket in /app/box/vendor/amphp/byte-stream/lib/ResourceOutputStream.php:91.

@kelunik
Copy link

kelunik commented Jan 21, 2018

Completes successfully now. I removed a var_dump($compactors) inside the closure executed in the workers.

@kelunik
Copy link

kelunik commented Jan 21, 2018

diff --git a/src/Box.php b/src/Box.php
index dc69c6c..7d1f73b 100644
--- a/src/Box.php
+++ b/src/Box.php
@@ -157,25 +157,16 @@ final class Box
         $compactors = $this->compactors;
         $placeholders = $this->placeholders;
 
-        //Debug: the values passed to the $processFile closure seems to be working fine
-        $x0 = \serialize($retrieveBasePath);
-        $x1 = \serialize($mapFile);
-        $x3 = \serialize($placeholders);
-        $x4 = \serialize($compactors);
-
-        $y0 = \unserialize($x0, []);
-        $y1 = \unserialize($x1, []);
-        $y3 = \unserialize($x3, []);
-        $y4 = \unserialize($x4, []);
-
         $files = array_map(
-            function (\SplFileInfo $fileInfo): string {
-                return $fileInfo->getPathname();
+            function (\SplFileInfo $fileInfo) use ($retrieveBasePath, $mapFile, $compactors, $
placeholders): array {
+                return [$fileInfo->getPathname(), $retrieveBasePath, $mapFile, $compactors, $p
laceholders];
             },
             iterator_to_array($files)
         );
 
-        $processFile = function (string $filePath) use ($retrieveBasePath, $mapFile, $placehol
ders, $compactors) {
+        $processFile = function ($args) {
+list($filePath, $retrieveBasePath, $mapFile, $compactors, $placeholders) = $args;
+
             Assertion::file($filePath);
             Assertion::readable($filePath);

@theofidry
Copy link
Member Author

theofidry commented Jan 21, 2018

Indeed, that works for building the PHAR from the source but it still fails in the PHAR:

php -d phar.readonly=0 bin/box build   # works; output bin/box.phar
mv bin/box.phar .
php -d phar.readonly=0 box.phar build  # fails

The error I get is:

Sending on the channel failed. Did the context die?

Which do be honest I have no idea what this means. Do you also have it on your end?

If you want a more automated way to check this you can simply run make e2e

@kelunik
Copy link

kelunik commented Jan 21, 2018

I have created opis/closure#18, without the mapByReference() call it works fine with use (...).

amphp/parallel needs a fix to work inside PHARs. I noticed that some weeks ago, but didn't fix it, yet. It needs to run worker-process.php, which doesn't work if that file is inside the PHAR. It's probably best to just copy it into the system's temporary directory and use that file when inside a PHAR, similar to this. Would you like to provide a PR?

@kelunik
Copy link

kelunik commented Jan 21, 2018

opis/closure#19 fixes it.

@theofidry
Copy link
Member Author

Thanks for the input @kelunik that's been of great help!

Regarding the usage in amphp/parallel inside a PHAR do you have an idea as to why it's not working or should I understand that inside a PHAR, it's best to copy worker-process.php somewhere else and use it from there?

To be used outside a PHAR I guess I can always have a custom stub which would require the copy instead of autoloading the file from inside the PHAR

theofidry added a commit that referenced this pull request Jan 22, 2018
Paves the way for #32.

As it stands, #32 requires some functions from `Configuration` which would require `Configuration` to be serializable. This is a lot of work so it makes sense to rather encapsulate some behaviour in utility classes and leverage them instead. As a result only those utility classes would require to be serializable which is trivial.

This also prepares a change for the `Box` class which IMO should handle the file mapping as it already handles the placeholders replacement and compactor processing.
@kelunik
Copy link

kelunik commented Jan 22, 2018

@theofidry My guess is this, which will be inside the PHAR and not directly runnable, or does PHP support running phar://... files?

    private static $pharScriptPath;
                                       
    public function __construct(string $envClassName = BasicEnvironment::class, array $env = []
        if (\strpos(self::SCRIPT_PATH, "phar://") === 0) {                                     
            if (self::$pharScriptPath) {              
                $script = self::$pharScriptPath;
            } else {
                $scriptContent = \file_get_contents(self::SCRIPT_PATH);
                self::$pharScriptPath = $script = \tempnam(\sys_get_temp_dir(), "amp-worker-process-");
                \file_put_contents(self::$pharScriptPath, $scriptContent);                         

                \register_shutdown_function(static function() {
                    @\unlink(self::$pharScriptPath);          
                });
            }
        } else {
            $script = self::SCRIPT_PATH;
        }

        $script = [$script, $envClassName];

        /* ... */

^ This is what I tried in the linked class, but it didn't work.

@theofidry
Copy link
Member Author

Hm that's weird I though PHARs would intercept those kind of things; I'll check it out I'd prefer to solve the real issue rather than trying to find a workaround

@kelunik
Copy link

kelunik commented Jan 22, 2018

There's something else strange:

In AbstractWorker.php line 78:
                           
  The worker was shutdown  

But that should actually be line 75.

@theofidry
Copy link
Member Author

Hm depends, there's the Php compactor enabled (see "compactors" in the box.json.dist) which will alter the content of the files. Might be better to disable it for debugging purposes as it's not completely unwarranted that this transformation is not causing any bug

@kelunik
Copy link

kelunik commented Jan 22, 2018

I've pushed a fix (amphp/parallel@d16da46), please verify it before I tag a new version.

@kelunik
Copy link

kelunik commented Jan 24, 2018

We've released v0.2.2 of amphp/parallel.

@theofidry
Copy link
Member Author

theofidry commented Jan 24, 2018 via email

@theofidry theofidry force-pushed the feature/parallel-processing branch from 8c3e433 to e79bb42 Compare January 28, 2018 14:33
@theofidry
Copy link
Member Author

Update.

So I've been doing a lot of work (see the PR description) to pave the way for this PR. Turns out that besides fixing a lot of bugs, harden tests, this also provided a speed increase of 30% which is due to optimising the collection of the files. Instead of having different config entries and adding the files one by one per config entry, they have been aggregated which allows me to ensure a file is not added twice. As files needs to be processed, this turned out to be a bigger perf gain than I expected.

That said the above is a bonus, another slow part (which will become even slower with #31) is the file processing. So as you can see by the diff, the integration is actually quite trivial now and the perf gain is of 25% for now.

Thanks you a lot @kelunik and @trowski, it's really cool to see it working well.

Also @kelunik, apparently the issue with the usage in the PHAR has been taken care of? I though I would have to look at it today but looks to be working fine now

@theofidry theofidry changed the title PoC: Process files in parallel Process files in parallel Jan 28, 2018
@theofidry theofidry merged commit ca7377c into box-project:master Jan 28, 2018
@theofidry theofidry deleted the feature/parallel-processing branch January 28, 2018 15:09
@kelunik
Copy link

kelunik commented Jan 28, 2018

@theofidry Yes, the v0.2.2 release fully fixed the usage inside PHARs.

Great to hear about the gain!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants