Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[9.x] Get last item of takeUntilTimeout method on a LazyCollection #41275

Closed

Conversation

gdebrauwer
Copy link
Contributor

@gdebrauwer gdebrauwer commented Feb 28, 2022

This PR allows the takeUntilTimeout method of a LazyCollection to accept a second parameter: a closure. This closure will be executed when the timeout has been reach and it will return the first item that it was not able to process anymore.

$lazyCollection->takeUntilTimeout(now()->addMinute(), function ($firstNotTakenItem, $firstNotTakenItemKey) {
    // ...
});

Why?

Sometimes you need to process a lot of data in a job and you can not do it in one job because the job will timeout. Let's say you have thousands of users that need to be processed.

class ProcessUsersJob
{
    public $timeout = 60;

    public function handle()
    {
        User::query()
            ->where(...)
            ->cursor()
            ->each(fn ($user) => ...);
    }
}

This job will timeout if there a lot of users. This is were the current takeUntilTimeout method comes into play. We could just process users until we reach the timeout, but than we still need to know which users we have not processed yet. So we need to keep track of the last processed user in order to prevent duplicate processing. This creates a lot of overhead code which will be the same in every job that processes lots of data.

class ProcessUsersJob
{
    public $timeout = 60;

    public function __construct(
       public ?int $lastProcessedUserId = null
    ) {}

    public function handle()
    {
        $lastUserId = User::query()
            ->where(...)
            ->when($this->lastProcessedUserId, fn ($query) => $query->where('id', '>', $this->lastProcessedUserId)
            ->orderBy('id')
            ->cursor()
            ->takeUntilTimeout(now()->addSeconds(55))
            ->each(function ($user) {
                // Remember the last user that has been processed
                $this->lastProcessedUserId = $user->id

                // ...
            })
            ->last()?->id;

        // If the last user id equals the last processed user id, then that means
        // we have processed all users and we do not need to dispatch a new job
        // to process the remaining unprocessed users.
        if ($lastUserId === $this->lastProcessedUserId) {
            return;
        }

        static::dispatch($this-> lastProcessedUserId);
    }
}

This can be simplified if the takeUntilTimeout returns the first item that it could not process anymore. Now we can just dispatch a new job from inside that closure and we don't need a that boilerplate code anymore.

class ProcessUsersJob
{
    public $timeout = 60;

    public function __construct(
       public ?int $firstUnprocessedUserId = null
    ) {}

    public function handle()
    {
        User::query()
            ->where(...)
            ->when($this->firstUnprocessedUserId, fn ($query) => $query->where('id', '>=', $this->firstUnprocessedUserId)
            ->orderBy('id')
            ->cursor()
            ->takeUntilTimeout(now()->addSeconds(55), function ($user) {
                static::dispatch($user->id);
            })
            ->each(fn ($user) => ...);
    }
}

@gdebrauwer
Copy link
Contributor Author

I'm not really familiar with the types of static analysis. Can anyone help me and tell me what I did wrong? 🤔

@JosephSilber
Copy link
Contributor

This closure will be executed when the timeout has been reach and it will return the first item that it was not able to process anymore.

Hmmm. This actually surfaced a bug in the current implementation of takeUntilTimeout. It currently fetches one extra item, which it shouldn't.

Instead of the implementation here, I think we should instead fix takeUntilTimeout first. Then, we can call the callback (which should really be called $onTimeout) and pass it the iterator, so that it can decide itself whether to pull an additional item.

It's gonna be a little more work in the callback, but I think that's more correct.

@taylorotwell
Copy link
Member

@gdebrauwer do you know how to address @JosephSilber's comments?

@netpok
Copy link
Contributor

netpok commented Mar 4, 2022

This PR changes the public interface so it should target master.

@JosephSilber
Copy link
Contributor

do you know how to address @JosephSilber's comments?

@gdebrauwer if you need help, just say the word and I'm there with assistance.

@nunomaduro
Copy link
Member

This pull request should target master, as it is a breaking change.

@taylorotwell
Copy link
Member

taylorotwell commented Mar 4, 2022

@netpok @nunomaduro is the fear that people are overriding takeUntilTimeout with their own custom implementation? If so, I'm not sure if that would be common at all in practice?

@netpok
Copy link
Contributor

netpok commented Mar 4, 2022

For me it's more like "a major version means there won't be a breaking change" thing, also php already provides "magical ways" to add additional parameters to a function without breaking compatibility:

    public function takeUntilTimeout(DateTimeInterface $timeout)
    {
        $callback = func_num_args() > 1 ? func_get_arg(1) : null;

        // rest of code...
    }

@gdebrauwer
Copy link
Contributor Author

I created a seperate PR that fixes the implementation of the method. Let me know what I should do this PR: target master or 9.x 🙂 I assumed the change would not be a breaking change, because it is an optional parameter and the method is not a method from an interface

@driesvints
Copy link
Member

@gdebrauwer is this PR superseded by #41370?

@gdebrauwer
Copy link
Contributor Author

That PR fixes an issue that was already present in the takeUntilTimeout method. When that PR has been merged, I will need to create a new PR to add the new extra callback parameter to takeUntilTimeout method, but I don't know yet if I should target master or 9.x 🙂

@driesvints
Copy link
Member

An extra argument is added to a method here so it should target master. Thanks

@driesvints driesvints closed this Mar 7, 2022
@netpok
Copy link
Contributor

netpok commented Mar 7, 2022

@gdebrauwer it does not matter if its an optional parameter.

Take a look here: https://3v4l.org/h5gkL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants