Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving documents with embedded collections and default storage strategy leads to inconsistent state between in memory objects and database #2621

Open
xammmue opened this issue Apr 12, 2024 · 1 comment

Comments

@xammmue
Copy link

xammmue commented Apr 12, 2024

Bug Report

Q A
BC Break no
Version 2.7.0

Summary

I created a document structure that has two levels of embeddable document collections. Changing those collections by removing objects or setting them again leads to unexpected states being saved to the database. The state inside the database doesn't match the state of the object in memory anymore.
Changing the storage strategy to 'setArray' instead of using the default one resolves the issue

Current behavior

I setup the following example with the objects Bookshelf, Book and Pages.
A bookshelf has a collection of embedded books and a book has a collection of embedded pages.
I noticed unexpected behavior the following cases:

Removing one book from the collection of books and removing a page of another book.
In this case, the book gets removed as expected but the page that should have been removed stays.

Removing one book from the collection of books and resetting the pages of one of the remaining books.
In this case, the book gets removed as expected but the pages of an unrelated book get removed and the pages which were reset are duplicated.

How to reproduce

https://github.com/xammmue/doctrine_odm_save_storage_strategy_issue_poc

I created this repository which includes the classes Bookshelf, Book and Page. I also added some tests with assertions that would match the behavior I expected. The two failing tests test_removeBookAndPage and test_reinitializePagesAndDeleteOtherBook reproduce the (in my opinion) faulty behavior.

Changing the storage strategy for the books and pages to e.g. 'setArray' resolves the problems and all cases I observed work as expected.

Expected behavior

I expected that the state saved in the database matches the state of the objects in memory.

@alcaeus
Copy link
Member

alcaeus commented Apr 15, 2024

After investigating this for a while, I was able to figure out what exactly is happening for the first failing test (where pages in one book are not removed when a different book is removed entirely):

The default strategy for collections (so embedMany and referenceMany) is pushAll. With this strategy, ODM first uses the $push operator to push new elements in the collection to the document in the database. It then handles deletions by first using $unset on the elements to be deleted, followed by a $pull to extract the values previously set to null by $unset. setArray on the other hand uses the $set operator to update collections in one go - with the downside that any changes to the collection since reading the document are completely ignored. This shouldn't be an issue in many cases, but be aware that in applications with a high write contention it can cause data loss!

Back to the pushAll strategy, the issue arises when running multiple updates in a nested collection. Based on the example above, we have the following document:

{
    "_id": "661cd2c1e676fe8fe10e0fb3",
    "books": [
        { "pages": [ { "pageNumber": 1, "font": "default" } ], "name": "red" },
        { "pages": [ { "pageNumber": 1, "font": "fancy" } ], "name": "green" },
        {
            "pages": [
                { "pageNumber": 1, "font": "beauty" },
                { "pageNumber": 2, "font": "beauty" }
            ],
            "name": "blue"
        }
    ]
}

We now want to remove the red book and page one from the blue book. If we were to write the queries ourselves, it would look something like this. Note that these are NOT the queries ODM generates. First, we use $unset to remove the red book and the first page of the blue book:

{
    "$unset": {
        "books.0": true,
        "books.2.pages.0": true
    }
}

Then, we use $pull to remove the null elements left over in the two collections:

{
    "$pull": {
        "books": true,
        "books.2.pages": true
    }
}

It is at this stage that MongoDB complains:

Updating the path 'books.2.pages' would create a conflict at 'books'

In short, we can't update any collection nested in books at the same time as updating books in a single query but would instead have to make multiple queries. There is logic in place to prevent the error and it works most of the time, but not in this particular case. The reason why MongoDB forbids these kinds of updates is exactly the reason becomes apparent when we look at what ODM actually does. Here's the update that uses $unset:

{
    "$unset": {
        "books.0": true
    }
}

Followed by the $pull:

{
    "$pull": {
        "books": true
    }
}

This is the result of an optimisation made in #1880, which consolidated multiple collection updates into a single query and introduced logic to avoid the conflict error highlighted above. When looking at property paths that have changed (books and books.2.pages), the logic correctly detects that books.2.pages is a subpath of books and thus excludes it to avoid the error.

This issue is tricky to fix without completely reverting #1880. While that may fix the bug introduced, it also re-introduces a host of other problem due to each document requiring multiple queries to update each contained collection individually. A quick attempt at fixing this not only failed because of the conflict when using $pull, but also due to incorrect handling property paths in collections during the update process.

If the setArray strategy works for you, I would suggest sticking with it for the time being. I have also yet to look into the other issue, where I suspect a similar issue with pushAll, but this time related to the aforementioned property path handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants