Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batches_to_sort_string differing from similar implementation in assert_batches_sorted_eq #15312

Open
Shreyaskr1409 opened this issue Mar 19, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@Shreyaskr1409
Copy link
Contributor

Shreyaskr1409 commented Mar 19, 2025

Describe the bug

I was migrating tests to insta in a PR #15248 and got a problem. For same expected output in a test, I was finding differing old and new snapshots while using batches_to_sort_string and no differing snapshots while using assert_batches_sorted_eq. I did not encounter this issue while migrating many other tests in /datafusion/physical-plan, this weirdly was the first time I encountered this issue.

Edit: Also referencing at the following PR comment #15288 (comment) . This is also a problem discovered so far.

Previous code (using assert_batches_sorted_eq) :

        let expected = [
            "+---+---+---+----+---+---+",
            "| a | b | c | a  | b | c |",
            "+---+---+---+----+---+---+",
            "|   |   |   | 30 | 3 | 6 |",
            "|   |   |   | 40 | 4 | 4 |",
            "| 2 | 7 | 9 | 10 | 2 | 7 |",
            "| 2 | 7 | 9 | 20 | 2 | 5 |",
            "| 0 | 4 | 7 |    |   |   |",
            "| 1 | 5 | 8 |    |   |   |",
            "| 2 | 8 | 1 |    |   |   |",
            "+---+---+---+----+---+---+",
        ];
        assert_batches_sorted_eq!(expected, &batches);

New code (using batches_to_sort_string) :

        allow_duplicates! {
            assert_snapshot!(batches_to_sort_string(&batches), @r#"
            +---+---+---+----+---+---+
            | a | b | c | a  | b | c |
            +---+---+---+----+---+---+
            |   |   |   | 30 | 3 | 6 |
            |   |   |   | 40 | 4 | 4 |
            | 2 | 7 | 9 | 10 | 2 | 7 |
            | 2 | 7 | 9 | 20 | 2 | 5 |
            | 0 | 4 | 7 |    |   |   |
            | 1 | 5 | 8 |    |   |   |
            | 2 | 8 | 1 |    |   |   |
            +---+---+---+----+---+---+
                "#)
        }

In both cases, I had made sure several times that the expected output is the same.

I am getting the following output while using new code:
Image

To Reproduce

In /datafusion/physical-plan/src/joins/hash_join.rs,
replace following part in async fn join_full_with_filter(batch_size: usize) -> Result<()>:

        let expected = [
            "+---+---+---+----+---+---+",
            "| a | b | c | a  | b | c |",
            "+---+---+---+----+---+---+",
            "|   |   |   | 30 | 3 | 6 |",
            "|   |   |   | 40 | 4 | 4 |",
            "| 2 | 7 | 9 | 10 | 2 | 7 |",
            "| 2 | 7 | 9 | 20 | 2 | 5 |",
            "| 0 | 4 | 7 |    |   |   |",
            "| 1 | 5 | 8 |    |   |   |",
            "| 2 | 8 | 1 |    |   |   |",
            "+---+---+---+----+---+---+",
        ];
        assert_batches_sorted_eq!(expected, &batches);

with

        allow_duplicates! {
            assert_snapshot!(batches_to_sort_string(&batches), @r#"
            +---+---+---+----+---+---+
            | a | b | c | a  | b | c |
            +---+---+---+----+---+---+
            |   |   |   | 30 | 3 | 6 |
            |   |   |   | 40 | 4 | 4 |
            | 2 | 7 | 9 | 10 | 2 | 7 |
            | 2 | 7 | 9 | 20 | 2 | 5 |
            | 0 | 4 | 7 |    |   |   |
            | 1 | 5 | 8 |    |   |   |
            | 2 | 8 | 1 |    |   |   |
            +---+---+---+----+---+---+
                "#)
        }

Expected behavior

Similar results for both the tests.

Additional context

I did not encounter this issue while migrating many other tests in /datafusion/physical-plan, this weirdly was the first time I encountered this issue.

@blaginin
Copy link
Contributor

blaginin commented Mar 19, 2025

Hey, I think this is happening because assert_batches_sorted_eq sorts both lhs and rhs, while batches_to_sort_string sorts its only input - and because the snapshot is one big string, we don't currenly sort it.

When I was working on that, I manually verified the new lines make sense. Alternatively, you can explore #15165 (comment) and https://docs.rs/insta/latest/insta/fn.sorted_redaction.html

Context: #15165 (comment)

alamb pushed a commit that referenced this issue Mar 22, 2025
* Basic setup done for physical plan insta tests

* Migrated tests to insta in aggregate/mod.rs

* Migrated tests to insta in aggregate/topk/priority_map.rs

* Migrated tests to insta in joins

* Fallback to previous implementation in a test due to an issue (#15312)

* Fix formatting

* Update Cargo.toml

* Revert any Cargo.lock changes

* Clean workspace and attempt fixing failing build test

* format Cargo.toml

* Add insta as a dev-dependency instead

* Update datafusion/physical-plan/src/aggregates/mod.rs

Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>

* Preserved comments from aggregates/mod.rs

* Resolved errors regarding dependencies and formatting

---------

Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
@Shreyaskr1409
Copy link
Contributor Author

I am facing yet another issue same as mentioned in the following PR:
#15288 (comment)

@alamb
Copy link
Contributor

alamb commented Mar 23, 2025

TLDR is I think this is "working as expected" (though somewhat confusing) and we can just refer to this ticket when migrating tests to insta if the output changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants