[Spark] Auto Compaction was incorrectly including large files towards minNumFiles #4045 #4178
+42
−20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which Delta project/connector is this regarding?
Description
If DELTA_AUTO_COMPACT_MIN_FILE_SIZE was unset, it was defaulting to Long.MaxValue which resulted in large files counting torwards the minNumFiles threshold for AC to be triggered. This resulted in compaction running more frequently up to the point of running after every write as a table grows in size.


The below is the expected behavior on the same test suite as produced by Databricks:
AC eval criteria incorrectly didn't always require enough small files and would trigger AC if AC wasn't run as part of the last operation. AC should only evaluate as
shouldCompact
if compaction last not just run AND there are enough small files.Resolves #4045 See issue for more details.
How was this patch tested?
Does this PR introduce any user-facing changes?
No.