Skip to content

Commit

Permalink
[SPARK-49985][SQL] Remove support for interval types in Variant
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Support for interval types was added to the variant spec. This PR removes this support and removes the ability to cast from interval types to variant and vice versa.

### Why are the changes needed?

I implemented interval support for Variant before, but because the Variant spec type is supposed to be open and compatible with other engines which may not support all the ANSI Interval types, more thought needs to be put into the design of these intervals in Variant.

### Does this PR introduce _any_ user-facing change?

Yes, after this change, users would no longer be able to cast between variants and intervals.

### How was this patch tested?

Unit tests making sure that
1. It is not possible to construct variants containing intervals.
2. It is not possible to cast variants to intervals.
3. Interval IDs in variants are treated just like other unknown type IDs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48215 from harshmotw-db/harshmotw-db/disable_interval_2.

Authored-by: Harsh Motwani <harsh.motwani@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
  • Loading branch information
harshmotw-db authored and HyukjinKwon committed Oct 23, 2024
1 parent 2bf41a6 commit 2957069
Show file tree
Hide file tree
Showing 12 changed files with 123 additions and 1,002 deletions.

This file was deleted.

This file was deleted.

8 changes: 0 additions & 8 deletions common/variant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -352,8 +352,6 @@ The Decimal type contains a scale, but no precision. The implied precision of a
| Float | float | `14` | FLOAT | IEEE little-endian |
| Binary | binary | `15` | BINARY | 4 byte little-endian size, followed by bytes |
| String | string | `16` | STRING | 4 byte little-endian size, followed by UTF-8 encoded bytes |
| YMInterval | year-month interval | `19` | INT(32, signed)<sup>1</sup> | 1 byte denoting start field (1 bit) and end field (1 bit) starting at LSB followed by 4-byte little-endian value. |
| DTInterval | day-time interval | `20` | INT(64, signed)<sup>1</sup> | 1 byte denoting start field (2 bits) and end field (2 bits) starting at LSB followed by 8-byte little-endian value. |

| Decimal Precision | Decimal value type |
|-----------------------|--------------------|
Expand All @@ -364,12 +362,6 @@ The Decimal type contains a scale, but no precision. The implied precision of a

The *Logical Type* column indicates logical equivalence of physically encoded types. For example, a user expression operating on a string value containing "hello" should behave the same, whether it is encoded with the short string optimization, or long string encoding. Similarly, user expressions operating on an *int8* value of 1 should behave the same as a decimal16 with scale 2 and unscaled value 100.

The year-month and day-time interval types have one byte at the beginning indicating the start and end fields. In the case of the year-month interval, the least significant bit denotes the start field and the next least significant bit denotes the end field. The remaining 6 bits are unused. A field value of 0 represents YEAR and 1 represents MONTH. In the case of the day-time interval, the least significant 2 bits denote the start field and the next least significant 2 bits denote the end field. The remaining 4 bits are unused. A field value of 0 represents DAY, 1 represents HOUR, 2 represents MINUTE, and 3 represents SECOND.

Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types.

[1] The parquet format does not have pure equivalents for the year-month and day-time interval types. Year-month intervals are usually represented using int32 values and the day-time intervals are usually represented using int64 values. However, these values don't include the start and end fields of these types. Therefore, Spark stores them in the column metadata.

# Field ID order and uniqueness

For objects, field IDs and offsets must be listed in the order of the corresponding field names, sorted lexicographically. Note that the fields themselves are not required to follow this order. As a result, offsets will not necessarily be listed in ascending order.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,6 @@
import java.util.Base64;
import java.util.Locale;

import org.apache.spark.util.DayTimeIntervalUtils;
import org.apache.spark.util.YearMonthIntervalUtils;

import static org.apache.spark.types.variant.VariantUtil.*;

/**
Expand Down Expand Up @@ -91,16 +88,6 @@ public long getLong() {
return VariantUtil.getLong(value, pos);
}

// Get the start and end fields of a year-month interval from the variant.
public IntervalFields getYearMonthIntervalFields() {
return VariantUtil.getYearMonthIntervalFields(value, pos);
}

// Get the start and end fields of a day-time interval from the variant.
public IntervalFields getDayTimeIntervalFields() {
return VariantUtil.getDayTimeIntervalFields(value, pos);
}

// Get a double value from the variant.
public double getDouble() {
return VariantUtil.getDouble(value, pos);
Expand Down Expand Up @@ -334,22 +321,6 @@ static void toJsonImpl(byte[] value, byte[] metadata, int pos, StringBuilder sb,
case BINARY:
appendQuoted(sb, Base64.getEncoder().encodeToString(VariantUtil.getBinary(value, pos)));
break;
case YEAR_MONTH_INTERVAL:
IntervalFields ymFields = VariantUtil.getYearMonthIntervalFields(value, pos);
int ymValue = (int) VariantUtil.getLong(value, pos);
appendQuoted(sb, YearMonthIntervalUtils
.toYearMonthIntervalANSIString(ymValue, ymFields.startField, ymFields.endField));
break;
case DAY_TIME_INTERVAL:
IntervalFields dtFields = VariantUtil.getDayTimeIntervalFields(value, pos);
long dtValue = VariantUtil.getLong(value, pos);
try {
appendQuoted(sb, DayTimeIntervalUtils.toDayTimeIntervalANSIString(dtValue,
dtFields.startField, dtFields.endField));
} catch(Exception e) {
throw malformedVariant();
}
break;
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -216,22 +216,6 @@ public void appendTimestampNtz(long microsSinceEpoch) {
writePos += 8;
}

public void appendYearMonthInterval(long value, byte startField, byte endField) {
checkCapacity(1 + 5);
writeBuffer[writePos++] = primitiveHeader(YEAR_MONTH_INTERVAL);
writeBuffer[writePos++] = (byte) ((startField & 0x1) | ((endField & 0x1) << 1));
writeLong(writeBuffer, writePos, value, 4);
writePos += 4;
}

public void appendDayTimeInterval(long value, byte startField, byte endField) {
checkCapacity(1 + 9);
writeBuffer[writePos++] = primitiveHeader(DAY_TIME_INTERVAL);
writeBuffer[writePos++] = (byte) ((startField & 0x3) | ((endField & 0x3) << 2));
writeLong(writeBuffer, writePos, value, 8);
writePos += 8;
}

public void appendFloat(float f) {
checkCapacity(1 + 4);
writeBuffer[writePos++] = primitiveHeader(FLOAT);
Expand Down
Loading

0 comments on commit 2957069

Please sign in to comment.