Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49985][SQL] Remove support for interval types in Variant #48215

Closed

This file was deleted.

This file was deleted.

8 changes: 0 additions & 8 deletions common/variant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -352,8 +352,6 @@ The Decimal type contains a scale, but no precision. The implied precision of a
| Float | float | `14` | FLOAT | IEEE little-endian |
| Binary | binary | `15` | BINARY | 4 byte little-endian size, followed by bytes |
| String | string | `16` | STRING | 4 byte little-endian size, followed by UTF-8 encoded bytes |
| YMInterval | year-month interval | `19` | INT(32, signed)<sup>1</sup> | 1 byte denoting start field (1 bit) and end field (1 bit) starting at LSB followed by 4-byte little-endian value. |
| DTInterval | day-time interval | `20` | INT(64, signed)<sup>1</sup> | 1 byte denoting start field (2 bits) and end field (2 bits) starting at LSB followed by 8-byte little-endian value. |

| Decimal Precision | Decimal value type |
|-----------------------|--------------------|
Expand All @@ -364,12 +362,6 @@ The Decimal type contains a scale, but no precision. The implied precision of a

The *Logical Type* column indicates logical equivalence of physically encoded types. For example, a user expression operating on a string value containing "hello" should behave the same, whether it is encoded with the short string optimization, or long string encoding. Similarly, user expressions operating on an *int8* value of 1 should behave the same as a decimal16 with scale 2 and unscaled value 100.

The year-month and day-time interval types have one byte at the beginning indicating the start and end fields. In the case of the year-month interval, the least significant bit denotes the start field and the next least significant bit denotes the end field. The remaining 6 bits are unused. A field value of 0 represents YEAR and 1 represents MONTH. In the case of the day-time interval, the least significant 2 bits denote the start field and the next least significant 2 bits denote the end field. The remaining 4 bits are unused. A field value of 0 represents DAY, 1 represents HOUR, 2 represents MINUTE, and 3 represents SECOND.

Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types.

[1] The parquet format does not have pure equivalents for the year-month and day-time interval types. Year-month intervals are usually represented using int32 values and the day-time intervals are usually represented using int64 values. However, these values don't include the start and end fields of these types. Therefore, Spark stores them in the column metadata.

# Field ID order and uniqueness

For objects, field IDs and offsets must be listed in the order of the corresponding field names, sorted lexicographically. Note that the fields themselves are not required to follow this order. As a result, offsets will not necessarily be listed in ascending order.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,6 @@
import java.util.Base64;
import java.util.Locale;

import org.apache.spark.util.DayTimeIntervalUtils;
import org.apache.spark.util.YearMonthIntervalUtils;

import static org.apache.spark.types.variant.VariantUtil.*;

/**
Expand Down Expand Up @@ -91,16 +88,6 @@ public long getLong() {
return VariantUtil.getLong(value, pos);
}

// Get the start and end fields of a year-month interval from the variant.
public IntervalFields getYearMonthIntervalFields() {
return VariantUtil.getYearMonthIntervalFields(value, pos);
}

// Get the start and end fields of a day-time interval from the variant.
public IntervalFields getDayTimeIntervalFields() {
return VariantUtil.getDayTimeIntervalFields(value, pos);
}

// Get a double value from the variant.
public double getDouble() {
return VariantUtil.getDouble(value, pos);
Expand Down Expand Up @@ -334,22 +321,6 @@ static void toJsonImpl(byte[] value, byte[] metadata, int pos, StringBuilder sb,
case BINARY:
appendQuoted(sb, Base64.getEncoder().encodeToString(VariantUtil.getBinary(value, pos)));
break;
case YEAR_MONTH_INTERVAL:
IntervalFields ymFields = VariantUtil.getYearMonthIntervalFields(value, pos);
int ymValue = (int) VariantUtil.getLong(value, pos);
appendQuoted(sb, YearMonthIntervalUtils
.toYearMonthIntervalANSIString(ymValue, ymFields.startField, ymFields.endField));
break;
case DAY_TIME_INTERVAL:
IntervalFields dtFields = VariantUtil.getDayTimeIntervalFields(value, pos);
long dtValue = VariantUtil.getLong(value, pos);
try {
appendQuoted(sb, DayTimeIntervalUtils.toDayTimeIntervalANSIString(dtValue,
dtFields.startField, dtFields.endField));
} catch(Exception e) {
throw malformedVariant();
}
break;
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -216,22 +216,6 @@ public void appendTimestampNtz(long microsSinceEpoch) {
writePos += 8;
}

public void appendYearMonthInterval(long value, byte startField, byte endField) {
checkCapacity(1 + 5);
writeBuffer[writePos++] = primitiveHeader(YEAR_MONTH_INTERVAL);
writeBuffer[writePos++] = (byte) ((startField & 0x1) | ((endField & 0x1) << 1));
writeLong(writeBuffer, writePos, value, 4);
writePos += 4;
}

public void appendDayTimeInterval(long value, byte startField, byte endField) {
checkCapacity(1 + 9);
writeBuffer[writePos++] = primitiveHeader(DAY_TIME_INTERVAL);
writeBuffer[writePos++] = (byte) ((startField & 0x3) | ((endField & 0x3) << 2));
writeLong(writeBuffer, writePos, value, 8);
writePos += 8;
}

public void appendFloat(float f) {
checkCapacity(1 + 4);
writeBuffer[writePos++] = primitiveHeader(FLOAT);
Expand Down
Loading