Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initializes Datum and adds functionality to evaluator #1451

Merged
merged 9 commits into from
Jun 11, 2024

Conversation

johnedquinn
Copy link
Member

@johnedquinn johnedquinn commented Apr 27, 2024

Relevant Issues

  • None

Description

  • Initializes Datum, the engine's representation of a value
  • Also and adds functionality to evaluator. Currently, Datum is all over the evaluator, however, we convert to PartiQLValue when calling UDFs and doing comparisons.
  • While working on the JOIN PR (Simplifies joins and fixes bugs #1438) and PartiQL Cursor PR (Initializes PartiQLData and PartiQLValueLoader #1443), it became apparent to me that our modelling of PartiQLValue is tightly coupled with Java's type semantics. In many places, we rely on Java's types (example: x is StructValue<*>) rather than relying on our plumbing: (x.type == PartiQLValueType.STRUCT). On top of that, as part of the JOIN PR, I found it incredibly difficult (and sub-optimal performance-wise) to coerce nulls of a particular type to a null of another type. This PR takes these learnings and completely overhauls PartiQLValue to a representation that we are all familiar with: Ion's AnyElement. In a way, this modelling is as close as we can get to C's Union type, and by modelling it this way, we gain control over our type system.
  • Notice that the API proposed is called PQLValue. I intend for its core to be copied and pasted directly into PartiQLValue. I have also written this in Java, as I've noticed our inability to control our public API by writing in Kotlin. On top of that, PQLValue is written in Java to get us closer to using primitives directly. That being said, PQLValue will NOT exist during the V1 release. Its contents will replace the contents of PartiQLValue -- and PartiQLValue will be written in Java.

Internal Notes

  • Notice that I attempt to keep the null-safety that Kotlin provides by marking all applicable API's with @NotNull.
  • Similarly, I attempt to use primitives as much as possible -- limiting our memory footprint and maintaining the null-safety.
  • I've left remaining action items to fully replace PartiQLValue in the KDocs of PQLValue. These include adding the comparator, equals, hashCode, etc. These should be able to be added fairly quickly.

Other Information

  • Updated Unreleased Section in CHANGELOG: NO
  • Any backward-incompatible changes? NO -- not really, however, PQLValue intends to replace PartiQLValue
  • Any new external dependencies? NO
  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? YES

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@johnedquinn johnedquinn force-pushed the v1-eval-value-rebased branch from 4ab73d6 to eee755f Compare April 27, 2024 17:01
@johnedquinn johnedquinn force-pushed the v1-eval-value-rebased branch 2 times, most recently from 5d9957f to d25de2d Compare April 30, 2024 22:52
@johnedquinn johnedquinn force-pushed the v1-eval-value-rebased branch from d25de2d to d201667 Compare April 30, 2024 23:19
@johnedquinn johnedquinn marked this pull request as ready for review May 1, 2024 21:40
@johnedquinn johnedquinn requested a review from rchowell May 6, 2024 15:42
class CharValue implements PQLValue {

@NotNull
final String _value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final String _value;
final char _value;

char primitive is 16-bit unicode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The character data type in SQL doesn't represent the char in Java. It's more like a char array

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the fixed-length character. In my head I had

char <-> char(1) -- fixed-length character with length 1

Comment on lines 571 to 574
@NotNull
static PQLValue nullValue(@NotNull PartiQLValueType type) {
return new NullValue(type);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure about typed nulls if we are going ahead and changing the value system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to pass existing tests, this is necessary. I can mark it as deprecated, but for this PR I'd like to avoid removing PartiQLValueType.NULL as it touches many things

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PR #1463. This will have a pretty significant impact on this comment.

/**
* Represents the key-value pairs that are embedded within values of type {@link org.partiql.value.PartiQLValueType#STRUCT}.
*/
public interface StructField {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why an interface?

Struct fields should have just two values,

- name   -> field name original casing
- value  -> field value

We should also consider private, but non-final field members

Comment on lines 86 to 96
@OptIn(PartiQLValueExperimental::class)
fun PQLValue.getInt32Coerced(): Int {
return when (this.type) {
PartiQLValueType.INT8 -> this.int8Value.toInt()
PartiQLValueType.INT16 -> this.int16Value.toInt()
PartiQLValueType.INT32 -> this.int32Value
PartiQLValueType.INT64 -> this.int64Value.toInt()
PartiQLValueType.INT -> this.intValue.toInt()
else -> throw TypeCheckException()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding all of these coerces to the PQLValue implementations.

class Int32Value ... {

    private Int value;

    @Override
    short toShort() { 
       // throw DataException on overflow  etc.
      return value as short;
     }

    @Override
    int getInt() { return value };

    @Override
    long getLong() { return value };
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (v1@607c4c0). Learn more about missing BASE report.

Additional details and impacted files
@@          Coverage Diff          @@
##             v1    #1451   +/-   ##
=====================================
  Coverage      ?   72.30%           
  Complexity    ?     2539           
=====================================
  Files         ?      283           
  Lines         ?    20282           
  Branches      ?     3700           
=====================================
  Hits          ?    14664           
  Misses        ?     4596           
  Partials      ?     1022           
Flag Coverage Δ
CLI 13.71% <ø> (?)
EXAMPLES 80.07% <ø> (?)
LANG 77.69% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

/**
* This shall always be package-private (internal).
*/
class DecimalArbitraryValue implements PQLValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need both Decimal and DecimalArbitrary for runtime values.

Comment on lines 9 to 26
class Float32Value implements PQLValue {

private final float _value;

Float32Value(float value) {
_value = value;
}

@Override
public boolean isNull() {
return false;
}

@Override
public float getFloat() {
return _value;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look exactly like PartiQLValue just in Java and with the getFloat() .. getLong() etc. I have a couple of questions.

  • What's the difference now? (besides flat class hierarchy)
  • Why don't we have the coercions in the PQLValue? (in this case getDouble())

Perhaps its been too long since I looked, but I don't think I understand the value prop here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this PR aims to replace PartiQLValue incrementally. I'd suggest:

  1. Add PQLValue without releasing.
  2. Update functions to use PQLValue. (subsequent PR)
  3. Move the existing PartiQLValue to the plan package (maybe called PlanLiteral) (since it's use is more for the DOM)
  4. Rename PQLValue to PartiQLValue

The difference is that we are moving away from using Java's type semantics in favor of our own type system. With this, we can/should be using our own type information to determine how to extract the underlying data. The idea is that this logic should be exposed by partiql-eval instead of partiql-types.

There shouldn't be a need for coercions that are not represented in the plan. What I'd like to see is that when the executor receives a plan, without any modification, the executor shall execute the plan. If a JVM double is expected, then the planner should have inserted the appropriate coercion.

Consider Rex.Op.Path.Index. The planner currently allows all integer types as the index, and then the executor is actually under the hood coercing any integer value into a JVM int. In my opinion, the "function signature" of the path index should have required that the index be a PartiQL int_32. With this mindset, the expressions representing the index (say its typed to be an INT16) would be wrapped with a coercion by the planner. We're currently making up for our lack of planning by adding all of these additional coercions in the execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed other points offline to create more distance from DOM.

There shouldn't be a need for coercions that are not represented in the plan. What I'd like to see is that when the executor receives a plan, without any modification, the executor shall execute the plan. If a JVM double is expected, then the planner should have inserted the appropriate coercion.

Agreed, makes sense now. Thank you.

* - The comparator for ordering and aggregations
* - Adding support for annotations
*/
public interface PQLValue extends Iterable<PQLValue> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should be public if it's not a replacement. I also thought we discussed them solving different problems?

We shouldn't introduce another duplication of something existing when PartiQLValue is still under experimental. We either replace it completely or change it however we need to.

Comment on lines 188 to 190
* @return the underlying value applicable to the types:
* {@link PartiQLValueType#INT32}
* @throws UnsupportedOperationException if the operation is not applicable to the type returned from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why coercions wouldn't fit like Int16Value

/**
* This shall always be package-private (internal).
*/
class BagValue implements PQLValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need Bag/List/Sexp classes if we have a type enum. Because this is always internal, we can add additional classes later, but it doesn't provide additional value. Same for Decimal/DecimalArbitrary.

/**
* This shall always be package-private (internal).
*/
class CharValue implements PQLValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment, but seems we don't need Char and String considering their identical implementations.

/**
* This shall always be package-private (internal).
*/
class ClobValue implements PQLValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clob/Blob -> ByteArray or Lob

Updates all implementing classes of Datum
@johnedquinn johnedquinn changed the title Initializes PQLValue and adds functionality to evaluator Initializes Datum and adds functionality to evaluator Jun 5, 2024
Copy link
Contributor

@rchowell rchowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only change is to fix the test/partiql-tests submodule otherwise ship it.

Comment on lines 9 to 13
internal class RecordValueIterator(
collectionValue: CollectionValue<*>
collectionValue: Iterator<Datum>
) : Iterator<Record> {

private val collectionIter = collectionValue.iterator()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but now we can have private val iterator: Iterator<Dataum>

@johnedquinn
Copy link
Member Author

Only change is to fix the test/partiql-tests submodule otherwise ship it.

I believe there is a bug in GitHub's rendering of the submodule diff. For some reason, the commit is the same on v1 as on johnedquinn:v1-eval-value-rebased. Checked the files being "changed" and they're all already in v1. I pushed twice to move to some older submodule commit and then back to what is currently on v1, but it's still not rendering appropriately.

@rchowell rchowell merged commit ed882f7 into partiql:v1 Jun 11, 2024
7 checks passed
@johnedquinn johnedquinn deleted the v1-eval-value-rebased branch June 11, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants