-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: Add TableDefinition wrapper for python #5892
feat!: Add TableDefinition wrapper for python #5892
Conversation
Sketch of impl. Manual testing works. Need to add real tests. |
Any use cases to show that the new wrapper is better than a simple new property |
Good Q. The advantage to this approach:
I think we could ask the same question about the java side; why don't we use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor docs comment, but I'm otherwise not opposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the TableDefinition
wrapper needs work, mostly re: ordering.
py/server/deephaven/column.py
Outdated
|
||
j_object_type = _JColumnDefinition | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a constructor that directly wraps an existing _JColumnDefinition
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This includes that case:
Column(j_column_definition=...)
And we use this construction as the basis for TableDefinition as a Mapping[str, Column].
It looks funky because we need to maintain compatibility with the existing constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just agreed on a call to introduce a superclass called ColumnDefinition
, maybe, in order to preserve the existing behavior and also provide the kind of wrapper we want users to use.
py/server/deephaven/column.py
Outdated
|
||
j_object_type = _JColumn | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, this is supported via j_column_definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We said this may not actually be a wrapper. I'm not clear that it's a wrapper around a JColumnDefinition, regardless.
py/server/deephaven/table.py
Outdated
@cached_property | ||
def is_refreshing(self) -> bool: | ||
"""Whether this table is refreshing.""" | ||
if self._is_refreshing is None: | ||
self._is_refreshing = self.j_table.isRefreshing() | ||
return self._is_refreshing | ||
return self.j_table.isRefreshing() | ||
|
||
@property | ||
@cached_property | ||
def is_blink(self) -> bool: | ||
"""Whether this table is a blink table.""" | ||
return _JBlinkTableTools.isBlink(self.j_table) | ||
|
||
@property | ||
@cached_property | ||
def update_graph(self) -> UpdateGraph: | ||
"""The update graph of the table.""" | ||
if self._update_graph is None: | ||
self._update_graph = UpdateGraph(self.j_table.getUpdateGraph()) | ||
return self._update_graph | ||
return UpdateGraph(self.j_table.getUpdateGraph()) | ||
|
||
@property | ||
@cached_property | ||
def is_flat(self) -> bool: | ||
"""Whether this table is guaranteed to be flat, i.e. its row set will be from 0 to number of rows - 1.""" | ||
if self._is_flat is None: | ||
self._is_flat = self.j_table.isFlat() | ||
return self._is_flat | ||
return self.j_table.isFlat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isFlat
is technically mutable, as is isRefreshing
. That said, a Python wrapper should never see a table that hasn't had these set in their final form, so I don't know that it actually matters.
isBlink
is effectively immutable, on the other hand; that is, once you look at it, it can't be changed (like anything else stored in attributes). updateGraph
is final
, also OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a possibility for mutation, I would remove the caching because we'll never remember to make a change here if we do get to a case where mutation happens. I also doubt that these get called enough to make a significant performance difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm convinced the caching is fine. But I also agree that we are likely not calling these anywhere performance critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- some concerns about the use of mutable @cached_property but can accept the trade-off
- there seems to be a small breaking change
py/server/deephaven/column.py
Outdated
@property | ||
def j_column_header(self): | ||
return _JColumnHeader.of(self.name, self.data_type.qst_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't need to worry about the mutability of @cached_property
, should we use that here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like more of an internal property and only used when creating an InputColumn; I don't think we expect users to call j_column_header
? I would be okay removing it or forcing it to start w/ _
, and I think tracing through the callstack, it doesn't make sense to cache it (I don't think we'll ever re-use it).
Where we might conceivably re-use it in the future is if we had a helper from Column -> InputColumn. IE,
my_col : InputColumn = Column("Foo", dtypes.int32).with_data([1, 2, 3])
But at least with the current interfaces, users are always going through InputColumn constructor or helper methods (bool_col
, byte_col
, etc).
def j_object(self) -> jpy.JType: | ||
return self.j_table_definition | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cached_property
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me if we want to cache Table; this would mean that the table would exist until TableDefinition is GCd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur with Devin.
!2, 3.9 /home/runner/work/deephaven-core/deephaven-core/py/server/deephaven/table.py builtin generic type annotation (collections.abc.Iterable[..]) requires !2, 3.9 builtin generic type annotation (collections.abc.Mapping[..]) requires !2, 3.9 Minimum required versions: 3.9 Incompatible versions: 2 Target versions not met: 3.8
@@ -75,7 +73,7 @@ def publish_failure(self, failure: Exception) -> None: | |||
|
|||
def table_publisher( | |||
name: str, | |||
col_defs: Union[Dict[str, DType], List[Column]], | |||
table_definition: TableDefinitionAlias, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change since the arg is renamed. Should we temporarily support the old parameter with a deprecation warning, or should we directly break code? This only ends up being breaking if the user code is using named parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've changed all the relevant param names back.
with self.subTest("from table definition"): | ||
append_only_input_table = input_table(col_defs=col_defs) | ||
append_only_input_table = input_table(col_defs=t.definition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col_defs
is now a misnomer. This is a table definition.- We are inconsistent on if we name the variable as
col_defs
ortable_def
across the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This item needs some attention.
py/server/deephaven/table.py
Outdated
@cached_property | ||
def is_refreshing(self) -> bool: | ||
"""Whether this table is refreshing.""" | ||
if self._is_refreshing is None: | ||
self._is_refreshing = self.j_table.isRefreshing() | ||
return self._is_refreshing | ||
return self.j_table.isRefreshing() | ||
|
||
@property | ||
@cached_property | ||
def is_blink(self) -> bool: | ||
"""Whether this table is a blink table.""" | ||
return _JBlinkTableTools.isBlink(self.j_table) | ||
|
||
@property | ||
@cached_property | ||
def update_graph(self) -> UpdateGraph: | ||
"""The update graph of the table.""" | ||
if self._update_graph is None: | ||
self._update_graph = UpdateGraph(self.j_table.getUpdateGraph()) | ||
return self._update_graph | ||
return UpdateGraph(self.j_table.getUpdateGraph()) | ||
|
||
@property | ||
@cached_property | ||
def is_flat(self) -> bool: | ||
"""Whether this table is guaranteed to be flat, i.e. its row set will be from 0 to number of rows - 1.""" | ||
if self._is_flat is None: | ||
self._is_flat = self.j_table.isFlat() | ||
return self._is_flat | ||
return self.j_table.isFlat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a possibility for mutation, I would remove the caching because we'll never remember to make a change here if we do get to a case where mutation happens. I also doubt that these get called enough to make a significant performance difference.
|
||
self._schema = _td_to_columns(self._definition) | ||
return self._schema | ||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary reason to cache is to prevent jpy java calls. In all the cases where cache_property isn't used, it's either all pure python, or the delegated layer is handling the caching.
"""The column names of the table.""" | ||
return list(self.definition.keys()) | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming you guys get on the same page regarding deprecation/breaking changes, etc.
@@ -327,35 +326,6 @@ def _j_array_to_series(dtype: DType, j_array: jpy.JType, conv_null: bool) -> pd. | |||
|
|||
return s | |||
|
|||
def j_table_definition(table_definition: Union[Dict[str, DType], List[Column], None]) -> Optional[jpy.JType]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this internal before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have a _
, so it wasn't fully private / internal. It is possible someone used it, so it would be a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added it back with deprecation for removal warning.
@@ -71,7 +71,7 @@ def _validate(inputs: Input, outputs: Output, table: Table): | |||
input_columns_list = [input_.input.getColNames()[i] for input_ in inputs for i in | |||
range(len(input_.input.getColNames()))] | |||
input_columns = set(input_columns_list) | |||
table_columns = {col.name for col in table.columns} | |||
table_columns = set(table.definition.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not order-preserving. Why is Python like this? dict
is ordered, but set
isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous set construction was not order preserving either. Looking at the code, the order doesn't matter, it's just after fast check (and I think intersection / difference).
py/server/deephaven/numpy.py
Outdated
@@ -95,7 +95,7 @@ def to_numpy(table: Table, cols: List[str] = None) -> np.ndarray: | |||
if table.is_refreshing: | |||
table = table.snapshot() | |||
|
|||
col_def_dict = {col.name: col for col in table.columns} | |||
col_def_dict = table.definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline and dump the var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The renamed the variable, but it's a simpler change to not inline... Looking at the method, I think there is room for further improvements, but I don't want to muck with any of this logic atm.
py/server/deephaven/table.py
Outdated
@cached_property | ||
def is_refreshing(self) -> bool: | ||
"""Whether this table is refreshing.""" | ||
if self._is_refreshing is None: | ||
self._is_refreshing = self.j_table.isRefreshing() | ||
return self._is_refreshing | ||
return self.j_table.isRefreshing() | ||
|
||
@property | ||
@cached_property | ||
def is_blink(self) -> bool: | ||
"""Whether this table is a blink table.""" | ||
return _JBlinkTableTools.isBlink(self.j_table) | ||
|
||
@property | ||
@cached_property | ||
def update_graph(self) -> UpdateGraph: | ||
"""The update graph of the table.""" | ||
if self._update_graph is None: | ||
self._update_graph = UpdateGraph(self.j_table.getUpdateGraph()) | ||
return self._update_graph | ||
return UpdateGraph(self.j_table.getUpdateGraph()) | ||
|
||
@property | ||
@cached_property | ||
def is_flat(self) -> bool: | ||
"""Whether this table is guaranteed to be flat, i.e. its row set will be from 0 to number of rows - 1.""" | ||
if self._is_flat is None: | ||
self._is_flat = self.j_table.isFlat() | ||
return self._is_flat | ||
return self.j_table.isFlat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm convinced the caching is fine. But I also agree that we are likely not calling these anywhere performance critical.
input_data: Any = field(default=None) | ||
|
||
def __post_init__(self): | ||
def j_object(self) -> jpy.JType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, but do a quick check.
return ColumnType(self.j_column_definition.getColumnType()) | ||
|
||
|
||
class Column(ColumnDefinition): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with deprecation with a plan to remove in the next release. Create a ticket for removal and assign it to one of us.
@@ -327,35 +326,6 @@ def _j_array_to_series(dtype: DType, j_array: jpy.JType, conv_null: bool) -> pd. | |||
|
|||
return s | |||
|
|||
def j_table_definition(table_definition: Union[Dict[str, DType], List[Column], None]) -> Optional[jpy.JType]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have a _
, so it wasn't fully private / internal. It is possible someone used it, so it would be a breaking change.
with self.subTest("from table definition"): | ||
append_only_input_table = input_table(col_defs=col_defs) | ||
append_only_input_table = input_table(col_defs=t.definition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This item needs some attention.
Labels indicate documentation is required. Issues for documentation have been opened: Community: deephaven/deephaven-docs-community#295 |
deephaven.table.TableDefinition
: new python wrapper forio.deephaven.engine.table.TableDefinition
deephaven.column.ColumnDefinition
: new python wrapper forio.deephaven.engine.table.ColumnDefinition
deephaven.table.TableDefinitionLike
: new type alias to allow for consistent public APIs and conversions intodeephaven.table.TableDefinition
deephaven.column.Column
: deprecated for removaldeephaven.jcompat.j_table_definition
: deprecated for removaldeephaven.stream.kafka.consumer.json_spec
:cols_defs
specified asList[Tuple[str, DType]]
deprecated for removalFixes #4822
BREAKING CHANGE:
deephaven.column.InputColumn
no longer inherits fromdeephaven.column.Column
; as such, it no longer exposes Column's attributes. This is unlikely to affect users as InputColumn is really a structure meant to supportnew_table
.