Tabulator: custom datetime sorter #3839
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As already pointed out before Tabulator JS defines its own sorters, which can behave differently from
df.sort_values
. The result of these two sorters was different when applied to datetime columns that contain NaN values (this causes issues with the index of edit/click events returned when the sorters aren't aligned on their result...). Datetime cols are actually serialized as UNIX timestamps by Bokeh, and NaN values to-9223372036854776
(see bokeh/bokeh#10448). Tabulator JS was using its number built-in sorter to sorter this kind of column in the end, the NaN values being first when sorting in an ascending way. The default behavior ofdf.sort_values
is put NaNs at the end (see https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_values.html#pandas.Series.sort_values). This is where the two sorting algorithms were diverging. Pandas'df.sort_values
has thena_position
parameter that can be set to'first'
, except that it applies to all the columns which I didn't want to do that now, as it affects all the dtypes. Instead this PR adds a custom sorter for datetime objects which always puts the NaNs at the end asdf.sort_values
.