-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Change datatype of column type in BaseColumn to allow larger datatype names for complexed columns #17360
fix: Change datatype of column type in BaseColumn to allow larger datatype names for complexed columns #17360
Conversation
… names for complexed columns
Codecov Report
@@ Coverage Diff @@
## master #17360 +/- ##
==========================================
- Coverage 68.11% 68.07% -0.05%
==========================================
Files 1653 1653
Lines 66374 66374
Branches 7121 7121
==========================================
- Hits 45211 45182 -29
- Misses 19266 19295 +29
Partials 1897 1897
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Should we use a VARCHAR here with a longer char limit instead of TEXT? I think there are performance implications of using TEXT, but i'm not knowledgeable enough of a backend engineer to know for sure |
We could... but how longer? We deal with complexed columns (think many levels of nested elements) and the schema becomes the default type when saving the query as a dataset in sqllab. Others reported similar symptoms in the above issues. But yeah, more than happy to hear from the engineers for the potential side effects of this change. |
how long are we talking? more than 255 characters? more than 1000? sorry, i don't know just how complex this can get |
I'm wondering if we can have two columns for this... store 95% cases in VACHAR with a reasonable limit and use TEXT to store large ENUMs and more advanced structs. Then there can be some helper functions to translate the complex types to more generic types to be used by the UI. |
I have use cases with more than 3000 characters, hard to predict. |
❗ Please consider rebasing your branch to avoid db migration conflicts. |
I know for fact that for Postgres there's no cost in using TEXT instead of VARCHAR, and it might even be faster in some cases. Not sure about MySQL and other DBs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
For reference, in the models for SIP-68 I'm also using TEXT for column types.
It's also my experience that VARCHAR and TEXT have pretty similar performance on all databases I've used. I don't think it will have any performance impact in this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - just wondering if we should add a note in UPDATING.md
, as this migration will probably may take some time to complete on large deployments?
Thanks to people smarter than I double checking the perf implications. We definitely should have a note in UPDATING though, as this will probably be a table lock on mysql dbs. otherwise, lgtm |
IIRC, there does have some implications on search performance for VARCHAR vs TEXT in MySQL. Search w/ indexing is either not possible or much slower doe TEXT depending on which storage engine you use and which MySQL version you are on. |
Sorry, I'm new to this process, should I write this note? |
Thanks for you input, I appreciate as this is not my expertise, I still need guidance whether or not we should use a VARCHJAR with a higher number or a TEXT for that specific column. |
❗ Please consider rebasing your branch to avoid db migration conflicts. |
Yeah, just add it to the file (https://github.com/apache/superset/blob/master/UPDATING.md) under the next release. |
Looks like with MySQL we can use still Performance-wise there seems to be an extra cost when operating on Since we don't know the maximum expected size of this column I think it's OK to:
|
|
❗ Please consider rebasing your branch to avoid db migration conflicts. |
❗ Please consider rebasing your branch to avoid db migration conflicts. |
Can someone take a look at this PR, some checks didn't pass for obscure reasons but other than that, seems ready to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cccs-joel this looks great, but we need to update the down revision in your migration before merging.
|
||
# revision identifiers, used by Alembic. | ||
revision = "3ba29ecbaac5" | ||
down_revision = "b92d69a6643c" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your tests are failing because your migration is now introducing a second HEAD. You can change the down revision here and in line 20 to abe27eaf93db
(which you can see if you run superset db heads
).
Accept proposed changes. Co-authored-by: Beto Dealmeida <roberto@dealmeida.net>
❗ Please consider rebasing your branch to avoid db migration conflicts. |
SUMMARY
Some columns such as the ones representing a complex structure (array, struct, enum or a combination of these) may require more than 32 chars to store the datatype. Changing datatype to TEXT and no limit was suggested by @villebro in the 1st associated issue listed below.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Before:
After:
TESTING INSTRUCTIONS
After the migration, easiest way to test is to edit an existing dataset and change the type value of a column (using the legacy datasource editor) to something larger than 32 characters, Superset should accept the change and confirm the row was changed in the database.
ADDITIONAL INFORMATION