-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Issue 13956 #13980
fix: Issue 13956 #13980
Conversation
a16244a
to
7d7d16e
Compare
Codecov Report
@@ Coverage Diff @@
## master #13980 +/- ##
==========================================
- Coverage 79.40% 79.30% -0.11%
==========================================
Files 938 938
Lines 47541 47523 -18
Branches 5940 5940
==========================================
- Hits 37749 37687 -62
- Misses 9666 9710 +44
Partials 126 126
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
7d7d16e
to
2dcb5cf
Compare
|
||
|
||
def downgrade(): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we impl a down migration here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing to downgrade to, i.e., the upgrade removes all the key and associated values from the extra blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but if one needed to revert, would that even make sense to put the keys back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@craig-rueda the key is optional and thus unnecessary to re-add if a downgrade was performed.
DATASET_HEALTH_CHECK = None | ||
# A SQL dataset health check. Note if enabled the callable should be cached to aid with | ||
# performance. | ||
DATASET_HEALTH_CHECK: Optional[Callable[["SqlaTable"], str]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be a BaseDatasource
not a SqlaTable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etr2460 this functionality is only provided for the SQLA connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice do add an health check callable example here, or add it to the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgaspar I added an example.
check = config["DATASET_HEALTH_CHECK"] | ||
return check(self) if check else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's more pythonic? this or:
check = config["DATASET_HEALTH_CHECK"] | |
return check(self) if check else None | |
return (config["DATASET_HEALTH_CHECK"] or lambda: None)(self) |
is this even valid python? 🤷
superset/config.py
Outdated
# It will get executed each time when user open a chart's explore view. | ||
DATASET_HEALTH_CHECK = None | ||
# A SQL dataset health check. Note if enabled the callable should be cached to aid with | ||
# performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If the healthy status is in
cache
but not persist in database, when restart server will we lost all the status? - right now we can run some queries to see healthy status. if use
cache
solution we will not be easy to know this number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The Redis (or similar) cache rarely is restarted.
- It's definitely less clear though possible via a Redis (or similar) query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution makes sense. I guess the lesson is there should be less implicit side-effects.
if config.get("DATASET_HEALTH_CHECK") | ||
else None | ||
) | ||
data_["health_check_message"] = self.health_check_message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_["health_check_message"] = self.health_check_message | |
data_["health_check_message"] = self.check_health() |
If this will run an external function every time this line is visited, maybe it's better to change this property to an actual function call so it "looks" more expensive...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ktmud I thought about that. I think we could also just use @functools.lru_cache
for the property so at least the property is cached locally.
@dpgaspar Do you have any suggestion for this issue and fix? When a user open explore view to see a chart, we want to add a health check for the sql in the datasource, and want to save the check results into dataset |
Removing the |
@robdiciuccio I can break the migration into a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment to improve docs
DATASET_HEALTH_CHECK = None | ||
# A SQL dataset health check. Note if enabled the callable should be cached to aid with | ||
# performance. | ||
DATASET_HEALTH_CHECK: Optional[Callable[["SqlaTable"], str]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice do add an health check callable example here, or add it to the docs
On second thought, I'm still a little torn between having Superset manage the cache vs shifting the responsibility to the end users. IMO the config file should be as lightweight as possible. The more logics we force users to put in there, the more likely things would break. I'm wondering whether we can utilize sqla's |
I haven't tested it but explicitly setting the changed_by_fk on a separated merge maybe? |
@ktmud I prefer the proposed approach as it provides the user with a lot more flexibility and per the PR description it means that the health check can relate to the actual underlying data and just not the datasource metadata. Squeezing cached datasource health check information into the datasource record (the current approach) has pros and cons. Additionally by defining a caching strategy one can leverage Flask-Caching to clear the cache whenever the function byte-code changes and thus there's no longer a requirement for the user to ensure that they'll need to bump the version. Finally in terms of making the config as light as possible, this hasn't been the case in the past. Superset historically has provided endless flexibility (via callbacks etc.) in the configs which places the ownership on the user which can be problematic if implemented incorrectly, but the upside can be tremendous. |
db migration can only fix the old records right? right now the behavior is: when user opens explore view, health check function will add some message in the "extra" column, and the dataset's changed_by_fk will be changed to chart viewer. |
b009f90
to
9ab9ca8
Compare
Actually @robdiciuccio reading through SIP-57 this change is non-breaking, i.e., the functionality remains unchanged, though performance could be degraded if the callback function is expensive and non-cached. |
e769fec
to
6164d19
Compare
6db8674
to
f5bbaf9
Compare
@dpgaspar @etr2460 @graceguo-supercat @ktmud et al. this is now ready for re-review. Note although this fix is not perfect it does:
|
df92188
to
9a7a7fe
Compare
# | ||
# if cache_manager.cache.get(name) != code: | ||
# cache_manager.cache.delete_memoized(func) | ||
# cache_manager.cache.set(name, code, timeout=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice example!
superset/views/core.py
Outdated
@@ -734,7 +734,7 @@ def explore( # pylint: disable=too-many-locals,too-many-return-statements,too-m | |||
|
|||
# if feature enabled, run some health check rules for sqla datasource | |||
if hasattr(datasource, "health_check"): | |||
datasource.health_check() | |||
datasource.health_check_message = datasource.health_check() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this override the default @property
getter or just update the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ktmud good call. Lines 735–737 can actually be removed.
superset/views/core.py
Outdated
@@ -734,7 +734,7 @@ def explore( # pylint: disable=too-many-locals,too-many-return-statements,too-m | |||
|
|||
# if feature enabled, run some health check rules for sqla datasource | |||
if hasattr(datasource, "health_check"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be always false because you remove the health_check
method/attribute from SqlaDatasource
?
session = db.Session(bind=bind) | ||
|
||
for datasource in session.query(SqlaTable): | ||
if datasource.extra: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if datasource.extra: | |
if datasource.extra and "health_check" in datasource.extra: |
This should limit following ops to a smaller subset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is risky as your checking for the existence of a substring in a JSON encoded string rather than the existence of a key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is. It at least saves json.loads
op for datasources without health_check info, but can be guaranteed to be correct as long as you keep all following steps.
To really save some database IO, we can do the pre-filtering in the SQLA query step:
for datasource in session.query(SqlaTable).filter(
SqlaTable.extra.like('%"health_check"%')
):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Granted the downstream logic ensures that the string is actually from a JSON key (I misspoke earlier). That said this migration only takes a few seconds and thus I think the existing logic is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me as well. Just thought it was a good practice to always do pre-filtering when running db migrations.
3d4b74a
to
cc89ef5
Compare
cc89ef5
to
57f4d74
Compare
Co-authored-by: John Bodley <john.bodley@airbnb.com> (cherry picked from commit a3b41e2)
Co-authored-by: John Bodley <john.bodley@airbnb.com>
Co-authored-by: John Bodley <john.bodley@airbnb.com>
SUMMARY
Fixes #13956. More specifically the issue with "caching" the the dataset health check message (#11970) on the associated datasource record was that if the callback was defined, the first time a user when to explorer the record was updated with the callback response where the user was defined as the updater.
Additionally tying the health check solely to the metadata of the datasource is likely too restrictive, i.e., it seems plausible that an institution may want to have custom logic which actually introspects the underlying data. Hence the fix was to remove the caching on the datasource record and instead place on ownership on the institution to write the necessary logic for memoizing the callback (which includes the specific datasource as an argument).
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TEST PLAN
CI.
ADDITIONAL INFORMATION