-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Or property filtering API #8012
Conversation
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
If we add OR filters, this leaves a few of the current assumptions we make in ambiguous states.
|
@@ -11,7 +11,7 @@ | |||
|
|||
class PropertyMixin(BaseParamMixin): | |||
@cached_property | |||
def properties(self) -> List[Property]: | |||
def properties(self) -> Union[List[Property], List[List[Property]]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR will come down to typing making sense IMO.
Let's rename current properties to something like VanillaProperty
.
I think List[AnyProperty]
is the proper value here, with AnyProperty = Union[VanillaProperty, OrProperty]
OrProperty should be a separate class from the property containing neccessary values.
How I imagine the class will look like:
class OrProperty:
type = 'or'
def __init__(properties: List[AnyProperty]): # ...
From the frontend you'd pass a payload like properties: [{ type: 'or', properties: [...] }]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm i see, will give that a shot when i loop back around to this
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
(some context below to familiarise with the problem, more a stream of consciousness as I discover/remember high school boolean logic. I could simply write the final conclusion I arrive on, but that misses valuable context & tradeoffs imo, so bear with me) I think this still isn't flexible enough for all use cases? We have enough feedback here to know what all to support (pending UX stuff, which might indeed change this decision, but there's still an argument to make around making the backend as flexible as possible, such that all future iterations on OR property filtering are frontend/UX bound only) What we currently have is a default AND between members of a list. A terrible way to introduce ORs is to keep a single list with a property type per property, like:
The fundamental problem is that The way around this, if we want to preserve the existing data model is to only allow everything to be AND'ed or OR'ed together. This sounds not-flexible-enough to me. One step above this is the current solution in this PR: We have two lists, the outer list is implicit AND, and inner lists are implicit OR:
This is thankfully deterministic, and reasonably flexible, but misses this entire other class of combinations: Outer ORs with inner ANDs. We can convert the above into this by using De Morgan's Law, and the NOT operator. But going down this route doesn't seem reasonable to me: It adds a lot more complexity around ensuring transformations are correct, figuring out Negation of properties, and also raises the question of where and how this happens (how does UI tell us this, and if the UI is general enough, there's no good reason for the backend to restrict itself to a constrained data model) But, there's another class of property filtering this can't(?) account for. Something like: At this point, worth mentioning(remembering) a very important theorem from boolean logic: Any boolean function may be written using only two levels of logic and possible negation of variables. What this means is that the conjunctive normal form, or the representation we have in this PR, is good enough to represent any boolean operation on any number of operands.
The crux for the data model then is such: AND of ORs is good enough, if we're okay with running transformations on what users input. But, the user input is also in our control, so what if the UI restricts filters to only AND of ORs, and negations? This is technically enough, but here's the contention: It's terrible UX for users to do these transformations themselves. "You want to see a trend line of domain A, B, and D only for country C?" "Yeah, 100% possible, you just need to insert: Not of Not domain A or Not country C, AND, not domain B or not country C, AND not domain D or not Country C; et viola!". I think users shouldn't have to do this. Which begs the question, do we have to do this internally? What if we let users say whatever, and instead of simplifying it, just parse out the SQL as it comes out? Our internal representation then is something like an AST: ORs and ANDs of whatever properties users select, created into a tree, and parsed to spit out an SQL that works, sans clever mods. (I don't know yet what implications this has for query running time, or if it's better/worse than CNF form queries) Another option is to allow both CNF and DNF forms (AND of ORs, and ORs of ANDs), which might help make a bit more intuitive sense without allowing any deeply nested combos, and still be generic enough for most use cases. For the example above, this means we stop before deMorgan's, at Opening this up to discussion, I can be convinced that the premise is wrong and we don't want to have as flexible as this APIs, i.e. either CNF or DNF is good enough & we don't need both / something more generic like AST. For @clarkus : UX wise, the theorem implies that you can do whatever you want with 2 levels of logical operators only. I think it's reasonable to restrict it to 2 levels. |
Here is the latest idea for balancing simplicity of composing filters with the power of 2 levels of logical operators. The component here uses groups with group-level operators that apply to all conditions. These are and / or groups. When multiple groups exist, we'll expose a higher level control that allows for matching all / any for the entire set of filters. This should allow us to meet all the demonstrated cases so far. |
🎉 yupp, this is what I was trying to get at with this PR because I was going to just draw a line that it would be sufficient but your following points are 100% correct I think @macobo's point was onto something though. Just have an (AND, OR) along with a group of properties. Pair this with @clarkus's switchers and some method for creating groups 1 layer deep (just an ability to +property within a group or a new group) and you should have all the abilities listed above. The only pattern missing is adding a negation on an entire group but this would be covered per demorgans law with the switcher EDIT: just was writing this as the previous comment got posted too. |
Makes sense, this appears the best of both worlds to me so far as well! Thought it made sense to line out all usecases + tradeoffs here. And yep, AST is overkill, and what Karl mentioned sounds reasonable enough:
|
Latest updates to improve affordance for operators. This also demonstrates the workflow for the updated component. You'll note that last group has different icon actions. I was wondering if there are any other group level utility actions for a filter group. We have duplication and removal now, but is there anything else that could add value? Also of note is that we need a default for the match any / all control at the global level. I would default towards being more inclusive (any). Please share thoughts here or in figma. |
ee/clickhouse/models/property.py
Outdated
params.update(filter_params) | ||
|
||
return " ".join(final), params | ||
# TODO: clean | ||
joined = f"AND ({' '.join(final).replace(property_operator, '', 1)})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically want to AND whatever comes out of this function from now on, so other filters, like team_id & datetimes etc. etc. can be ANDed as expected.
Earlier this wasn't a problem since everything was ANDed anyway
ee/clickhouse/models/property.py
Outdated
@@ -83,17 +175,19 @@ def parse_prop_clauses( | |||
"{}person".format(prepend), | |||
prop_var="person_props" if is_direct_query else "properties", | |||
allow_denormalized_props=allow_denormalized_props and is_direct_query, | |||
property_operator="AND", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super tricky. This is a solo condition that's always pushed into a subquery so it needs to be AND. It's unrelated from the larger logic for the overall condition it's forming (addressed in test_parse_groups_persons
)
Made some more progress on the grouping logic + tests. It should be able to handle any depth, not that it's necessary. TODO:
|
Thanks! I've been doing the filter parsing today. Not quite sure yet how to combine things (and whether to use properties & Not yet sure either about the migration either: Do we want to do this change on the DB, or can we perhaps make |
(oh and moved a few things around to prevent circular imports). Test currently fail because we removed that first AND from parse_prop_clauses , which is ok. I'm also wondering about scope with the switch-over: This PR would become absolutely massive if we try and do everything in here. This is attracting me towards the partial approach: Make some things support the new stuff, and transform old props into new props for that piece of code. Break up the PRs. Continue. New insights being modded save properties as new type(? not sure how complicated this gets to split, idk yet). And then, once we know this works well, decide to migrate old props to new. |
) | ||
if is_direct_query: | ||
final.append(filter_query) | ||
params.update(filter_params) | ||
else: | ||
# Subquery filter here always should be blank as it's the first | ||
filter_query = filter_query.replace(property_operator, "", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neilkakkar I changed the above "AND" back to the variable so it's less confusing and made it remove the operator. GET_DISTINCT_IDS_BY_PROPERTY_SQL
doesn't need an operator because i removed the dummy 1 = 1
I think this is ready for a lookover to ship @neilkakkar. since we don't need a migration, this should be able to support a new frontend directly. But, we need to do some checks on how the caching logic works to make sure everything is accounted for |
Thanks for fixing this bit^! Clever, to not worry about generating the entire valid test case, and focusing only on the direct_query part. Fix looks reasonable to me. Tested out running cache update, dashboards and saved insight refreshes, everything seems to work okay. Only new change I made: renaming Agreed though, that we'll need new checks and tests for updating caching, etc. when we start saving new filters with |
This issue has 2004 words at 15 comments. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:
Is this issue intended to be sprawling? Consider adding label |
Further, since we're making no db changes, this change, if things do go wrong, should be very easy to rollback. |
Actually, just reminded of one thing (which you rightly mentioned 15 days ago): How does this now work with simplifying filters? It should be fine for now since default properties are ANDed, but when frontend starts sending |
The cool thing about our recursive structure now is that we could easily just add another nesting so that entity properties and test account properties are an AND group with the rest of the |
That's neat! I think I'm in favor of leaving it as an |
Yep, makes sense! Just meant to note that this behaviour isn't implemented here, but something we definitely would need to do before releasing groups. This will probs change lots of simplify_filters logic. Oh, also, a third place to keep in mind: |
Definitely! I think the concern is that this can be confusing (just simply ANDing) when filters also have ORs in them. So need a way to tell users that test account filters are alwayss ANDed. |
|
||
|
||
def is_property_group(group: Union[Property, "PropertyGroup"]): | ||
if isinstance(group, PropertyGroup): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return isinstance(group, PropertyGroup)
Or even inline?
return final, final_params | ||
|
||
|
||
def is_property_group(group: Union[Property, "PropertyGroup"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a smell in that PropertyGroup just wants to be another kind of Property rather than it's own entity.
I suggested this in my initial review - not sure why a different approach was chosen but I strongly disagree with it.
if len(property_group.groups) == 0: | ||
return "", {} | ||
|
||
if isinstance(property_group.groups[0], PropertyGroup): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious why this is correct - why can you only send one PropertyGroup at a time?
This isn't verified anywhere -> leads to unexpected behavior. Note that these APIs are not completely private and will be more documented and public long-term
At a minimum, an assert should be here.
This PR is pretty broken:
I also think data structures need improvements - typing-wise propertygroups should just be a different kind of properties rather than it's own object. This makes it easier to make sure that all code consuming property objects is correct and handles every case. The current implementation extends the API of filter objects, making it easy to miss cases like the above and introduce bugs long-term. Given how far and wide |
Changes
Please describe.
add support for or filtering
OR properties filtering #5035
This PR doesn't change how the frontend needs to be structured. It sets up the logic to handle grouped filters but is compatible with the properties array as is. Follow up PRs will create migrations such that filters are stored in the new format
Current property payload:
All properties in the array are AND'd by default
Proposed structure:
There will be a group operator that determines the relationship between properties within the list.
property_groups
is a new accepted filter parameter. When theproperties
field is empty,property_groups
will be parsedHow did you test this code?
Please describe.
-add tests