-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10 add support for condominium sale flagging #13
10 add support for condominium sale flagging #13
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall good work @wagnerlmichael! My main concern here is that there's already a bit of code duplication across the various flagging scripts, and this PR really only adds to that. I would refactor a little bit to try to reduce some of the larger duplicate bits. Hopefully Jean's concurrent work on #17 will let us reduce some of the duplicate code as well.
See my comments for necessary changes, but don't get hung up on them. Would rather get this merged and then fixed up later than sit on this PR for ages.
When you merge this, let's keep the merge history (rather than squashing) since this is a pretty big change.
As a general aside, this is a pretty large PR and isn't super easy to review. In the future, try to:
- Break up something like this into a series of smaller PRs
- Add PR comments for context on any thing that might be confusing
f"sv_price_per_sqft_deviation_{group_string}" | ||
).get(key) | ||
sq_lower, sq_upper = sq_std_range | ||
if condos == True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (non-blocking): To the extent possible, I would try to control flow only the parts of this function that depend on having square footage. There's a lot of duplicate code here since you're basically making two copies of the same function with a very marginal difference between them. This also applies in other places i.e. pricing_info()
. However, could be that I'm missing a constraint that makes such refactoring unwieldy!
if condos == True: | ||
conditions = [ | ||
(df["sv_short_owner"] == "Short-term owner") | ||
& (df["sv_pricing"].str.contains("High")), | ||
(df["sv_name_match"] != "No match") | ||
& (df["sv_pricing"].str.contains("High")), | ||
(df["sv_transaction_type"] == "legal_entity-legal_entity") | ||
& (df["sv_pricing"].str.contains("High")), | ||
(df["sv_anomaly"] == "Outlier") & (df["sv_pricing"].str.contains("High")), | ||
(df["sv_pricing"].str.contains("High price swing")), | ||
(df["sv_pricing"].str.contains("High")), | ||
(df["sv_short_owner"] == "Short-term owner") | ||
& (df["sv_pricing"].str.contains("Low")), | ||
(df["sv_name_match"] != "No match") | ||
& (df["sv_pricing"].str.contains("Low")), | ||
(df["sv_transaction_type"] == "legal_entity-legal_entity") | ||
& (df["sv_pricing"].str.contains("Low")), | ||
(df["sv_anomaly"] == "Outlier") & (df["sv_pricing"].str.contains("Low")), | ||
(df["sv_pricing"].str.contains("Low price swing")), | ||
(df["sv_pricing"].str.contains("Low")), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Again, try to reduce the duplication of code here. I would define the set of conditions and labels shared by both res and condos, then just use .insert()
to add list elements for condos.
|
||
df_condo_flagged = go( | ||
df=df_condo_to_flag, | ||
groups=tuple(stat_groups_list), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: We actually don't want condos to use class as a grouping variable, as there are very few 297s or 399s. Instead, all condo classes should be considered one class, then partitioned/grouped by township. Let's table this for now and break it out into a separate issue so as not to block this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know about this. I'm happy to make an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wagnerlmichael Please do make a separate issue for this.
@@ -173,7 +172,7 @@ def pricing_info( | |||
) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was able to refactor pricing_info()
a bit. price_column()
was giving me trouble so I skipped it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup @wagnerlmichael! Really condensed the code quite a lot this round. See my minor comments/suggestions. Otherwise, this looks good to go.
if not condos: | ||
df = deviation_dollars(df, groups) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Can you collapse this into a single conditional with line 399?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it seems like this needs to be run in this order for proper column name creation.
…sale-flagging 10 add support for condominium sale flagging
First pass at condo inclusion. Largely incorporated from Billy's changed to the flagging code here with boolean switches.
Everything seems to run well, tested
initial_flagging.py
,manual_update.py
, and the glue job.