-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A new diffusive_utils file without features designed only for refactored hydrofabric #710
A new diffusive_utils file without features designed only for refactored hydrofabric #710
Conversation
…drofabric found in diffusive_utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ran fine for me. While you're updating this file, do you think we should fix this warning message?:
t-route/src/troute-routing/troute/routing/diffusive_utils_v02.py:539: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling "frame.insert" many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use "newframe = frame.copy()" usgs_df_complete.insert(i, timestamps[i], -4444.0*np.ones(len(usgs_df)), allow_duplicates=False)
Yes. Sean. please take a try if you have time after this PR is merged, I think. |
I have a fix for it: In line 536 instead of for loop replace it with this
|
@kumdonoaa , @AminTorabi-NOAA 's fix looks good. Might want to add another line after to make sure columns are in the correct order. Something like: |
|
||
usgs_df_complete = usgs_df.replace(np.nan, -4444.0) | ||
|
||
for i in range(len(timestamps)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of for loop because we keep adding column to dataframe it slow it down and give warning. This should solve the issue
missing_timestamps = [ts for ts in timestamps if ts not in usgs_df.columns]
missing_data = pd.DataFrame(-4444.0*np.ones((len(usgs_df), len(missing_timestamps))),
columns=missing_timestamps,
index=usgs_df.index)
usgs_df_complete = pd.concat([usgs_df_complete, missing_data], axis=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AminTorabi-NOAA @shorvath-noaa Do these lines replace existing lines 537~539?:
https://github.com/kumdonoaa/t-route/blob/4c56cc615b49f6cc75ede3c6b8879689940bee9a/src/troute-routing/troute/routing/diffusive_utils_v02.py#L537-L539
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. As Sean suggested you can add usgs_df_complete = usgs_df_complete[timestamps]
at the end of those lines too
…eature only with syn x-sec for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it, and it works.
The current diffusive_utils.py includes features utilizing refactored hydrofabric that was originally developed for removing short stream segments of NHDv2.0 hydrofabric of NWMv3.0. The removal was mainly for reducing routing compute time and possible numerical instability. The development of refactored hydrofabric wasn't completed and was stopped. Now the nextgen HYfeature hydrofabric replaces the refactored hydrofabric, so features related to the refactored hydrofabric aren't needed any more. So the cleaned diffusive_utils_v02.py is created to be used for both NHDv2.0 hydrofabric and HYfeature hydrofabric. More features to come in this file for HYfeature pretty soon.
Additions
Removals
Changes
Testing
Screenshots
Notes
Todos
Checklist
Testing checklist
Target Environment support
Accessibility
Other