Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new diffusive_utils file without features designed only for refactored hydrofabric #710

Merged
merged 2 commits into from
Dec 8, 2023

Conversation

kumdonoaa
Copy link
Contributor

The current diffusive_utils.py includes features utilizing refactored hydrofabric that was originally developed for removing short stream segments of NHDv2.0 hydrofabric of NWMv3.0. The removal was mainly for reducing routing compute time and possible numerical instability. The development of refactored hydrofabric wasn't completed and was stopped. Now the nextgen HYfeature hydrofabric replaces the refactored hydrofabric, so features related to the refactored hydrofabric aren't needed any more. So the cleaned diffusive_utils_v02.py is created to be used for both NHDv2.0 hydrofabric and HYfeature hydrofabric. More features to come in this file for HYfeature pretty soon.

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

Copy link
Contributor

@shorvath-noaa shorvath-noaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ran fine for me. While you're updating this file, do you think we should fix this warning message?:

t-route/src/troute-routing/troute/routing/diffusive_utils_v02.py:539: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling "frame.insert" many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use "newframe = frame.copy()" usgs_df_complete.insert(i, timestamps[i], -4444.0*np.ones(len(usgs_df)), allow_duplicates=False)

@kumdonoaa
Copy link
Contributor Author

kumdonoaa commented Dec 7, 2023

Yes. Sean. please take a try if you have time after this PR is merged, I think.

@AminTorabi-NOAA
Copy link
Contributor

AminTorabi-NOAA commented Dec 7, 2023

I have a fix for it: In line 536 instead of for loop replace it with this

missing_timestamps = [ts for ts in timestamps if ts not in usgs_df.columns]
    
        missing_data = pd.DataFrame(-4444.0*np.ones((len(usgs_df), len(missing_timestamps))), 
                                    columns=missing_timestamps, 
                                    index=usgs_df.index)
usgs_df_complete = pd.concat([usgs_df_complete, missing_data], axis=1)`

@shorvath-noaa
Copy link
Contributor

@kumdonoaa , @AminTorabi-NOAA 's fix looks good. Might want to add another line after to make sure columns are in the correct order. Something like:
usgs_df_complete = usgs_df_complete[timestamps]


usgs_df_complete = usgs_df.replace(np.nan, -4444.0)

for i in range(len(timestamps)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of for loop because we keep adding column to dataframe it slow it down and give warning. This should solve the issue

        missing_timestamps = [ts for ts in timestamps if ts not in usgs_df.columns]
    
  
        missing_data = pd.DataFrame(-4444.0*np.ones((len(usgs_df), len(missing_timestamps))), 
                                    columns=missing_timestamps, 
                                    index=usgs_df.index)
        
       
        usgs_df_complete = pd.concat([usgs_df_complete, missing_data], axis=1)

Copy link
Contributor Author

@kumdonoaa kumdonoaa Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. As Sean suggested you can add usgs_df_complete = usgs_df_complete[timestamps] at the end of those lines too

Copy link
Contributor

@JurgenZach-NOAA JurgenZach-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it, and it works.

@kumdonoaa kumdonoaa merged commit 07d511b into NOAA-OWP:master Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants