Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve reprs for key Parcels classes #1693

Closed
5 tasks done
erikvansebille opened this issue Sep 6, 2024 · 6 comments · Fixed by #1766
Closed
5 tasks done

Improve reprs for key Parcels classes #1693

erikvansebille opened this issue Sep 6, 2024 · 6 comments · Fixed by #1766
Labels

Comments

@erikvansebille
Copy link
Member

erikvansebille commented Sep 6, 2024

So far, we haven't really worked on the output of print() on objects like Field, FieldSet etc. While there is a ParticleSet.__repr__, even that is pretty clunky and doesn't take into account notebook functionality.

It would be nice to have better print() implementation for at least the following classes

  • Field
  • FieldSet
  • ParticleSet
  • Particle
  • ParticleFile
  • Kernel(?) EDIT Vecko: Out of scope
@rehamansoor
Copy link

Hi,
I’d like to contribute to this issue. Could you please specify how you would like the print() implementation to be improved for the Field, FieldSet, and ParticleSet classes?

@erikvansebille
Copy link
Member Author

Thanks for offering to contribute to this Issue, @rehamansoor! I don't have a very clear idea what to print in mind yet, but was inspired by the html-version __repr__ in xarray (see e.g. pydata/xarray#1627). So a table-like/collapsable view that shows the attributes of these objects that are relevant to users

Since a FieldSet is a collection of Fields, I suggest we start with a Field.__repr__? This should include information on the name, grid, dimension sizes, interpolation_method etc.
@VeckoTheGecko, since you are working on redefining the API, perhaps you also have ideas what attributes to show in a notebook?

@VeckoTheGecko
Copy link
Contributor

VeckoTheGecko commented Sep 10, 2024

Hi @rehamansoor, great to hear that you’re interested in this issue.

Before the wall of text that is about to follow, please let me know if you have any questions about anything below, or if you'd like any clarification. I'm not sure what your Python level is so - if you feel like this is a bit much for where you're at - let me know (there are issues in the codebase that are better suited to new Python developers 😁). I imagine these won't all be fixed in one PR, or by the same person.

Useful reference

TLDR: I think we shouldn't do HTML outputs for now. Let's focus on text reprs

HTML repr outputs would be amazing in theory, but after doing some diving into the implementation in xarray (see entrypoint ds._repr_html_() for the curious) their implementation is complex requiring custom CSS (vendored in the package itself in a static folder), HTML generation (including escaping characters to produce valid HTML), and they even use custom decorators to drive the reprs.

import xarray as xr
import numpy as np
import pandas as pd

ds = xr.Dataset(
    data_vars=dict(
        temperature=(["loc", "time"], np.random.randn(2, 4)),
        precipitation=(["loc", "time"], 10 * np.random.rand(2, 4)),
    ),
    coords=dict(
        lat=("loc", [42.25, 42.21]),
        time=pd.date_range("2014-09-06", periods=4),
        reference_time=pd.Timestamp("2014-09-05"),
    ),
    attrs=dict(description="Weather related data."),
)

print(ds._repr_html_())

I think there is quite some room for improvement in our text reprs (we'll need them anyway for scripts) and down the line see if there is a need for HTML ones as well (/investigate how we can better customise iPython notebooks output). At the moment the apparent complexity/maintenance burden makes me hesitate. Note customising iPython notebook reprs is easily done with the _repr_html_() method.

Starting with the smallest objects to the largest, this is the information (self.values) that would be good to have in the repr for each:

  • Field

    • self. information to include
      • grid
      • mesh
      • allow_time_extrapolation
      • time_periodic
      • gridindexingtype
      • to_write
    • notes: pretty formatting with indentation
  • VectorField

    • self. information to include
      • name
      • U
      • V
      • W
    • notes: pretty formatting with indentation
  • FieldSet

    • information to include
      • Fields (call .get_fields()). To be formatted as a list going down
  • ParticleFile

    • self. information to include
      • name
      • outputdt
      • chunks
      • create_new_zarrfile
      • (intentionally left out particleset)
    • notes: have a simple oneline repr (e.g., ParticleFile(name='my_name', outputdt='...)
  • ParticleSet

    • self. information to include
      • fieldset
      • pclass
      • repeatdt
    • Other information to include:
      • number of particles: do len(self)
      • particle information (only first x and last y particles):
        • Currently the __repr__ print out all particles, which isn’t really useful in the case of thousands of particles (which occurs frequently). Current implementation is return "\n".join([str(p) for p in self]) which can be adapted accordingly. Choose sensible defaults for x and y
    • notes: pretty formatting with indentation
  • Kernel

    • Hmmmm, not sure for the timebeing. No changes needed
  • Particle and Variable

    • @erikvansebille I saw some reprs of the format f"PType<{self.name}>::{self.variables}". Does Parcels dynamically evaluate these in the C-code? Are these worth avoiding changes to?

I would recommend working on the simple reprs first. Also at this stage we are not too concerned about user customising the repr output (like the set_options() class in xarray options.py allowing to set max_width), but having a sensible output of 80 should be good. Please rely on formatting helper functions to reduce code duplication, you can put them in tools/_formatting.py.

This isn't the be-all and end-all for the reprs. We'll be reworking Parcels internals over the coming months cleaning things up. But having a starting point to go from would be super helpfull.

If there are numpy arrays in the repr, please use the context manager with np.printoptions(threshold=5, suppress=True, linewidth=120): or some other sensible defaults (see docs) so that the output is managable.

@rehamansoor
Copy link

Thank you for the detailed breakdown, and I apologize for the delay in replying—I got a bit busy with school. I really appreciate the opportunity to contribute. While I have experience in Python through my college courses, I think these tasks might be a bit complex for me at the moment. Could you suggest an issue that might be a better fit for my current skill level? I’m eager to help and continue improving my abilities.

Thanks again, and I look forward to contributing!

@VeckoTheGecko
Copy link
Contributor

VeckoTheGecko commented Sep 30, 2024

Hi @rehamansoor . All good! We do have a few issues that would be better suited:

In #1620 with have the subtask Remove erroneous calls to os.path in cleanup_remove_files() and clean up function. This is a purely Pythonic task and, once you understand that section of code, should be quite straightforward. We also have #1701 which is just documenting adding metadata to the particlefile output (for example an entry "contact_email": "email@example.com") . There is also #1511 which is more specific to Parcels itself

@VeckoTheGecko
Copy link
Contributor

  • @erikvansebille I saw some reprs of the format f"PType<{self.name}>::{self.variables}". Does Parcels dynamically evaluate these in the C-code? Are these worth avoiding changes to?

bump

@VeckoTheGecko VeckoTheGecko changed the title Improve print for key Parcels classes Improve reprs for key Parcels classes Oct 24, 2024
VeckoTheGecko added a commit that referenced this issue Oct 24, 2024
VeckoTheGecko added a commit that referenced this issue Oct 25, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Done in Parcels development Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants