Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor property fitting interface #4471

Merged
merged 97 commits into from
Dec 25, 2024

Conversation

Chengqian-Zhang
Copy link
Collaborator

@Chengqian-Zhang Chengqian-Zhang commented Dec 13, 2024

Two main changes:

  1. The program can read data files in npy format with any prefix(tc.npy, band_gap.npy...). One just needs to write the name of the property and the corresponding dimension in model/fitting in input.json.
  2. Data normalisation has been added to the program. Specifically, the mean and standard deviation of the properties are calculated when calculating the stat, the output of fitting_net is multiplied by the standard deviation and then added to the mean to get the output of the PropertyModel, and when calculating the loss, the loss is again normalised.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new parameters for property loss calculations and model fitting, enhancing flexibility and functionality.
    • Added methods for retrieving property names and checking property intensity across various classes.
  • Bug Fixes

    • Improved validation and error handling for property-related calculations.
  • Documentation

    • Enhanced documentation for property fitting, including detailed parameter explanations and structured training examples.
    • Added documentation for new parameters in the JSON configuration related to property fitting.
  • Tests

    • Added new test cases to validate the functionality of updated methods and properties.
    • Updated existing tests to utilize dynamic property names instead of hardcoded strings.
  • Chores

    • Updated configuration files and test data to align with new property handling features.

@Chengqian-Zhang Chengqian-Zhang marked this pull request as draft December 13, 2024 10:51
Copy link
Contributor

coderabbitai bot commented Dec 13, 2024

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive enhancement to the DeepMD-kit's property fitting functionality. The changes primarily focus on introducing a dynamic property_name parameter across multiple components of the model, replacing hardcoded "property" references. The modifications enable more flexible property handling, support for intensive and extensive properties, and improved statistical computations. Key changes include updating model initialization, loss calculations, output statistics, and documentation to accommodate these new capabilities.

Changes

File Change Summary
deepmd/pt/loss/property.py Added var_name, out_bias, out_std, and intensive parameters to PropertyLoss class; updated forward method for new parameters.
deepmd/pt/model/task/property.py Introduced property_name and get_intensive method in PropertyFittingNet; updated constructor logic.
deepmd/utils/stat.py Added compute_stats_do_not_distinguish_types function for computing element-independent statistics; modified existing functions to accommodate new parameters.
deepmd/utils/argcheck.py Removed bias_method, added property_name and property_dim parameters to fitting_property and loss_property functions.
deepmd/pt/model/atomic_model/property_atomic_model.py Added get_compute_stats_distinguish_types and get_intensive methods; modified apply_out_stat logic.
deepmd/dpmodel/fitting/property_fitting.py Updated PropertyFittingNet constructor to include property_name and removed bias_method.
Multiple test files Updated configurations to use new property_name parameter; modified tests to reflect changes in property handling.
Documentation files Added comprehensive documentation for property fitting, including new JSON structures and parameter explanations.

Possibly related PRs

Suggested reviewers

  • njzjz
  • wanghan-iapcm
  • iProzd

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
deepmd/pt/utils/stat.py (1)

506-507: Simplify dictionary membership checks

Use direct dictionary membership tests for better readability and efficiency.

Apply this diff to simplify the code:

-                assert ii in bias_atom_e.keys()
-                assert ii in std_atom_e.keys()
+                assert ii in bias_atom_e
+                assert ii in std_atom_e
🧰 Tools
🪛 Ruff (0.8.2)

506-506: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


507-507: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

deepmd/utils/argcheck.py (1)

1584-1584: Provide documentation for doc_property_name

The placeholder "TODO" indicates missing documentation. Please provide a meaningful description for doc_property_name to enhance clarity.

Apply this diff to address the issue:

-        doc_property_name = "TODO"
+        doc_property_name = "The name(s) of the property(ies), which should correspond to the properties in the dataset."
deepmd/pt/model/task/property.py (1)

131-131: Consider documenting the sub_var_name parameter

The sub_var_name parameter is being passed to OutputVariableDef, but its purpose and expected values aren't documented in the class docstring.

Add documentation for the new parameter:

    """Fitting the rotationally invariant properties of `task_dim` of the system.

    Parameters
    ----------
+   property_name : str|list
+       Name of the property or list of property names.
    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9ed267 and 157f70c.

📒 Files selected for processing (7)
  • deepmd/dpmodel/output_def.py (2 hunks)
  • deepmd/pt/loss/property.py (4 hunks)
  • deepmd/pt/model/atomic_model/property_atomic_model.py (1 hunks)
  • deepmd/pt/model/task/property.py (3 hunks)
  • deepmd/pt/utils/stat.py (4 hunks)
  • deepmd/utils/argcheck.py (4 hunks)
  • deepmd/utils/out_stat.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/utils/stat.py

506-506: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


507-507: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (8)
deepmd/pt/model/atomic_model/property_atomic_model.py (1)

38-41: Application of output statistics is correctly streamlined

The updated code simplifies the application of output statistics to the atomic outputs, enhancing readability and maintainability.

deepmd/pt/loss/property.py (4)

27-28: Effective addition of property_name and property_dim parameters

Introducing property_name and property_dim enhances the flexibility to handle multiple properties with varying dimensions.


49-55: Proper handling of property names and dimensions

Converting property_name and property_dim to lists ensures that multiple properties are managed consistently. The assertion confirms that the sum of property_dim matches task_dim, maintaining integrity.


81-95: Correct concatenation and normalization of properties

The code appropriately concatenates the property labels and normalizes predictions and labels using out_bias and out_std. The assertions ensure dimensional consistency.


163-171: Dynamic label requirements for multiple properties

The label_requirement method correctly generates data requirements for each property, facilitating proper data handling during training.

deepmd/utils/argcheck.py (2)

Line range hint 1618-1623: Addition of property_name parameter

Including property_name as an argument aligns with modifications in other components, facilitating flexible property handling.


Line range hint 2486-2523: Appropriate integration of property parameters in loss configuration

The addition of property_name and property_dim parameters in the loss configuration allows for precise specification of property names and dimensions during loss computation.

deepmd/pt/model/task/property.py (1)

85-85: LGTM: Property name parameter addition

The addition of property_name parameter with a default value "property" maintains backward compatibility while adding flexibility for property naming.

Also applies to: 99-99

deepmd/pt/utils/stat.py Outdated Show resolved Hide resolved
deepmd/utils/out_stat.py Outdated Show resolved Hide resolved
deepmd/dpmodel/output_def.py Outdated Show resolved Hide resolved
deepmd/pt/utils/stat.py Fixed Show fixed Hide fixed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (6)
deepmd/pt/loss/property.py (3)

27-28: LGTM! Consider using typing.Union for type hints

The implementation correctly handles both single and multiple property configurations. The validation ensures property dimensions sum up to task_dim.

Consider using typing.Union instead of the | operator for better compatibility with older Python versions:

-        property_name: str | list = "property",
-        property_dim: int | list = 1,
+        property_name: Union[str, list] = "property",
+        property_dim: Union[int, list] = 1,

Also applies to: 49-55


81-98: Consider optimizing property concatenation and enhancing validation

The implementation correctly handles multiple properties, but there's room for improvement in validation and performance.

Consider these improvements:

  1. Validate dimensions before concatenation:
def _validate_property_shapes(self, label, nbz):
    for name, dim in self.property_name_dim_mapping.items():
        if label[name].shape != (nbz, dim):
            raise ValueError(
                f"Property {name} shape mismatch. "
                f"Expected ({nbz}, {dim}), got {label[name].shape}"
            )
  1. Use torch.cat with pre-allocated list for better performance:
concat_property = [None] * len(self.property_name)
for i, name in enumerate(self.property_name):
    concat_property[i] = label[name]
label["property"] = torch.cat(concat_property, dim=1)

105-106: Consider reducing code duplication in loss calculations

The normalization logic is correctly implemented but duplicated across different loss functions.

Consider extracting the normalization logic:

def _normalize_predictions(self, label_tensor, pred_tensor, out_bias, out_std):
    return (label_tensor - out_bias) / out_std, (pred_tensor - out_bias) / out_std

# Usage in loss functions
norm_label, norm_pred = self._normalize_predictions(
    label["property"], model_pred["property"], out_bias, out_std
)
loss += F.smooth_l1_loss(norm_label, norm_pred, reduction="sum", beta=self.beta)

Also applies to: 112-114, 118-119, 125-126

deepmd/utils/argcheck.py (3)

1584-1584: Add proper documentation for property_name parameter

The documentation for property_name is marked as TODO. Please add proper documentation explaining the purpose and usage of this parameter.


2492-2493: Add proper documentation for property parameters

Both property_name and property_dim parameters need proper documentation to explain:

  • Their purpose and usage
  • Expected value types and formats
  • Relationship with dataset properties

Line range hint 515-516: Simplify dictionary key checks

Remove unnecessary .keys() calls when checking for key existence in dictionaries.

Apply this diff:

-            assert ii in bias_atom_e.keys()
-            assert ii in std_atom_e.keys()
+            assert ii in bias_atom_e
+            assert ii in std_atom_e
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 157f70c and 6172dc4.

📒 Files selected for processing (6)
  • deepmd/pt/loss/property.py (4 hunks)
  • deepmd/pt/model/atomic_model/property_atomic_model.py (1 hunks)
  • deepmd/pt/model/task/property.py (3 hunks)
  • deepmd/pt/utils/stat.py (4 hunks)
  • deepmd/utils/argcheck.py (4 hunks)
  • deepmd/utils/out_stat.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/pt/model/task/property.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/utils/stat.py

515-515: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


516-516: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (7)
deepmd/pt/loss/property.py (1)

168-176: LGTM! Clear and correct implementation of label requirements

The implementation correctly generates requirements for each property with appropriate dimensions.

deepmd/utils/out_stat.py (1)

133-179: 🛠️ Refactor suggestion

Fix documentation and optimize implementation

The implementation has similar issues to those previously identified.

  1. Update the docstring to accurately reflect the implementation:
-    compute the least-squares solution as the atomic output bias and std.
+    compute the mean as the atomic output bias and std.
  1. Optimize the implementation using numpy's broadcasting:
-    computed_output_bias = np.repeat(
-        np.mean(output_redu, axis=0)[np.newaxis, :], ntypes, axis=0
-    )
+    computed_output_bias = np.broadcast_to(
+        np.mean(output_redu, axis=0), (ntypes, output_redu.shape[1])
+    )
  1. Consider adding input validation for assigned_bias parameter since it's currently unused:
if assigned_bias is not None:
    warnings.warn("assigned_bias parameter is not used in compute_stats_property")
deepmd/pt/model/atomic_model/property_atomic_model.py (1)

38-41: LGTM! Clean simplification of output statistics application

The implementation is now more straightforward and consistent with the changes in other files.

deepmd/pt/utils/stat.py (4)

32-32: LGTM: Import added for property statistics support

The import of compute_stats_property is properly placed and used in the property statistics computation logic.


294-298: LGTM: Property sub-variable handling added

The code correctly extends the keys list with sub-variable names from property definitions, enabling proper handling of property statistics.


495-527: LGTM: Property statistics computation implemented correctly

The implementation properly:

  • Computes property statistics using the appropriate method
  • Concatenates bias and standard deviation values
  • Handles tensor shape transformations correctly
🧰 Tools
🪛 Ruff (0.8.2)

515-515: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


516-516: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


382-391: ⚠️ Potential issue

Fix generator expression evaluation

The generator expression is used directly in a boolean context, which doesn't evaluate as intended. Wrap it with any() to properly evaluate the condition.

Apply this diff to fix the issue:

-            if (
-                "property" in atomic_output.var_defs
-                and (
-                    ii in keys for ii in atomic_output.var_defs["property"].sub_var_name
-                )
-            )
+            if (
+                "property" in atomic_output.var_defs
+                and any(
+                    ii in keys for ii in atomic_output.var_defs["property"].sub_var_name
+                )
+            )

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
deepmd/utils/argcheck.py (1)

1584-1584: Documentation needed for property_name parameter.

The TODO comment indicates missing documentation for the property_name parameter. Please add a description explaining its purpose and usage.

deepmd/utils/out_stat.py (2)

171-174: Optimize numpy operations

The current implementation can be optimized using numpy's broadcasting capabilities.

-    computed_output_bias = np.repeat(
-        np.mean(output_redu, axis=0)[np.newaxis, :], ntypes, axis=0
-    )
+    computed_output_bias = np.broadcast_to(np.mean(output_redu, axis=0), (ntypes, output_redu.shape[1]))

133-137: Document or remove unused parameter

The assigned_bias parameter is declared but never used in the function implementation.

Either document why this parameter exists but is unused, or remove it if it's not needed.

deepmd/pt/utils/stat.py (2)

294-298: Simplify sub-keys extraction

The sub-keys extraction can be written more concisely using list comprehension.

-        sub_keys = []
-        for key in keys:
-            if atomic_output.var_defs[key].sub_var_name is not None:
-                sub_keys.extend(atomic_output.var_defs[key].sub_var_name)
-        keys.extend(sub_keys)
+        keys.extend([
+            sub_key for key in keys 
+            if atomic_output.var_defs[key].sub_var_name is not None
+            for sub_key in atomic_output.var_defs[key].sub_var_name
+        ])

495-506: Simplify dictionary key checks

The code can be simplified by removing unnecessary .keys() calls.

-            if "property" in atomic_output.var_defs:
+            if "property" in atomic_output.var_defs and kk in stats_input:
                 bias_atom_e[kk], std_atom_e[kk] = compute_stats_property(
                     stats_input[kk],
                     merged_natoms[kk],
                     assigned_bias=assigned_atom_ener[kk],
                 )
-            else:
+            elif kk in stats_input:
                 bias_atom_e[kk], std_atom_e[kk] = compute_stats_from_redu(
                     stats_input[kk],
                     merged_natoms[kk],
                     assigned_bias=assigned_atom_ener[kk],
                     rcond=rcond,
                 )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 157f70c and 6172dc4.

📒 Files selected for processing (6)
  • deepmd/pt/loss/property.py (4 hunks)
  • deepmd/pt/model/atomic_model/property_atomic_model.py (1 hunks)
  • deepmd/pt/model/task/property.py (3 hunks)
  • deepmd/pt/utils/stat.py (4 hunks)
  • deepmd/utils/argcheck.py (4 hunks)
  • deepmd/utils/out_stat.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/pt/model/task/property.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/utils/stat.py

515-515: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


516-516: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (10)
deepmd/pt/model/atomic_model/property_atomic_model.py (1)

38-41: LGTM! Clean refactor of bias application logic.

The simplified implementation is more maintainable while preserving the same functionality. The direct multiplication by standard deviation followed by bias addition is cleaner than the previous conditional approach.

deepmd/pt/loss/property.py (5)

81-83: LGTM! Proper shape validation.

The shape validation ensures the property predictions match the expected dimensions.


84-92: LGTM! Well-structured property concatenation.

The implementation properly:

  1. Handles multiple properties
  2. Validates individual property dimensions
  3. Concatenates properties while maintaining their order

94-97: LGTM! Proper normalization setup.

The code correctly retrieves and validates the normalization parameters (out_std and out_bias).


105-106: LGTM! Consistent normalization across loss functions.

The normalization is consistently applied across all supported loss functions (smooth_mae, mae, mse, rmse).

Also applies to: 112-113, 118-119, 125-126


168-176: LGTM! Well-structured label requirements.

The implementation properly generates label requirements for each property with appropriate dimensions.

deepmd/utils/argcheck.py (3)

1618-1624: LGTM! Property configuration parameters.

The addition of property_name parameter with proper type checking and defaults is well-structured.


2492-2493: LGTM! Clear documentation for property parameters.

The documentation clearly explains the relationship between property names/dimensions and the dataset.


2516-2529: LGTM! Well-structured property configuration.

The implementation properly handles both single and multiple property configurations with appropriate defaults.

deepmd/utils/out_stat.py (1)

138-142: ⚠️ Potential issue

Fix incorrect docstring description

The docstring states that the function computes "the least-squares solution" but the implementation actually computes the mean value.

-    compute the least-squares solution as the atomic output bias and std.
+    compute the mean value as the atomic output bias and std.

Likely invalid or redundant comment.

deepmd/utils/out_stat.py Outdated Show resolved Hide resolved
deepmd/pt/utils/stat.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
deepmd/pt/loss/property.py (3)

Line range hint 28-56: Consider adding validation for property dimensions

While the code correctly validates that the sum of property dimensions equals task_dim, consider adding additional validation:

  1. Ensure property_dim values are positive integers
  2. Verify that property_name and property_dim lists have the same length
        if isinstance(property_dim, int):
            property_dim = [property_dim]
        self.property_name = property_name
+       if not all(isinstance(dim, int) and dim > 0 for dim in property_dim):
+           raise ValueError("All property dimensions must be positive integers")
+       if len(property_name) != len(property_dim):
+           raise ValueError("Length of property_name and property_dim must match")
        assert self.task_dim == sum(property_dim)

95-99: Consider adding shape validation for normalization parameters

The code assumes out_bias has the same shape as out_std. Add explicit validation:

        out_std = model.atomic_model.out_std[0][0]
        out_bias = model.atomic_model.out_bias[0][0]
        assert len(out_std.shape) == 1
+       assert out_bias.shape == out_std.shape, "out_bias and out_std must have the same shape"
        assert out_std.shape[0] == self.task_dim

106-127: Consider refactoring repeated normalization logic

The normalization logic (x - out_bias) / out_std is repeated across different loss functions. Consider extracting it to a helper method:

+    def _normalize(self, x):
+        """Normalize input using out_bias and out_std."""
+        return (x - self.out_bias) / self.out_std

     def forward(self, input_dict, model, label, natoms, learning_rate=0.0, mae=False):
         # ... existing code ...
         if self.loss_func == "smooth_mae":
             loss += F.smooth_l1_loss(
-                (label["property"] - out_bias) / out_std,
-                (model_pred["property"] - out_bias) / out_std,
+                self._normalize(label["property"]),
+                self._normalize(model_pred["property"]),
                 reduction="sum",
                 beta=self.beta,
             )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6172dc4 and 4c2033f.

📒 Files selected for processing (3)
  • deepmd/dpmodel/output_def.py (2 hunks)
  • deepmd/pt/loss/property.py (5 hunks)
  • deepmd/pt/model/task/property.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/dpmodel/output_def.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/task/property.py

85-85: Undefined name Union

(F821)

🔇 Additional comments (4)
deepmd/pt/model/task/property.py (2)

Line range hint 85-99: LGTM: Constructor parameter addition

The addition of property_name parameter with a sensible default value maintains backward compatibility while enabling more flexible property handling.

🧰 Tools
🪛 Ruff (0.8.2)

85-85: Undefined name Union

(F821)


131-131: LGTM: Output definition enhancement

The sub_var_name parameter is correctly utilized to pass the property name to the output definition.

deepmd/pt/loss/property.py (2)

85-94: LGTM: Property concatenation and validation

The property concatenation logic is well-implemented with proper shape validation for each property.


169-177: LGTM: Dynamic label requirements

The label requirements are correctly updated to handle multiple properties, with proper dimension mapping and settings.

deepmd/pt/model/task/property.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
deepmd/pt/loss/property.py (2)

Line range hint 30-58: Enhance type hints and validation messages

The constructor changes look good overall, but could benefit from some improvements:

  1. Consider using more specific type hints:
- property_name: Union[str, list] = "property"
- property_dim: Union[int, list] = 1
+ property_name: Union[str, List[str]] = "property"
+ property_dim: Union[int, List[int]] = 1
  1. Add more descriptive validation:
- assert self.task_dim == sum(property_dim)
+ assert self.task_dim == sum(property_dim), (
+     f"Sum of property dimensions {sum(property_dim)} does not match task_dim {self.task_dim}"
+ )

97-129: Refactor loss computation for better maintainability

Consider extracting the normalization logic and improving the loss function selection:

  1. Extract normalization to a separate method:
def _normalize_property(self, property_tensor: torch.Tensor, out_bias: torch.Tensor, out_std: torch.Tensor) -> torch.Tensor:
    assert len(out_std.shape) == 1, f"Expected 1D tensor for out_std, got shape {out_std.shape}"
    assert out_std.shape[0] == self.task_dim, (
        f"Dimension mismatch: out_std has shape {out_std.shape}, "
        f"expected ({self.task_dim},)"
    )
    return (property_tensor - out_bias) / out_std
  1. Consider using a dictionary for loss functions to improve maintainability:
LOSS_FUNCTIONS = {
    "smooth_mae": lambda x, y, beta: F.smooth_l1_loss(x, y, reduction="sum", beta=beta),
    "mae": lambda x, y, _: F.l1_loss(x, y, reduction="sum"),
    "mse": lambda x, y, _: F.mse_loss(x, y, reduction="sum"),
    "rmse": lambda x, y, _: torch.sqrt(F.mse_loss(x, y, reduction="mean")),
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c2033f and ac5fa5c.

📒 Files selected for processing (1)
  • deepmd/pt/loss/property.py (5 hunks)
🔇 Additional comments (2)
deepmd/pt/loss/property.py (2)

171-179: LGTM! Label requirements properly handle multiple properties

The implementation correctly generates requirements for each property while maintaining consistency with the must=True and high_prec=True flags.


Line range hint 1-179: Verify integration with dependent components

Please ensure that all components using PropertyLoss have been updated to handle multiple properties correctly.

✅ Verification successful

Let me gather more information about the test files and training integration to ensure proper handling of multiple properties.


Based on the gathered information, I can now provide the final response:

Multiple property support has been properly integrated

The changes to PropertyLoss are well-integrated with the rest of the codebase:

  • Training integration is clean through deepmd/pt/train/training.py
  • Test coverage exists in source/tests/pt/model/test_property_fitting.py and source/tests/pt/test_training.py
  • Property access patterns are consistently used only within the PropertyLoss class
  • The property concatenation logic is self-contained and doesn't affect other components
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage of PropertyLoss and property handling in the codebase

# Check for PropertyLoss usage
echo "Checking PropertyLoss usage..."
rg -l "PropertyLoss" --type py

# Check for property access patterns that might need updating
echo "Checking property access patterns..."
rg -A 2 "model_pred\[.property.\]|label\[.property.\]" --type py

# Check for potential property-related tests
echo "Checking property-related tests..."
rg -l "test.*property" --type py

Length of output: 3309


Script:

#!/bin/bash
# Check test implementations and training integration

# Check property-related test implementations
echo "Checking property test implementations..."
rg -A 10 "test.*property" source/tests/pt/model/test_property_fitting.py source/tests/pt/test_training.py

# Check training integration
echo "Checking training integration..."
rg -A 5 "PropertyLoss" deepmd/pt/train/training.py

# Check property model predictions
echo "Checking property model predictions..."
ast-grep --pattern 'def forward($$$) { $$$ property $$$ }'

Length of output: 2695

deepmd/pt/loss/property.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)
deepmd/pt/loss/property.py (5)

30-31: Add documentation for new parameters

The docstring is missing documentation for the new parameters property_name and property_dim.

Add the following to the Parameters section of the docstring:

        beta:
            The 'beta' parameter in 'smooth_mae' loss.
+       property_name : Union[str, list]
+           Name(s) of the property/properties to fit. Can be a single string or list of strings.
+       property_dim : Union[int, list]
+           Dimension(s) of the property/properties. Can be a single integer or list of integers.

52-58: Enhance input validation

The validation logic could be more robust to handle edge cases.

Consider adding these validations:

         if isinstance(property_name, str):
             property_name = [property_name]
         if isinstance(property_dim, int):
             property_dim = [property_dim]
+        if not property_name:
+            raise ValueError("property_name cannot be empty")
+        if not property_dim:
+            raise ValueError("property_dim cannot be empty")
+        if len(property_name) != len(property_dim):
+            raise ValueError("Length of property_name must match property_dim")
+        if any(dim <= 0 for dim in property_dim):
+            raise ValueError("All property dimensions must be positive")
         self.property_name = property_name
         assert self.task_dim == sum(property_dim)

84-95: Improve error messages and performance

The shape assertions could be more informative, and the property concatenation could be optimized.

Consider these improvements:

-        assert model_pred["property"].shape == (nbz, self.task_dim)
+        assert model_pred["property"].shape == (nbz, self.task_dim), \
+            f"Expected property shape ({nbz}, {self.task_dim}), got {model_pred['property'].shape}"

-        concat_property = []
-        for property_name in self.property_name:
-            assert label[property_name].shape == (
-                nbz,
-                self.property_name_dim_mapping[property_name],
-            )
-            concat_property.append(label[property_name])
-        label["property"] = torch.cat(concat_property, dim=1)
+        # Pre-allocate tensor for better performance
+        label["property"] = torch.empty((nbz, self.task_dim), 
+            dtype=label[self.property_name[0]].dtype,
+            device=label[self.property_name[0]].device)
+        start_idx = 0
+        for property_name in self.property_name:
+            dim = self.property_name_dim_mapping[property_name]
+            prop_tensor = label[property_name]
+            expected_shape = (nbz, dim)
+            assert prop_tensor.shape == expected_shape, \
+                f"Expected {property_name} shape {expected_shape}, got {prop_tensor.shape}"
+            label["property"][:, start_idx:start_idx + dim] = prop_tensor
+            start_idx += dim

108-129: Reduce code duplication in loss normalization

The normalization logic (x - out_bias) / out_std is repeated across different loss functions.

Consider extracting the normalization into a helper method:

+    def _normalize(self, x, out_bias, out_std):
+        return (x - out_bias) / out_std

     def forward(self, input_dict, model, label, natoms, learning_rate=0.0, mae=False):
         # ... existing code ...
         
         if self.loss_func == "smooth_mae":
             loss += F.smooth_l1_loss(
-                (label["property"] - out_bias) / out_std,
-                (model_pred["property"] - out_bias) / out_std,
+                self._normalize(label["property"], out_bias, out_std),
+                self._normalize(model_pred["property"], out_bias, out_std),
                 reduction="sum",
                 beta=self.beta,
             )

171-179: Consider pre-allocating the requirements list

The current implementation builds the list one item at a time.

Consider pre-allocating the list for better performance:

     @property
     def label_requirement(self) -> list[DataRequirementItem]:
         """Return data label requirements needed for this loss calculation."""
-        label_requirement = []
+        label_requirement = [None] * len(self.property_name)
+        for i, property_name in enumerate(self.property_name):
-        for property_name in self.property_name:
-            label_requirement.append(
+            label_requirement[i] = (
                 DataRequirementItem(
                     property_name,
                     ndof=self.property_name_dim_mapping[property_name],
                     atomic=False,
                     must=True,
                     high_prec=True,
                 )
             )
         return label_requirement
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c2033f and 69803d6.

📒 Files selected for processing (2)
  • deepmd/dpmodel/output_def.py (2 hunks)
  • deepmd/pt/loss/property.py (5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/dpmodel/output_def.py

200-200: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (2)
deepmd/dpmodel/output_def.py (1)

Line range hint 200-224: Verify integration of sub_var_name parameter

The sub_var_name parameter has been added but appears to be unused in the class methods. Please clarify:

  1. The intended purpose of this parameter
  2. How it integrates with the existing output variable handling
  3. Whether additional methods need to be updated to support sub-variables

Let's check for any existing usage or similar patterns in the codebase:

🧰 Tools
🪛 Ruff (0.8.2)

200-200: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

deepmd/pt/loss/property.py (1)

30-31: Verify backward compatibility

The new parameters have default values that should maintain backward compatibility, but we should verify this doesn't break existing models.

Let's check for existing usage patterns:

Also applies to: 52-58

✅ Verification successful

The initial results show PropertyLoss usage in training.py with simple instantiation. Let's examine the test configurations and test files to understand existing usage patterns and verify backward compatibility implications.


The configuration files show simple property loss configurations with just "type": "property" without any specific property_name or property_dim parameters, which is good for backward compatibility. Let's examine the test files to complete our verification.


We've found several test files that could show how PropertyLoss is used in tests. Let's examine their content to verify backward compatibility implications.


Based on the comprehensive test implementations and configuration files examined, I can now generate the final response:

Backward compatibility is maintained

The changes to PropertyLoss are backward compatible for the following reasons:

  • All test cases use default values for property_name and property_dim
  • Configuration files (input.json) only specify "type": "property" without any additional parameters
  • Test implementations verify various scenarios (intensive/non-intensive, different dimensions) using the default parameters
  • PropertyLoss instantiations in training.py use **loss_params without explicit property_name/property_dim
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for existing PropertyLoss instantiations that might be affected

# Search for PropertyLoss instantiations
rg -A 5 "PropertyLoss\(" 

# Search for property-related configuration files
fd -e json -e yaml | xargs rg -l "property"

Length of output: 817


Script:

#!/bin/bash
# Let's check the actual content of these configuration files and any test files
# to understand the existing usage patterns

# Check property test configurations
cat source/tests/pt/property/input.json
cat examples/property/train/input_torch.json

# Look for test files that might show usage patterns
rg -l "test.*property" --type python

Length of output: 3144


Script:

#!/bin/bash
# Let's check test files with a more reliable pattern
rg -l "test.*property" --type-add 'py:*.py' --type py

# Also check for actual property loss test implementations
fd -e py | xargs rg -l "PropertyLoss"

# And look at the actual test implementations
fd -e py -p "test.*property.*\.py" 

Length of output: 538


Script:

#!/bin/bash
# Let's examine the actual test implementations to verify usage patterns
cat source/tests/universal/pt/loss/test_loss.py
cat source/tests/pt/model/test_property_fitting.py
cat source/tests/consistent/fitting/test_property.py

# Also check the training implementation for any specific property handling
rg -A 5 "property.*loss" source/tests/pt/test_training.py

Length of output: 24056

deepmd/dpmodel/output_def.py Outdated Show resolved Hide resolved
deepmd/pt/utils/stat.py Fixed Show fixed Hide fixed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
deepmd/dpmodel/fitting/property_fitting.py (1)

78-78: Document the new property_name parameter

The new parameter needs documentation in the class docstring to explain its purpose and accepted values.

Add this to the class docstring parameters section:

    Parameters
    ----------
+   property_name : Union[str, list], optional
+       Name(s) of the property. Can be a single string or list of strings. Defaults to "property".
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5be3457 and 5ce6d31.

📒 Files selected for processing (3)
  • deepmd/dpmodel/fitting/property_fitting.py (2 hunks)
  • deepmd/dpmodel/output_def.py (3 hunks)
  • deepmd/pt/utils/stat.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/utils/stat.py

516-516: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


517-517: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (3)
deepmd/dpmodel/fitting/property_fitting.py (1)

94-94: LGTM: Property name assignment

The property name is correctly assigned to the instance variable.

deepmd/pt/utils/stat.py (2)

512-528: 🛠️ Refactor suggestion

Optimize property concatenation logic

The property concatenation logic can be improved for better efficiency and safety.

Apply this diff to optimize the code:

    if "property" in atomic_output.var_defs:
        concat_bias = []
        concat_std = []
        for ii in atomic_output.var_defs["property"].sub_var_name:
-            assert ii in bias_atom_e.keys()
-            assert ii in std_atom_e.keys()
+            if ii not in bias_atom_e or ii not in std_atom_e:
+                raise ValueError(f"Missing property {ii} in bias or std arrays")
+            if not concat_bias or concat_bias[-1].shape[0] == bias_atom_e[ii].shape[0]:
+                concat_bias.append(bias_atom_e[ii])
+                concat_std.append(std_atom_e[ii])
+            else:
+                raise ValueError(f"Inconsistent shapes in property arrays for {ii}")
-        del bias_atom_e, std_atom_e
-        bias_atom_e = {}
-        std_atom_e = {}
+        bias_atom_e = {"property": np.concatenate(concat_bias, axis=-1)}
+        std_atom_e = {"property": np.tile(
+            np.concatenate(concat_std, axis=-1),
+            (bias_atom_e["property"].shape[0], 1)
+        )}
-        bias_atom_e["property"] = np.concatenate(concat_bias, axis=-1)
-        std_atom_e["property"] = np.concatenate(concat_std, axis=-1)
-        std_atom_e["property"] = np.tile(
-            std_atom_e["property"], (bias_atom_e["property"].shape[0], 1)
-        )

The changes:

  1. Replace assertions with proper error handling
  2. Add shape validation
  3. Optimize dictionary operations
  4. Improve memory usage by avoiding intermediate variables

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

516-516: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


517-517: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


383-392: ⚠️ Potential issue

Fix generator expression evaluation

The condition using a generator expression is incorrect and won't evaluate as intended.

Apply this diff to fix the issue:

    keys = (
        ["property"]
        if (
            "property" in atomic_output.var_defs
            and (
-                ii in keys for ii in atomic_output.var_defs["property"].sub_var_name
+                any(ii in keys for ii in atomic_output.var_defs["property"].sub_var_name)
            )
        )
        else keys
    )

Likely invalid or redundant comment.

deepmd/dpmodel/output_def.py Outdated Show resolved Hide resolved
deepmd/pt/utils/stat.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
deepmd/dpmodel/output_def.py (2)

6-8: Consider using modern Python type hints

For better readability and to align with modern Python practices, consider using the union operator (|) instead of Union if you're targeting Python 3.10+:

-from typing import (
-    Union,
-)

And update the type hint:

-sub_var_name: Union[list[str], str, None] = None,
+sub_var_name: list[str] | str | None = None,

Line range hint 203-229: Add tests for sub_var_name functionality

The new sub_var_name parameter needs test coverage to ensure it works correctly with:

  1. String input
  2. List of strings input
  3. None input
  4. Invalid inputs (validation)

Would you like me to help create test cases for this new functionality?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ce6d31 and 5a3bf94.

📒 Files selected for processing (1)
  • deepmd/dpmodel/output_def.py (3 hunks)
🔇 Additional comments (2)
deepmd/dpmodel/output_def.py (2)

203-203: ⚠️ Potential issue

Add parameter documentation

The sub_var_name parameter needs to be documented in the class docstring. Add the following to the Parameters section:

    Parameters
    ----------
+   sub_var_name : Union[list[str], str, None], optional
+       List of sub-variable names or a single sub-variable name for the output variable.
+       If a string is provided, it will be converted to a single-item list.
+       Defaults to None.

Likely invalid or redundant comment.


227-229: ⚠️ Potential issue

Add type validation for sub_var_name

The current implementation lacks type validation, which could lead to runtime errors. Add validation to ensure the input is either a string, list of strings, or None:

         if isinstance(sub_var_name, str):
             sub_var_name = [sub_var_name]
+        if sub_var_name is not None and not (
+            isinstance(sub_var_name, list) 
+            and all(isinstance(x, str) for x in sub_var_name)
+        ):
+            raise ValueError(
+                "sub_var_name must be a list of strings, a single string, or None"
+            )
         self.sub_var_name = sub_var_name

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (2)
deepmd/pt/loss/property.py (1)

59-60: Simplify kwargs.get() calls

The None default is redundant as it's the default behavior of dict.get().

-self.out_bias = kwargs.get("out_bias", None)
-self.out_std = kwargs.get("out_std", None)
+self.out_bias = kwargs.get("out_bias")
+self.out_std = kwargs.get("out_std")
🧰 Tools
🪛 Ruff (0.8.2)

59-59: Use kwargs.get("out_bias") instead of kwargs.get("out_bias", None)

Replace kwargs.get("out_bias", None) with kwargs.get("out_bias")

(SIM910)


60-60: Use kwargs.get("out_std") instead of kwargs.get("out_std", None)

Replace kwargs.get("out_std", None) with kwargs.get("out_std")

(SIM910)

source/tests/pt/property/input.json (1)

30-30: Add comments to document property configuration

Consider adding comments to explain the purpose and requirements of the new property configuration parameters.

 "fitting_net": {
   "type": "property",
   "intensive": true,
+  "_comment": "property_name defines the name of the property to be fitted",
   "property_name": "band_property",
   "task_dim": 3,
   ...
 },
 "loss": {
   "type": "property",
+  "_comment": "property_name and property_dim define the property to be fitted and its dimensionality",
   "property_name": "band_property",
   "property_dim": [3],
   ...
 }

Also applies to: 52-53

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a3bf94 and ff4650e.

📒 Files selected for processing (8)
  • checkpoint (1 hunks)
  • deepmd/pt/loss/property.py (5 hunks)
  • source/tests/pt/model/test_permutation.py (1 hunks)
  • source/tests/pt/property/double/type.raw (1 hunks)
  • source/tests/pt/property/double/type_map.raw (1 hunks)
  • source/tests/pt/property/input.json (2 hunks)
  • source/tests/pt/test_training.py (1 hunks)
  • source/tests/universal/dpmodel/loss/test_loss.py (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • source/tests/pt/property/double/type.raw
  • checkpoint
  • source/tests/pt/property/double/type_map.raw
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/loss/property.py

59-59: Use kwargs.get("out_bias") instead of kwargs.get("out_bias", None)

Replace kwargs.get("out_bias", None) with kwargs.get("out_bias")

(SIM910)


60-60: Use kwargs.get("out_std") instead of kwargs.get("out_std", None)

Replace kwargs.get("out_std", None) with kwargs.get("out_std")

(SIM910)

🔇 Additional comments (5)
deepmd/pt/loss/property.py (3)

3-5: LGTM: Type hints and new parameters are well-defined

The addition of Union type hints and new parameters property_name and property_dim improves the interface flexibility.

Also applies to: 30-31


86-98: Improve error handling in property concatenation

The previous review comment about improving error handling is still valid. Consider implementing the suggested validation method for better robustness.


173-181: LGTM: Label requirements are properly updated

The dynamic generation of label requirements based on property names is well-implemented.

source/tests/pt/test_training.py (1)

467-467: Verify test data compatibility with the new property interface.

The change from "single" to "double" suggests testing with more complex property data. Ensure that the test data at the new path exists and is formatted correctly for the refactored property interface.

✅ Verification successful

Based on the output, I can see that the "property/double" directory exists and contains numerous NumPy array files with various shapes and data types. The test data appears to be well-structured with consistent formats, containing both float (f4, f8) and integer (i8) arrays of different dimensions that are commonly used in property testing.

The presence of arrays with shapes like (2, 441), (2, 9), (2, 1), etc., suggests that the data is properly formatted for the property interface, with paired inputs and outputs. The data types and dimensions are consistent with what would be expected for property testing, including feature vectors, labels, and property-related data.

Test data is properly structured and compatible with the property interface.

The change from "single" to "double" appears to be well-supported by the existing test data files, which show appropriate dimensionality and data types for property testing.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence and structure of property test data
# Check if the test data directory exists and examine its contents
fd -t d "property/double$" 
fd -t f . -e json -e npy --exec head -n 1 {} \; 2>/dev/null

Length of output: 34323

source/tests/pt/model/test_permutation.py (1)

334-334: Verify integration with property interface changes

The addition of property_name should be verified against the broader property interface refactor.

Let's verify the property interface implementation:

✅ Verification successful

Let me gather more information about the property interface and its implementation.


Based on the gathered information, I can now provide the final response:

Property interface implementation is consistent

The band_property usage in the test file is consistent with the broader property interface implementation. The codebase shows a well-structured property model system with:

  • Proper inheritance hierarchy through PropertyModel, PropertyFittingNet, and related classes
  • Consistent implementation across different backends (PyTorch, JAX)
  • Test coverage for property-related functionality
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for property interface implementation and usage

# Check for property interface implementation
echo "Checking property interface implementation..."
rg -l "property_name.*band_property" --type py

# Check for property model usage
echo "Checking property model usage..."
ast-grep --pattern 'class $name implements PropertyModel {
  $$$
}'

# Check for test coverage
echo "Checking test coverage..."
rg -l "test.*property" --type py

Length of output: 571


Script:

#!/bin/bash
# Check the actual implementation and usage of property_name and band_property
echo "Checking property_name implementation..."
rg -A 5 "property_name.*=.*band_property" --type py

echo "\nChecking band_property usage..."
rg -A 5 "band_property" --type py

echo "\nChecking property model definitions..."
rg -A 10 "class.*Property" --type py

Length of output: 21097

deepmd/pt/loss/property.py Outdated Show resolved Hide resolved
deepmd/pt/loss/property.py Outdated Show resolved Hide resolved
source/tests/universal/dpmodel/loss/test_loss.py Outdated Show resolved Hide resolved
source/tests/pt/model/test_permutation.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
deepmd/infer/deep_property.py (2)

49-74: Add error handling for edge cases

While the implementation is correct, consider adding error handling for potential edge cases:

  1. Validate that get_var_name() returns a non-empty string
  2. Handle potential exceptions during output definition creation
 def change_output_def(self) -> None:
+    var_name = self.get_var_name()
+    if not var_name:
+        raise ValueError("Variable name cannot be empty")
+    try:
         self.output_def = ModelOutputDef(
             FittingOutputDef(
                 [
                     OutputVariableDef(
-                        self.get_var_name(),
+                        var_name,
                         shape=[self.get_task_dim()],
                         reducible=True,
                         atomic=True,
                         intensive=self.get_intensive(),
                     ),
                 ]
             )
         )
+    except Exception as e:
+        raise RuntimeError(f"Failed to create output definition: {e}")

139-144: Add shape validation before reshaping

Consider adding shape validation before reshaping to provide clearer error messages:

+    expected_shape = (-1, -1, self.get_task_dim())
+    if atomic_property.ndim != 3 or atomic_property.shape[2] != expected_shape[2]:
+        raise ValueError(
+            f"Expected atomic_property shape (*,*,{self.get_task_dim()}), "
+            f"got {atomic_property.shape}"
+        )
     atomic_property = results[self.get_var_name()].reshape(
         nframes, natoms, self.get_task_dim()
     )
deepmd/pt/loss/property.py (1)

Line range hint 129-181: Standardize reduction methods across loss calculations

The code uses different reduction methods (sum vs mean) across different loss calculations, which could affect loss scaling. Consider standardizing to mean reduction for consistency:

-                reduction="sum",
+                reduction="mean",

This change should be applied consistently across all loss calculations to ensure proper scaling.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e428a7 and f7cf9e1.

📒 Files selected for processing (13)
  • deepmd/dpmodel/fitting/property_fitting.py (5 hunks)
  • deepmd/entrypoints/test.py (4 hunks)
  • deepmd/infer/deep_eval.py (1 hunks)
  • deepmd/infer/deep_property.py (3 hunks)
  • deepmd/pt/infer/deep_eval.py (2 hunks)
  • deepmd/pt/loss/property.py (6 hunks)
  • deepmd/pt/model/model/property_model.py (4 hunks)
  • deepmd/pt/model/task/property.py (5 hunks)
  • deepmd/pt/train/training.py (1 hunks)
  • source/tests/consistent/fitting/test_property.py (5 hunks)
  • source/tests/pt/model/test_property_fitting.py (11 hunks)
  • source/tests/pt/test_dp_test.py (1 hunks)
  • source/tests/universal/dpmodel/loss/test_loss.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • source/tests/pt/test_dp_test.py
  • source/tests/universal/dpmodel/loss/test_loss.py
  • source/tests/consistent/fitting/test_property.py
  • deepmd/entrypoints/test.py
  • deepmd/infer/deep_eval.py
  • deepmd/pt/infer/deep_eval.py
  • deepmd/pt/model/task/property.py
  • source/tests/pt/model/test_property_fitting.py
  • deepmd/dpmodel/fitting/property_fitting.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/loss/property.py

29-29: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (6)
deepmd/pt/model/model/property_model.py (3)

40-41: LGTM: Dynamic property name handling implemented correctly

The change from static "property" keys to dynamic keys using get_var_name() improves flexibility while maintaining consistency.


65-66: LGTM: Consistent dynamic property handling in forward method

The changes maintain consistency with the output definition changes, properly using dynamic property names.


81-84: LGTM: Well-documented getter method

The get_var_name method is clearly documented and provides essential functionality for dynamic property handling.

deepmd/infer/deep_property.py (1)

41-48: LGTM: Clear documentation of design choice

The placeholder implementation with detailed documentation clearly explains why the output definition is deferred until model loading.

deepmd/pt/loss/property.py (1)

97-99: ⚠️ Potential issue

Add validation for natoms to prevent division by zero

When dividing by natoms for non-intensive properties, add validation:

     if not self.intensive:
+        if torch.any(natoms == 0):
+            raise ValueError("natoms contains zero values, which would cause division by zero")
         model_pred[var_name] = model_pred[var_name] / natoms
         label[var_name] = label[var_name] / natoms

Likely invalid or redundant comment.

deepmd/pt/train/training.py (1)

1243-1247: LGTM! Verify model interface implementation.

The addition of var_name and intensive parameters enhances the property loss configuration, aligning with the PR's goal of flexible property fitting. The changes look correct and well-structured.

Let's verify that the required model interface methods are implemented:

✅ Verification successful

Based on the search results, I can see that both get_var_name() and get_intensive() methods are properly implemented in multiple model classes, particularly in the property model classes that are relevant to this PR:

  1. In deepmd/pt/model/model/property_model.py:
def get_var_name(self) -> str:
    """Get the name of the property."""
    return self.get_fitting_net().var_name

def get_intensive(self) -> bool:
    """Get whether the property is intensive."""
    return self.model_output_def()[self.get_var_name()].intensive
  1. In deepmd/pt/model/task/property.py:
def get_intensive(self) -> bool:
    """Whether the fitting property is intensive."""
    return self.intensive

The methods are properly implemented in the model hierarchy and provide the required functionality for property fitting. Therefore, I can now provide the final response:

LGTM! Required model interface methods are properly implemented.

The get_var_name() and get_intensive() methods are correctly implemented in the property model classes, providing the necessary interface for property loss configuration. The implementation properly propagates the property name and type information through the model hierarchy.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the model interface methods are implemented
# Test: Search for the getter method implementations
ast-grep --pattern 'def get_var_name($$$) {
  $$$
}'

ast-grep --pattern 'def get_intensive($$$) {
  $$$
}'

Length of output: 110


Script:

#!/bin/bash
# Let's try a different approach using ripgrep to search for these method definitions
# The methods might be defined with different indentation or style

# Search for get_var_name method
echo "Searching for get_var_name:"
rg "def get_var_name" -A 5

# Search for get_intensive method
echo -e "\nSearching for get_intensive:"
rg "def get_intensive" -A 5

# Also search for class definitions that might contain these methods
echo -e "\nSearching for relevant model classes:"
rg "class.*Model" -A 10

Length of output: 66402

deepmd/pt/loss/property.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7cf9e1 and ce24784.

📒 Files selected for processing (2)
  • deepmd/infer/deep_eval.py (1 hunks)
  • deepmd/pt/model/model/property_model.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/infer/deep_eval.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/model/property_model.py

84-84: Undefined name obj

(F821)


84-84: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


84-84: Undefined name obj

(F821)

🔇 Additional comments (5)
deepmd/pt/model/model/property_model.py (5)

40-41: LGTM: Dynamic property naming implementation looks good!

The changes correctly implement dynamic property naming by using get_var_name() to construct output definition keys, which aligns with the PR's objective of supporting flexible property names.


65-66: LGTM: Consistent implementation of dynamic property naming!

The changes maintain consistency with the translated_output_def method and correctly implement dynamic property naming in the forward pass.


79-79: LGTM: Correctly adapted intensive property access!

The method properly uses the dynamic property name to access the intensive flag from the model output definition.


113-114: LGTM: Consistent implementation in lower-level forward pass!

The changes maintain consistency with other methods in implementing dynamic property naming for the lower-level forward pass.


Line range hint 1-117: Verify integration with normalization features

While the dynamic property naming is well implemented, let's verify the integration with the normalization features mentioned in the PR objectives.

✅ Verification successful

Let me search more specifically for normalization in the property model implementation.


Let me check one more specific aspect about the property model's normalization implementation.


Property model correctly integrates with normalization features

Based on the verification results, the property model properly integrates with normalization features through:

  1. Proper handling of statistical normalization in DPPropertyAtomicModel:

    • Uses out_bias and out_std for normalization
    • Applies normalization through ret[kk] = ret[kk] * out_std[kk][0] + out_bias[kk][0]
  2. Support for property-specific statistics:

    • Handles bias and standard deviation per property type
    • Properly manages intensive vs. extensive properties via model_output_def()
  3. Integration with the fitting framework:

    • Supports type-specific normalization through type_map
    • Handles vacuum contribution removal when needed
    • Properly manages statistical parameters through the fitting network
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for normalization-related code to ensure proper integration
echo "Searching for normalization implementation..."
rg -l "mean|std|norm" --type py

# Look for statistical calculations
echo "Checking for statistical calculations..."
ast-grep --pattern 'def $_(self, $$$) {
  $$$
  mean = $$$
  std = $$$
  $$$
}'

Length of output: 11730


Script:

#!/bin/bash
# Check for normalization in property model and related files
echo "Searching for normalization in property model..."
rg -A 5 "norm|mean|std" deepmd/pt/model/atomic_model/property_atomic_model.py deepmd/pt/model/model/property_model.py

# Look for statistical handling in property fitting
echo "Checking property fitting implementation..."
rg -A 5 "fitting|stat" deepmd/pt/model/task/fitting.py

Length of output: 4814


Script:

#!/bin/bash
# Check for output definition and normalization handling
echo "Checking output definition handling..."
rg -A 10 "model_output_def" deepmd/pt/model/model/property_model.py

# Look for base implementation of normalization
echo "Checking base model implementation..."
ast-grep --pattern 'class DPPropertyAtomicModel($$$) {
  $$$
}'

Length of output: 1084

🧰 Tools
🪛 Ruff (0.8.2)

84-84: Undefined name obj

(F821)


84-84: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


84-84: Undefined name obj

(F821)

deepmd/pt/model/model/property_model.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deepmd/pt/infer/deep_eval.py (1)

187-195: Simplify attribute check in get_var_name

The current implementation uses unnecessary getattr. Simplify the check.

Apply this change:

     def get_var_name(self) -> str:
         """Get the name of the property."""
-        if hasattr(self.dp.model["Default"], "get_var_name") and callable(
-            getattr(self.dp.model["Default"], "get_var_name")
-        ):
+        if hasattr(self.dp.model["Default"], "get_var_name"):
             return self.dp.model["Default"].get_var_name()
-        else:
-            raise NotImplementedError
+        raise NotImplementedError
🧰 Tools
🪛 Ruff (0.8.2)

190-190: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce24784 and b473b6a.

📒 Files selected for processing (4)
  • deepmd/dpmodel/fitting/property_fitting.py (5 hunks)
  • deepmd/pt/infer/deep_eval.py (2 hunks)
  • deepmd/pt/model/model/property_model.py (4 hunks)
  • deepmd/pt/model/task/property.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/dpmodel/fitting/property_fitting.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/infer/deep_eval.py

190-190: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)

🔇 Additional comments (8)
deepmd/pt/model/model/property_model.py (4)

40-41: LGTM: Dynamic property name handling in output definition

The changes correctly use get_var_name() to dynamically construct output keys, improving flexibility.


65-66: LGTM: Consistent property name handling in forward method

The forward method properly uses get_var_name() to maintain consistency with the output definition.


79-80: LGTM: Property intensity check uses dynamic property name

The get_intensive method correctly uses get_var_name() to check the property's intensive status.


81-84: LGTM: Clean implementation of get_var_name

Simple and straightforward implementation that delegates to the fitting network.

deepmd/pt/model/task/property.py (3)

128-130: LGTM: Clear implementation of get_intensive

The method provides a clean way to check if the property is intensive.


46-49: ⚠️ Potential issue

Add validation for property_name parameter

The property_name parameter needs validation to prevent security issues when loading files.

Add this validation at the beginning of init:

     def __init__(
         self,
         ntypes: int,
         dim_descrpt: int,
         property_name: str,
         task_dim: int = 1,
         ...
     ) -> None:
+        if not isinstance(property_name, str) or not property_name.strip():
+            raise ValueError("property_name must be a non-empty string")
+        if any(c in property_name for c in '/\\:*?"<>|') or '..' in property_name:
+            raise ValueError("property_name contains invalid characters")
         self.task_dim = task_dim

Likely invalid or redundant comment.


135-135: Verify version compatibility changes

The version has been bumped from 3 to 4 and serialization format updated.

Run this script to check version compatibility:

Also applies to: 137-137, 148-148, 150-150

✅ Verification successful

Version compatibility changes are safe

Based on the codebase analysis, the version bump from 3 to 4 in deepmd/pt/model/task/property.py is a safe change:

  1. The version compatibility check is properly implemented using check_version_compatibility() function that ensures backward compatibility
  2. The version bump is consistently applied across related files:
    • deepmd/pt/model/task/property.py (version 4)
    • deepmd/dpmodel/fitting/property_fitting.py (version 4)
    • deepmd/pt/model/task/polarizability.py (version 4)
    • deepmd/dpmodel/fitting/polarizability_fitting.py (version 4)
  3. No existing models with older versions (1-3) were found in the codebase that would be affected by this change
  4. The version bump follows the established pattern used throughout the codebase for managing model versioning
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any existing models that might break with the new version
rg -l '"@version":\s*[123]' --type json

Length of output: 41


Script:

#!/bin/bash
# Let's check for version compatibility checks in the codebase
rg -A 2 "check_version_compatibility" 

# Also search for version-related constants or variables
rg "SUPPORTED_VERSION|MODEL_VERSION|@version" 

Length of output: 66887

deepmd/pt/infer/deep_eval.py (1)

212-212: LGTM: Model type check uses dynamic property name

The model_type property correctly uses get_var_name() for consistency.

doc/model/train-fitting-property.md Outdated Show resolved Hide resolved
Delete `fparam` in doc train-fitting-property.md

Signed-off-by: Chenqqian Zhang <100290172+Chengqian-Zhang@users.noreply.github.com>
Change json format

Signed-off-by: Chenqqian Zhang <100290172+Chengqian-Zhang@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
doc/model/train-fitting-property.md (6)

7-8: Enhance the introduction with specific examples

Consider expanding the introduction with concrete examples of properties that have been successfully fitted using this model, along with typical use cases. This would help users better understand the model's capabilities and limitations.


25-35: Fix JSON formatting and enhance configuration example

  1. Replace hard tabs with spaces for consistent formatting.
  2. Consider adding validation rules for each parameter (e.g., valid ranges for task_dim, required neuron array length).
 "fitting_net" : {
-	"type": "property",
-        "intensive": true,
-        "property_name": "band_prop",
-        "task_dim": 3,
-	"neuron": [240,240,240],
-	"resnet_dt": true,
-	"seed": 1,
+    "type": "property",
+    "intensive": true,
+    "property_name": "band_prop",
+    "task_dim": 3,
+    "neuron": [240, 240, 240],
+    "resnet_dt": true,
+    "seed": 1,
 },
🧰 Tools
🪛 Markdownlint (0.37.0)

27-27: Column: 1
Hard tabs

(MD010, no-hard-tabs)


31-31: Column: 1
Hard tabs

(MD010, no-hard-tabs)


32-32: Column: 1
Hard tabs

(MD010, no-hard-tabs)


33-33: Column: 1
Hard tabs

(MD010, no-hard-tabs)


49-55: Fix JSON formatting and enhance loss function documentation

  1. Replace hard tabs with spaces.
  2. Consider adding examples and explanations for each loss function, particularly the differences between 'smooth_mae' and 'mae'.
 "loss" : {
-	"type": "property",
-	"metric": ["mae"],
-	"loss_func": "smooth_mae"
+    "type": "property",
+    "metric": ["mae"],
+    "loss_func": "smooth_mae"
 },
🧰 Tools
🪛 Markdownlint (0.37.0)

51-51: Column: 1
Hard tabs

(MD010, no-hard-tabs)


52-52: Column: 1
Hard tabs

(MD010, no-hard-tabs)


53-53: Column: 1
Hard tabs

(MD010, no-hard-tabs)


67-115: Enhance data preparation example

  1. Add error handling examples for common scenarios (e.g., missing files, incorrect dimensions).
  2. Consider adding validation for property values.
  3. Include examples of handling multiple properties simultaneously.

Here's an enhanced version of the code with error handling:

import dpdata
import numpy as np
from dpdata.data_type import Axis, DataType

def register_property(property_name: str, task_dim: int):
    try:
        # Validate inputs
        if not isinstance(task_dim, int) or task_dim <= 0:
            raise ValueError(f"task_dim must be a positive integer, got {task_dim}")
        
        datatypes = [
            DataType(
                property_name,
                np.ndarray,
                shape=(Axis.NFRAMES, task_dim),
                required=False,
            ),
        ]
        # ... rest of the registration code ...
        
        return True
    except Exception as e:
        print(f"Error registering property: {e}")
        return False

def prepare_training_data(poscar_path: str, property_values: np.ndarray):
    try:
        # Validate property values
        if property_values.shape[1] != task_dim:
            raise ValueError(
                f"Property values must have shape (n, {task_dim}), "
                f"got {property_values.shape}"
            )
        
        ls = dpdata.MultiSystems()
        frame = dpdata.System(poscar_path, fmt="vasp/poscar")
        # ... rest of the preparation code ...
        
        return ls
    except FileNotFoundError:
        print(f"POSCAR file not found: {poscar_path}")
    except Exception as e:
        print(f"Error preparing training data: {e}")
    return None

135-149: Add language specification to code blocks and enhance output examples

  1. Add language specification to the output example:
-```
+```text
 # step        mae_val     mae_trn   lr
  1. Consider adding examples of:
    • How to interpret the metrics
    • Common convergence patterns
    • Warning signs of overfitting/underfitting
🧰 Tools
🪛 Markdownlint (0.37.0)

135-135: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


185-194: Enhance data normalization documentation

Consider adding:

  1. Practical examples showing how normalization affects the training process
  2. Guidelines for when normalization is particularly important
  3. Code examples showing how to access and interpret the normalization parameters

Example addition:

# Example of accessing normalization parameters
import numpy as np

def analyze_normalization(property_values):
    mean = np.mean(property_values)
    std = np.std(property_values)
    print(f"Property statistics:")
    print(f"- Mean: {mean:.6f}")
    print(f"- Std: {std:.6f}")
    print(f"- Range: [{np.min(property_values):.6f}, {np.max(property_values):.6f}]")
    
    # Guidelines
    if std < 1e-6:
        print("Warning: Very small standard deviation might cause numerical issues")
    if abs(mean) > 1e3:
        print("Warning: Large mean values might need pre-normalization")
🧰 Tools
🪛 LanguageTool

[uncategorized] ~193-~193: Loose punctuation mark.
Context: ...(D_i))= MLP(D_i) * std+ mean$$ 1. std: The standard deviation of the property ...

(UNLIKELY_OPENING_PUNCTUATION)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b473b6a and 6638ba6.

📒 Files selected for processing (1)
  • doc/model/train-fitting-property.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
doc/model/train-fitting-property.md

[uncategorized] ~58-~58: Loose punctuation mark.
Context: ...distinction from ener mode. - metric: The metric for display, which will be p...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~59-~59: Loose punctuation mark.
Context: ..., 'mae', 'mse' and 'rmse'. - loss_func: The loss function to minimize, you can ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~193-~193: Loose punctuation mark.
Context: ...(D_i))= MLP(D_i) * std+ mean$$ 1. std: The standard deviation of the property ...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Markdownlint (0.37.0)
doc/model/train-fitting-property.md

27-27: Column: 1
Hard tabs

(MD010, no-hard-tabs)


31-31: Column: 1
Hard tabs

(MD010, no-hard-tabs)


32-32: Column: 1
Hard tabs

(MD010, no-hard-tabs)


33-33: Column: 1
Hard tabs

(MD010, no-hard-tabs)


51-51: Column: 1
Hard tabs

(MD010, no-hard-tabs)


52-52: Column: 1
Hard tabs

(MD010, no-hard-tabs)


53-53: Column: 1
Hard tabs

(MD010, no-hard-tabs)


135-135: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


172-172: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


178-178: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Dec 25, 2024
Merged via the queue into deepmodeling:devel with commit f9f1759 Dec 25, 2024
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants