Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default values and missing value handling to agentset.get #2279

Merged
merged 31 commits into from
Sep 6, 2024

Conversation

quaquel
Copy link
Member

@quaquel quaquel commented Sep 4, 2024

Currently, agentset.get raises an attribute error if the attribute does not exist on an agent.

This PR enhances the agentset.get method by introducing two new parameters: handle_missing and default_value. These additions give more control over how missing attributes are handled.

  • handle_missing: Determines the behavior when an attribute is missing. Options are:

    • 'error' (default): Raises an AttributeError.
    • 'default': Returns the specified default_value if the attribute is missing.
  • default_value: Used when handle_missing='default', providing a fallback value when an attribute is missing.

Usage examples:

# Normal usage, raises AttributeError if "unknown_attribute" doesn't exist
agentset.get("unknown_attribute")

# Use a default value for missing attributes
agentset.get("unknown_attribute", handle_missing="default", default_value=True)

This PR supersedes #2067 and closes #2045.

@quaquel quaquel requested a review from EwoutH September 4, 2024 13:13
@quaquel quaquel added the enhancement Release notes label label Sep 4, 2024
Copy link

github-actions bot commented Sep 4, 2024

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 -3.4% [-4.4%, -2.3%] 🔵 -0.4% [-0.9%, -0.0%]
BoltzmannWealth large 🔵 -0.4% [-2.1%, +1.2%] 🔵 +3.9% [-0.1%, +7.5%]
Schelling small 🔵 +0.2% [-0.7%, +1.2%] 🔵 -0.3% [-1.7%, +0.9%]
Schelling large 🔵 -1.0% [-3.6%, +0.8%] 🔵 -3.1% [-7.6%, +0.3%]
WolfSheep small 🔵 -0.8% [-2.3%, +0.8%] 🔵 -2.9% [-4.6%, -1.2%]
WolfSheep large 🔵 +3.0% [-0.3%, +6.6%] 🔴 +9.4% [+4.0%, +15.5%]
BoidFlockers small 🔵 +0.8% [-0.2%, +2.3%] 🔵 -2.7% [-3.4%, -2.0%]
BoidFlockers large 🔵 -0.4% [-1.5%, +0.5%] 🟢 -4.2% [-4.9%, -3.5%]

mesa/agent.py Outdated
@@ -318,12 +318,16 @@ def agg(self, attribute: str, func: Callable) -> Any:
values = self.get(attribute)
return func(values)

def get(self, attr_names: str | list[str]) -> list[Any]:
def get(self, *args) -> list[Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me guess, default=None wouldn't work, because then you can't assign None as default value. As would True, False, etc.

What I don't like about this is that now type hinting and IDE lookups are gone. What about

Suggested change
def get(self, *args) -> list[Any]:
def get(self, attr_names: str | list[str], default: Any = "throw_error") -> list[Any]:

Can't imagine someone wants the exact string "throw_error". For any other value than that you can assign it as the default.

Copy link
Member

@EwoutH EwoutH Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks promising:

# Sentinel object to detect when no default value is provided
NO_DEFAULT = object()

class AgentSet:
    def __init__(self, agents):
        self._agents = agents

    def get(self, attr_names: Union[str, List[str]], default: Any = NO_DEFAULT) -> Union[List[Any], List[List[Any]]]:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are correct. I'll take a closer look at your alternative suggestions.

@EwoutH
Copy link
Member

EwoutH commented Sep 4, 2024

But I can also see the scenario where you just don't want to collect it. Just skip it.

So I think we need a third parameter, something like handle_missing, which can be "error", "skip", and "default".

You would get something like this:

    def get(self, attr_names: Union[str, List[str]], handle_missing: str = "error", default: Any = None) -> Union[List[Any], List[List[Any]]]:
        """
        Retrieve the specified attribute(s) from each agent in the AgentSet.

        Args:
            attr_names (str | list[str]): The name(s) of the attribute(s) to retrieve from each agent.
            handle_missing (str, optional): How to handle missing attributes. Can be 'error' (default), 
                                            'skip' to ignore missing attributes, or 'default' to return 
                                            the specified default value.
            default (Any, optional): The default value to return if 'handle_missing' is set to 'default'
                                     and the agent does not have the attribute.

        Returns:
            list[Any]: A list of attribute values for each agent if attr_names is a str.
            list[list[Any]]: A list of lists of attribute values for each agent if attr_names is a list of str.

        Raises:
            AttributeError: If 'handle_missing' is 'error' and the agent does not have the specified attribute(s).
        """

        def get_attr(agent, attr_name: str) -> Any:
            """ Helper function to get the attribute based on the handle_missing policy """
            try:
                return getattr(agent, attr_name)
            except AttributeError:
                if handle_missing == "error":
                    raise
                elif handle_missing == "default":
                    return default
                elif handle_missing == "skip":
                    return None  # Will be filtered out later
                else:
                    raise ValueError(f"Unknown handle_missing option: {handle_missing}")

        if isinstance(attr_names, str):
            # Single attribute, return a flat list, handling 'skip' by filtering out None values
            return [value for value in (get_attr(agent, attr_names) for agent in self._agents) if value is not None or handle_missing != "skip"]
        else:
            # Multiple attributes, return a list of lists, handling 'skip' by filtering out lists with None
            result = [
                [get_attr(agent, attr_name) for attr_name in attr_names]
                for agent in self._agents
            ]
            if handle_missing == "skip":
                result = [res for res in result if all(val is not None for val in res)]
            return result

Additional advantage: It now doesn't break models for users that have defined .get(attr_names="some_attr") with a keyword argument.

@EwoutH
Copy link
Member

EwoutH commented Sep 4, 2024

This is probably faster:

    def get(self, attr_names: Union[str, List[str]], handle_missing: str = "error", default: Any = None) -> Union[List[Any], List[List[Any]]]:
        """
        Retrieve the specified attribute(s) from each agent in the AgentSet.

        Args:
            attr_names (str | list[str]): The name(s) of the attribute(s) to retrieve from each agent.
            handle_missing (str, optional): How to handle missing attributes. Can be 'error' (default), 
                                            'skip' to ignore missing attributes, or 'default' to return 
                                            the specified default value.
            default (Any, optional): The default value to return if 'handle_missing' is set to 'default'
                                     and the agent does not have the attribute.

        Returns:
            list[Any]: A list of attribute values for each agent if attr_names is a str.
            list[list[Any]]: A list of lists of attribute values for each agent if attr_names is a list of str.

        Raises:
            AttributeError: If 'handle_missing' is 'error' and the agent does not have the specified attribute(s).
        """

        # Check if the attr_names is a single string or a list of attributes.
        is_single_attr = isinstance(attr_names, str)

        # Branch earlier based on the `handle_missing` option to avoid repeated checks
        match handle_missing:
            case "error":
                if is_single_attr:
                    return [self._get_or_raise(agent, attr_names) for agent in self._agents]
                else:
                    return [[self._get_or_raise(agent, attr) for attr in attr_names] for agent in self._agents]

            case "default":
                if is_single_attr:
                    return [self._get_with_default(agent, attr_names, default) for agent in self._agents]
                else:
                    return [[self._get_with_default(agent, attr, default) for attr in attr_names] for agent in self._agents]

            case "skip":
                if is_single_attr:
                    return [val for agent in self._agents if (val := self._get_skip(agent, attr_names)) is not None]
                else:
                    return [
                        [val for attr in attr_names if (val := self._get_skip(agent, attr)) is not None]
                        for agent in self._agents
                    ]

            case _:
                raise ValueError(f"Unknown handle_missing option: {handle_missing}")

    # Helper method to get an attribute or raise an error if it's missing
    def _get_or_raise(self, agent: Any, attr_name: str) -> Any:
        return getattr(agent, attr_name)

    # Helper method to get an attribute with a default value if missing
    def _get_with_default(self, agent: Any, attr_name: str, default: Any) -> Any:
        return getattr(agent, attr_name, default)

    # Helper method to skip missing attributes by returning None
    def _get_skip(self, agent: Any, attr_name: str) -> Any:
        try:
            return getattr(agent, attr_name)
        except AttributeError:
            return None

@EwoutH
Copy link
Member

EwoutH commented Sep 5, 2024

I dumped a lot of information yesterday. Can I help you move this forward?

@quaquel
Copy link
Member Author

quaquel commented Sep 5, 2024

I dumped a lot of information yesterday. Can I help you move this forward?

meetings all morning, hope to have time this afternoon.

Copy link
Member

@EwoutH EwoutH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like it. It's a bit verbose, but I think in this case the explicitness helps. There just isn't a good single-parameter that describes both well.

@EwoutH EwoutH changed the title Add a default value to agentset.get Add default values and missing value handling to agentset.get Sep 5, 2024
@EwoutH
Copy link
Member

EwoutH commented Sep 5, 2024

Updated the PR description and added a few more test cases, given the complicated branching structure we now have.

Copy link
Contributor

@Corvince Corvince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being annoying after 2 approvals already. But I don't think the "skip" part is handled correctly. If the attributes value is "None" it will be skipped, because the code does not differentiate between attribute value None and a caught AttrributeError

mesa/agent.py Outdated Show resolved Hide resolved
mesa/agent.py Outdated Show resolved Hide resolved
mesa/agent.py Outdated Show resolved Hide resolved
@quaquel
Copy link
Member Author

quaquel commented Sep 5, 2024

@Corvince no worries, I'll take a look tomorrow and update the code based on the suggestions.

quaquel and others added 2 commits September 5, 2024 22:41
Co-authored-by: Corvince <13568919+Corvince@users.noreply.github.com>
mesa/agent.py Outdated Show resolved Hide resolved
@EwoutH
Copy link
Member

EwoutH commented Sep 5, 2024

@Corvince thanks for the critical review. I trusted GPT-generated code a little too much, it remains a fine balance.

My code, my bad: I will clean it up tomorrow.

@EwoutH
Copy link
Member

EwoutH commented Sep 6, 2024

Fixed and updated the tests cases.

There are a few weird edge cases in here, due to all the possible combinations. For example:

    # Case 11: Skip handling when one attribute is completely missing from some agents
    values = agentset.get(["a", "c"], handle_missing="skip")
    assert all(5 in a for a in values)

returns: [[5, 8], [5, 8], [5, 8], [5, 8], [5, 8], [5], [5], [5], [5], [5]]

But when it's completely missing, it doesn't even include an empty list, but just no element at all. For example:

returns: [[5, 8], [5, 8], [5, 8], [5, 8], [5], [5]]
you might expect: [[5, 8], [5, 8], [5, 8], [5, 8], [], [], [], [], [5], [5]]

Things get complicated with multi-attribute lookups.

What I don't like - and why I propose different functions for get and get_multiple, it that the output formats is different.

Returns:
    list[Any]: A list of attribute values for each agent if attr_names is a str.
    list[list[Any]]: A list of lists of attribute values for each agent if attr_names is a list of str.

Because now you can't consistently trust get to return the same data structure - which gets really unpredictable once you do things like skip.

So I'm thinking about splitting it into two funtions: get and get_multiple (name can be discussed), such that each format has one, consistent output format.

Also in some cases returning dicts might be more useful. Especially because .agg already takes some of the "I want to quickly get a metric so let's get all agents values" load of .get.

@Corvince
Copy link
Contributor

Corvince commented Sep 6, 2024

Tricky! @EwoutH Whats the use case for the "skip" operation? Do we really need it? Even without the corner cases I don't like that you just get a list back with the values, but no way to associate it with specific agents. So you don't know which agents were skipped. For some workflows that doesn't matter, but for others its suboptimal.

I think we can potentially skip a lot of complexiness without "skip". So lets reevaluate its usefulness (you could still "skip" those agents by doing agentset.select(lambda a: hasattr(a, "foo")).get("foo")

Lets evaluate the APIs a bit more

agentset.get("foo", handle_missing="skip")
agentset.select(lambda a: hasattr(a, "foo")).get("foo")
[getattr(a, "foo") for a in agentset if hasattr(a, "foo")]

Hmm I have to say API wise the skip option wins. But how common is it?

@quaquel
Copy link
Member Author

quaquel commented Sep 6, 2024

I was also thinking about the issue this morning. I am still wondering whether we need skip. So, @EwoutH, you suggested it in the first place. Can you give some use cases?

Also, API-wise, without skip you could do something simple like

values = [entry for entry in some_agentset.get(some_list_of_attributes, default_value = np.NaN) if np.NaN not in set(entry)]

if you want to have skip-like behavior.

Removing skip makes the code much simpler and it was not in scope for the initial version of this PR, the PR that preceded it, nor the original issue that is being addressed by both PRs.

@EwoutH
Copy link
Member

EwoutH commented Sep 6, 2024

I actually don't think there are that many. It just seemed like a logical option to have.

More often it would be some placeholder value like 0, None or False.

Technically with type hinting you can no initialize variables without values:

class MyAgent(Agent):
    def __init__(self, unique_id: int):
        super().__init__(unique_id, model)
        self.name: str
        self.age: int

But I haven't seen it much in the wild.

Let's remove it until we have concrete use cases.

I'm still a proponent of splitting get into two methods, one for single-attribute and one for multi-attribute.

@quaquel
Copy link
Member Author

quaquel commented Sep 6, 2024

I'm still a proponent of splitting get into two methods, one for single-attribute and one for multi-attribute.

Why? Your original motivation was strongly tied to skip. Also, we might be better off finishing this PR (I'll try to take a closer look later this afternoon) and leaving the possible split for a separate PR.

@EwoutH
Copy link
Member

EwoutH commented Sep 6, 2024

What I don't like - and why I propose different functions for get and get_multiple, it that the output formats is different.

Returns:
    list[Any]: A list of attribute values for each agent if attr_names is a str.
    list[list[Any]]: A list of lists of attribute values for each agent if attr_names is a list of str.

So I'm thinking about splitting it into two funtions: get and get_multiple (name can be discussed), such that each format has one, consistent output format.

I just don't like functions to have different return formats.

we might be better off finishing this PR (I'll try to take a closer look later this afternoon) and leaving the possible split for a separate PR.

Agreed, and thanks

@quaquel
Copy link
Member Author

quaquel commented Sep 6, 2024

removed skip, updated tests, and slightly tweaked docstring. Should be good to go now.

Copy link
Contributor

@Corvince Corvince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I like that the API remains flexible enough to later (possibly) accommodate "skip".

I also don't like the different return types for single vs multiple attributes. But I don't have a better idea for now

@quaquel quaquel merged commit 8043bc0 into projectmesa:main Sep 6, 2024
11 of 12 checks passed
@quaquel quaquel deleted the default_value_get branch September 6, 2024 15:33
@EwoutH
Copy link
Member

EwoutH commented Sep 6, 2024

Thanks for the clean-up, it’s appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Release notes label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AgentSet.get is hardcoded to raise AttributeError for agents with missing attribute
4 participants