Add AgentSet.groupby #2220

quaquel · 2024-08-17T15:23:43Z

This PR adds pandas style groupby to AgentSet (a first draft for this was #2092, but this PR has stalled). It also adds, as in pandas, a GroupBy helper class to enable method chaining. The resulting API is illustrated below.

Key Changes

adds a group_by method to AgentSet; this method enables pandas style group_by operations on the agents in the set.
adds a GroupBy helper class; this helper class is returned by the new group_by method and, as in pandas, has a few methods to enable method chaining in a split, apply, combine style syntax.

usage examples

# eppstein civil violence
# will give you a dict with the number of agents for each condition
# the result_type optional kwarg allows finer control over the group data structure
# which enhances performance if you don't need the fancy stuff in agentset
# map returns a dict with group_name and the return of whatever function is applied to each group
some_agentset.groupby("condition", result_type="list").map(len)

# random activation by type
groups = some_agentset.groupby(type)
for group, agents in groups: # we can iterate over the groups
    agents.shuffle(inplace=True).do("step")

# we can easily get the number of groups
len(groups)

#  we can get a dict with the group_name and the groups
groups.groups

# one liner random activation by type
# do returns the underlying agent set
# note how do and appply can take either a callable or a string analogous to AgentSet.do
some_agentset.groupby(type).do("shuffle", inplace=True).do("step")

github-actions · 2024-08-17T15:28:46Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
Schelling	small	🔴 +6.1% [+5.8%, +6.5%]	🔵 -0.3% [-0.5%, -0.1%]
Schelling	large	🔴 +5.8% [+5.1%, +6.5%]	🔵 +0.3% [-0.3%, +0.9%]
WolfSheep	small	🔵 +3.5% [+2.3%, +4.7%]	🔵 +2.4% [+2.1%, +2.8%]
WolfSheep	large	🔴 +3.7% [+3.3%, +4.0%]	🔵 +1.2% [+0.8%, +1.6%]
BoidFlockers	small	🔵 +2.9% [+2.2%, +3.6%]	🔵 +1.9% [+1.3%, +2.7%]
BoidFlockers	large	🔵 +3.2% [+2.8%, +3.7%]	🔵 +2.0% [+1.3%, +2.7%]

EwoutH · 2024-08-19T17:52:20Z

specifically enable user choice between agentsets and lists. Lists are faster but should be used with care

It seems that this is a recurring issue.

I have no idea if something like the following is possible in Python, but ideally you would have a fast construct that some function can accept and some functions can output aside from the agentset. As soon as you match or chain two of such functions, they recognize they can take the "fast path" and avoid compiling an recompiling an agentset.

But that's on a high level, no idea if it's possible/feasible to implement in our case in Python.

EwoutH · 2024-08-19T18:09:38Z

Apparently this is a concept called "Deferred Execution". Basically, you can do something like this:

Instead if directly performing each operation, you chain all operations that need to be performed in an iterator
Once you have gathered all operations, you check for each out if they support the fast data construct as both input and output
If there are any matches, flip a boolean for those operations
They you execute them as the original chain

You need some wrapper and/or decorator magic to make this work, and it adds a lot of code complexity. But then it could prevent unnecessary conversions.

It's kind of related to the lazy execution concept, but now you're able to look forward instead of only backwards.

quaquel · 2024-08-19T20:24:51Z

It seems that this is a recurring issue.

Yes it is

Apparently this is a concept called "Deferred Execution". Basically, you can do something like this:

I had a look but this seems a lot of trouble to get working. You need to use reflection to figure out all calls and then complex logic to figure out what do to. The short term and easier solution is to allow users to control this explicitly. With the inplace argument on for example shuffle we are already doing so. It is just a matter of documenting it carefully and making clear that it can be beneficial to profile this carefully.

quaquel · 2024-08-19T20:27:33Z

This is ready for review. All tests are passing and the docstring is in place. The list/agentset stuff has been raised by Ewout and I believe giving users control is the least worst solution at the moment.

What remains is the split apply combine stuff. Split/groupby is there. Apply is available on the GroupBy helper class. I am less sure on the need and desirability of an explicit control for combine. In pandas this is needed because everything needs to be transformed back to a DataFrame/Series. We don't have that need in MESA.

EwoutH · 2024-08-19T20:29:52Z

Thanks a lot for this. Will take a close look tomorrow. If you have the chance, could you maybe add a few more user API examples in the PR start?

github-actions · 2024-08-19T20:29:57Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
Schelling	small	🔵 -2.5% [-2.8%, -2.3%]	🔵 +0.6% [+0.4%, +0.8%]
Schelling	large	🔵 -2.6% [-2.9%, -2.3%]	🔵 +2.6% [+1.2%, +4.2%]
WolfSheep	small	🔵 -3.4% [-4.2%, -2.6%]	🔵 +0.5% [+0.2%, +0.8%]
WolfSheep	large	🟢 -3.6% [-3.8%, -3.4%]	🔵 -0.2% [-0.5%, +0.1%]
BoidFlockers	small	🟢 -6.9% [-7.5%, -6.2%]	🔵 -1.4% [-2.0%, -0.7%]
BoidFlockers	large	🟢 -6.4% [-6.9%, -5.9%]	🔵 -0.9% [-1.4%, -0.5%]

EwoutH · 2024-08-19T20:54:47Z

mesa/agent.py

+        """Apply the specified callable to each group
+
+        Args:
+            callable (Callable): The callable to apply to each group, it will be called with the group as first argument


Would it be useful to also pass the group key here?

No. Apply in pandas goes over all groups and if you want to do it on a single group you can just do

groupes = some_agentset.groupby("condition") callable(groupes.get_group["quiet"])

EwoutH · 2024-08-19T21:02:31Z

On initial look this is a clean and solid start, especially API wise. I don't expect much work to be needed, we can extend what we find missing in later PRs.

Tomorrow I will play with some models to get a real feel for this function.

quaquel · 2024-08-20T06:10:07Z

On initial look this is a clean and solid start, especially API wise. I don't expect much work to be needed, we can extend what we find missing in later PRs.

I came up with a couple of simple additions to GroupBy.apply. First, it now only takes a callable, this should be changed to a callable or string so you can call methods on the underlying agentset if desired. Next, it might be nice to control the return type to be either the return of the callable/str or the GroupBy instance itself (as we do in AgentSet in places). This allows for chaining like

# random activation by type
some_agentset.groupby(type).apply("shuffle", returntype="groupby", inplace=true).apply("do")

EwoutH · 2024-08-20T06:39:13Z

Thanks. I think the best to keep group_by properly scoped is to have a few very clear use cases. Data collection (with aggregation) is clearly one, RandomActivationByType could be another one.

What I find weird about that scheduler by the way, is that is shuffles the order of the agent groups and the agents within the group both by default:

mesa/mesa/time.py

Line 349 in 3cf1b76

    
           def step(self, shuffle_types: bool = True, shuffle_agents: bool = True) -> None:

Which means the same group can go twice in a row, etc. Not a weird option to have, but strange by default.

Luckily the AgentSet is way more flexible. A shuffle_groups() method might be useful in GroupBy if you want to replicate this behavior.

But all based on clear use cases.

What would be interesting is to see if it allows us to make our current examples better or easier. For example, can something in Game of Life be improved with group by? Or in Boltzmann wealth?

mesa/agent.py

Corvince · 2024-08-20T10:29:01Z

I think this PR adds a nice feature, but I would advice to carefully reconsider the return types. Currently the flow is like the following, if we consider all possible return types (including the ones proposed for apply)

agentset.group_by("foo").apply(bar)
AgentSet -> GroupBy[str, AgentSet | list] -> Dict[str, Any] | GroupBy[str, AgentSet | list]

It took me quite some time to actually figure this out, because you really have to think through what can be returned by each function. Compare this to Pandas

dataframe.groupby("foo").apply(bar)
DataFrame -> DataFrameGroupBy[str, DataFrame] -> DataFrame

I would recommend following a similar path and staying as close to our AgentSet as possible. I think defining return types is also a bit unpythonic. Instead of agentset.groupby(foo, response_type="list") it is more pythonic to just use list(agentset.groupby(foo).

Also please note the different spelling of group_by (this PR) and groupby (pandas)

quaquel · 2024-08-20T15:06:08Z

I think this PR adds a nice feature, but I would advice to carefully reconsider the return types. Currently the flow is like the following, if we consider all possible return types (including the ones proposed for apply)

I have been somewhat concerned about this as well, but you drive home the importance of sorting it out. Its even more confusing then I thought.

I think defining return types is also a bit unpythonic. Instead of agentset.groupby(foo, response_type="list") it is more pythonic to just use list(agentset.groupby(foo).

Here I disagree somewhat. Pandas has control for returntype in various places E.g. DataFrame.apply has a way of specifying the result_type. We also do some of it in AgentSet already (i.e., AgentSet.do). The main problem is that in pandas, groupby will allways go back to a dataframe. I don't think there is such a obvious default in our case.

One possible solution is to only add group_by (renamed as @Corvince points out to groupby) and don't add the GroupBy class with the apply function.

rht · 2024-08-21T06:50:31Z

One possible solution is to only add group_by (renamed as @Corvince points out to groupby) and don't add the GroupBy class with the apply function.

I thought this was because Polars had decided to use group_by instead of groupby. In dplyr, it is group_by. I'm not sure of Polars' reasoning, but it seems to be for consistency with dplyr and that Polars is a cross-language library. I'm fine with either way.

mesa/agent.py

for more information, see https://pre-commit.ci

quaquel · 2024-08-21T16:58:38Z

I have added __len__, seperated apply and do, and removed get_groups. Unittests have also been updated. This is ready for a last review.

quaquel · 2024-08-22T08:42:56Z

Based on conversation in #2237, changed GroupBy.apply to GroupBy.map.

EwoutH · 2024-08-22T10:25:45Z

Thanks for the work, both this PR and the do split ended up great!

Adds Agentset.groupby and Groupby Helper class

quaquel added 5 commits January 31, 2024 18:59

further updates

d90b0f7

Update benchmarks/WolfSheep/__init__.py

9586490

Merge remote-tracking branch 'upstream/main'

4aaa35d

Merge remote-tracking branch 'upstream/main'

d31478c

Merge remote-tracking branch 'upstream/main'

6e4c72e

quaquel added the enhancement Release notes label label Aug 17, 2024

EwoutH added feature Release notes label and removed enhancement Release notes label labels Aug 17, 2024

Merge remote-tracking branch 'upstream/main'

70fbaf5

quaquel mentioned this pull request Aug 18, 2024

Boltzman performance issue #2224

Closed

quaquel force-pushed the agentset_group_by branch from 71b414a to c90a6ea Compare August 19, 2024 14:35

quaquel marked this pull request as ready for review August 19, 2024 20:25

quaquel requested review from EwoutH and rht August 19, 2024 20:28

EwoutH reviewed Aug 19, 2024

View reviewed changes

rht reviewed Aug 20, 2024

View reviewed changes

mesa/agent.py Show resolved Hide resolved

rht reviewed Aug 21, 2024

View reviewed changes

mesa/agent.py Outdated Show resolved Hide resolved

quaquel and others added 13 commits August 21, 2024 15:46

Update test_agent.py

e9be4cc

[pre-commit.ci] auto fixes from pre-commit.com hooks

6a329bd

for more information, see https://pre-commit.ci

ruff fixes

16d75c5

[pre-commit.ci] auto fixes from pre-commit.com hooks

e09f5a2

for more information, see https://pre-commit.ci

additional docs

a4d9559

add test for callable

7db02b8

[pre-commit.ci] auto fixes from pre-commit.com hooks

8aa56da

for more information, see https://pre-commit.ci

rename groupby and additional docs

943a907

Update agent.py

9fc7963

add str option to GroupBy.apply

c481281

[pre-commit.ci] auto fixes from pre-commit.com hooks

585025b

for more information, see https://pre-commit.ci

remove unnecessary list comprehension

f0dccf6

seperate apply and do

eaf0a5b

quaquel force-pushed the agentset_group_by branch from 332645b to eaf0a5b Compare August 21, 2024 13:46

quaquel and others added 3 commits August 21, 2024 15:48

remove unintened update to __init__.py for wolfsheep benchmark

7c3a78f

additional tests

a25dd06

[pre-commit.ci] auto fixes from pre-commit.com hooks

5544cba

for more information, see https://pre-commit.ci

quaquel changed the title ~~Add AgentSet.group_by~~ Add AgentSet.groupby Aug 21, 2024

quaquel mentioned this pull request Aug 21, 2024

Split AgentSet into map and do to seperate return types #2237

Merged

Change apply to map

e075bcb

also update tests

c72aa37

Corvince approved these changes Aug 22, 2024

View reviewed changes

quaquel merged commit a48dd80 into projectmesa:main Aug 22, 2024
11 of 12 checks passed

quaquel mentioned this pull request Aug 22, 2024

add group_by method to AgentSet for attribute-based grouping #2092

Closed

EwoutH mentioned this pull request Aug 29, 2024

AgentSet: Allow selecting a fraction of agents in the AgentSet #2253

Merged

EwoutH pushed a commit to EwoutH/mesa that referenced this pull request Sep 24, 2024

Add AgentSet.groupby (projectmesa#2220)

d721576

Adds Agentset.groupby and Groupby Helper class

quaquel deleted the agentset_group_by branch November 4, 2024 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AgentSet.groupby #2220

Add AgentSet.groupby #2220

quaquel commented Aug 17, 2024 •

edited

Loading

github-actions bot commented Aug 17, 2024

EwoutH commented Aug 19, 2024

EwoutH commented Aug 19, 2024 •

edited

Loading

quaquel commented Aug 19, 2024

quaquel commented Aug 19, 2024

EwoutH commented Aug 19, 2024

github-actions bot commented Aug 19, 2024

EwoutH Aug 19, 2024

quaquel Aug 20, 2024

EwoutH commented Aug 19, 2024

quaquel commented Aug 20, 2024

EwoutH commented Aug 20, 2024 •

edited

Loading

Corvince commented Aug 20, 2024

quaquel commented Aug 20, 2024

rht commented Aug 21, 2024

quaquel commented Aug 21, 2024

quaquel commented Aug 22, 2024

EwoutH commented Aug 22, 2024 •

edited

Loading

Add AgentSet.groupby #2220

Add AgentSet.groupby #2220

Conversation

quaquel commented Aug 17, 2024 • edited Loading

github-actions bot commented Aug 17, 2024

EwoutH commented Aug 19, 2024

EwoutH commented Aug 19, 2024 • edited Loading

quaquel commented Aug 19, 2024

quaquel commented Aug 19, 2024

EwoutH commented Aug 19, 2024

github-actions bot commented Aug 19, 2024

EwoutH Aug 19, 2024

Choose a reason for hiding this comment

quaquel Aug 20, 2024

Choose a reason for hiding this comment

EwoutH commented Aug 19, 2024

quaquel commented Aug 20, 2024

EwoutH commented Aug 20, 2024 • edited Loading

Corvince commented Aug 20, 2024

quaquel commented Aug 20, 2024

rht commented Aug 21, 2024

quaquel commented Aug 21, 2024

quaquel commented Aug 22, 2024

EwoutH commented Aug 22, 2024 • edited Loading

quaquel commented Aug 17, 2024 •

edited

Loading

EwoutH commented Aug 19, 2024 •

edited

Loading

EwoutH commented Aug 20, 2024 •

edited

Loading

EwoutH commented Aug 22, 2024 •

edited

Loading