-
Notifications
You must be signed in to change notification settings - Fork 929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AgentSet.groupby #2220
Add AgentSet.groupby #2220
Conversation
Performance benchmarks:
|
71b414a
to
c90a6ea
Compare
It seems that this is a recurring issue. I have no idea if something like the following is possible in Python, but ideally you would have a fast construct that some function can accept and some functions can output aside from the agentset. As soon as you match or chain two of such functions, they recognize they can take the "fast path" and avoid compiling an recompiling an agentset. But that's on a high level, no idea if it's possible/feasible to implement in our case in Python. |
Apparently this is a concept called "Deferred Execution". Basically, you can do something like this:
You need some wrapper and/or decorator magic to make this work, and it adds a lot of code complexity. But then it could prevent unnecessary conversions. It's kind of related to the lazy execution concept, but now you're able to look forward instead of only backwards. |
Yes it is
I had a look but this seems a lot of trouble to get working. You need to use reflection to figure out all calls and then complex logic to figure out what do to. The short term and easier solution is to allow users to control this explicitly. With the |
This is ready for review. All tests are passing and the docstring is in place. The list/agentset stuff has been raised by Ewout and I believe giving users control is the least worst solution at the moment. What remains is the split apply combine stuff. Split/groupby is there. Apply is available on the |
Thanks a lot for this. Will take a close look tomorrow. If you have the chance, could you maybe add a few more user API examples in the PR start? |
Performance benchmarks:
|
mesa/agent.py
Outdated
"""Apply the specified callable to each group | ||
|
||
Args: | ||
callable (Callable): The callable to apply to each group, it will be called with the group as first argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to also pass the group key here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Apply in pandas goes over all groups and if you want to do it on a single group you can just do
groupes = some_agentset.groupby("condition")
callable(groupes.get_group["quiet"])
On initial look this is a clean and solid start, especially API wise. I don't expect much work to be needed, we can extend what we find missing in later PRs. Tomorrow I will play with some models to get a real feel for this function. |
I came up with a couple of simple additions to # random activation by type
some_agentset.groupby(type).apply("shuffle", returntype="groupby", inplace=true).apply("do") |
Thanks. I think the best to keep group_by properly scoped is to have a few very clear use cases. Data collection (with aggregation) is clearly one, RandomActivationByType could be another one. What I find weird about that scheduler by the way, is that is shuffles the order of the agent groups and the agents within the group both by default: Line 349 in 3cf1b76
Which means the same group can go twice in a row, etc. Not a weird option to have, but strange by default. Luckily the AgentSet is way more flexible. A But all based on clear use cases. What would be interesting is to see if it allows us to make our current examples better or easier. For example, can something in Game of Life be improved with group by? Or in Boltzmann wealth? |
I think this PR adds a nice feature, but I would advice to carefully reconsider the return types. Currently the flow is like the following, if we consider all possible return types (including the ones proposed for apply) agentset.group_by("foo").apply(bar)
AgentSet -> GroupBy[str, AgentSet | list] -> Dict[str, Any] | GroupBy[str, AgentSet | list] It took me quite some time to actually figure this out, because you really have to think through what can be returned by each function. Compare this to Pandas dataframe.groupby("foo").apply(bar)
DataFrame -> DataFrameGroupBy[str, DataFrame] -> DataFrame I would recommend following a similar path and staying as close to our AgentSet as possible. I think defining return types is also a bit unpythonic. Instead of Also please note the different spelling of |
I have been somewhat concerned about this as well, but you drive home the importance of sorting it out. Its even more confusing then I thought.
Here I disagree somewhat. Pandas has control for returntype in various places E.g. DataFrame.apply has a way of specifying the result_type. We also do some of it in AgentSet already (i.e., AgentSet.do). The main problem is that in pandas, groupby will allways go back to a dataframe. I don't think there is such a obvious default in our case. One possible solution is to only add |
I thought this was because Polars had decided to use |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
332645b
to
eaf0a5b
Compare
I have added |
Based on conversation in #2237, changed |
Thanks for the work, both this PR and the do split ended up great! |
Adds Agentset.groupby and Groupby Helper class
This PR adds pandas style groupby to AgentSet (a first draft for this was #2092, but this PR has stalled). It also adds, as in pandas, a GroupBy helper class to enable method chaining. The resulting API is illustrated below.
Key Changes
group_by
method to AgentSet; this method enables pandas style group_by operations on the agents in the set.GroupBy
helper class; this helper class is returned by the newgroup_by
method and, as in pandas, has a few methods to enable method chaining in a split, apply, combine style syntax.usage examples