Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Remove dependence on model.schedule, add clock to Model #1942

Merged
merged 17 commits into from
Jan 26, 2024

Conversation

rht
Copy link
Contributor

@rht rht commented Jan 7, 2024

The current AgentSet API allows users to define the structure and order of agent actions without initializing a scheduler object. However, a scheduler object is currently necessary until this pull request addresses the issue by incorporating the steps and time attributes directly into the model (which were previously only tracked in the scheduler).

A complication arises for users who still use the scheduler object, as they now need to manually specify model.advance_time() within the model's step() function. To resolve this, the pull request proposes a solution with assistance from ChatGPT, involving the injection of model.advance_time() into the scheduler's step() function. This modification aims to streamline the process for users and enhance the overall functionality of the AgentSet API.

@quaquel
Copy link
Member

quaquel commented Jan 7, 2024

The use of model.agents looks good to me.

One quick question: you also changed how time is handled. It was retrieved from the schedule, and now you track it internally. This is because you don't want to depend on schedule, which makes sense. However, from a conceptual point of view, is this the nicest, longer-term way of handling time?

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

It was retrieved from the schedule, and now you track it internally.
This is because you don't want to depend on schedule, which makes sense. However, from a conceptual point of view, is this the nicest, longer-term way of handling time?

Thank you for raising the issue. In this PR, I track it within the datacollector object, which signifies the observer's clock. Which is not ideal. Still looking for a better solution. One alternative would be to define a method in the mesa.Model:

def advance_time(self):
    self.time += 1

In the user's model step

def step(self):
    self.agents.shuffle().do("step")
    self.datacollector.collect(self)
    self.advance_time()

This is how it is done in abcEconomics.

@quaquel
Copy link
Member

quaquel commented Jan 7, 2024

I think tying time to model.step is indeed the way to go. The simple solution would be to handle it through a super call or through some kind of annotation. To be discussed separately at some point.

@EwoutH
Copy link
Member

EwoutH commented Jan 7, 2024

I think tying time to model.step is indeed the way to go.

At least by default. I think some concept of actual time would also be very useful. See #1912 (reply in thread)

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

The simple solution would be to handle it through a super call or through some kind of annotation. To be discussed separately at some point.

I had thought about super().step(). The problem is that it is very easy to forget doing so. It doesn't have a direct meaning of incrementing the time.

At least by default. I think some concept of actual time would also be very useful. See #1912 (reply in thread)

There would be the ContinuousSpace equivalent of time in that the system may advance a float-type amount of time, and indeed that steps needs to remain separate.

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

Regarding with the term, advance_time is more general than increment_time or step_time, with the latter 2 are specific for discrete time step. But at the same time, the term needs to also encompass incrementing the model steps count. As such, advance_time is not sufficient.

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

To be pedantic, step_and_advance_time would be the term for incrementing both time and steps.

@quaquel
Copy link
Member

quaquel commented Jan 7, 2024

There would be the ContinuousSpace equivalent of time in that the system may advance a float-type amount of time

In my view, the moment you allow for this, you should go all the way and have a discrete event-style event list at the heart of everything. If you want traditional ABM behavior, you just schedule evenly-spaced events. Each event would be then a call to model.step. If you want full-blown discrete event behavior, you can schedule events (i.e., a combination of a time instant and callable) at any other non-discrete time instant. In fact, you can simply hybridize this by allowing both. Time then is held by the event list.

building on @rht on advance_time, increment_time, and step_time, with an event list, all you would have is advance which means goes to the next scheduled event and update time to the time of this event. So, the entire problem simply disappears.

No idea what a resulting clear API could look like.

@EwoutH
Copy link
Member

EwoutH commented Jan 7, 2024

Another thought I have was giving Model.step() a timestep argument. Default would be timestep=1, but you can change that (once or every step if you want).

This could integrate the advance time part into the step, right?

@quaquel
Copy link
Member

quaquel commented Jan 7, 2024

There is a way to track the number of calls to model.step without relying on super. It involves the use of metaclasses.

from collections import defaultdict
from functools import wraps

def count_calls(func):
    name = func.__name__

    @wraps(func)
    def wrapper(self, *args, **kwargs):
        # creates the instance counter if necessary
        counter = getattr(self, "time", None)
        if counter is None:
            counter = 0
        setattr(self,"time", counter + 1)
        return func(self, *args, **kwargs)

    wrapper._is_count_call_wrapper = True
    return wrapper


class CountStep(type):
    def __new__(cls, name, bases, attrs):
        if name != Model.__name__:        
            try:
                step_method = attrs["step"]
            except KeyError:
                pass
            else:
                attrs["step"] = count_calls(step_method)
        return super(CountStep, cls).__new__(cls, name, bases, attrs)

class Model(metaclass=CountStep):
    pass
    
class MyModel(Model):

    def step(self):
        print(self.time)

If we run this

model = MyModel()
for _ in range(10):
    model.step()

we nicely get 1 ... 10.

@quaquel
Copy link
Member

quaquel commented Jan 7, 2024

Another thought I have was giving Model.step() a timestep argument. Default would be timestep=1, but you can change that (once or every step if you want).

This could integrate the advance time part into the step, right?

Not sure how to read something like this. What would it mean if you say step(timestep=3.1415)? Do you mean to take the current time and add the timestep to it and execute all events scheduled between the current time and the new endtime? Or something else?

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

(I'm probably digressing too much here on DES...)

Considering #1912 (reply in thread)

In case of ABM, there are only fixed time intervals, or ticks, between sets of events.
MESA lacks an eventlist. Instead, it is up to the user to advance the eventlist by one tick at a time by calling model.step().

I'm drawing example from the Eurace@Unibi model, one of the most elaborate macroeconomic model that has existed.

Taking an excerpt from the paper on Eurace@Unibi

Concerning the activation of agents, the actions can be calendar-based (time-driven) or event-based, where the former can follow either subjective or objective time schedules (agent-time vs.
clock-time). Furthermore, the economic activities take place on a hierarchy of time-scales: yearly,
monthly, weekly and daily activities all take place following the calendar-time or subjective agent-time. Agents are activated asynchronously according to their subjective time schedules that are
anchored on an individual activation day. These activation days are uniformly and randomly
distributed among the agents at the start of the simulation but may change endogenously.

If we were to be able to model a reduced version of Eurace@Unibi in Mesa, for pedagogical purpose. Extending Mesa to describe events would be necessary.

@quaquel
Copy link
Member

quaquel commented Jan 7, 2024

Colleagues of mine have been doing pandemic modeling with models involving between 150 thousand and 25 million agents. The only way to make this computationally feasible was by switching from calender-based to event-based activation. So at some point figuring out how to support this in MESA would be great.

In the meantime, however, there is still the issue of tracking the time of the simulation. Would it make sense to do it along the lines of the metaclass example I have given above?

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

In the meantime, however, there is still the issue of tracking the time of the simulation. Would it make sense to do it along the lines of the metaclass example I have given above?

While it is more convenient to the user, I find the implementation to be too complex for the reader of Mesa code. The library code needs to be simple enough without requiring one to spend an effort to decipher the implementation to what amount to tracking the time automatically.

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

Not sure how to read something like this. What would it mean if you say step(timestep=3.1415)? Do you mean to take the current time and add the timestep to it and execute all events scheduled between the current time and the new endtime? Or something else?

I see it as the step period that is not necessarily an integer. All the events that happen within the step, are sorted based on their activation times, and are executed in order. That event1 fires at time 0.06674, event2 at 0.1054, event3 at 2.9979, which information is used to decide their execution order.

Edit:
It seems that the AgentSet implementation in #1916 has yet to be able to replace DiscreteEventScheduler.

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

What do you think of this approach instead?

class Clock:
    def __init__(self):
        self.steps = 0
        self.time = 0

    def step(deltat=1):  # deltat is more mnemonic than timestep
        self.steps += 1
        self.time += deltat

class MyModel(mesa.Model):
    def __init__(self, ...):
        self.clock = Clock()

    def step(self):
        self.agents.shuffle().do("step")
        self.datacollector.collect(self)
        # This is sufficiently mnemonic, as a replacement of self.schedule.step()
        self.clock.step()

And so, we reuse the existing ABM terms without having to invent new terms. It's FSM all the way down.

@EwoutH
Copy link
Member

EwoutH commented Jan 7, 2024

That's exactly what I had in mind. Only instead of creating a new class, I would just integrate it in the Model class.

Edit: and don't have to call the clock explicitly, that should

@quaquel
Copy link
Member

quaquel commented Jan 7, 2024

Why do I use a library for something? To avoid having to write boilerplate code. All models need to track time, so if I use a library, the library should handle this for me. With this suggestion, I must add Clock myself and remember to advance it. It also adds two lines of code to any model I make. Also, speaking from experience teaching MESA for the last 3 years at the MSc level, this is something that will easily trip up new users. So, no, I don't like this suggestion.

While it is more convenient to the user, I find the implementation to be too complex for the reader of Mesa code. The library code needs to be simple enough without requiring one to spend an effort to decipher the implementation to what amount to track the time automatically.

So, here I have a different view. For me, the cleanliness of the API and the use of the library come first. So be it if a clean and easy-to-use API requires some more obscure Python machinery. Because, who is going to read the source code? Only users who are invested in the library and already have some programming background. So, as long as the code is well documented and explained at a high level (i.e., what does it do) and with some detail on how this is achieved, I prefer such a solution over forcing my user always to add boilerplate code.

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

I appreciate the criticism and see the point regarding with the annoyance of having to manually carry along the Clock object to wherever the dynamics of the model happens. But I do think that having a simple library encourages users to read the source code, to extend and experiment with them, and to more likely contribute back to the code. From the maintainers' perspective, obscure code only works if there are only a small number of maintainers who understand the code but no one else does. If #1942 (comment) were to be incorporated, from the perspective of an uninitiated developer new to this section of the code, the Git archaelogy would have been more involved than the issue I encountered with __new__ and __init__ in the model initialization.

At the very least, model.clock simply replaces model.schedule in the previous code. And the students may understand what is going on under the hood, instead of using a library that "just works".

Regarding with PEP 20:

Simple is better than complex.

I interpret it as overall simplicity, instead of simple API but obscure implementation.

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

That said, I still think there might be a solution where a model.clock or model.steps & model.time is not needed, depending on how model.datacollector's definition, as an observer to the system, should be modified.

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

That's exactly what I had in mind. Only instead of creating a new class, I would just integrate it in the Model class.

This seems the simplest approach for now. model.steps and model.time are automatically initialized during super().__init__(). Users then just need to remember to do self.step_and_advance_time() inside the model step(), as the only boilerplate line.

@quaquel
Copy link
Member

quaquel commented Jan 7, 2024

I am fine with a simple solution, although I would advocate calling super over adding another method to the model class. It is still only one additional line, but at least for me and in my teaching, I always advocated calling super anyway.

Some other thoughts

  1. The discussion for me is not about simplicity versus complexity. I agree that PEP 20 should guide all Python projects. Here, however, there is a trade-off between the simplicity of the API and the simplicity of the implementation. I am personally also doubtful whether using metaclasses for something trivial like tracking time is defensible. However, it is the only solution I could devise which avoids forcing the user to write additional boilerplate code.
  2. The argument that implementation simplicity would stimulate users to contribute back to MESA makes no sense to me. In my experience, other factors drive that choice. WRT to mesa, the fact that ruff is not automated, that there are many open issues where it is unclear whether the maintainers actually want to address them (see. e.g., my discussion with @Corvince in Add system state tracking #1933 on Model state format #574), the lack of milestones, and the many stale pull requests, are much more important in my decision on whether to continue to contribute than readability. In fact, having spent the better part of yesterday getting my head around the current space implementation, there are more important parts of the current code base that are hard to comprehend than a relatively small part of the code that could be quite easily explained (also PEP 20: If the implementation is easy to explain, it may be a good idea.) with some comments in the code (i.e., the suggestion is a combination of an annotation (count_calls) and a metaclass to automatically assign this annotation to the step method of the user's model).
  3. Yes, datacollection is in need of an overhaul. However, I believe the data collector should retrieve time from the model rather than be responsible for maintaining time. Because, as @rht stated, the data collector observers the model and a model should run fine without any data collection.

@rht
Copy link
Contributor Author

rht commented Jan 7, 2024

The argument that implementation simplicity would stimulate users to contribute back to MESA makes no sense to me. In my experience, other factors drive that choice. WRT to mesa, the fact that ruff is not automated, that there are many open issues where it is unclear whether the maintainers actually want to address them ...

I think the emphasis on the implementation simplicity of the code should be orthogonal to the questioning of the maintainers' time commitment. There would definitely be a situation where both the code is simple and readable, together with active maintenance. (On my end, #1933 on #574 is definitely on my radar; I just need some time to digest them.)

To localize the discussion on the system clock: that said, #1942 (comment) handles the model.steps update by counting the number of step() calls, but it hasn't taken into account of model.time update. You could specify the timestep at model __init__, but there is an implicitness in this design choice.

tests/test_batch_run.py Outdated Show resolved Hide resolved
@EwoutH
Copy link
Member

EwoutH commented Jan 7, 2024

Let’s separate some issues here:

  1. Tracking time in the model
  2. Decoupling the data collector from the scheduler
  3. State tracking
  4. Event based activation
  5. Complexity in implementation vs user API
  6. Mesa maintenance

1 and 2 are implementation discussions. I think everyone agrees they should be done, so let’s (continue) discussing how. Maybe in separate issues or PRs though, and I think it might be useful to do 1 first and then 2.

3 and 4 are long term and conceptual. In any case, you probably want a central clock in the model, right? So it doesn’t block 1 or 2, and we can continue discussing 3 and 4 in their respective discussions.

5 important, but can quickly get very broad. If it’s not about this specific implementation anymore I would say spin off into a new discussion.

6 also important, but can get personal, and thus maybe face to face stuff (or very well-thought out written out).

(might still be missing some stuff)


In general, I would suggest issues and PRs to be atomic, and only focus on one coherent issue. Of course it can touch other stuff, and therefore spin-off new discussions (which is great in general, also about meta things like user/contributor friendliness), but let’s try to spin-off those discussions in separate threads, or discuss them in a face to face dev meeting. That helps to keep the PRs on topic.

@Corvince
Copy link
Contributor

Corvince commented Jan 8, 2024

Thanks a lot for summarizing the sometimes confusing discussion @EwoutH !

However I disagree on

  1. Decoupling the data collector from the scheduler
    [...]
    1 and 2 are implementation discussions. I think everyone agrees they should be done, so let’s (continue) discussing how. Maybe in separate issues or PRs though, and I think it might be useful to do 1 first and then 2.

From the discussion in #1912 I think it is still unclear if we want to actually get rid of schedulers or not. I think we need to continue to discuss this first before we continue this path here. Because if we want to keep schedulers I think they are the right place to keep track of time and so there is no need to decouple the logic. I mean the whole discussion can be viewed as an advantage of schedulers - it is clear that they need to track time. For example, if we remove schedulers, but add a Clock instance we don't really gain anything. Same for tracking time inside the model instance - we just further clutter the model namespace, but keep a tied coupling. This isn't necessarily bad, but we should really first discuss about the future of schedulers before arguing about implementation details. And the best place for this discussion is #1912.

@rht
Copy link
Contributor Author

rht commented Jan 8, 2024

For example, if we remove schedulers, but add a Clock instance we don't really gain anything.

The gain: model.agents.shuffle().do("step") and model.agents.shuffle().do("advance") is conceptually clearer than the term StagedActivation with lots of boilerplate code. The Clock instance has a very specific purpose and is easy to conceptualize and explain.

@EwoutH
Copy link
Member

EwoutH commented Jan 26, 2024

Right, I now understand the complication:

  • The data collector depended on the schedule.
  • If an agent is only removed from the schedule (and not from the model) the data collection used to stop.
  • Now it doesn’t, and data keeps being collected.

While it isn’t best practice, there might be models out there that rely on this behavior.

@EwoutH EwoutH closed this Jan 26, 2024
@EwoutH EwoutH reopened this Jan 26, 2024
@EwoutH
Copy link
Member

EwoutH commented Jan 26, 2024

(sorry misclicked)

Let me think a bit about possible solutions. Maybe we can get away with throwing a clear warning in the right place.

Edit: I feel in both the old and new implementation we make assumptions about for which agents data will be collected. In the new one we definitely have to make that explicit.

@EwoutH
Copy link
Member

EwoutH commented Jan 26, 2024

Another option could be adding some switch like old_datacollector_behaviour which we flip on 3.0.

@rht
Copy link
Contributor Author

rht commented Jan 26, 2024

I am removing the commit "time: Remove agent.remove in remove" so that this PR is ready to merge as is.

@EwoutH
Copy link
Member

EwoutH commented Jan 26, 2024

Sorry but for me this doesn't solve the issue:

  • There might be models which remove the agent after removing it from the schedule. Those will now crash, since the agent is already removed.
  • There might be models that remove it from the schedule temporarily and then add it back again.
  • There might be models that remove it from the schedule but keep it on the grid

I'm not happy about altering time module behavior. I would like to solve it in the datacollector. Some kind of flag or switch that "if an agents is removed from model.schedule, we will stop collecting data. With Mesa 3.0 this might change. We recommend explicitly removing your agent from the model with agent.remove() if you want to completely remove the agent."

If you want, I can try to come up with an implementation in the weekend.

@Corvince
Copy link
Contributor

Corvince commented Jan 26, 2024

I think the simplest solution would be to check if model.schedule exists and if it does collect data from its agents. Otherwise use model.agents. This would be backwards compatible, but allow removing the scheduler. And agree that in a future datacollector it should be made explicit which agent data is collected.

/edit and of course don't remove agents from model.agents if they are only removed from the scheduler

@quaquel
Copy link
Member

quaquel commented Jan 26, 2024

I agree with @Corvince proposed solution.

@EwoutH
Copy link
Member

EwoutH commented Jan 26, 2024

Good idea, also agreed. @rht would you like to implement it?

@rht
Copy link
Contributor Author

rht commented Jan 26, 2024

Done. You can check the last commit of this PR.

@EwoutH EwoutH changed the title refactor: Remove dependence on model.schedule refactor: Remove dependence on model.schedule, add time andto Model Jan 26, 2024
@EwoutH EwoutH changed the title refactor: Remove dependence on model.schedule, add time andto Model refactor: Remove dependence on model.schedule, add clock to Model Jan 26, 2024
Copy link
Member

@EwoutH EwoutH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go! Thanks a lot, we churned out a lot of conceptual things together with this PR. I will try to do a quick write up tomorrow.

I can merge later today if preferred. I recommend either squashing or cleaning up the commits.

@EwoutH EwoutH added the enhancement Release notes label label Jan 26, 2024
@rht rht merged commit 003cbe3 into projectmesa:main Jan 26, 2024
12 of 13 checks passed
@rht rht deleted the rm_schedule branch January 26, 2024 16:36
EwoutH added a commit that referenced this pull request Jan 26, 2024
)

* refactor: Remove dependence on model.schedule

* model: Implement internal clock

* time: Call self.model.advance_time() in step()

This ensures that the scheduler's clock and the model's clock are
updated and are in sync.

* Ensure advance_time call in schedulers happen only once in a model step

* Turn model steps and time to be private attribute

* Rename advance_time to _advance_time

* Annotate model._steps

* Remove _advance_time from tests

This is because schedule.step already calls _advance_time under the
hood.

* model: Rename _time to time_

* Rename _steps to steps_

* Revert applying _advance_time in schedulers step

* feat: Automatically call _advance_time right after model step()

Solution drafted by and partially attributed to ChatGPT: https://chat.openai.com/share/d9b9c6c6-17d0-4eb9-9eae-484402bed756

* fix: Make sure agent removes itself in schedule.remove

* fix: Do step() wrapping in scheduler instead of model

* fix: JupyterViz: replace model.steps with model.steps_

* Rename steps_ -> _steps, time_ -> _time

* agent_records: Use model.agents only when model has no scheduler

---------

Co-authored-by: Ewout ter Hoeven <E.M.terHoeven@student.tudelft.nl>
EwoutH added a commit that referenced this pull request Jan 26, 2024
)

* refactor: Remove dependence on model.schedule

* model: Implement internal clock

* time: Call self.model.advance_time() in step()

This ensures that the scheduler's clock and the model's clock are
updated and are in sync.

* Ensure advance_time call in schedulers happen only once in a model step

* Turn model steps and time to be private attribute

* Rename advance_time to _advance_time

* Annotate model._steps

* Remove _advance_time from tests

This is because schedule.step already calls _advance_time under the
hood.

* model: Rename _time to time_

* Rename _steps to steps_

* Revert applying _advance_time in schedulers step

* feat: Automatically call _advance_time right after model step()

Solution drafted by and partially attributed to ChatGPT: https://chat.openai.com/share/d9b9c6c6-17d0-4eb9-9eae-484402bed756

* fix: Make sure agent removes itself in schedule.remove

* fix: Do step() wrapping in scheduler instead of model

* fix: JupyterViz: replace model.steps with model.steps_

* Rename steps_ -> _steps, time_ -> _time

* agent_records: Use model.agents only when model has no scheduler

---------

Co-authored-by: Ewout ter Hoeven <E.M.terHoeven@student.tudelft.nl>
@EwoutH
Copy link
Member

EwoutH commented Jul 3, 2024

The only thing I currently still dislike is that time and step have to explicitly increased (by using _advance_time). It should just keep track of the number of steps by default, and optionally can be overwritten if you want to do something else.

This is still the case in the current codebase right? If so, it's a bit weird that users should call an private function to increase the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Release notes label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants