You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Wouldn't it be cool if we can replay trajectories? This means,
Experiment results are easily reproducible (since evaluation harness usually provides clean initial states)
Easier to record demo videos
Integration testing
Make debugging scenarios easier to reproduce
Describe the UX of the solution you'd like
I don't know how it would look like on web GUI, but I think backend support would be a good first step.
Do you have thoughts on the technical implementation?
At this stage, I am not 100% confident that trajectories information is enough to replay.
Dealing with non-determinism (what if environment changes?) would be tricky, but for simplicity (and I doubt we have to), let's ignore whether real observation matches the observation from the trajectory. Let's simply execute actions from the trajectory and assume everything is deterministic.
This is so cool! My first thought was that it's the same with going back in time at a random time, but it's not, right? Full replay should be possible and easier!
This is so cool! My first thought was that it's the same with going back in time at a random time, but it's not, right? Full replay should be possible and easier!
Yes! Time travel is even cooler but significantly harder. Full replay is easier to achieve and it would do what I need - to be able to replay benchmark results, and to enable end-to-end testing.
What problem or use case are you trying to solve?
Wouldn't it be cool if we can replay trajectories? This means,
Describe the UX of the solution you'd like
I don't know how it would look like on web GUI, but I think backend support would be a good first step.
Do you have thoughts on the technical implementation?
At this stage, I am not 100% confident that trajectories information is enough to replay.
Dealing with non-determinism (what if environment changes?) would be tricky, but for simplicity (and I doubt we have to), let's ignore whether real observation matches the observation from the trajectory. Let's simply execute actions from the trajectory and assume everything is deterministic.
Describe alternatives you've considered
Additional context
SWE-agent supports trajectory replay: https://github.com/SWE-agent/SWE-agent/blob/main/sweagent/run/run_replay.py
Roadmap:
The text was updated successfully, but these errors were encountered: