Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Trajectory Replay #6049

Open
2 of 4 tasks
li-boxuan opened this issue Jan 5, 2025 · 2 comments
Open
2 of 4 tasks

[Feature] Trajectory Replay #6049

li-boxuan opened this issue Jan 5, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@li-boxuan
Copy link
Collaborator

li-boxuan commented Jan 5, 2025

What problem or use case are you trying to solve?

Wouldn't it be cool if we can replay trajectories? This means,

  1. Experiment results are easily reproducible (since evaluation harness usually provides clean initial states)
  2. Easier to record demo videos
  3. Integration testing
  4. Make debugging scenarios easier to reproduce

Describe the UX of the solution you'd like

I don't know how it would look like on web GUI, but I think backend support would be a good first step.

Do you have thoughts on the technical implementation?

At this stage, I am not 100% confident that trajectories information is enough to replay.
Dealing with non-determinism (what if environment changes?) would be tricky, but for simplicity (and I doubt we have to), let's ignore whether real observation matches the observation from the trajectory. Let's simply execute actions from the trajectory and assume everything is deterministic.

Describe alternatives you've considered

Additional context

SWE-agent supports trajectory replay: https://github.com/SWE-agent/SWE-agent/blob/main/sweagent/run/run_replay.py

Roadmap:

  • Support replay in headless mode
  • Support trajectory dump in GUI mode
  • Support trajectory replay in GUI mode
  • Add E2E tests for replay in headless mode
@li-boxuan li-boxuan added the enhancement New feature or request label Jan 5, 2025
@li-boxuan li-boxuan changed the title Trajectory Replay [Feature] Trajectory Replay Jan 5, 2025
@li-boxuan li-boxuan self-assigned this Jan 6, 2025
@enyst
Copy link
Collaborator

enyst commented Jan 10, 2025

This is so cool! My first thought was that it's the same with going back in time at a random time, but it's not, right? Full replay should be possible and easier!

@li-boxuan
Copy link
Collaborator Author

This is so cool! My first thought was that it's the same with going back in time at a random time, but it's not, right? Full replay should be possible and easier!

Yes! Time travel is even cooler but significantly harder. Full replay is easier to achieve and it would do what I need - to be able to replay benchmark results, and to enable end-to-end testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants