-
Hi there, I am trying to write a script to visualise the cart pole swing task under cart sparse policy, but I am a little bit confused and error occurs when I am coding. Can you show me a simple script to visualise the task pls? Here is my code, and the 'cart_sparse_policy' is generated by cart.py import numpy as np
import matplotlib.pyplot as plt
from ray.rllib.env.wrappers.dm_control_wrapper import DMCEnv
import sindy_rl.policy
from sindy_rl.sindy_utils import build_optimizer
from pysindy import PolynomialLibrary
import pickle
env = DMCEnv('cartpole', task_name='swingup', height=480, width=480)
env.reset()
cam_id = 0
plt.ion()
fig, ax = plt.subplots(figsize=(6, 6))
img = ax.imshow(np.zeros((480, 480, 3), dtype=np.uint8))
alpha = 1e-6
thresh = 1e-5
n_models = 20
poly_deg = 3
include_bias = False
optimizer_config = {
'base_optimizer': {
'name': 'STLSQ',
'kwargs': {
'alpha': alpha,
'threshold': thresh,
}
},
'ensemble': {
'bagging': True,
'library_ensemble': True,
'n_models': n_models,
}
}
optimizer = build_optimizer(optimizer_config)
with open('cart_sparse_policy.pkl', 'rb') as f:
cart_sparse_policy = pickle.load(f)
print(cart_sparse_policy)
# Polynomial Library
feature_library = PolynomialLibrary(degree=poly_deg, include_bias=include_bias, include_interaction=True)
n_control = env.action_space.shape[0]
SparseEnsemblePolicy = cart_sparse_policy
for episode in range(100):
print(f"Episode {episode + 1}")
obs = env.reset()
# obs = env.step(0.0 * env.action_space.sample())
print("Observation shape:", np.array(obs).shape)
done = False
total_reward = 0
while not done:
pixels = env.render(camera_id=cam_id)
print("Rendered Image Shape:", pixels.shape)
print("Pixel values range:", pixels.min(), pixels.max())
if pixels is None or pixels.size == 0:
print("Warning: Rendered image is empty!")
continue
img.set_data(pixels)
plt.draw()
plt.pause(0.01)
observation = obs['observation'] if isinstance(obs, dict) and 'observation' in obs else np.array(obs)
action = SparseEnsemblePolicy.compute_action(observation)
# action = random_policy.compute_action(observation)
step_output = env.step(action)
print(f"Step output: {step_output}")
obs = step_output[0] # observation
reward = step_output[1] # reward
done = step_output[2] # done
truncated = step_output[3] # truncated
info = step_output[4] # info
total_reward += reward
print(f"Total Reward so far: {total_reward}")
print(f"Episode {episode + 1} finished. Total Reward: {total_reward}") |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi there @Vcbby! So sorry for the delay, this flew under my radar. What error are you getting? And do you mind letting me know what OS you're running the code on (e.g. Windows, Linux, MacOS) and whether you're using Docker? I've noticed that dm_control can act kind of weird in Docker (and machines when you ssh) because they are searching for a way to access a GUI and aren't. I think there are some tricks to fixing this, but I'm less experienced with them. In theory, something like this should work. I just got this to run locally in a jupyter notebook on a M2 Macbook air with Python=3.9.13 import numpy as np
import matplotlib.pyplot as plt
from ray.rllib.env.wrappers.dm_control_wrapper import DMCEnv
import pickle
import os
# setup environment
cam_id = 0
env = DMCEnv('cartpole', task_name='swingup', height=480, width=480)
# load policy
policy_path = '/path/to/policy'
with open(policy_path, 'rb') as f:
cart_sparse_policy = pickle.load(f)
print(cart_sparse_policy)
num_steps = 1000
obs_list = []
pixel_list = []
obs = env.reset()
# obs, info = env.reset() <--- Depends on the version of ray/gymnasium you have installed
for step_idx in range(num_steps):
# query action
action = cart_sparse_policy.compute_action(obs)
# step in environment
result = env.step(action)
obs = result[0]
obs_list.append(obs)
# extract pixels to render later
pixels = env.render(camera_id=cam_id)
pixel_list.append(pixels)
# render pixels
plt.imshow(pixel_list[-1])
plt.show() |
Beta Was this translation helpful? Give feedback.
@Vcbby — glad you were able to solve this! I can't claim to be a roboticist, so I'm afraid I'm not going to be very helpful here. MuJoCo is just a physics engine, but people have been using it to create custom robotics models (which I think can be defined using XML?) and propagate the physics/constraints within it. It looks like there have been previous attempts at building cassie models in MuJoCo [1,2].
gymnasium
on the other hand, is just a convenient API for wrapping simulators and is commonly accepted for many of the RL packages out there—I tried to make my code compliant with this. It looks like [1] tried to do this for cassie wrapping the MuJoCo environment. I believe thatdm_control
…