-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Action Repetition Issue #436
Comments
Hello @Mulcek04, Sorry, but I'm currently very busy with maintenance tasks for the tool. There are a few issues similar to the one you're talking about—have you been able to find a solution? I believe it might be due to the implementation of the EnergyPlus API itself. My plan was to update it to the latest version (#430) and then continue investigating to see if I can identify the cause. In any case, if you've found any information or something that could be useful, I'd really appreciate it :) |
Hi @AlejandroCN7 , class EnergyPlus(object):
I'm still a beginner in programming, hope this could help you or anyother people with similar problem. |
Hi @Mulcek04 Thank you so much for your help! I'll run some tests with the solution you suggested, and if it works, I'll create a patch for the tool. I'll keep this issue open until I release that version. Thanks again! |
Hi @Mulcek04! Starting from version 3.5.9 of Sinergym, the issue with the delay in the effect of actions should be resolved. The sleep you added to the code was correct because it prevented an extra skip when processing the actions, and it helped me identify the problem. However, I’ve implemented a more stable solution. You can check the commits related to the simulator in the mentioned PR (#443). Essentially, the EnergyPlus execution thread is interrupted when the action is sent. Then, the reset collects the observation and waits to process the action with step(), ensuring no cycles are lost in the process. In earlier versions (starting from v3.0.0), I couldn't control the order in which actions and observations were processed in the same step, but now I can process the observation first 😄. If you check the CSV files generated with the Logger, you'll notice that the observations have one more row than the actions (due to the reset). For any given row with the same index, you'll see the observation and info, the action taken in that state, and the reward obtained from that action. So, the action takes effect as soon as it’s sent and is reflected in the observation of the next step. I’ve run tests, and everything seems to be working well, but if I missed anything, I imagine more issues will come up. I’d like to thank everyone who commented on the incident and helped resolve it; your support has been invaluable ❤️ |
Hi @AlejandroCN7 , |
Bug 🐛
I am currently using Sinergym to set up a residential heating system simulation for simulating a typical European radiator heating system. The heating system consists of a condensing boiler with a heating valve and a radiant convective baseboard (you can see the heating system in the Openstudio screenshot I uploaded).
I've developed a Soft Actor-Critic (SAC) agent using Stable Baselines3 (SB3) based on the sinergym example file "drl.ipynb". The agent gives the only action from 10 to 70 degrees to adjust the supply hot water setpoint. The reward consists of both energy consumption and temperature violations.
However, after reviewing the log files generated during training, I noticed an issue. In some episodes, the agent's actions are repeated twice in the certain timestep. And in subsequent timesteps, an action taken at time t will only be executed at time t+2. For example, as shown in the attached image, the action [44.685577] on row 409 is conducted repeatly in the following two timesteps (rows 410 and 411).
merged_data-5-BUG.csv
Moreover, this error occurs randomly. For example, in a 30-episode SAC training run, this issue appearred in episodes 5, 6, 7, 22, and 27. Even though the currently developed SAC agent can be trained and converged, I am concerned that this “misalignment” error may cause suboptimal problems.
Can anyone help or provide some insights?
To Reproduce
Here I attach my epjson file and some code. If you need me to provide other content, please let me know.
Residential_heating_system.zip
Expected behavior
In addition to the previouse error CSV file, I have also uploaded a bug-free logger CSV file. This file records the data from the 8th episode of training, where the actions (Temp_supply_water) provided by the agent at time t are correctly executed at time t+1 (SP_T_supply) without any repeated executions.
merged_data-8.csv
System Info
Describe the characteristic of your environment:
Additional context
Add any other context about the problem here.
Checklist
📝 Please, don't forget to include more labels besides
bug
if it is necessary.The text was updated successfully, but these errors were encountered: