[Bug]: Action Repetition Issue #436

Mulcek04 · 2024-09-03T14:43:47Z

Bug 🐛

I am currently using Sinergym to set up a residential heating system simulation for simulating a typical European radiator heating system. The heating system consists of a condensing boiler with a heating valve and a radiant convective baseboard (you can see the heating system in the Openstudio screenshot I uploaded).

I've developed a Soft Actor-Critic (SAC) agent using Stable Baselines3 (SB3) based on the sinergym example file "drl.ipynb". The agent gives the only action from 10 to 70 degrees to adjust the supply hot water setpoint. The reward consists of both energy consumption and temperature violations.

However, after reviewing the log files generated during training, I noticed an issue. In some episodes, the agent's actions are repeated twice in the certain timestep. And in subsequent timesteps, an action taken at time t will only be executed at time t+2. For example, as shown in the attached image, the action [44.685577] on row 409 is conducted repeatly in the following two timesteps (rows 410 and 411).

merged_data-5-BUG.csv

Moreover, this error occurs randomly. For example, in a 30-episode SAC training run, this issue appearred in episodes 5, 6, 7, 22, and 27. Even though the currently developed SAC agent can be trained and converged, I am concerned that this “misalignment” error may cause suboptimal problems.
Can anyone help or provide some insights?

To Reproduce

Here I attach my epjson file and some code. If you need me to provide other content, please let me know.
Residential_heating_system.zip

environment='Eplus-20240716_rbc-hot-continuous-v1'
new_variables={
    'T_amb': ('Site Outdoor Air DryBulb Temperature', 'Environment'),
    'DNI':('Site Direct Solar Radiation Rate per Area', 'Environment'),
    'T_diff_upper': ('Zone Air Temperature', 'ROOMS'),
    'T_diff_lower': ('Zone Air Temperature', 'ROOMS'),
    
    'T_rooms': ('Zone Air Temperature', 'ROOMS'),
    'Baseboard_T_inlet': ('Baseboard Water Inlet Temperature', 'ZONE HVAC BASEBOARD RAD CONV WATER'),
    'Baseboard_T_outlet': ('Baseboard Water Outlet Temperature', 'ZONE HVAC BASEBOARD RAD CONV WATER'),
    'SP_T_supply':('System Node Setpoint Temperature', 'Hot Water Loop Supply Outlet Node'),
    'Pump_mass_flow':('Pump Mass Flow Rate', 'CONST SPD PUMP'),
    'Boiler_mass_flow':('Boiler Mass Flow Rate', '90.1-2019 BOILER'),
    'PLR': ('Boiler Part Load Ratio', '90.1-2019 BOILER'),
    'E_boiler[W]': ('Boiler Heating Rate', '90.1-2019 BOILER'),
    'E_boiler[J]':('Boiler Heating Energy', '90.1-2019 BOILER'),
    'E_baseboard[W]':('Baseboard Total Heating Rate', 'ZONE HVAC BASEBOARD RAD CONV WATER'),
    'E_baseboard[J]':('Baseboard Total Heating Energy', 'ZONE HVAC BASEBOARD RAD CONV WATER'),
}
new_meters = {}

from sinergym.utils.rewards import MyReward_24h
reward_kwargs={
    'temperature_variables': ['T_rooms'], 
    'energy_variables': ['E_boiler[W]'],
    'range_comfort_winter': (19, 21),
    'range_comfort_summer': (24, 26),
    'SP_int': 20.0,
    'lambda_energy': lambda_energy, 
    'lambda_temperature': lambda_temperature,       
    'energy_weight': energy_weight 
    }

new_actuators = {'T_water_outlet': ('Schedule:Year','Schedule Value', 'HOT WATER TEMPERATURE LOOP')}
new_action_space = gym.spaces.Box(
    low= np.array([10.], dtype=np.float32), 
    high=np.array([70.], dtype=np.float32),
    shape=(len(new_actuators),),#len(new_actuators),),
    dtype=np.float32
    )
extra_conf={
    'timesteps_per_hour': timesteps_per_hour,
    'runperiod':(1,12,2018, 31,12,2018)
    }

env= gym.make(environment,
            env_name=experiment_name,
            reward=MyReward_24h,
            reward_kwargs = reward_kwargs,
            weather_files= 'Juvara_1819_solarModify.epw',
            variables= new_variables,
            meters = new_meters,
            actuators= new_actuators,
            action_space= new_action_space,
            config_params= extra_conf)

env = ConvertDeltaTempWrapper(env, comfort_range = reward_kwargs['range_comfort_winter'])
env = Previous_nStep_ObservationWrapper(env, previous_variables=['T_diff_upper', 'T_diff_lower', 'T_amb'], n= 4)
env = NormalizeObservation(env)
env = NormalizeAction(env, normalize_range=(-1., 1.))
env = LoggerWrapper(env)
env = CSVLogger(env)

obs_reduction=[
  'month', 'day_of_month',
  'PLR', 'Baseboard_T_inlet', 'Baseboard_T_outlet', 'SP_T_supply', 'Pump_mass_flow', 'Boiler_mass_flow',
  'E_boiler[W]', 'E_boiler[J]', 'E_baseboard[W]', 'E_baseboard[J]', 'T_rooms', 
  ]
env = ReduceObservationWrapper(env, obs_reduction=obs_reduction)

Traceback (most recent call last): File ...

Expected behavior

In addition to the previouse error CSV file, I have also uploaded a bug-free logger CSV file. This file records the data from the 8th episode of training, where the actions (Temp_supply_water) provided by the agent at time t are correctly executed at time t+1 (SP_T_supply) without any repeated executions.
merged_data-8.csv

System Info

Describe the characteristic of your environment:

Describe how Sinergym was installed: docker
Sinergym Version: e.g. 3.5.2

Additional context

Add any other context about the problem here.

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

📝 Please, don't forget to include more labels besides bug if it is necessary.

The text was updated successfully, but these errors were encountered:

AlejandroCN7 · 2024-09-06T13:49:50Z

Hello @Mulcek04,

Sorry, but I'm currently very busy with maintenance tasks for the tool. There are a few issues similar to the one you're talking about—have you been able to find a solution? I believe it might be due to the implementation of the EnergyPlus API itself. My plan was to update it to the latest version (#430) and then continue investigating to see if I can identify the cause.

In any case, if you've found any information or something that could be useful, I'd really appreciate it :)

Mulcek04 · 2024-09-06T14:01:41Z

Hi @AlejandroCN7 ,
I found a potential solution to address the alignment problem according to the issue #416 . The way is to add a time.sleep() to the def _process_action(self, state_argument: int) function in the eplus.py file as followed:

class EnergyPlus(object):
def _process_action(self, state_argument: int) -> None:
"""EnergyPlus callback that sets output actuator value(s) from last received action.

    Args:
        state_argument (int): EnergyPlus API state
    """

    # If simulation is complete or not initialized --> do nothing
    if self.simulation_complete:
        return
    # Check system is ready (only is executed is not)
    self._init_system(self.energyplus_state)
    if not self.system_ready:
        return
    # If not value in action queue --> do nothing
    if self.act_queue.empty():
        return
    # Get next action from queue and check type
    next_action = self.act_queue.get()
    # self.logger.debug('ACTION get from queue: {}'.format(next_action))

    # Here is the modify
    while self.act_queue.empty():
        time.sleep(0.01)

    # Set the action values obtained in actuator handlers
    for i, (act_name, act_handle) in enumerate(self.actuator_handlers.items()):
        self.exchange.set_actuator_value(
            state=state_argument,
            actuator_handle=act_handle,
            actuator_value=next_action[i]
        )

I'm still a beginner in programming, hope this could help you or anyother people with similar problem.

AlejandroCN7 · 2024-09-09T14:42:30Z

Hi @Mulcek04

Thank you so much for your help! I'll run some tests with the solution you suggested, and if it works, I'll create a patch for the tool.

I'll keep this issue open until I release that version. Thanks again!

AlejandroCN7 · 2024-09-13T16:22:54Z

Hi @Mulcek04!

Starting from version 3.5.9 of Sinergym, the issue with the delay in the effect of actions should be resolved. The sleep you added to the code was correct because it prevented an extra skip when processing the actions, and it helped me identify the problem.

However, I’ve implemented a more stable solution. You can check the commits related to the simulator in the mentioned PR (#443). Essentially, the EnergyPlus execution thread is interrupted when the action is sent. Then, the reset collects the observation and waits to process the action with step(), ensuring no cycles are lost in the process. In earlier versions (starting from v3.0.0), I couldn't control the order in which actions and observations were processed in the same step, but now I can process the observation first 😄.

If you check the CSV files generated with the Logger, you'll notice that the observations have one more row than the actions (due to the reset). For any given row with the same index, you'll see the observation and info, the action taken in that state, and the reward obtained from that action. So, the action takes effect as soon as it’s sent and is reflected in the observation of the next step. I’ve run tests, and everything seems to be working well, but if I missed anything, I imagine more issues will come up.

I’d like to thank everyone who commented on the incident and helped resolve it; your support has been invaluable ❤️

Mulcek04 · 2024-09-14T17:48:54Z

Hi @AlejandroCN7 ,
Thank you for the update and solution. I appreciate the detailed explanation and I’m glad you were able to address this issue.
Thanks again for your hard work and keeping the community informed. If I meet any further issues, I’ll be sure to reach out.

Mulcek04 added the bug Something isn't working label Sep 3, 2024

Mulcek04 closed this as completed Sep 6, 2024

AlejandroCN7 reopened this Sep 6, 2024

AlejandroCN7 added the BackEnd Simulator and communication interface label Sep 6, 2024

AlejandroCN7 mentioned this issue Sep 13, 2024

(v3.5.9) - Action delay fix #443

Merged

16 tasks

AlejandroCN7 closed this as completed in #443 Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Action Repetition Issue #436

[Bug]: Action Repetition Issue #436

Mulcek04 commented Sep 3, 2024 •

edited

Loading

AlejandroCN7 commented Sep 6, 2024

Mulcek04 commented Sep 6, 2024

AlejandroCN7 commented Sep 9, 2024

AlejandroCN7 commented Sep 13, 2024

Mulcek04 commented Sep 14, 2024

[Bug]: Action Repetition Issue #436

[Bug]: Action Repetition Issue #436

Comments

Mulcek04 commented Sep 3, 2024 • edited Loading

Bug 🐛

To Reproduce

Expected behavior

System Info

Additional context

Checklist

AlejandroCN7 commented Sep 6, 2024

Mulcek04 commented Sep 6, 2024

AlejandroCN7 commented Sep 9, 2024

AlejandroCN7 commented Sep 13, 2024

Mulcek04 commented Sep 14, 2024

Mulcek04 commented Sep 3, 2024 •

edited

Loading