Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit engine Markov chain actions and transition rewards #247

Merged
merged 13 commits into from
Aug 8, 2024

Conversation

davexparker
Copy link
Member

Add support for Markov chain actions and transition rewards to explicit engine.

Includes significant refactoring/simplification of reward storage in explicit engine.

  • Push more into Rewards/RewardsExplicit, and move away from the distinction between MCRewards and MDPRewards

  • All Rewards objects now define getStateReward/getTransitionReward. The meaning of getTransitionReward(s,i) differs depending on the model type, i indexes a choice for a NondetModel, a transition for a Markov chain.

  • RewardsExplicit, the base implementation for all mutable reward classes, now includes setStateReward and setTransitionReward, and also gives a default implementation of addToStateReward/addToTransitionReward. So, code to build/edit Rewards can be unaware of the underlying class.

  • RewardsSimple becomes the default "Simple" class for reward storage. It can store both state/transition rewards, but does so in a sparse fashion, with no storage costs if one or the other is missing.

  • Existing code should mostly still work unmodified, but should consider using Rewards instead of MCRewards/MDPRewards and moving to the use of RewardsSimple over other *Simple reward implementations.

  • The interfaces MCRewards, MDPRewards, STPGRewards are retained but should no longer be needed - Rewards can be used instead. Similarly, implementations MDPRewardsSimple and STPGRewardsSimple are kept, but could/should be replaced with RewardsSimple.

Also includes refactor of DTMCSimple implementation.

Replace with custom storage of successor states/probabilities.
In particular, this means that the order of transitions is fxied.
This will ease the addition of actions and transition rewards.

Also remove tracking of transition count (not really needed).
Still not enabled for .tra regression tests, to align with symbolic engines.
- Update MCRewards interface
- New RewardsSimple class that supports state/trans rewards
- Update code for constructing and exporting rewards
- Model checker (mostly) collapses to state rewards before checking
  (i.e. adds expected transition reward to state reward)
In short: we push more into Rewards/RewardsExplicit, and move away
from the distinction between MCRewards and MDPRewards. Existing
code should mostly still work unmodified, but should consider
using Rewards instead of MCRewards/MDPRewards and moving to the
use of RewardsSimple over other *Simple reward implementations.

All Rewards objects now define getStateReward/getTransitionReward.
A default implementation just returns zero. The meaning of
getTransitionReward(s,i) differs depending on the model type,
i indexes a choice for a NondetModel, a transition for a Markov chain.

RewardsExplicit, the base implementation for all mutable reward classes,
now includes setStateReward and setTransitionReward, and also gives
a default implementation of addToStateReward/addToTransitionReward.
So, code to build/edit Rewards can be unaware of the underlying class.

RewardsSimple becomes the default "Simple" class for reward storage.
It can store both state/transition rewards, but does so in a sparse
fashion, with no storage costs if one or the other is missing.

ConstructRewards is refactored accordingly, with one methods that
works for any model (MC, MDP, or otherwise).

The interfaces MCRewards, MDPRewards, STPGRewards are retained but
should no longer be needed - Rewards can be used instead. Similarly,
implementations MDPRewardsSimple and STPGRewardsSimple are kept,
but could/should be replaced with RewardsSimple.

A few other changes may require alteration of calling code.
The StateRewards abstract class is removed, and the methods
deepCopy() and buildMDPRewards() in STPGReward are removed since
no longer needed.
This means that rewards for LTSs can be exported too,
so we add an additional regression test for this.
- Align symbolic/export tests as much as possible.
- Also stick to defaults wherever possible
  (include reward headers, don't change precision)
@davexparker davexparker merged commit 1de42fa into prismmodelchecker:master Aug 8, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant