Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workorder-detail-fix: Enforce order item details for buggy job types #929

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

flashy-man
Copy link

The fix: An 'eventful' handler that checks each job after it's dispatched to a workshop, correcting the details of any job items that have been altered from its parent order's.

The bug: For certain work orders that have been changed via gui/job-details (or other means), jobs created from the order will revert the changes. This is because a job_item's type doesn't exactly match what is expected for the job. Particularly, the "any" type will pedantically only match other "any" types, so for jobs like PrepareMeal, a "plant" or "liquid" type job_item will trigger a revert when it's dispatched to the workshop. ("plant" =/= "any", technically)

With that fixed, it's easier to order lavish meals of meticulous selection, or silk images sewn specifically on hemp ropes.

One possible issue: Eventful frequency is set to max (every tick) for job initiated event, which will affect every other script using it.

Before, I had it set up as a 100-tick periodic script, but there was some hassle with active jobs. Eventful's JOB_INITIATED check needs to run frequently enough to ensure that the job is caught before anyone starts gathering items. The items vector can be cleared if caught in time, just not sure if it's wise. I had to go in and manually un-task the cleared job items, and who knows what else could go wrong. Doing it just every 10 ticks might be fine as long as fastdwarf isnt on.

@TymurGubayev
Copy link
Contributor

Btw, the pre-commit checks are failing because of trailing whitespace, mixed line ending, and something about the end of file.

Copy link
Member

@myk002 myk002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the code yet, but here are some higher-order notes:

  • script should be in the fix/ subdir: e.g. fix/workorder-details
  • If this is useful to run for all players all the time, it should be added as an entry in the control panel registry and enabled by default
  • you'll need to implement the enabled API

@myk002
Copy link
Member

myk002 commented Jan 7, 2024

Ok, I finally got a chance to look at this properly.

So, for my context, is this a problem that only comes up when using gui/job-details, or can this happen with purely vanilla UI actions? Did you get gui/job-details from #799 ?

Is the code in this PR required for gui/job-details to function correctly? If so, maybe this should be part of that tool and be automatically engaged instead of being a separate "fix' command.

@TymurGubayev what do you think about this?

@flashy-man
Copy link
Author

flashy-man commented Jan 7, 2024

Yes the gui/job-details from #799 is the only tool I know to directly edit the order details like this, and this fix is required for the full expected functionality there. Shouldn't affect vanilla users at all.

However gui/job-details might not be the only tool benefiting in the long run. In theory it should also affect work orders imported via the orders plugin, though my testing says that orders currently destroys item details on export. workorder does the same.

As things are now it's only benefiting gui/job-details, but I can see that changing if any tool wants to get a hair more specific with work orders.

@TymurGubayev
Copy link
Contributor

The workorder script behaving similarly to the orders plugin is kind of as expected because it's basically just a part of the plugin translated into Lua, plus minor adjustments about exactly how the imported manager orders are handled.

It'd be good to adjust both of them before the job-details PR is merged, otherwise, players are in for nasty surprises.

As for the strategy about when should this fix run: since there are 3 different sources of possible problems (orders plugin, workorder, and job-details scripts), we need to either always have it on; or periodically check the manager orders for problematic things and then turn the main fix on (is there maybe a hook we can use, f.e. when new manager order is created or the manager is about to perform their job?); or we adjust the 3 sources to make them enable this fix in some way when necessary (maybe set and store a flag in save data).

@flashy-man
Copy link
Author

flashy-man commented Jan 7, 2024

periodically check the manager orders for problematic things and then turn the main fix on

I like this one. Then it's not running a bunch of handlers when it doesn't need to, and no other scripts need to know about it*. In lieu of some appropriate hook, I could change the PR to be a periodic script like:

  • Load save, no handlers on yet
  • Check periodically for a funky order in orders list
  • Fix handler is enabled for the rest of the session if so (we don't need to check for funky orders anymore)

And then make that make sense with enabled API

*It's kind of a tough call on whether even the order checker should be on by default. Realistically only those who tamper with work orders will need to worry about this bug. Since only a few scripts mess with work orders right now, maybe it's best to let those scripts turn it on.

I'd maybe want to check on init when the fort is loaded. That way it could catch orders that have been modified previously without having to rely on save data. I'm fairly sure I can avoid any false positives and activating unnecessarily. But either way

@flashy-man
Copy link
Author

flashy-man commented Jan 9, 2024

As outlined, now it checks the order list first before arming the handler. Still to do:

  • Do some structured testing, auto-generate lots of orders, run checker every tick, etc, to weed out any missed edge cases or errors.

  • Think about checking the orders list once on fort init, even if we decide the fix shouldn't be enabled by default. It should be trivial (just add it as a recommended/default init somewhere in dfhack config.) It's just a matter of whether you guys think it's warranted. I've been annoyed by this bug for a while, so of course I think it is.

  • Public "check order list once" module function so other tools can check explicitly after editing/importing orders.

  • This could be a full daemon that auto-disarms the handlers when not needed. But with no way (yet) to undo the side-effects to the eventful handler frequency, it wouldn't make me feel much better.

Some semi-relevant ramblings i took out of my comments: (click me)

Only SewImage orders have the item-to-improve in the first slot of their items list. SewImage also isn't affected by Bug#10092 Whereas all other improvement jobs are. Coincidence? Perhaps not.

I looked deep into that particular bug and got stuck looking for the source of "some stack variable" that appeared to come in way before the manager order checker process touched it. Would make a lot of sense if it's related to the index of the improved item. The other thing is that the bugged improvement jobs' `item_type` gets mangled to a large 2-byte number, and neither of the bytes are familiar to me.

One thing that worries me about these bugs is, these checks could just be redundant and safe, or they could be guarding an actual issue with how the item details get filled. Limiting the scope of my fixes to just a few job types makes me feel better about it.

I do have a half-fix for 10092, however it isn't able to complete periodic orders yet. If there's not an "issue" for it yet I may raise one (along with others that came up in this PR)

@ab9rf
Copy link
Member

ab9rf commented Jan 9, 2024

i'm not thrilled with the idea of a script traversing the job list with each tick, might want to do some load testing to see what the performance impact is

@myk002
Copy link
Member

myk002 commented Jan 9, 2024

i'm not thrilled with the idea of a script traversing the job list with each tick, might want to do some load testing to see what the performance impact is

at least the iteration is done in cpp and not here. On the other hand, this will be a 5x increase over prioritize, which sets the period to 5 ticks for JOB_INITIATED.

I want to think about this a little more to see if there is a way to solve the problems in the source tools instead of as a periodic fixup. Alternately, is there behavior that could be merged into vanilla when creating the jobs? If we can identify legitimately "wrong" behavior, Putnam might make a vanilla fix.

@TymurGubayev
Copy link
Contributor

TymurGubayev commented Jan 9, 2024

If we can identify legitimately "wrong" behavior, Putnam might make a vanilla fix.

well, some job types are spawned with adjusted details, while others aren't, so IMHO it is a bug, even though it's not possible in vanilla.

<deleted, I'm stupid>

@TymurGubayev
Copy link
Contributor

I think there are 3 possibilities

  • this gets recognized as a bug and is fixed in vanilla;
  • this fix is run every tick - since it only affects new jobs, maybe it's not that bad?
  • this fix is run less frequently, in which case it needs to deal with started item gathering, which gets us back to

The items vector can be cleared if caught in time, just not sure if it's wise. I had to go in and manually un-task the cleared job items, and who knows what else could go wrong.

I think the manual part here can be automated, and then maybe it'll just work?

Another possibility to solve the cleaning up is to cancel all related jobs, including the main one, and create a new one instead with the fix already applied.

@flashy-man
Copy link
Author

flashy-man commented Jan 9, 2024

I saw the word 'hook' near onJobInitiated event in eventful.cpp, so i hoped that it traverses a list of events collected via hooks, rather than traversing the whole job list. The only performance check i did was handler CPU time, which i can scarcely measure. (Better performance testing is on my list, also trying to understand eventful.cpp)

Also a period of 5-10 ticks could be just fine for this fix to be ~99% effective. I'm probably thinking too much of the odd dwarf who happens to be near the workshop at just the wrong time. He'd have to pick up the new job, start working on it, and get to the stage where the actual items are designated from job.job_items, all within X ticks. Worst case scenario here is that the wrong items are gathered, because someone probably had fastdwarf on, and the script has to have some spaghetti code to fix it. (or just leave it alone. We tried eh?)

Re: Editing the affected source tools instead. I've looked hard for ways to beat this bug at the source on order creation. I couldn't find any way to make the item types pass properly from the order, other than the binpatching I was originally doing. There's an instruction in the order-job-dispatch process that really does not want to let any item type pass other than -1 for the 'bugged' jobs. If it would consider job_item.vector_id like dfhack.job.isSuitableItem does, and also consider that the "any" type should match any type, it would probably be fine.

As far as asking for a vanilla fix: @TymurGubayev said it, it's basically almost a vanilla bug. Maybe if we ask really nicely. We just want to make our nobles happy right?

so, TODO: Test for handler triggering too late at 5-10 ticks. Get a nice lua performance test. (Maybe) workshop persuasive rant about how players need highly specific work orders

@ab9rf
Copy link
Member

ab9rf commented Jan 10, 2024

I saw the word 'hook' near onJobInitiated event in eventful.cpp, so i hoped that it traverses a list of events collected via hooks, rather than traversing the whole job list.

unfortunately the only way we have to detect new jobs is... to traverse the job list and look for any jobs that weren't on it the last time we traversed it. (see https://github.com/DFHack/dfhack/blob/180fd34a96e7ef2bbdb77c2a172ba43ade182464/library/modules/EventManager.cpp#L435C1-L463) so subscribing the "job initialized" hook forces the core event system to do a job list traversal at the frequency requested.

@flashy-man
Copy link
Author

flashy-man commented Jan 10, 2024

Hmmm. Triggering on every job init isn't ideal for this anyway. It'd be nice if there was an event hook on the order-job-dispatch process, to only catch those jobs created by the manager. But let's not go there ...

I'm going to look into this idea from @TymurGubayev:

(is there maybe a hook we can use, f.e. when new manager order is created or the manager is about to perform their job?)

Only thing: The job-order-dispatch process has nothing to do with manager's activities. It seems to just happen every 1650 ticks exactly (i totally knew this and i didnt write it down :')*

df.global.cur_year_tick / dfhack.getTickCount()
order dispatch tick: 106530, 621526015
order dispatch tick: 108180, 621542515 +(1650, 16500)
order dispatch tick: 109830, 621559015 +(1650, 16500)
order dispatch tick: 111480, 621575515 +(1650, 16500)

Instead of worrying about events on every job, it should be fine to scan the job list then (even in Lua, every 1650 ticks is not bad right?). I think the schedule formula is (cur_year_tick + 720) % 1650 == 0, but i'll have to test it on a longer game/more saves

If we go this route, workorder-job-repeater may have to be changed (since i think the point is to bypass this dispatch event) nvm it just copies the old job... but maybe it shouldnt in case the order has been changed? graagh

Maybe we should ask/see if there are any planned vanilla updates to this system any time soon

@flashy-man
Copy link
Author

flashy-man commented Jan 13, 2024

Got the manager schedule (thanks to whoever found manager_timer).
It was really A) manager_timer == 0 and tick % 150 == 30. Timer decrements on the latter condition, starting at 10 until it reaches 0 (then it dispatches the jobs, and resets.) Ends up so it goes 11 times between dispatches. (11 * 150 = 1650)

Now the fix code is different. Still needs more testing, but I went through some different saves to make sure i could sync with all the manager schedules. (it even works on 47.05). Here is the flow:

  1. Enable: determine next tick of manager job dispatch & schedule the handler for that tick

    next_dispatch = tick + (-(tick - 30) % 150) + manager_timer * 150
  2. Handler: runs every 150 ticks. Only on conditions A)***, B) manager exists and C) bugged orders exist, will it check the job list for new jobs, and correct them on the same tick they were dispatched. So order-traversing once every 1650 ticks, and only job-traversing then if we find a bugged order.

*** A) is at the top. Also, the script runs after the manager process, so the condition is slightly different. Not sure about sub-tick timing yet but it seems Lua scripts always run after game code

  1. Very small fix for the active job issue. It still happens on that first run.
    (i didnt test this at all aside from having done it several times in gm-editor)
    -- experimental: remove the items and un-task them
    --while job.items[0] do -- bad idea
    while #job.items ~= 0 do
        job.items[0].item.flags.in_job = false
        job.items:erase(0)
    end
    

Still a periodic script technically, but it's better. Since the main concern is doing less to fix it, i'll make an overdue write-up of the bug with some snapshots of the code. Could be something I missed, but I couldn't find any way to bypass that filtering process in the job dispatch.

binpatching was best for my personal use, but with all these OS and steam versions and blah blah, probably not the way to go.

(commits incoming)

@myk002
Copy link
Member

myk002 commented Jan 17, 2024

-- experimental: remove the items and un-task them
--while job.items[0] do -- bad idea
while #job.items ~= 0 do
    job.items[0].item.flags.in_job = false
    job.items:erase(0)
end

In cpp we have Job::disconnectJobItem(), which is more thorough than this code (e.g. it removes specific refs on the item). it's not exposed to Lua yet, but it could be exposed and used by this script, if needed.

internal/control-panel/registry.lua Outdated Show resolved Hide resolved
docs/fix/workorder-detail-fix.rst Outdated Show resolved Hide resolved
docs/fix/workorder-detail-fix.rst Outdated Show resolved Hide resolved
fix/workorder-detail-fix.lua Outdated Show resolved Hide resolved
fix/workorder-detail-fix.lua Outdated Show resolved Hide resolved
fix/workorder-detail-fix.lua Outdated Show resolved Hide resolved
fix/workorder-detail-fix.lua Outdated Show resolved Hide resolved
fix/workorder-detail-fix.lua Outdated Show resolved Hide resolved
fix/workorder-detail-fix.lua Outdated Show resolved Hide resolved
fix/workorder-detail-fix.lua Outdated Show resolved Hide resolved
@flashy-man
Copy link
Author

Woohoo review-- Main things:

  1. New name: fix/workorder-details

  2. Now re-syncing when dispatch tick is lost (only happens using timestream that I have seen)

  3. No more trying to fix already-gathered items. It was crashing either due to the vector checking while loop, or not clearing item refs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants