-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add repair step to repair mismatch filecache paths #28253
Conversation
additional stuff to try to see if the breakage looks the same:
|
Just tried that, moving file from sharer1's storage to sharer2's storage through received shares. This creates a cross-storage situation. Basically the broken entry "folderA/one/two" is still on the first sharer's storage and the rest is on the second sharer's storage:
|
deleting "folderB/one" preserves the wrong path, but now its parent is in trash:
|
cd73a0c
to
573ac39
Compare
Adjusted to also fix the storage value. And now I realize that there could be rare cases where the path value is still correct but the storage value is wrong... Ah... Yet another case. But for the sake of completeness:
|
Uh oh, and what if the target "name" is not the same as the original ? 😢
|
5737bb2
to
36d30e9
Compare
Also note that it is not possible to create "fodlerA/one/two" even though it doesn't appear in the listing. |
|
My mega test case:
|
Possible solutions:
or
|
36d30e9
to
728e855
Compare
What a night mare, there's even more to it: if a filecache entry with the target name already exist, we delete it. But now if that entry had child entries that were created after the breakage, deleting it would orphan these child entries. So after deleting we need to reparent the child entries to attach them to the repaired entry.
My worry here is that this is yet another can of worms and that the subtree reparenting has some hidden special cases. |
|
We likely already have some code somewhere to reparent child entries, but that code should not be used within repairs steps so we'll likely have to do manually by doing several DB commands |
Yet another alternative repair solution would be to keep the wrong path but adjust the parent id to point to the correct one. In this case it would be like cancelling the move operation. But it would also require moving back the parent entry that was moved to the target, move it back to the old location and set its path, with a slight risk that someone might have recreated the missing entry. Now this could also be problemative if someone already worked a lot with the partially moved target and added new entries there. So maybe instead of moving the target entry back we just recreate another new parent entry to fill the gap. This way the already moved metadata is in the new folder and the old subdir metadata is on the old folder. Then user interaction is required to fix this: the user needs to decide what to do with the data, whether to reattempt the move or move everything back. This approach sounds much less risky than the one in this PR. |
arghh wait, the above is only about file cache stuff. The actual physical data is already on the target location. So this would require also moving the files physically, which can be tricky from within a repair step as we need to use the |
Forget about the alternative, I think I have a plan that could work out: Let's illustrate what is happen currently, at some point we get into this broken situation:
Here I do a This results in this situation which I expect might exist for some people already who continued to use the broken folder:
At this point we could run into a conflict when moving "folderA/one/two" to "folderB/one/two" for repair purposes. The repair routine in this PR already deletes the target. This means that for every former "folderA/**" entries we can move them to the targets with the current code by repeatingly running the repair step until done. (the repeating is an issue, see #28253 (comment)). One there is nothing else to repair the filecache looks like this:
If you look closely you'll notice that the fileid of "folderB/one/two" is the same as the "folderA/one/two" we had before. So here the repair was successful. Now since we added a new folder "x" inside there, that one is still pointing to the old parent fileid (37). This one was deleted when we replaced "folderB/one/two". Instead of trying to manually reparent the whole tree, I suggest the following approach: Write an additional repair step (which will be useful anyway!) that finds all entries that have a missing parent:
Then, using the The above repair needs to be repeated until there are no more unrepairable entries with missing parent.
|
Select all entries where the parent points to a non-existing entry but where an entry exist with the entry's path select storage,fileid,parent,path,etag from oc_filecache fc where parent <> -1 and not exists (select 1 from oc_filecache fc2 where fc2.fileid = fc.parent) and exists (select fc3.path from oc_filecache fc3 where fc3.storage = fc.storage and fc3.path = substring(fc.path, 1, length(fc.path) - length(substring_index(fc.path, '/', -1)) - 1)) order by path;
+---------+--------+--------+-------------------------+---------------+
| storage | fileid | parent | path | etag |
+---------+--------+--------+-------------------------+---------------+
| 3 | 39 | 37 | files/folderB/one/two/x | 59562454a5133 |
+---------+--------+--------+-------------------------+---------------+ |
if the above turns out to not work cross-DB, fall back to simply iterating over all found entries and keep a blacklist of file ids of non-repairable entries to exclude in subsequent queries. |
alternative query that uses a join and provides the id of the parent to correct it to: select fc.storage,fc.fileid,fc.parent as "wrongparent",fc3.fileid "correctparent",fc.path,fc.etag from oc_filecache fc, oc_filecache fc3 where fc.parent <> -1 and fc.parent != fc3.fileid and fc3.storage=fc.storage and fc3.path = substring(fc.path, 1, length(fc.path) - length(substring_index(fc.path, '/', -1)) - 1) and not exists (select 1 from oc_filecache fc2 where fc2.fileid = fc.parent) order by path;
+---------+--------+-------------+---------------+-------------------------+---------------+
| storage | fileid | wrongparent | correctparent | path | etag |
+---------+--------+-------------+---------------+-------------------------+---------------+
| 3 | 39 | 37 | 27 | files/folderB/one/two/x | 59562454a5133 |
+---------+--------+-------------+---------------+-------------------------+---------------+
1 row in set (0.00 sec) I'm not confident to remove the check whether the parent entry doesn't exist because removing it could return entries that are broken differently: parent id pointing to a different entry. |
|
602dc84
to
bd952f8
Compare
The current approach already works now, which is great. But at least we know the general approach works.
|
@tomneedham stable10 backport please |
@PVince81 Could you please let me know how I can apply this fix on my production 9.0.10? |
@socrat3000 you can't, the code is not compatible and would need porting and careful retesting. Anything preventing you to upgrade to 10.0.3 ? This fix will be in 10.0.4 |
@PVince81 Is there any way I could apply a workaround at least for now till we upgrade? |
@socrat3000 if you only have one affected folder, look at "oc_filecache" entry for this folder and check that all parents properly connect. The "parent" column should point at the "fileid" of the actual parent based on the "path" value. You need to manually adjust the "path" column values to match the hierarchy from parent<->fileid relationship. |
Can you help with the query for this?
Kind Regards / مع تحياتي
أحمد عمرو
Ahmad Amr
…On Wed, Sep 20, 2017 at 3:46 PM, Vincent Petry ***@***.***> wrote:
@socrat3000 <https://github.com/socrat3000> if you only have one affected
folder, look at "oc_filecache" entry for this folder and check that all
parents properly connect. The "parent" column should point at the "fileid"
of the actual parent based on the "path" value. You need to manually adjust
the "path" column values to match the hierarchy from parent<->fileid
relationship.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28253 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAVcV7o1XCy4X9xvdofn1MZezzjTO5Crks5skQkLgaJpZM4OIP79>
.
|
@PVince81 I have now upgraded to 10.0.3, is there any way I could get that fixed before 10.0.4 is realeased? |
@socrat3000 just wait for @tomneedham to finish the stable10 backport/PR then you could apply it to your instance and run "occ files:scan --all --repair" |
@socrat3000 See #29074 |
Added doc ticket for future reference to the new commands: owncloud-archive/documentation#3446 |
Hello there, got the problem on Owncloud 9.1.5, the option occ files:scan --repair is not available. Should I wait for the next version ? Thanks. |
The backports for OC 9.1 and 9.0 are still work in progress and should hopefully be available in the next versions. OC 10.0.4 will have the command too once released. |
Hi @PVince81, sudo -u apache path to/occ files:scan --all --repairAnswer this: Where am I wrong? Thanks |
@enc98 try updating to the latest 9.1.* version first |
There is no other way, sorry. you need access to command line because this repair is an expensive operation that cannot be run from a regular button in the web UI, mostly because it would likely run into PHP timeouts, especially on shared hosters which have low timeout values. |
Ok. With version 10.0.6 installed, is it possible that the problem occurs again? Or did you manage to repair its cause? |
The cause itself is definitely repaired since a while, see #28018 |
Problem solved. Although no offical owncloud hosting partner, my provider offered a solution, using shell script. PHP timeout was no problem in this case. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Description
Whenever a parent-child relationship describes a specific path but the
entry's actual path is different, this new repair step will adjust it.
In the event where another entry already exists under that name, the
latter is deleted and replaced by the above entry. This should help
preserve the metadata associated with the original entry.
This kind of situation can be create through the bug from #28018
This is a better fix than #28217
The cases that this repair step fixes:
parent = fileid
: not reproducible at all, was reported by @butonic as seen in the wild (clustered env). I added it because such entries would have caused an infinite loop in the repair step which is bad.This repair step does not:
Here are the queries that are run to find broken entries, could be used to pre-check an instance:
Related Issue
Fixes the fallout created by #28018
Motivation and Context
Because we hate filecache inconsistencies.
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: