Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add repair step to repair mismatch filecache paths #28253

Merged
merged 13 commits into from
Sep 20, 2017

Conversation

PVince81
Copy link
Contributor

@PVince81 PVince81 commented Jun 28, 2017

Description

Whenever a parent-child relationship describes a specific path but the
entry's actual path is different, this new repair step will adjust it.

In the event where another entry already exists under that name, the
latter is deleted and replaced by the above entry. This should help
preserve the metadata associated with the original entry.

This kind of situation can be create through the bug from #28018

This is a better fix than #28217

The cases that this repair step fixes:

  • current path doesn't match the one from the parent on same storage: reproducible before Fix cross-storage move info when moving between two received shares #28022 by moving a file between two received shares from the same user or by moving a file out of a received share.
  • current path doesn't match the one from the parent on a different storage: reproducible before Fix cross-storage move info when moving between two received shares #28022 by moving a file between two received shares where each share comes from a different user (two storages)
  • parentid pointing to a non-existing entry: not reproducible in the wild, however the first phase of the repair step will create such entries which the second phase fixes. I assume that there might be such cases existing out there so this would fix them too.
  • parent = fileid: not reproducible at all, was reported by @butonic as seen in the wild (clustered env). I added it because such entries would have caused an infinite loop in the repair step which is bad.

This repair step does not:

  • repair mismatch path_hashes
  • mime types

Here are the queries that are run to find broken entries, could be used to pre-check an instance:

select fc.storage,fc.fileid,fc.parent as "wrongparent",fc3.fileid "correctparent",fc.path,fc.etag from oc_filecache fc, oc_filecache fc3 where fc.parent <> -1 and fc.parent != fc3.fileid and fc3.storage=fc.storage and fc3.path = substring(fc.path, 1, length(fc.path) - length(substring_index(fc.path, '/', -1)) - 1) and not exists (select 1 from oc_filecache fc2 where fc2.fileid = fc.parent) order by path;
SELECT fc.`storage`, fc.`fileid`, fc.`path` as `path`, fc.`name`, fcp.`storage` as `parentstorage`, fcp.`path` as `parentpath`     FROM `oc_filecache` fc, `oc_filecache` fcp     WHERE (fc.`parent` = fcp.`fileid`)     AND (         (CONCAT(fcp.`path`, '/', fc.`name`) <> fc.`path`)         OR (fc.`storage` <> fcp.`storage`)     )     AND (fcp.`path` <> '')     AND (fc.`fileid` <> fcp.`fileid`);
select fc.storage,fc.fileid,fc.parent as "wrongparent",fc.path,fc.etag from oc_filecache fc where fc.parent <> -1 and not exists (select 1 from oc_filecache fc2 where fc2.fileid = fc.parent) order by path;
select * from oc_filecache where parent=fileid;

Related Issue

Fixes the fallout created by #28018

Motivation and Context

Because we hate filecache inconsistencies.

How Has This Been Tested?

  • TEST: Unit test (some pending)
  • TEST: manual test

⚠️ Use the test steps from here: #28253 (comment)

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@PVince81
Copy link
Contributor Author

PVince81 commented Jun 28, 2017

additional stuff to try to see if the breakage looks the same:

@PVince81
Copy link
Contributor Author

PVince81 commented Jun 28, 2017

also try reproducing the breakage by moving through two received shares but not from the same user

Just tried that, moving file from sharer1's storage to sharer2's storage through received shares. This creates a cross-storage situation. Basically the broken entry "folderA/one/two" is still on the first sharer's storage and the rest is on the second sharer's storage:

+---------+--------+--------+-----------------------+----------------------------------+
| storage | fileid | parent | path                  | etag                             |
+---------+--------+--------+-----------------------+----------------------------------+
|       3 |     31 |     25 | files/folderA         | 5953e2eadd88a                    |
|       3 |     46 |     45 | files/folderA/one/two | 5953e297167fa                    |
|       3 |     26 |     25 | files/welcome.txt     | 5db5bea08897a6709d234df3ccbb000f |
|       4 |     40 |     34 | files/folderB         | 5953e2eadedad                    |
|       4 |     45 |     40 | files/folderB/one     | 5953e29719c40                    |
|       4 |     35 |     34 | files/welcome.txt     | a05b5f63ba63bb154b29d0538491009f |
+---------+--------+--------+-----------------------+----------------------------------+
  • TODO: adjust repair step to adjust storage value: if the parent of a child is on another storage, then set the child's storage value to the same as the parent

@PVince81
Copy link
Contributor Author

PVince81 commented Jun 28, 2017

what happens if someone deletes the bogus entries, do they land broken in trash ? might need to repair that too, or at least discard

deleting "folderB/one" preserves the wrong path, but now its parent is in trash:

+--------+-----------------------+------+--------+--------------------------------------+
| fileid | path                  | name | fileid | path                                 |
+--------+-----------------------+------+--------+--------------------------------------+
|     31 | files/folderA/one/two | two  |     30 | files_trashbin/files/one.d1498670525 |
+--------+-----------------------+------+--------+--------------------------------------+
  • make sure repair step can also repair when entries are in trash

@PVince81 PVince81 force-pushed the repair-filecache-mismatch-path branch from cd73a0c to 573ac39 Compare June 28, 2017 18:59
@PVince81
Copy link
Contributor Author

PVince81 commented Jun 28, 2017

Adjusted to also fix the storage value.

And now I realize that there could be rare cases where the path value is still correct but the storage value is wrong... Ah... Yet another case.
This is likely very rare as both sharers need to have their source folder have exactly the same name and also that the recipient decides to do the cross-storage move...

But for the sake of completeness:

  • also handle case where path is the same but storage value of parent is wrong/different

@PVince81
Copy link
Contributor Author

PVince81 commented Jun 28, 2017

Uh oh, and what if the target "name" is not the same as the original ? 😢

  • check case where target name is not same as original (ex: move "folderA/one" to "folderB/oneX")

@PVince81 PVince81 force-pushed the repair-filecache-mismatch-path branch from 5737bb2 to 36d30e9 Compare June 28, 2017 20:02
@PVince81
Copy link
Contributor Author

PVince81 commented Jun 28, 2017

  • what if recreated the source parent folder "folderA/one" ? => this folder stays legit and isn't touched by the repair routine. So "folderA/one" stays where it is while the other part of the tree gets moved.

Also note that it is not possible to create "fodlerA/one/two" even though it doesn't appear in the listing.
It is possible to PROPFIND "folderA/one/two" as it would directly jump into the folder in the correct subtree.

@PVince81 PVince81 changed the title Add repair step to repair mismatch filecache paths [WIP] Add repair step to repair mismatch filecache paths Jun 28, 2017
@PVince81
Copy link
Contributor Author

PVince81 commented Jun 28, 2017

  • BUG: manual test destroyed legitimate file cache entries ⚠️ => it was a missing use statement so mysql concat used the wrong operators

@PVince81
Copy link
Contributor Author

PVince81 commented Jun 28, 2017

My mega test case:

  1. Check out a commit before the fix from Creation of subfolders using desktop client after a move across shares can lead to duplicated folders. #28018, for example e04de08
  2. Create three users "user0", "user1" and "user2". user1 will be the recipient
  3. Login as "user0", create "folderA" and "folderB" and share with "user1"
  4. Login as "user2", create "folderA" and share with "user1"
  5. Login as "user1", rename "folderA (2)" to "folderA2" (for convenience)
  6. Login as "user1" with cadaver
  7. Create step by step a folder mkdir folderA/trash1/a/b/c
  8. Create step by step a folder mkdir folderA/trash2/a/b/c
  9. Create step by step a folder mkdir folderA/samestorage/a/b/c
  10. Create step by step a folder mkdir folderA/samecrossstorage/a/b/c
  11. Move "folderA/trash1" to "folderB/trash1"
  12. Move "folderA/trash2" to "folderA2/trash2"
  13. Delete (rmcol) both "folderB/trash1" and "folderA2/trash2", they'll go to trash
  14. Move "folderA/samestorage" to "folderB/samestorage_renamed"
  15. Move "folderA/crossstorage" to "folderA2/crossstorage_renamed"
  16. Run SELECT fc.storage, fc.fileid, fc.path as path, fc.name, fcp.storage as parentstorage, fcp.path as parentpath FROM oc_filecachefc,oc_filecache fcp WHERE (fc.parent = fcp.fileid) AND ((CONCAT(fcp.path, '/', fc.name) <> fc.path) OR (fc.storage <> fcp.storage)) AND (fcp.path <> ''); which should give 4 results (broken entries)
  17. Checkout this PR and run occ upgrade
  18. Check the query again: no results
  19. Login as "user1" in the web UI and navigate through all folders and trashbin and see that all is well

@PVince81
Copy link
Contributor Author

PVince81 commented Jun 28, 2017

  • BUG: 🚫 doesn't work for sub-entries. It seems that after the repair, the base directories "a" are fixed, but then suddenly the child directories "a/b" are broken. I guess this is normal because at the time of the original select only the "a" had a wrong path, but now that the path of "a" was fixed, the entries of "a/b" now have the wrong path. => works now

Possible solutions:

  1. Rerun the select loop until there are no more entries to process

or

  1. When moving an entry, also search for all child entries and fix them as well.

@PVince81 PVince81 force-pushed the repair-filecache-mismatch-path branch from 36d30e9 to 728e855 Compare June 28, 2017 20:44
@PVince81
Copy link
Contributor Author

PVince81 commented Jun 28, 2017

What a night mare, there's even more to it: if a filecache entry with the target name already exist, we delete it. But now if that entry had child entries that were created after the breakage, deleting it would orphan these child entries. So after deleting we need to reparent the child entries to attach them to the repaired entry.

  • reparent children of deleted entry

My worry here is that this is yet another can of worms and that the subtree reparenting has some hidden special cases.

@PVince81
Copy link
Contributor Author

PVince81 commented Jun 30, 2017

  • there were reports of cases where "fileid = parent", so need to:
    • find a way to make it happen
    • make sure the repair step can repair it too, or at least leave these alone for another future step to repair

@PVince81
Copy link
Contributor Author

We likely already have some code somewhere to reparent child entries, but that code should not be used within repairs steps so we'll likely have to do manually by doing several DB commands

@PVince81
Copy link
Contributor Author

Yet another alternative repair solution would be to keep the wrong path but adjust the parent id to point to the correct one. In this case it would be like cancelling the move operation. But it would also require moving back the parent entry that was moved to the target, move it back to the old location and set its path, with a slight risk that someone might have recreated the missing entry.

Now this could also be problemative if someone already worked a lot with the partially moved target and added new entries there. So maybe instead of moving the target entry back we just recreate another new parent entry to fill the gap. This way the already moved metadata is in the new folder and the old subdir metadata is on the old folder.

Then user interaction is required to fix this: the user needs to decide what to do with the data, whether to reattempt the move or move everything back.

This approach sounds much less risky than the one in this PR.
So let's give it a try in a separate PR.

@PVince81
Copy link
Contributor Author

arghh wait, the above is only about file cache stuff. The actual physical data is already on the target location. So this would require also moving the files physically, which can be tricky from within a repair step as we need to use the View to make sure we trigger all hooks (versions, etc)

@PVince81
Copy link
Contributor Author

PVince81 commented Jun 30, 2017

Forget about the alternative, I think I have a plan that could work out:

Let's illustrate what is happen currently, at some point we get into this broken situation:

+---------+--------+--------+------------------------------------------------+----------------------------------+
| storage | fileid | parent | path                                           | etag                             |
+---------+--------+--------+------------------------------------------------+----------------------------------+
|       3 |     20 |     14 | files/folderA                                  | 59562123b68de                    |
|       3 |     27 |     26 | files/folderA/one/two                          | 5956211e9f5b6                    |
|       3 |     28 |     27 | files/folderA/one/two/three                    | 5956211e9a79a                    |
|       3 |     21 |     14 | files/folderB                                  | 59562449a3ac3                    |
|       3 |     26 |     21 | files/folderB/one                              | 5956211e9f5b6                    |
|       3 |     36 |     21 | files/folderB/x                                | 595624499eaae                    |

Here I do a mkdir x then mv x folderB/one/two. Internally the code will notice that the parent entries for "folderB/one/two/x" do not exist in the cache. (they do but have the wrong path with "folderA")

This results in this situation which I expect might exist for some people already who continued to use the broken folder:

+---------+--------+--------+------------------------------------------------+----------------------------------+
| storage | fileid | parent | path                                           | etag                             |
+---------+--------+--------+------------------------------------------------+----------------------------------+
|       3 |     20 |     14 | files/folderA                                  | 59562123b68de                    |
|       3 |     27 |     26 | files/folderA/one/two                          | 5956211e9f5b6                    |
|       3 |     28 |     27 | files/folderA/one/two/three                    | 5956211e9a79a                    |
|       3 |     21 |     14 | files/folderB                                  | 59562458ca486                    |
|       3 |     26 |     21 | files/folderB/one                              | 59562458ca486                    |
|       3 |     37 |     26 | files/folderB/one/two                          | 59562458ca486                    |
|       3 |     38 |     37 | files/folderB/one/two/three                    | 595624519b05a                    |
|       3 |     39 |     37 | files/folderB/one/two/x                        | 59562454a5133                    |
|       3 |     36 |     21 | files/folderB/x                                | 595624499eaae                    |

At this point we could run into a conflict when moving "folderA/one/two" to "folderB/one/two" for repair purposes. The repair routine in this PR already deletes the target. This means that for every former "folderA/**" entries we can move them to the targets with the current code by repeatingly running the repair step until done. (the repeating is an issue, see #28253 (comment)).

One there is nothing else to repair the filecache looks like this:

+---------+--------+--------+------------------------------------------------+----------------------------------+
| storage | fileid | parent | path                                           | etag                             |
+---------+--------+--------+------------------------------------------------+----------------------------------+
|       3 |     20 |     14 | files/folderA                                  | 59562123b68de                    |
|       3 |     21 |     14 | files/folderB                                  | 59562458ca486                    |
|       3 |     26 |     21 | files/folderB/one                              | 59562458ca486                    |
|       3 |     27 |     26 | files/folderB/one/two                          | 5956211e9f5b6                    |
|       3 |     28 |     27 | files/folderB/one/two/three                    | 5956211e9a79a                    |
|       3 |     39 |     37 | files/folderB/one/two/x                        | 59562454a5133                    |
|       3 |     36 |     21 | files/folderB/x                                | 595624499eaae                    |

If you look closely you'll notice that the fileid of "folderB/one/two" is the same as the "folderA/one/two" we had before. So here the repair was successful.

Now since we added a new folder "x" inside there, that one is still pointing to the old parent fileid (37). This one was deleted when we replaced "folderB/one/two".

Instead of trying to manually reparent the whole tree, I suggest the following approach:

Write an additional repair step (which will be useful anyway!) that finds all entries that have a missing parent:

select storage,fileid,parent,path,etag from oc_filecache fc where parent <> -1 and not exists (select 1 from oc_filecache fc2 where fc2.fileid = fc.parent) order by path;
+---------+--------+--------+-------------------------+---------------+
| storage | fileid | parent | path                    | etag          |
+---------+--------+--------+-------------------------+---------------+
|       3 |     39 |     37 | files/folderB/one/two/x | 59562454a5133 |
+---------+--------+--------+-------------------------+---------------+

Then, using the dirname() of the path, we can find the actual parent entry to reattach to.
If there is no parent to repair then we're out of luck and need to leave the entry or even delete it... a topic for another time.

The above repair needs to be repeated until there are no more unrepairable entries with missing parent.

  • add second repair routine to repair filecache entry where "parent" points to a non-existing entry but a real parent entry based on path exists

@PVince81
Copy link
Contributor Author

PVince81 commented Jun 30, 2017

Select all entries where the parent points to a non-existing entry but where an entry exist with the entry's path dirname:

select storage,fileid,parent,path,etag from oc_filecache fc where parent <> -1 and not exists (select 1 from oc_filecache fc2 where fc2.fileid = fc.parent) and exists (select fc3.path from oc_filecache fc3 where fc3.storage = fc.storage and fc3.path = substring(fc.path, 1, length(fc.path) - length(substring_index(fc.path, '/',  -1)) - 1)) order by path;

+---------+--------+--------+-------------------------+---------------+
| storage | fileid | parent | path                    | etag          |
+---------+--------+--------+-------------------------+---------------+
|       3 |     39 |     37 | files/folderB/one/two/x | 59562454a5133 |
+---------+--------+--------+-------------------------+---------------+

@PVince81
Copy link
Contributor Author

if the above turns out to not work cross-DB, fall back to simply iterating over all found entries and keep a blacklist of file ids of non-repairable entries to exclude in subsequent queries.

@PVince81
Copy link
Contributor Author

alternative query that uses a join and provides the id of the parent to correct it to:

select fc.storage,fc.fileid,fc.parent as "wrongparent",fc3.fileid "correctparent",fc.path,fc.etag from oc_filecache fc, oc_filecache fc3 where fc.parent <> -1 and fc.parent != fc3.fileid and fc3.storage=fc.storage and fc3.path = substring(fc.path, 1, length(fc.path) - length(substring_index(fc.path, '/',  -1)) - 1) and not exists (select 1 from oc_filecache fc2 where fc2.fileid = fc.parent) order by path;
+---------+--------+-------------+---------------+-------------------------+---------------+
| storage | fileid | wrongparent | correctparent | path                    | etag          |
+---------+--------+-------------+---------------+-------------------------+---------------+
|       3 |     39 |          37 |            27 | files/folderB/one/two/x | 59562454a5133 |
+---------+--------+-------------+---------------+-------------------------+---------------+
1 row in set (0.00 sec)

I'm not confident to remove the check whether the parent entry doesn't exist because removing it could return entries that are broken differently: parent id pointing to a different entry.

@PVince81
Copy link
Contributor Author

PVince81 commented Jun 30, 2017

  • should we update etags to trigger sync client redownload for repaired entries ? need to check if necessary => apparently not. The client already sees the correct target structure

@PVince81 PVince81 force-pushed the repair-filecache-mismatch-path branch from 602dc84 to bd952f8 Compare June 30, 2017 12:58
@PVince81
Copy link
Contributor Author

PVince81 commented Jun 30, 2017

The current approach already works now, which is great.
However I used some Mysql-specific functions like SUBSTRING_INDEX which aren't available in other DBs. I'll have to rewrite that query, possibly separate it in two queries.

But at least we know the general approach works.

  • make it work for all DBs

@PVince81 PVince81 merged commit f964147 into master Sep 20, 2017
@PVince81 PVince81 deleted the repair-filecache-mismatch-path branch September 20, 2017 12:19
@PVince81
Copy link
Contributor Author

@tomneedham stable10 backport please

@ahmedatawfik
Copy link

@PVince81 Could you please let me know how I can apply this fix on my production 9.0.10?

@PVince81
Copy link
Contributor Author

@socrat3000 you can't, the code is not compatible and would need porting and careful retesting.

Anything preventing you to upgrade to 10.0.3 ? This fix will be in 10.0.4

@ahmedatawfik
Copy link

@PVince81 Is there any way I could apply a workaround at least for now till we upgrade?
screenshot from 2017-09-20 15-34-22

@PVince81
Copy link
Contributor Author

@socrat3000 if you only have one affected folder, look at "oc_filecache" entry for this folder and check that all parents properly connect. The "parent" column should point at the "fileid" of the actual parent based on the "path" value. You need to manually adjust the "path" column values to match the hierarchy from parent<->fileid relationship.

@ahmedatawfik
Copy link

ahmedatawfik commented Sep 20, 2017 via email

@ahmedatawfik
Copy link

@PVince81 I have now upgraded to 10.0.3, is there any way I could get that fixed before 10.0.4 is realeased?

@PVince81
Copy link
Contributor Author

@socrat3000 just wait for @tomneedham to finish the stable10 backport/PR then you could apply it to your instance and run "occ files:scan --all --repair"

@tomneedham
Copy link
Contributor

@socrat3000 See #29074

@PVince81
Copy link
Contributor Author

PVince81 commented Oct 9, 2017

Added doc ticket for future reference to the new commands: owncloud-archive/documentation#3446

@jdelvecchio
Copy link

Hello there,

got the problem on Owncloud 9.1.5, the option occ files:scan --repair is not available.
How can I fix this ? I tried in Owncloud 9.1.6, still no option --repair either.

Should I wait for the next version ?

Thanks.

@PVince81
Copy link
Contributor Author

The backports for OC 9.1 and 9.0 are still work in progress and should hopefully be available in the next versions.

OC 10.0.4 will have the command too once released.

@PVince81
Copy link
Contributor Author

PVince81 commented Nov 7, 2017

@enc98
Copy link

enc98 commented Feb 12, 2018

Hi @PVince81,
When I run /occ files: scan --all --repair into my OC (9.1.4)

sudo -u apache path to/occ files:scan --all --repair

Answer this:
[Symfony\Component\Console\Exception\RuntimeException] The "--repair" option does not exist.

Where am I wrong?

Thanks

@PVince81
Copy link
Contributor Author

@enc98 try updating to the latest 9.1.* version first

@ZennoB
Copy link

ZennoB commented Feb 20, 2018

Seems like I have the same Problem:
180220_csync_failed_to_access

Unfortunately I am running my owncloud on a shared Webserver. I haven't tried before, but it seems like I cannot run occ commands. What else can I do to solve this problem?

I have updated to 10.0.6 after the problem occured, that did not help.

Thanks for help!

@PVince81
Copy link
Contributor Author

There is no other way, sorry. you need access to command line because this repair is an expensive operation that cannot be run from a regular button in the web UI, mostly because it would likely run into PHP timeouts, especially on shared hosters which have low timeout values.

See owncloud-archive/documentation#3831

@ZennoB
Copy link

ZennoB commented Feb 21, 2018

Ok. With version 10.0.6 installed, is it possible that the problem occurs again? Or did you manage to repair its cause?

@PVince81
Copy link
Contributor Author

The cause itself is definitely repaired since a while, see #28018

@ZennoB
Copy link

ZennoB commented Feb 26, 2018

Problem solved. Although no offical owncloud hosting partner, my provider offered a solution, using shell script. PHP timeout was no problem in this case.

@lock
Copy link

lock bot commented Jul 31, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jul 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.