Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(commands): return error if check fails #224

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

aawsome
Copy link
Member

@aawsome aawsome commented May 6, 2024

closes #222

@aawsome aawsome changed the title check: return error if check fails fix(commands): return error if check fails May 6, 2024
Copy link

codecov bot commented May 6, 2024

Codecov Report

Attention: Patch coverage is 54.49735% with 86 lines in your changes missing coverage. Please review.

Project coverage is 43.4%. Comparing base (1e7b4a8) to head (5b98b3b).
Report is 82 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/commands/check.rs 54.9% 82 Missing ⚠️
crates/core/src/repository.rs 42.8% 4 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
crates/core/src/error.rs 14.2% <ø> (ø)
crates/core/src/repository.rs 34.7% <42.8%> (+0.2%) ⬆️
crates/core/src/commands/check.rs 55.8% <54.9%> (-7.8%) ⬇️

... and 21 files with indirect coverage changes

crates/core/src/commands/check.rs Outdated Show resolved Hide resolved
crates/core/src/error.rs Outdated Show resolved Hide resolved
crates/core/src/lib.rs Outdated Show resolved Hide resolved
@aawsome aawsome requested a review from simonsan May 28, 2024 15:26
@simonsan
Copy link
Contributor

I'll look through it tomorrow. 👍🏽

pub enum CheckErrorLevel {
/// A warning: Something is strange and should be corrected, but repository integrity is not affected.
Warn,
/// An erro: Something in the repository is not as it should be.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// An erro: Something in the repository is not as it should be.
/// An error: Something in the repository is not as it should be.

Comment on lines +50 to +55
#[derive(Debug)]
/// `CheckResults` is a list of errors encountered during the check.
pub struct CheckResults {
/// The list of errors with severity level.
pub errors: Vec<(CheckErrorLevel, CheckError)>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we could replace these with: https://crates.io/crates/lazy_errors

Didn't use it, but it sounded interesting on Reddit last week: https://www.reddit.com/r/rust/comments/1ddi1nb/i_published_my_first_crate_lazy_errors/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks still relevant to me, lazy_errors looks actually really good for collecting multiple errors like CheckResults does but looks more ergonomic to me at first look.

crates/core/src/commands/check.rs Show resolved Hide resolved
crates/core/src/commands/check.rs Show resolved Hide resolved
Comment on lines 124 to +125
// TODO: Make concurrency (20) customizable
check_cache_files(20, cache, raw_be, file_type, &p)?;
self.check_cache_files(20, cache, raw_be, file_type, &p)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, if that could be just a field in CheckOptions? or maybe another struct ParallelismOptions composed into CheckOptions that we can use in rustic_core.

Comment on lines 148 to +149
// TODO: Make concurrency (5) customizable
check_cache_files(5, cache, raw_be, FileType::Pack, &p)?;
self.check_cache_files(5, cache, raw_be, FileType::Pack, &p)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, maybe ParallelismOptions could be useful here as well?

Err(err) => error!("Error reading pack {id} : {err}",),
Err(err) => self.add_error(CheckError::ErrorReadingPack {
id,
err: Box::new(err),
Copy link
Contributor

@simonsan simonsan Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would call it source so it's more clear also because thiserror infers from it

}
self.errors
.lock()
.unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would at least prefer an expect here

@simonsan simonsan added A-errors Area: error handling needs improvement C-enhancement Category: New feature or request A-commands Area: Related to commands in `rustic_core` labels Oct 5, 2024

#[non_exhaustive]
#[derive(Error, Debug, Display)]
pub enum CheckError {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that also be convertable to RusticError?

crates/core/src/commands/check.rs Show resolved Hide resolved
Comment on lines +50 to +55
#[derive(Debug)]
/// `CheckResults` is a list of errors encountered during the check.
pub struct CheckResults {
/// The list of errors with severity level.
pub errors: Vec<(CheckErrorLevel, CheckError)>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks still relevant to me, lazy_errors looks actually really good for collecting multiple errors like CheckResults does but looks more ergonomic to me at first look.

@simonsan
Copy link
Contributor

simonsan commented Oct 5, 2024

After looking through it, I understand better now, what I found confusing about it, back in May. It's really because it introduces a special way to handle errors only for check (via self), although we want to have a unified way of handling errors. And apply that throughout rustic_core, which makes it easier to maintain, than having a special way to handle check errors.

The CheckError in errors.rs is really nice though, and this is probably the approach, we should take all over the code base. Then we might construct them in-place with all the information we need and create a well-formed error message towards the user from it.

For collecting the errors in a method/function we could use lazy_errors or change the signature to something like either:

  • Result<T, Vec<RusticError>>
  • struct ResultWithErrors<T> {
        value: Option<T>,
        soft_errors: Vec<RusticError>,
        hard_errors: Vec<RusticError>,
    }
  • or return a tuple of (T, Vec<RusticError>) from a function, in case we want to handle multiple errors
  • we could also go for making RusticError a trait and implement common behaviour on structs for each possible error struct

But this topic is definitely something we should talk about to find a good approach.

@aawsome
Copy link
Member Author

aawsome commented Oct 6, 2024

In principle I like the ides of lazy_errors - I however think that we want to collect not only errors, but also warnings - or, more general, a kind of ErrorLevel together with the error.
Also, we have two cases:

  • situations where we want to collect errors, but continue, i.e. we want to return a result and a error list
  • situation where we want to abort on errors, i.e. we want to return only the error list and indicate that things went wrong...

@simonsan
Copy link
Contributor

simonsan commented Oct 6, 2024

I sketched out an idea, to fix the pain points you mentioned in a more generic way, than introducing special error handling/collection for check only. Could you take a look at #318 ?

@aawsome
Copy link
Member Author

aawsome commented Oct 6, 2024

I think the main question here is:
Should we merge this PR with maybe just small modifications (quick&dirty) to fix the check issues and then overwork the error handling once we have a general error handling framework? Or should we wait for the framework?
I have a tendency towards a quick&dirty merge here, actually...

@simonsan
Copy link
Contributor

simonsan commented Oct 6, 2024

I think the main question here is: Should we merge this PR with maybe just small modifications (quick&dirty) to fix the check issues and then overwork the error handling once we have a general error handling framework? Or should we wait for the framework? I have a tendency towards a quick&dirty merge here, actually...

I would wait, I don't think (no offence really, we are doing a lot here) this should be merged as it is. We are introducing new errors here (CheckError), etc. We should just focus on this error handling topic, after the next rustic release and have it fixed long-term. This would just move it further away, while we feel we 'fixed' something, but we didn't really. We just moved a well-formed fix further away.

Also, the design implies, we implement checking on some ResultCollector, which I find counterintuitive due to the refactor in #317. That should probably still be freestanding functions, so they are decoupled and easier to test.

@simonsan
Copy link
Contributor

simonsan commented Oct 6, 2024

Let's rather do it right one time with a bit more effort now, than doing it more than once, just to be quick and dirty. 😅

@simonsan
Copy link
Contributor

simonsan commented Oct 7, 2024

  • situations where we want to collect errors, but continue, i.e. we want to return a result and a error list

For this situation, can you give an example, because this is worth to explore why it is like that. There could be many reasons for it, e.g. architectural problems, prolonged error handling, instead of doing it locally, etc.

When it comes to collecting errors in a thread, we can go for a channel and handle it in the main thread.

github-merge-queue bot pushed a commit that referenced this pull request Oct 9, 2024
…ne, repair, key, and restore (#317)

I saw in #224 that `CheckOptions::run()` has been moved to type
`CheckResultsCollector`. In other commands we actually have functions,
not associated to any type, that take their options as parameters. I
applied this here to check, prune, repair, key, and restore as well.
Because I think it reduces coupling and increases testability.

The idea behind having these standalone functions is, that check, prune,
repair, key, and restore are not run on their options (as a method of
e.g. `CheckOptions`), but rather take parameters, where one is the e.g.
`CheckOption` itself.

In principle: methods that implement commands don't operate on their own
options and have side effects on other types - options are passed into
functions as parameters.

Furthermore, a `check()` on `CheckOptions` sounds if it validates these
options itself, rather than being run on an external type. I think from
that POV it also makes sense to have such freestanding functions as
entry point to our commands in `rustic_core`.

---------

Signed-off-by: simonsan <14062932+simonsan@users.noreply.github.com>
@aawsome
Copy link
Member Author

aawsome commented Oct 11, 2024

  • situations where we want to collect errors, but continue, i.e. we want to return a result and a error list

For this situation, can you give an example, because this is worth to explore why it is like that. There could be many reasons for it, e.g. architectural problems, prolonged error handling, instead of doing it locally, etc.

An example is Repository::get_snapshots. This loads all Snapshotfiles from the repository and decrypts them into a `Vec'.

There are errors like not being able to list snapshots which will cause this command to fail. But each individual snapshot may fail either to load from backend or to decrypt (e.g. data corruption). Currently, this functions fails if any snapshot fails to load+decrypt. This means if we have a corrupt snapshot file, the repository can be basically no longer used until this is fixed.

We however may want to introduce resiliency: It is e.g. very nasty if the whole repository is not accessible just because that old and now unimportant snapshotfile has an issue. If we introduce this, then the command would return the successful loaded snapshots and a list of failed snapshots with the error(s) - and we would at least need to indicate that something went wrong by not sending a 0 return code in some cases...

@simonsan
Copy link
Contributor

simonsan commented Oct 14, 2024

I would imagine it like this somehow (draft):

2024-10-14 22_00_49-snapshotfile rs (Working Tree) (snapshotfile rs) - rustic (Workspace) - Rust Dev

So we have a Vec<RusticError> that we return from SnapshotFile::fill_missing:

2024-10-14 22_04_30-snapshotfile rs (Working Tree) (snapshotfile rs) - rustic (Workspace) - Rust Dev

, and we reserve the Err(E) of RusticResult for hard errors. But I think this is not nice. 😬

But in the end, it's the question of why we want to return a list of the non-working snapshots and not just log an error there. What value has the information for the caller, that there are broken snapshots it cannot be operated on? We just don't want it to fail, right? If we return that Vec<RusticError> to the caller, what else will it do, than just logging it/printing to the user?
If we would want to handle the broken snapshots as well, we might rather return something like:
RusticResult<Vec<RusticResult<SnapshotFile>>> with outer Result being the operation itself that is fallible and indicates something went horribly wrong. And the inner Results being the Snapshots itself, that the caller wants to handle the Ok(T) values of and do something with the errors.

@simonsan simonsan marked this pull request as draft October 24, 2024 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-commands Area: Related to commands in `rustic_core` A-errors Area: error handling needs improvement C-enhancement Category: New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error handling in check
2 participants