Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Display a correct status when removing a downtime #8104

Merged
merged 1 commit into from
Jan 14, 2021

Conversation

yhabteab
Copy link
Member

@yhabteab yhabteab commented Jul 9, 2020

resolves #7408

@yhabteab yhabteab requested a review from Al2Klimov July 9, 2020 13:40
lib/icinga/apiactions.cpp Outdated Show resolved Hide resolved
Copy link
Member

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • git reset --soft HEAD~$(git rev-list --count $(git merge-base HEAD master)..HEAD^)
  • git commit --amend --no-edit
  • git push -f origin bugfix/remove-downtime-returns-wrong-status-7408

@yhabteab yhabteab force-pushed the bugfix/remove-downtime-returns-wrong-status-7408 branch from f4a3f85 to 9ce579d Compare July 9, 2020 14:07
@yhabteab yhabteab requested a review from Al2Klimov July 9, 2020 14:07
Copy link
Member

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write a before/after test protocol for both cases you handle.

@yhabteab
Copy link
Member Author

yhabteab commented Jul 9, 2020

Testprotocol:
The problem was that Icinga would output incorrect API response even though the downtime could not be removed. So, although a log message is displayed saying that deleting downtime is not possible, you will get an api response in the console Successfully removed downtime ........

First create scheduledDowntime object like this..

object ScheduledDowntime "some-downtime" {
  host_name = "example"
  service_name = "ping4"

  author = "root"
  comment = "Some comment Here"

  fixed = false
  duration = 30m

  ranges = {
    "thursday" = "14:29-15:00"
  }
}

Now restart Icinga2 daemon and try to remove downtime: curl -k -s -u root:icinga -H 'Accept: application/json' -X POST 'https://localhost:5665/v1/actions/remove-downtime' -d '{ "downtime": "example!ping4!94c0de19-1dcd-429c-9613-9a9ecd537d3e", "pretty": true }'

Log message:

[2020-07-09 16:31:59 +0200] information/ApiListener: New client connection from [127.0.0.1]:38918 (no client certificate)
[2020-07-09 16:31:59 +0200] information/HttpServerConnection: Request: POST /v1/actions/remove-downtime (from [127.0.0.1]:38918), user: root, agent: curl/7.58.0).
[2020-07-09 16:31:59 +0200] warning/Downtime: Cannot remove downtime 'example!ping4!94c0de19-1dcd-429c-9613-9a9ecd537d3e'. It is owned by scheduled downtime object 'example!ping4!some-downtime'
[2020-07-09 16:31:59 +0200] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:38918)

Api response:

"results": [
        {
            "code": 200.0,
            "status": "Successfully removed downtime 'example!ping4!94c0de19-1dcd-429c-9613-9a9ecd537d3e'."
        }
    ]

After:
Log message:

[2020-07-09 16:27:37 +0200] information/HttpServerConnection: Request: POST /v1/actions/remove-downtime (from [127.0.0.1]:38908), user: root, agent: curl/7.58.0).
[2020-07-09 16:27:37 +0200] warning/Downtime: Cannot remove downtime 'example!ping4!94c0de19-1dcd-429c-9613-9a9ecd537d3e'. It is owned by scheduled downtime object 'example!ping4!some-downtime'
[2020-07-09 16:27:37 +0200] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:38908)

Api :

"results": [
        {
            "code": 400.0,
            "status": "Cannot remove downtime 'example!ping4!94c0de19-1dcd-429c-9613-9a9ecd537d3e'. It is owned by scheduled downtime object 'example!ping4!some-downtime'"
        }
    ]

Or if you want to delete host and services that belong to the scheduledDowntime:

Before:

[2020-07-09 16:33:12 +0200] information/ApiListener: New client connection from [127.0.0.1]:38922 (no client certificate)
[2020-07-09 16:33:12 +0200] information/HttpServerConnection: Request: POST /v1/actions/remove-downtime (from [127.0.0.1]:38922), user: root, agent: curl/7.58.0).
[2020-07-09 16:33:12 +0200] warning/Downtime: Cannot remove downtime 'example!ping4!94c0de19-1dcd-429c-9613-9a9ecd537d3e'. It is owned by scheduled downtime object 'example!ping4!some-downtime'
[2020-07-09 16:33:12 +0200] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:38922)
"results": [
        {
            "code": 200.0,
            "status": "Successfully removed all downtimes for object 'example!ping4'."
        }
    ]

After:

[2020-07-09 16:38:17 +0200] information/ApiListener: New client connection from [127.0.0.1]:38946 (no client certificate)
[2020-07-09 16:38:17 +0200] information/HttpServerConnection: Request: POST /v1/actions/remove-downtime (from [127.0.0.1]:38946), user: root, agent: curl/7.58.0).
[2020-07-09 16:38:17 +0200] warning/Downtime: Cannot remove downtime 'example!ping4!94c0de19-1dcd-429c-9613-9a9ecd537d3e'. It is owned by scheduled downtime object 'example!ping4!some-downtime'
[2020-07-09 16:38:17 +0200] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:38946)
"results": [
        {
            "code": 400.0,
            "status": "Cannot remove downtime 'example!ping4!94c0de19-1dcd-429c-9613-9a9ecd537d3e'. It is owned by scheduled downtime object 'example!ping4!some-downtime'"
        }
    ]

@yhabteab yhabteab requested a review from Al2Klimov July 9, 2020 14:39
@Al2Klimov Al2Klimov added this to the 2.13.0 milestone Jul 9, 2020
@Al2Klimov Al2Klimov added area/api REST API bug Something isn't working labels Jul 9, 2020
@Al2Klimov Al2Klimov requested a review from julianbrost November 3, 2020 16:19
Copy link
Contributor

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered passing the error from Downtime::RemoveDowntime(...) instead of duplicating the check, for example by raising an exception? That function is used in a few places though and probably all of them would have to be updated though. But this could also show different places where the caller should know about this failure.

I think it could also be possible that downtime->IsExpired() changes between the check here and in Downtime::RemoveDowntime(...), so in theory you could get different results in the log and the API response, but nobody would really care I guess (log says it can't be removed right before it expires, API response says it was removed as it just expired).

@yhabteab yhabteab force-pushed the bugfix/remove-downtime-returns-wrong-status-7408 branch 2 times, most recently from 8643cbc to 4167624 Compare November 15, 2020 10:49
Copy link
Contributor

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also two more uses of Downtime::RemoveDowntime in the code base, have you checked these?

  • Checkable::RemoveAllDowntimes
    void Checkable::RemoveAllDowntimes()
    {
    for (const Downtime::Ptr& downtime : GetDowntimes()) {
    Downtime::RemoveDowntime(downtime->GetName(), true, true);
    }
    }
  • Downtime::DowntimesExpireTimerHandler
    for (const Downtime::Ptr& downtime : downtimes) {
    /* Only remove downtimes which are activated after daemon start. */
    if (downtime->IsActive() && (downtime->IsExpired() || !downtime->HasValidConfigOwner()))
    RemoveDowntime(downtime->GetName(), false, true);
    }

lib/icinga/externalcommandprocessor.cpp Outdated Show resolved Hide resolved
lib/icinga/externalcommandprocessor.cpp Outdated Show resolved Hide resolved
lib/icinga/externalcommandprocessor.cpp Outdated Show resolved Hide resolved
lib/icinga/externalcommandprocessor.cpp Outdated Show resolved Hide resolved
lib/icinga/apiactions.cpp Show resolved Hide resolved
@yhabteab
Copy link
Member Author

There are also two more uses of Downtime::RemoveDowntime in the code base, have you checked these?

  • Checkable::RemoveAllDowntimes
    void Checkable::RemoveAllDowntimes()
    {
    for (const Downtime::Ptr& downtime : GetDowntimes()) {
    Downtime::RemoveDowntime(downtime->GetName(), true, true);
    }
    }
  • Downtime::DowntimesExpireTimerHandler
    for (const Downtime::Ptr& downtime : downtimes) {
    /* Only remove downtimes which are activated after daemon start. */
    if (downtime->IsActive() && (downtime->IsExpired() || !downtime->HasValidConfigOwner()))
    RemoveDowntime(downtime->GetName(), false, true);
    }

Yeah, I've seen them before. But since we throw the exception as long as the downtime has not expired and is currently used by another object, it should not be a problem with the two callers, because the parameter expiered is set to true when the Downtime::RemoveDowntime() is called.

@yhabteab yhabteab requested a review from julianbrost November 26, 2020 15:45
Copy link
Contributor

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit history obviously still needs cleanup and once you say you're done, I'll spin this up in my test setup for my final review.

In general, can you give some explanation on why you added the individual members of invalid_downtime_removal_error? I'd be perfectly fine with it only having the constructor it actually uses (i.e. with parameter String). There's probably no reason for allowing to construct an exception of this type without a message. Also, is there any particular reason for listing the copy constructor and operator=, or just to make it explicit?

lib/base/exception.cpp Outdated Show resolved Hide resolved
lib/icinga/apiactions.cpp Outdated Show resolved Hide resolved
lib/base/exception.cpp Show resolved Hide resolved
lib/icinga/externalcommandprocessor.cpp Outdated Show resolved Hide resolved
@yhabteab
Copy link
Member Author

Also, is there any particular reason for listing the copy constructor and operator=, or just to make it explicit?

No, there is no special reason for this, but since this exception type is thrown with BOOST_THROW_EXCEPTION(), I wanted the copy constructor to be declared explicitly. If you do not like it, I can remove it right away!

@yhabteab yhabteab force-pushed the bugfix/remove-downtime-returns-wrong-status-7408 branch 2 times, most recently from 7e62648 to e630dd1 Compare November 30, 2020 11:39
@yhabteab yhabteab requested a review from julianbrost November 30, 2020 11:40
Copy link
Contributor

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. Sorry for requesting changes yet again, I missed this during previous reviews.

lib/icinga/downtime.cpp Outdated Show resolved Hide resolved
lib/icinga/externalcommandprocessor.cpp Outdated Show resolved Hide resolved
@Al2Klimov Al2Klimov removed the needs feedback We'll only proceed once we hear from you again label Dec 1, 2020
@Al2Klimov
Copy link
Member

Why to discuss w/ team Web? Have a look at these (onSuccess). They don't even care about errors.

@yhabteab yhabteab force-pushed the bugfix/remove-downtime-returns-wrong-status-7408 branch from ff11e34 to 3cae544 Compare December 1, 2020 16:25
@julianbrost
Copy link
Contributor

Sure, you can also answer questions regarding the current behavior by trying it out or reading the source.

@yhabteab yhabteab force-pushed the bugfix/remove-downtime-returns-wrong-status-7408 branch from 3cae544 to 3ea1289 Compare December 1, 2020 16:30
@yhabteab yhabteab requested a review from Al2Klimov December 2, 2020 17:04
Copy link
Member

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Rebase against master
  2. Reproduce this statement of @julianbrost:

Somehow it now returns an error code of 400 in the JSON wrapped in a HTTP response with code 500

@yhabteab
Copy link
Member Author

yhabteab commented Dec 3, 2020

2. Reproduce this statement of @julianbrost:

< HTTP/1.1 500 Internal Server Error
< Server: Icinga/v2.12.0-rc1-54-g198177cc5
< Content-Type: application/json
< Content-Length: 248
< 
{
    "results": [
        {
            "code": 400.0,
            "status": "Cannot remove downtime 'ha-master!ping4!8912c953-19b5-4cd6-93d0-65fc98c5bf80'. It is owned by scheduled downtime object 'ha-master!ping4!run-downtime'"
        }
    ]
* Connection #0 to host localhost left intact
}

@Al2Klimov
Copy link
Member

Good! Now have a look at the code and tell where this behavior comes from.

@yhabteab
Copy link
Member Author

yhabteab commented Dec 3, 2020

As @julianbrost already mentioned above, everything that is not 200 is listed in the HTTP response as internal server error with 500 error code and comes from here.

icinga2/lib/remote/actionshandler.cpp Lines 94 to 103 in 653c6f4

int statusCode = 500;

for (const Dictionary::Ptr& res : results) {
if (res->Contains("code") && res->Get("code") == 200) {
statusCode = 200;
break;
}
}

response.result(statusCode);

@Al2Klimov
Copy link
Member

Any ideas how we could do this better?

@yhabteab
Copy link
Member Author

yhabteab commented Dec 3, 2020

Isn't it obvious 🙄 ! You just have to add an else if condition in the range based for loop.

@Al2Klimov
Copy link
Member

No. Make a separate PR for that code:

  1. Collect all unique status codes
  2. If there's only one: take it
  3. If there's only one non-2xx: take it
  4. Else: take 500

@yhabteab yhabteab force-pushed the bugfix/remove-downtime-returns-wrong-status-7408 branch from 198177c to 572af6a Compare December 3, 2020 15:12
@yhabteab yhabteab force-pushed the bugfix/remove-downtime-returns-wrong-status-7408 branch from 572af6a to dd02e3b Compare December 7, 2020 12:20
@yhabteab yhabteab requested a review from Al2Klimov December 7, 2020 12:20
@Al2Klimov
Copy link
Member

Al2Klimov commented Dec 14, 2020

@Al2Klimov Al2Klimov removed their request for review December 14, 2020 14:30
@Al2Klimov Al2Klimov marked this pull request as draft December 14, 2020 14:30
@Al2Klimov Al2Klimov requested a review from julianbrost January 14, 2021 09:32
@Al2Klimov Al2Klimov marked this pull request as ready for review January 14, 2021 09:32
@julianbrost
Copy link
Contributor

In fact this didn't change what Web 2 reports when trying to delete a downtime. But I guess in the end this error page is better than silently ignoring an error. I'll open an issue there.

@Al2Klimov Al2Klimov merged commit 70b438a into master Jan 14, 2021
@icinga-probot icinga-probot bot deleted the bugfix/remove-downtime-returns-wrong-status-7408 branch January 14, 2021 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: remove-downtime returns wrong status code/message if it fails
3 participants