Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize quaternions from msgs before Ogre use #1179

Merged
merged 12 commits into from
Jan 5, 2018

Conversation

rhaschke
Copy link
Contributor

This is a follow-up to #1167, which added tests for invalid quaternions that caused crashes in Ogre.
However, instead of rejecting those invalid quaternions, I propose to normalize them (or use identity if a zero quaternion is passed). Actually this was already the default behaviour of FrameManager::transform() which is used by most routines anyway.
However, in rare cases ROS quaternions were used directly to set an Ogre::Quaternion, which then indeed requires normalization and special handling of the zero quaternion.

This PR reverts most changes introduced in #1167 and thus fixes the regression not handling non-normalized quaternions gracefully: #1178, moveit/moveit#732.

@davetcoleman
Copy link
Member

davetcoleman commented Jan 2, 2018

I just ran into this issue today, spent ~4 hours trying to understand why some machines it was working until I upgraded them all. Thanks for this patch @rhaschke!

Copy link
Member

@davetcoleman davetcoleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would it be cleaner if you reverted #1167 then only had your proposed changes here

Ogre::Quaternion ori(pose.pose.orientation.w, pose.pose.orientation.x, pose.pose.orientation.y, pose.pose.orientation.z);
Ogre::Quaternion ori;
if (!normalizeQuaternion(pose.pose.orientation, ori))
ROS_WARN("invalid quaternion (zero length)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: capitalize Invalid here and in rest of this PR

{
return validateQuaternions( msg.orientation );
float norm2 = quaternionNorm2(w, x, y, z);
if (norm2 < 10e-3f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here why this threshold was chosen?

return validateQuaternion(msg.pose.orientation);
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one line break

@davetcoleman
Copy link
Member

I've tested this patch locally and it fixes the issue for me, thanks again

@tfoote
Copy link
Contributor

tfoote commented Jan 3, 2018

I'm not sure that I agree with the logic to just "normalize" the quaternion on input. Sending an unnormalized quaternion is like sending a NaN for a float value. Just pushing it to the nearest valid float real value is not representing the same thing.

When doing quaternion math floating point arithmetic errors can induce quaternions that are non-normalized and they do periodically need to be renormalized. Renormalizing at every step is a very high overhead, normalizing never will end up with NaN values. There is a balance in between to two extremes. This is typically done in a specific time in processing. Often I would expect it to be done shortly before being output from an algorithm not on the input. As the input will not have semantic knowledge of the previous potential cause of non-normalization. And I think it's much more correct for input validation instead of input normalization.

Related to this arbitrarily normalizing all inputs can cause completely invalid quaternions (such as unset) to become valid and be rendered. Normalization can only be considered valid if the original is close to normalized, or in a known state of non-normalization. (for example unitialized with all zeros) I can certainly take 4 numbers and turn them into a normalized quaternion, but if they still represent the same underlying information is not clear. Thus it would be better to reject the invalid data rather than pushing it into the right form.

Another example of this might be if the throttle commands for a vehicle are on the range 0 to 1. If I receive a command of 1.1 what does that mean? It's undefined as it's not in the interval, it could be someone trying to dial it up to 11. Or it could be someone sending 1.1 out of a range on 100. The meaning of a non-normalized quaternions is similarly undefined. I would strongly advocate for resolving this issue by making sure that all published quaternions are valid rather than accepting invalid quaternions by auto-renormalizing them.

@rhaschke
Copy link
Contributor Author

rhaschke commented Jan 3, 2018

@tfoote To some extend I can agree to your argumentation. However, it's a plain fact, that rviz did (and still does if allowed to) normalize quaternions incoming via ROS msgs and now it simply rejects them. This dramatically changes behaviour of rviz and - besides others - breaks the plain interactive marker tutorial (which could be thought of a best-practice example).
So, I'm fine to change the semantics in some future. But, please give us some time for transition and don't break a released tool.
Also, we should provide some tools to normalize quaternions in a ROS pose msg. Then it becomes feasible to normalize at the output generator (instead of the input in rviz).

@tfoote
Copy link
Contributor

tfoote commented Jan 3, 2018

We're looking at restoring the permissive behavior to restore existing behavior, but adding warnings. Anywhere that is known to be sending unnormalized quaternions should be ticketed for resolving at the source. Interactive markers tutorials or libraries should not be sending unnormalized quaternions. Please open tickets for anything that's having trouble right now while we're working to resolve this.

@wjwwood
Copy link
Member

wjwwood commented Jan 3, 2018

However, it's a plain fact, that rviz did (and still does if allowed to) normalize quaternions incoming via ROS msgs

Is that true? Looking at #1167, I don't see where code that did normalize incoming marker message quaternions was removed. Perhaps you mean in other places?

This dramatically changes behaviour of rviz and - besides others - breaks the plain interactive marker tutorial (which could be thought of a best-practice example).
So, I'm fine to change the semantics in some future. But, please give us some time for transition and don't break a released tool.

I can't speak for @tfoote, but we're on the same page here, the disruptive change in behavior was unexpected and we're going to revert it and do something about it. I think the idea of requiring valid quaternions would be a future change or at least would be opt-in if back ported.

I think it's worth pursuing to get valid quaternions in rviz (either fixed on the rviz side or on the publishing side), because there have been other issues which attribute crashes to invalid quaternions, but we haven't conclusively linked those two things yet, e.g.:

#1082 (comment)

@dhood
Copy link
Contributor

dhood commented Jan 3, 2018

I don't believe we ever normalised un-normalised quaternions before 1167, but @rhaschke pointed out that, previously, null quaternions would get set to identity in FrameManager::transform() while now they are rejected before they get to that point, which may be what they were referring to wrt "normalising" in the past

@gavanderhoorn
Copy link
Contributor

The unexpected breakage of RViz is not nice, but patching up invalid data just so it can be rendered makes no sense to me, so my vote would be for the warn/error-and-discard option.

That would be similar to TF and other incoming msgs that can't be transformed properly: they're discarded and a warning/error with sufficient visibility is output (ie: "Dropped N% of msgs so far, ..").

Normalising unconditionally will probably just mean that other nodes will crash/error out on the same data, which could be equally confusing ("but RViz does accept my quaternion?").

@rhaschke
Copy link
Contributor Author

rhaschke commented Jan 3, 2018

To clarify my point: rviz currently adapts invalid quaternions in at least two central places (since ages):

So, its not only rviz, which re-normalized quaternions, but that's already part of more basic tf library!

I fully understand the wish to handle normalization once at generation time, but current practice is fundamentally different. I guess, if you enable a warning in tf you will get hundreds of warnings. It's common practice to create quaternions like (1,0,0,1) to represent 90° rotations about some axes and then relying on normalization.

Normalization of quaternions usually is done by rviz::FrameManager::transform()
which transforms a ROS pose into an Ogre position + orientation.
Only in some rare cases, a ROS quaternion was directly used as an
Ogre::Quaternion, which then requires handling of null quaternions (as
they arise from uninitialized ROS pose msgs).
@rhaschke rhaschke force-pushed the normalize-quaternions branch from 1d30875 to 76e3a52 Compare January 3, 2018 10:17
@rhaschke
Copy link
Contributor Author

rhaschke commented Jan 3, 2018

I addressed @davetcoleman's comments.

@davetcoleman
Copy link
Member

I agree we should require all input to be normalized in the future but for now it should just produce warnings instead of breaking Rviz worldwide. I have found regressions in several different uses of Rviz, including MoveIt!.

@dhood
Copy link
Contributor

dhood commented Jan 3, 2018

@rhaschke thank you for clarifying that tf will normalise the quaternions inside FrameManager::transform. In light of this it makes sense to continue normalising in the missing places as you have proposed.

Since we still want warnings for unnormalised quaternions I'll remove the reverting of 1167 from this PR in favour of #1182, and this PR will just focus on normalising the quaternions from ROS msgs that are not otherwise normalised.

Additionally I'll invert the logic to consider null quaternions valid (users being lazy) and unnormalised ones invalid (potentially sign of an error)

dhood added 6 commits January 3, 2018 16:06
Don't mix logic for what is considered a valid quat into this function;
matches Ogre
Logic of what makes a quaternion valid isn't in normalizeQuaternion;
also gets the invalid quaternion value logged to console
@dhood
Copy link
Contributor

dhood commented Jan 4, 2018

@rhaschke thank you for kick starting this and for giving permission for this branch to be adapted by us.

As I mentioned, this PR now addresses the remaining places in the codebase @rhaschke identified where quaternions may be null/unnormalised. It addresses the issue that initially prompted rejecting invalid quaternions (#1137) and potentially others. It therefore removes the need for us to reject invalid quaternions (we now give warnings instead: #1182)

I've minimised the diff to simplify review, including removing printing of warnings for invalid quaternions that are covered by #1182. normalizeQuaternions still sets null quaternions to identity and normalises all others, but no longer returns a boolean; callers must use validateQuaternions to determine if a warning should be printed (and to get the invalid quaternion's magnitude logged to debug).

@rhaschke The previous state of this PR had map_display rejecting null quaternions which is no longer the case since I imagine it was unintentional, please correct me if that's not the case

@dhood
Copy link
Contributor

dhood commented Jan 4, 2018

@ros-pull-request-builder retest this please

@dhood
Copy link
Contributor

dhood commented Jan 4, 2018

@ros-pull-request-builder retest this please

(issues with the git refs)

@wjwwood
Copy link
Member

wjwwood commented Jan 4, 2018

Obviously need to get the CI passing or verify it locally, otherwise lgtm.

Copy link
Contributor Author

@rhaschke rhaschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
I my proposal, I tried to avoid computing the quaternion norm twice (once for validation and once for normalization). Hence the boolean return value of normalizeQuaternion().
The current proposal explicitly separates validation and normalization which is fine if the additional overhead is accepted. As I understood @tfoote in #1179 (comment), he wanted to reduce the computational overhead of normalization to a minimum ;-)

if ( 0.0f == x && 0.0f == y && 0.0f == z && 0.0f == w )
{
w = 1.0f;
x = y = z = 0.0f;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is superfluous. x,y,z are zero already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gah, thanks for pointing that out: df6dfd4

if ( 0.0 == x && 0.0 == y && 0.0 == z && 0.0 == w )
{
w = 1.0;
x = y = z = 0.0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous again.

template<typename T>
inline bool validateQuaternions(const T &vec)
inline bool validateQuaternions( const std::vector<T> &vec )
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I more like the old version as it is more compact and easier to grasp ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I know, but I'm trying to keep unnecessary changes out for this PR 😃

@rhaschke
Copy link
Contributor Author

rhaschke commented Jan 4, 2018

@rhaschke The previous state of this PR had map_display rejecting null quaternions which is no longer the case since I imagine it was unintentional, please correct me if that's not the case.

@dhood I think, converting null quaternions in a message to identity is what the user expects here too.

@dhood
Copy link
Contributor

dhood commented Jan 4, 2018

The current proposal explicitly separates validation and normalization which is fine if the additional overhead is accepted.

Yeah, for most of these msgs they're not being normalised elsewhere so it's roughly the same overhead as the validateQuaternions + tf normalisation in other places

The devel jobs are behaving now so we'll just wait for that to turn over. Appreciate your contributions/insight on this @rhaschke

@VictorLamoine
Copy link
Contributor

VictorLamoine commented Jan 4, 2018

I think we should reject quaternions that are not valid in the future.
I understand it will break a lot of code but this clearly needs to be fixed at the source; quaternions must be defined.

I'm in favour of:

  • Normalizing/initializing incoming quaternions AND printing warnings about invalid quaternions
  • Reporting and fixing of as much packages as possible during a defined time (6 months? 1 year?)
  • After this time, rejecting invalid quaternions (hoping that most packages are already fixed)

It will take some time to fix the code base but the effort is worth it.
What is the plan?

@VictorLamoine
Copy link
Contributor

Did someone test the overhead of this patch? eg: when displaying a large number of markers?

@dhood
Copy link
Contributor

dhood commented Jan 5, 2018

OK, looks like we're all in agreement.

Summarising the thread: invalid quaternions will be permitted for now to not break displays; publishers of invalid quaternions should be reported/updated; invalid quaternions will be rejected in the future.

What is the plan?

Pretty much as you said, @VictorLamoine. This is the timeline to get to the end goal of not normalising inputs:

  • Now: normalizing/initializing incoming quaternions AND printing warnings about invalid quaternions, with the 'once' log filter to ease into it.
  • After 6 months: more + stronger warnings to encourage the remaining offenders.
  • After 1 year: reject invalid quaternions with a warning.

Regarding the overhead, we can consider adding an option to displays to assume valid quaternions if there are use cases for which the overhead needs to be avoided.

@dhood dhood changed the title normalize invalid quaternions instead of rejecting them Normalize quaternions from msgs before Ogre use Jan 5, 2018
@dhood dhood merged commit 866d376 into ros-visualization:kinetic-devel Jan 5, 2018
@rhaschke
Copy link
Contributor Author

rhaschke commented Jan 5, 2018

@dhood @VictorLamoine In order to enforce valid quaternions on the generating side, you should provide tools to:

@wjwwood
Copy link
Member

wjwwood commented Jan 5, 2018

correctly initialize an identity quaternion in corresponding ROS msgs by default. Currently, the default is (0,0,0,0).

That's not possible because there are no default values for generated messages in ROS atm. In ROS 2 we have this (specifically for this case with quaternions), but I don't think there are plans to back port it, but with some community interest and help it might happen.

@rhaschke
Copy link
Contributor Author

rhaschke commented Jan 5, 2018

correctly initialize an identity quaternion in corresponding ROS msgs by default. Currently, the default is (0,0,0,0).

That's not possible because there are no default values for generated messages in ROS atm.

Actually, I was afraid of this. For me, that's a clear reason that generation-side normalization is currently not feasible in a comfortable fashion.

@wjwwood
Copy link
Member

wjwwood commented Jan 5, 2018

That's always been the case though. I don't see why requiring people to initialize their quaternions is a problem. Also for the special case of all zeros, we can continue to change that to identity for people, since that's not as expensive as always normalizing. That case, in my mind, is separate from giving unnormalized quaternions.

x /= norm2;
y /= norm2;
z /= norm2;
float invnorm = 1.0f / norm2;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of this? Multiplication and division have identical costs on the CPU, haven't they?

@rhaschke
Copy link
Contributor Author

rhaschke commented Jan 5, 2018

Also for the special case of all zeros, we can continue to change that to identity for people.

Agreed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants