Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SkelParser::readSkeleton assertion failure #412

Closed
jturner65 opened this issue Jun 17, 2015 · 37 comments
Closed

SkelParser::readSkeleton assertion failure #412

jturner65 opened this issue Jun 17, 2015 · 37 comments

Comments

@jturner65
Copy link

At line 544 :
it = joints.find(order.begin()->second);

after the first joint is processed this line fails due to order being empty.

The skeleton file i am using is based on (and nearly identical to, except for also having markers) fullbody1.skel, and the joint assigned to index just above is ("ground",0).

Edit : note this skeleton file has multiple skeletons (ground and biped), and the first one only has 1 joint defined. the code in this section looks like it may be predicated on the assumption that more than 1 joint will always exist in any skeleton.

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

It doesn't assume the .skel file has multiple joints, but it is, however, exhibiting undefined behavior, because it calls ->second on order.begin(), regardless of whether there are any entries in order (this will always happen, no matter how many Joints are in the Skeleton... although it won't happen with a zero-Joint Skeleton, I suppose). Somehow this hasn't been triggering any issues on any of the platforms that we use for testing, but it's definitely bad bad bad.

I'm about to submit a pull request that checks whether order.begin() is equal to order.end() before using it.

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

Hopefully the assertion failure is fixed in pull request #415. If you get the chance, try it out and let me know if it's helped. If the problem was the undefined behavior that I noticed, then I won't have a way of checking for myself if the issue is resolved.

@jturner65
Copy link
Author

this seems to address the issue, although i can't be sure (getting many other issues in skelparser).

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

Are the other issues related to the Eigen alignment problems, or are they specific to the parser?

@jturner65
Copy link
Author

there are many alignment things that i think i have addressed, but i also had to add a copy constructor for MultiDofJoint::UniqueProperties, since one is called in the init list for a free joint and i am currently attempting to address the upcast of properties to uniqueproperties in dynamics::detail::multidofjoint.h - an assertion is failing in MultiDofJoint::setDampingCoefficient since d (the coefficient value) doesn't appear to be initialized, and so is some negative garbage value.

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

i also had to add a copy constructor for MultiDofJoint::UniqueProperties

I find this surprising, since all the contents of MultiDofJoint::UniqueProperties should be entirely copy-safe. Could this be an alignment issue? What if you added EIGEN_MAKE_ALIGNED_OPERATOR_NEW to UniqueProperties instead of creating a user-defined copy constructor?

@jturner65
Copy link
Author

nope. i get a bad_alloc on the freejoint multi-dof constructor.

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

Has EIGEN_MAKE_ALIGNED_OPERATOR_NEW been added to every struct in the inheritance hierarchy of MultiDofJoint::Properties? I'm really just spitballing here, but I don't know what else would cause the behavior you're experiencing, since everything in those structs are just basic types and Eigen types.

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

We would want EIGEN_MAKE_ALIGNED_OPERATOR_NEW to be added to FreeJoint::Properties, MultiDofJoint::UniqueProperties, MultiDofJoint::Properties, and Joint::Properties. If that doesn't take care of the issue, then the cause is definitely something else.

@jslee02
Copy link
Member

jslee02 commented Jun 17, 2015

Is this still happening in eigen_aligned_allocator branch?

@jturner65
Copy link
Author

I've added those directives in all those places. (need to have one in Inertia.h, btw, too). The data in the properties structure is definitely garbage - i'm stepping through it, and the assertion in setDampingCoefficient is firing because d is ~ -9e-308.

stepthrough :

  1. line 419 skelParser.cpp
    pair = createJointAndNodePair<dynamics::BodyNode>(skeleton, parent, joint, static_cast<const dynamics::BodyNode::Properties&>(*body.properties));

2)line 398 skelParser.cpp
return skeleton->createJointAndBodyNodePair<dynamics::FreeJoint, BodyType>(parent, static_cast<const dynamics::FreeJoint::Properties&>(*joint.properties), body);
3)line 81 skeleton.h
JointType* joint = new JointType(_jointProperties);
4)line 141 freejoint.cpp
setProperties(_properties);
5) line 123 detail::multidofjoint.h (my line numbers are larger here because of cpy ctor - this is in function MultiDofJoint::setProperties(const Properties& _properties))
setProperties(static_cast<const UniqueProperties&>(_properties));
6) line 143 detail::multidofjoint.h (in MultiDofJoint::setProperties(const UniqueProperties& _properties))
setDampingCoefficient(i, _properties.mDampingCoefficients[i]);
7) line 1094 detail::multidofjoint.h (in MultiDofJoint::setDampingCoefficient(size_t _index, double _d))
assert(_d >= 0.0);

until i can fix this i won't be able to tell if all the alignment issues are addressed. (i will temporarily remove the assertion)

@jturner65
Copy link
Author

this should probably move to another issue.

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

Yeah, that's clearly an uninitialized garbage number.

Did you say that creating a user-defined copy constructor helped with this damping coefficient issue?

We could migrate this to another issue I suppose.

@jturner65
Copy link
Author

i'm sorry, i had something of a run on there and was unclear.
the user-defined uniqueproperties copy ctor was required to address the bad_alloc i was seeing in the init list of the multidof joint ctor, called from freejoint ctor.

the garbage data causing the assertion is later in the execution path - in the body of the freejoint ctor.

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

Wow, so the default copy constructor was resulting in a bad_alloc exception? That's even more insane than what I was interpreting.

Are you able to monitor the memory on your system as it runs? Could there be some huge memory leak going on? I don't know what causes bad_alloc except for running out of memory.

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

According to this thread, bad_alloc can occur from a race condition. You mentioned that you were running multithreaded applications, right? Could that have anything to do with what's going on?

@jturner65
Copy link
Author

there are no threads instantiated yet, this is from main. my multithreaded code is in a particular section much further down the execution path.

could this be the issue :
http://stackoverflow.com/questions/12577907/default-copy-constructor
"In C++11 you will not get a copy constructor if you have declared a move constructor or move assignment operator. Furthermore if you have declared a copy assignment operator or a destructor, the implicit generation of the copy constructor is deprecated. 12.8 [class.copy]:"

@jturner65
Copy link
Author

as i recall that struct has a destructor - could the "deprecated" == "removed in some compilers, not in others"?

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

At worst, the deprecation means it is a discouraged feature which will be removed in a later standard (probably C++17). That is a good thing to know about, though; I wasn't aware of that change.

The reason for the destructors being defined is to allow inheritance to work properly by making their destructors virtual. We don't do anything special in the destructors, so we're not violating The Rule of 3 / The Rule of 5. In the future we can declare this for each struct:

struct Properties
{
  Properties(const Properties& _other) = default;
  Properties& operator=(const Properties& _other) = default;
  // ... etc ...
};

But for right now, I'm fairly confident that this is not the issue.

When you created the copy constructor for UniqueProperties, how did you define it? Did you just do a memberwise setMember(_other.member)?

@jturner65
Copy link
Author

yep, pretty much.

in multidofjoint.h, the copy ctor declaration :
UniqueProperties(const UniqueProperties& _o);
in detail::multidofjoint.h :
template <size_t DOF>//copy ctor

MultiDofJoint<DOF>::UniqueProperties::UniqueProperties(const UniqueProperties& _o)
    : mPositionLowerLimits(_o.mPositionLowerLimits),
    mPositionUpperLimits(_o.mPositionUpperLimits),
    mVelocityLowerLimits(_o.mVelocityLowerLimits),
    mVelocityUpperLimits(_o.mVelocityUpperLimits),
    mAccelerationLowerLimits(_o.mAccelerationLowerLimits),
    mAccelerationUpperLimits(_o.mAccelerationUpperLimits),
    mForceLowerLimits(_o.mForceLowerLimits),
    mForceUpperLimits(_o.mForceUpperLimits),
    mSpringStiffnesses(_o.mSpringStiffnesses),
    mRestPositions(_o.mRestPositions),
    mDampingCoefficients(_o.mDampingCoefficients),
    mFrictions(_o.mFrictions) {
    for (size_t i = 0; i < DOF; ++i)
        mPreserveDofNames[i] = _o.mPreserveDofNames[i];
}

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

This is really bizarre, because the implicit copy constructor should be doing this same exact thing. I'm starting to wonder if the bad_alloc is due to some insane compiler error in Visual Studio.

@jturner65
Copy link
Author

i do recognize that i don't copy mDofNames, nor do i initialize either array's sizes. could this be an issue?

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

Ah, good point: Your constructor should actually be resulting in undefined behavior, because mPreserveDofNames was not given a size before you start to set its entries.

You should actually be able to include mDofNames and mPreserveDofNames in the initialization list, because std::vector has a perfectly reasonable copy constructor.

The fact that this copy constructor is circumventing a problem that was created by the implicit copy constructor is completely mind-boggling to me.

@jturner65
Copy link
Author

I actually copied the regular constructor - it too lacks both arrays getting initialized (so default initializer) and the mDofNames is not given any initial values.

i changed the constructor to this :

MultiDofJoint<DOF>::UniqueProperties::UniqueProperties(const UniqueProperties& _o)
    : mPositionLowerLimits(_o.mPositionLowerLimits),
    mPositionUpperLimits(_o.mPositionUpperLimits),
    mVelocityLowerLimits(_o.mVelocityLowerLimits),
    mVelocityUpperLimits(_o.mVelocityUpperLimits),
    mAccelerationLowerLimits(_o.mAccelerationLowerLimits),
    mAccelerationUpperLimits(_o.mAccelerationUpperLimits),
    mForceLowerLimits(_o.mForceLowerLimits),
    mForceUpperLimits(_o.mForceUpperLimits),
    mSpringStiffnesses(_o.mSpringStiffnesses),
    mRestPositions(_o.mRestPositions),
    mDampingCoefficients(_o.mDampingCoefficients),
    mFrictions(_o.mFrictions), 
    mPreserveDofNames(_o.mPreserveDofNames),
    mDofNames(_o.mDofNames)
{
    // Do Nothing
}

and i now get a bad_alloc again heh.

@jturner65
Copy link
Author

oh, btw those two are actually std::array, not vectors. so that's why the initialization wasn't failing before.

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

those two are actually std::array, not vectors.

Ding ding ding. I forgot that little tidbit, and I think it might explain everything. I vaguely remember reading that VS2013 does not fully support std::array yet. I'll look around to see if that's still true.

In the meantime, creating a user-defined copy constructor and setting each of the components of the arrays in a for-loop.

@jturner65
Copy link
Author

getting a bad_alloc in the copy ctor now - apparently the mdofnames array is never initialized, despite that i added an initializer in the ctor. the values in _o.mDofNames are :
{_Elems=0x006db7f8 {"+Ü", <Error reading characters of string.>, <Error reading characters of string.>, ...} }

@jturner65
Copy link
Author

this looks like it also is the result of an upcast from a "properties" (weirdly enough a dart::planarjoint::properties, even though this is being called from the freejoint code in skelparser.cpp)

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

So you're saying that the std::strings in mDofNames are not being initialized to empty strings by the default constructor for MultiDofJoint::UniqueProperties? I didn't realize VS2013 support for arrays was that bad. In the default constructor for MultiDofJoint::UniqueProperties, you could try changing

  for (size_t i = 0; i < DOF; ++i)
    mPreserveDofNames[i] = false;

to

  for (size_t i = 0; i < DOF; ++i)
  {
    mPreserveDofNames[i] = false;
    mDofNames[i] = "";
  }

That should ensure that they get initialized. Then you'll want to make sure that the user-defined copy constructor has something similar.

this looks like it also is the result of an upcast from a "properties" (weirdly enough a dart::planarjoint::properties, even though this is being called from the freejoint code in skelparser.cpp)

I'm not really clear on what you mean here. If a PlanarJoint::Properties is being handled somewhere that a FreeJoint::Properties is expected, then that sounds like a bug to me. PlanarJoint and FreeJoint have a different number of DOFs, so their MultiDofJoint base classes are incompatible.

@jturner65
Copy link
Author

apologies, the planar joint thing was a bug from me redoing the make_shared to the alloc_shared. copypasta error.

on that note, in skelparser.cpp, line 527 i replaced the make_shared to allocate_shared as so :
was :

      //rootJoint.properties = std::make_shared<dynamics::FreeJoint::Properties>(
      //      dynamics::Joint::Properties("root", rootNode->second.initTransform));

to

      auto alloc = Eigen::aligned_allocator < dynamics::Joint::Properties >();
      rootJoint.properties = std::allocate_shared<dynamics::FreeJoint::Properties>(alloc, dynamics::Joint::Properties("root", rootNode->second.initTransform));

and my question is should the allocator be a Joint::Properties or a FreeJoint::Properties ?

thanks, and sorry again, that was weird.

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

I would think you want Eigen::aligned_allocator<FreeJoint::Properties>(). I imagine more detail is better in this case.

Alternatively, I think you could merge in #414 and use Eigen::make_aligned_shared. It looks to me like that should work directly.

@jturner65
Copy link
Author

ah, and that's it. by replacing all the make_shared calls, putting the alignment macros in, putting in the initialization and the copy ctor, and putting the allocators in for the stl containers, it fixed everything (skel reads in). obviously the bug i introduced (planar instead of free) was the source of the weird properties nonsense we've been discussing the last few posts. thanks for helping.

@mxgrey
Copy link
Member

mxgrey commented Jun 17, 2015

Phew. Thanks for doing all the testing on this. I should really try to set up my Windows machine to run DART, but I can never decipher the linking issues I run into with Visual Studio.

So in summary: VS2013 has a half-baked implementation of std::array, and we cannot rely on all compilers to support C++11 alignment.

@jturner65
Copy link
Author

my pleasure, and yeah, that gives a pretty good rough summary heh.

so, to recap, what i did (other than the check for order being empty) :

  1. needed EIGEN_MAKE_ALIGNED_OPERATOR_NEW in many locations, listed above in post Robot feet seem to stick to the floor #10 (along with Inertia.h) - i also put this in structures that had structures containing Eigen vectorizable structures, and so on.
  2. replace all make_shared with allocate_shared for any structures that contain Eigen vectorizable structs, or structs holding such structs.
  3. allocators for all stl containers that hold structures that have Eigen vectorizable constructs
  4. needed a copy ctor for multidofjoint::UniqueProperties

and that's about it. if i think of anything else that I had to do i'll edit this post, but i'm pretty sure that's it.

@mxgrey
Copy link
Member

mxgrey commented Jun 18, 2015

I think the branch of Pull Request #414 should now work for you. Whenever you get a chance, try it out and let me know if it's working.

@jturner65
Copy link
Author

thanks, i should be able to get to it later today, and i'll update you with what i find.

@jturner65
Copy link
Author

sorry took so long to get back to you. other than the GL_MULTISAMPLE support, your fix + #414 appear to have fixed the problems discussed in this issue as well as the other skelparser issue, so i will close them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants