Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(ring_outlier_filter): performance tuning #3014

Merged
merged 13 commits into from
May 9, 2023

Conversation

sykwer
Copy link
Member

@sykwer sykwer commented Mar 6, 2023

Description

This PR makes ring_outlier_filter faster without changing the logical output. The tail latency of the ring_outlier_filter node gets about x3 faster with the introduction of the TLSF allocator (see this Autoware Discussion) and this PR merged.

ring_outlier_filter_performance_improvement_page-0001

Measurement Condition

Related links

Tests performed

Check if ring_outlier_filter publishes the same logical output as before (using Autoware Universe rosbag simulation)

Notes for reviewers

Pre-review checklist for the PR author

The PR author must check the checkboxes below when creating the PR.

In-review checklist for the PR reviewers

The PR reviewers must check the checkboxes below before approval.

  • The PR follows the pull request guidelines.
  • The PR has been properly tested.
  • The PR has been reviewed by the code owners.

Post-review checklist for the PR author

The PR author must check the checkboxes below before merging.

  • There are no open discussions or they are tracked via tickets.
  • The PR is ready for merge.

After all checkboxes are checked, anyone who has write access can merge the PR.

Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>
@github-actions github-actions bot added the component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) label Mar 6, 2023
for (int i = walk_first_idx; i <= walk_last_idx; i++) {
auto output_ptr = reinterpret_cast<PointXYZI *>(&output.data[output_size]);
*output_ptr = *reinterpret_cast<const PointXYZI *>(&input->data[indices[i]]);
output_size += output.point_step;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part assume data layout is PointXYZI or at least that input memory layout starts the same than PointXYZI. In practice it will just copy the first 4 fields. Is there a reason to ignore the other fields from input (ring, azimut, etc?) I know today Autoware use the ring filter as the last filter that needs "extended" fields, but who knows if it will still be true in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, it is fine with no changes (cf.

float x{0.0F};
float y{0.0F};
float z{0.0F};
char padding1[4]{0U};
float intensity{0.0F};
)

But we have a bug and are dealing with it in #2618.

In the long-term perspective, we will make chagne according to this design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thank you for the link. I find the manual 4-bytes padding quite strange... but that's another topic.
Still, I think this way of using reinterpret_cast is UB. For example, since the the compiler is not told not to pad the 2 structs with annotations such as __attribute__((packed)), the compiler is "free" to pad the structs as it sees fit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a universal problem in Autoware and not related to this Pull Request, so please raise another Issue.

std::max(current_pt_distance, next_pt_distance) <
std::min(current_pt_distance, next_pt_distance) * distance_ratio_ &&
std::max(current_distance, next_distance) <
std::min(current_distance, next_distance) * distance_ratio_ &&
azimuth_diff < 100.f) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should azimuth_diff < 100.f be a parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number has been hard-coded for a long period of time and should, of course, be parameterised, but this is not necessary.

for (size_t idx = 0U; idx < indices.size() - 1; ++idx) {
const size_t & current_data_idx = indices[idx];
const size_t & next_data_idx = indices[idx + 1];
walk_last_idx = idx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are skipping the last point with this logic.
For example, let's imagine a perfect circle ring with N points.

When we first enter the first loop we have :

idx = 0
walk_first_idx = 0
walk_last_idx = idx = 0
current_data_idx = indices[0]
next_data_idx= indices[1]

The ring is a perfect circle, so all the points are part of the same walk. So we always continue here:

      if (
        std::max(current_distance, next_distance) <
          std::min(current_distance, next_distance) * distance_ratio_ &&
        azimuth_diff < 100.f) {
        continue;  // Determined to be included in the same walk
      }

On the last loop we have:

idx = N-2
walk_first_idx = 0
walk_last_idx = idx = N-2
current_data_idx = indices[N-2]
next_data_idx= indices[N-1]

So after the loop, we have these lines:

  if (isCluster(
          input, indices[walk_first_idx], indices[walk_last_idx],
          walk_last_idx - walk_first_idx + 1)) {
      for (int i = walk_first_idx; i <= walk_last_idx; i++) {
        auto output_ptr = reinterpret_cast<PointXYZI *>(&output.data[output_size]);
        *output_ptr = *reinterpret_cast<const PointXYZI *>(&input->data[indices[i]]);
        output_size += output.point_step;
      }
    }
  }

which basically copy points from walk_first_idx = 0 to walk_last_idx = N-2, thus skipping the last point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as below

}
}
tmp_indices.clear();
}
Copy link
Contributor

@VRichardJP VRichardJP Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the line above is unrelated to my comment.

Another approximation I have found with this implementation is that the ring end is not "connected" to its start.

For example, let's imagine this ring:

ring : xxxx..xxxxxxxxxxx..xxxx.xx
index: abcdefghijklmnopqrstuvwxyz

Where chains of x represent points from the same "walk", and . the noisy points ment to be filtered. The index alphabet is just for the explanation. So in the situation here, there are 3 rings:

  1. from 'g' to 'q'
  2. from 't' to 'w'
  3. from 'y' to 'd' (wrapped around)

However in your implementation, we start at index 0 ('a') and the start/end of the ring are not considered connected. So you would find the rings:

  1. from 'a' to 'd'
  2. from 'g' to 'q'
  3. from 't' to 'w'

The algorithm would consider the walk 'y' -> 'z', but as it is too short it would ignore the points.

With the default parameters num_points_threshold_ = 4, up to 3 points could be missing because of that. When using bigger num_points_threshold_ value, the filter would sometimes produce some big blinspot in the data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this problem has been around for a while and has nothing to do with this Pull Request, can you please create a new Issue?

Copy link
Contributor

@VRichardJP VRichardJP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm annoying with all my comments ;-)

sykwer added 2 commits March 29, 2023 22:14
Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>
Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>
Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>
@github-actions github-actions bot added the type:documentation Creating or refining documentation. (auto-assigned) label Apr 3, 2023
@sykwer sykwer marked this pull request as ready for review April 3, 2023 18:22
@sykwer sykwer requested review from amc-nu, miursh, yukkysaito and a team as code owners April 3, 2023 18:22
pre-commit-ci bot and others added 2 commits April 3, 2023 18:24
Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>
@codecov
Copy link

codecov bot commented Apr 4, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.15 ⚠️

Comparison is base (54d45e5) 12.75% compared to head (378b023) 12.61%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3014      +/-   ##
==========================================
- Coverage   12.75%   12.61%   -0.15%     
==========================================
  Files        1218     1223       +5     
  Lines       85981    86953     +972     
  Branches    24469    24469              
==========================================
  Hits        10965    10965              
- Misses      63655    64627     +972     
  Partials    11361    11361              
Flag Coverage Δ *Carryforward flag
differential 0.00% <0.00%> (?)
total 12.75% <ø> (+<0.01%) ⬆️ Carriedforward from 36cdac6

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files Coverage Δ
...sor/outlier_filter/ring_outlier_filter_nodelet.hpp 0.00% <0.00%> (ø)
sensing/pointcloud_preprocessor/src/filter.cpp 0.00% <0.00%> (ø)
...src/outlier_filter/ring_outlier_filter_nodelet.cpp 0.00% <0.00%> (ø)

... and 12 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

auto y = p1[1] - p2[1];
auto z = p1[2] - p2[2];

return x * x + y * y + z * z >= object_length_threshold_ * object_length_threshold_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further optimization idea:
Another way to compute the distance between the 2 points is to use the law of cosines: dist2(p1,p2) = d1^2 + d2^2 - 2*d1*d2*cos(a). Where d1 and d2 are the distance attribute of p1 and p2, and a the azimut diff between the 2 points.

What is interesting with this approach is that a is always small. Indeed, since isCluster has early returns when walk has more than num_points_threshold_ (3 by default), it means the current walk has 3 points or less. But to be added to the walk, the azimut diff between 2 points must be less than 1 degree. So the a value is at most 3 degrees. Thus cos(a) ~= 1.

The distance calculation can be simplified to: dist2(p1,p2) = (d1 - d2)^2.

-> we don't need to fetch the xyz points anymore, just the 2 distance attributes are necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal changes the logical output by approximation. It is beyond the scope of this Pull Request to speed up the processing without changing the logical output. It would be appreciated if you could start a new issue for discussion and agreement.

Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>
@sykwer sykwer enabled auto-merge (squash) April 17, 2023 11:08
This reverts commit 1463b7d.
@sykwer sykwer disabled auto-merge April 18, 2023 14:35
@sykwer sykwer enabled auto-merge (squash) April 18, 2023 15:33
@sykwer sykwer requested a review from Shin-kyoto April 28, 2023 21:33
@sykwer
Copy link
Member Author

sykwer commented Apr 28, 2023

@Shin-kyoto
I've confirmed by your tests that the number of points is unchanged before and after this PR.

@Shin-kyoto
Copy link
Contributor

@Shin-kyoto I've confirmed by your tests that the number of points is unchanged before and after this PR.

@sykwer Thanks! I have also confirmed that the number of points and the x,y,z value of each point is unchanged before and after this PR.

Copy link
Contributor

@Shin-kyoto Shin-kyoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also confirmed that the number of points and the x,y,z value of each point is unchanged before and after this PR.
I checked whether the number of point clouds per topic and the values (x, y, z) of each point in the point clouds remained unchanged before and after the pull request. I used sample rosbag used in tutorial of rosbag replay simulation
The results are as follows:

  • Number of point clouds per topic: No difference before and after the pull request.
  • Values (x, y, z) of each point in the point clouds within the topic: No difference before and after the pull request for each of x, y, and z.

LGTM

@sykwer sykwer merged commit 81e03ae into autowarefoundation:main May 9, 2023
asa-naki added a commit to T4-FY23-AW-Training-Team1/autoware.universe that referenced this pull request May 19, 2023
@miursh
Copy link
Contributor

miursh commented May 27, 2023

I just found a small bug in output pointcloud filed.
The output says the offset of intensity is 16 in the pointfiled even the point_step is 16, which causes the next point's x value is read out as the previous point's intensity value.

height: 1
fields:
- name: x
  offset: 0
  datatype: 7
  count: 1
- name: y
  offset: 4
  datatype: 7
  count: 1
- name: z
  offset: 8
  datatype: 7
  count: 1
- name: intensity
  offset: 16
  datatype: 7
  count: 1
is_bigendian: false
point_step: 16

mojomex added a commit to mojomex/autoware.universe that referenced this pull request Jul 6, 2023
The ring indices were not read correctly from point buffers.
The ring/azimuth/etc. offsets were set to indices of the respective
fields in the input buffer's fields vector.
The missing lookup of the offset in the respective index's field
definition was added.

As discussed in autowarefoundation#3014 the law of cosines approach for calculating
point distances in isCluster ields logically different results
and thus needs more disussion before being merged. The isCluster
implementation in this commit has been reverted to the previous
euclidean distance check.

LiDAR point representation in the ad-hoc WalkInfo struct has been
replaced by a more readable ad-hoc PointXYZAD struct representation.
Once the input data has an enforced, consistent format, existing
point types can be used instead.

Signed-off-by: Maximilian Schmeller <maximilian.schmeller@tier4.jp>
mojomex added a commit to mojomex/autoware.universe that referenced this pull request Jul 7, 2023
The ring indices were not read correctly from point buffers.
The ring/azimuth/etc. offsets were set to indices of the respective
fields in the input buffer's fields vector.
The missing lookup of the offset in the respective index's field
definition was added.

As discussed in autowarefoundation#3014 the law of cosines approach for calculating
point distances in isCluster ields logically different results
and thus needs more disussion before being merged. The isCluster
implementation in this commit has been reverted to the previous
euclidean distance check.

LiDAR point representation in the ad-hoc WalkInfo struct has been
replaced by a more readable ad-hoc PointXYZAD struct representation.
Once the input data has an enforced, consistent format, existing
point types can be used instead.

Signed-off-by: Maximilian Schmeller <maximilian.schmeller@tier4.jp>
mojomex added a commit to mojomex/autoware.universe that referenced this pull request Jul 10, 2023
The ring indices were not read correctly from point buffers.
The ring/azimuth/etc. offsets were set to indices of the respective
fields in the input buffer's fields vector.
The missing lookup of the offset in the respective index's field
definition was added.

As discussed in autowarefoundation#3014 the law of cosines approach for calculating
point distances in isCluster ields logically different results
and thus needs more disussion before being merged. The isCluster
implementation in this commit has been reverted to the previous
euclidean distance check.

LiDAR point representation in the ad-hoc WalkInfo struct has been
replaced by a more readable ad-hoc PointXYZAD struct representation.
Once the input data has an enforced, consistent format, existing
point types can be used instead.

Signed-off-by: Maximilian Schmeller <maximilian.schmeller@tier4.jp>
1222-takeshi added a commit to 1222-takeshi/autoware.universe that referenced this pull request Jul 31, 2023
1222-takeshi added a commit to 1222-takeshi/autoware.universe that referenced this pull request Jul 31, 2023
1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Aug 10, 2023
1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Aug 14, 2023
1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Aug 15, 2023
1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Aug 16, 2023
1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Aug 24, 2023
1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Sep 1, 2023
1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Sep 6, 2023
1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) type:documentation Creating or refining documentation. (auto-assigned)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants