perf(ring_outlier_filter): performance tuning #3014

sykwer · 2023-03-06T18:16:18Z

Description

This PR makes ring_outlier_filter faster without changing the logical output. The tail latency of the ring_outlier_filter node gets about x3 faster with the introduction of the TLSF allocator (see this Autoware Discussion) and this PR merged.

Measurement Condition

Ubuntu22.04 + ROS2 Humble + Autoware Universe rosbag simulation
Core Isolated
Core Frequency Fixed (2.6GHz)
L3 Cache: 12MB

Tests performed

Check if ring_outlier_filter publishes the same logical output as before (using Autoware Universe rosbag simulation)

Notes for reviewers

Pre-review checklist for the PR author

The PR author must check the checkboxes below when creating the PR.

I've confirmed the contribution guidelines.
The PR follows the pull request guidelines.

In-review checklist for the PR reviewers

The PR reviewers must check the checkboxes below before approval.

The PR follows the pull request guidelines.
The PR has been properly tested.
The PR has been reviewed by the code owners.

Post-review checklist for the PR author

The PR author must check the checkboxes below before merging.

There are no open discussions or they are tracked via tickets.
The PR is ready for merge.

After all checkboxes are checked, anyone who has write access can merge the PR.

Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>

sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp

VRichardJP · 2023-03-07T23:50:39Z

sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp

+        for (int i = walk_first_idx; i <= walk_last_idx; i++) {
+          auto output_ptr = reinterpret_cast<PointXYZI *>(&output.data[output_size]);
+          *output_ptr = *reinterpret_cast<const PointXYZI *>(&input->data[indices[i]]);
+          output_size += output.point_step;


This part assume data layout is PointXYZI or at least that input memory layout starts the same than PointXYZI. In practice it will just copy the first 4 fields. Is there a reason to ignore the other fields from input (ring, azimut, etc?) I know today Autoware use the ring filter as the last filter that needs "extended" fields, but who knows if it will still be true in the future?

At the moment, it is fine with no changes (cf.

autoware.universe/common/autoware_point_types/include/autoware_point_types/types.hpp

Lines 59 to 63 in c7d84c6

float x{0.0F};

float y{0.0F};

float z{0.0F};

char padding1[4]{0U};

float intensity{0.0F};

)

But we have a bug and are dealing with it in #2618.

In the long-term perspective, we will make chagne according to this design.

I see. Thank you for the link. I find the manual 4-bytes padding quite strange... but that's another topic.
Still, I think this way of using reinterpret_cast is UB. For example, since the the compiler is not told not to pad the 2 structs with annotations such as __attribute__((packed)), the compiler is "free" to pad the structs as it sees fit.

It is a universal problem in Autoware and not related to this Pull Request, so please raise another Issue.

VRichardJP · 2023-03-07T23:55:57Z

sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp

-        std::max(current_pt_distance, next_pt_distance) <
-          std::min(current_pt_distance, next_pt_distance) * distance_ratio_ &&
+        std::max(current_distance, next_distance) <
+          std::min(current_distance, next_distance) * distance_ratio_ &&
        azimuth_diff < 100.f) {


should azimuth_diff < 100.f be a parameter?

This number has been hard-coded for a long period of time and should, of course, be parameterised, but this is not necessary.

VRichardJP · 2023-03-08T00:25:00Z

sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp

+    for (size_t idx = 0U; idx < indices.size() - 1; ++idx) {
+      const size_t & current_data_idx = indices[idx];
+      const size_t & next_data_idx = indices[idx + 1];
+      walk_last_idx = idx;


I think we are skipping the last point with this logic.
For example, let's imagine a perfect circle ring with N points.

When we first enter the first loop we have :

idx = 0 walk_first_idx = 0 walk_last_idx = idx = 0 current_data_idx = indices[0] next_data_idx= indices[1]

The ring is a perfect circle, so all the points are part of the same walk. So we always continue here:

if ( std::max(current_distance, next_distance) < std::min(current_distance, next_distance) * distance_ratio_ && azimuth_diff < 100.f) { continue; // Determined to be included in the same walk }

On the last loop we have:

idx = N-2 walk_first_idx = 0 walk_last_idx = idx = N-2 current_data_idx = indices[N-2] next_data_idx= indices[N-1]

So after the loop, we have these lines:

if (isCluster( input, indices[walk_first_idx], indices[walk_last_idx], walk_last_idx - walk_first_idx + 1)) { for (int i = walk_first_idx; i <= walk_last_idx; i++) { auto output_ptr = reinterpret_cast<PointXYZI *>(&output.data[output_size]); *output_ptr = *reinterpret_cast<const PointXYZI *>(&input->data[indices[i]]); output_size += output.point_step; } } }

which basically copy points from walk_first_idx = 0 to walk_last_idx = N-2, thus skipping the last point.

same as below

VRichardJP · 2023-03-08T00:42:06Z

sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp

      }
    }
-    tmp_indices.clear();
  }


Note: the line above is unrelated to my comment.

Another approximation I have found with this implementation is that the ring end is not "connected" to its start.

For example, let's imagine this ring:

ring : xxxx..xxxxxxxxxxx..xxxx.xx index: abcdefghijklmnopqrstuvwxyz

Where chains of x represent points from the same "walk", and . the noisy points ment to be filtered. The index alphabet is just for the explanation. So in the situation here, there are 3 rings:

from 'g' to 'q'

from 't' to 'w'

from 'y' to 'd' (wrapped around)

However in your implementation, we start at index 0 ('a') and the start/end of the ring are not considered connected. So you would find the rings:

from 'a' to 'd'

from 'g' to 'q'

from 't' to 'w'

The algorithm would consider the walk 'y' -> 'z', but as it is too short it would ignore the points.

With the default parameters num_points_threshold_ = 4, up to 3 points could be missing because of that. When using bigger num_points_threshold_ value, the filter would sometimes produce some big blinspot in the data.

Since this problem has been around for a while and has nothing to do with this Pull Request, can you please create a new Issue?

sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp

VRichardJP

Sorry I'm annoying with all my comments ;-)

sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp

Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>

codecov · 2023-04-04T04:35:18Z

Codecov Report

Patch coverage has no change and project coverage change: -0.15 ⚠️

Comparison is base (54d45e5) 12.75% compared to head (378b023) 12.61%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3014      +/-   ##
==========================================
- Coverage   12.75%   12.61%   -0.15%     
==========================================
  Files        1218     1223       +5     
  Lines       85981    86953     +972     
  Branches    24469    24469              
==========================================
  Hits        10965    10965              
- Misses      63655    64627     +972     
  Partials    11361    11361

Flag	Coverage Δ		*Carryforward flag
differential	`0.00% <0.00%> (?)`
total	`12.75% <ø> (+<0.01%)`	⬆️	Carriedforward from 36cdac6

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files	Coverage Δ
...sor/outlier_filter/ring_outlier_filter_nodelet.hpp	`0.00% <0.00%> (ø)`
sensing/pointcloud_preprocessor/src/filter.cpp	`0.00% <0.00%> (ø)`
...src/outlier_filter/ring_outlier_filter_nodelet.cpp	`0.00% <0.00%> (ø)`

... and 12 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

..._preprocessor/include/pointcloud_preprocessor/outlier_filter/ring_outlier_filter_nodelet.hpp

VRichardJP · 2023-04-05T03:08:03Z

..._preprocessor/include/pointcloud_preprocessor/outlier_filter/ring_outlier_filter_nodelet.hpp

+    auto y = p1[1] - p2[1];
+    auto z = p1[2] - p2[2];
+
+    return x * x + y * y + z * z >= object_length_threshold_ * object_length_threshold_;


Further optimization idea:
Another way to compute the distance between the 2 points is to use the law of cosines: dist2(p1,p2) = d1^2 + d2^2 - 2*d1*d2*cos(a). Where d1 and d2 are the distance attribute of p1 and p2, and a the azimut diff between the 2 points.

What is interesting with this approach is that a is always small. Indeed, since isCluster has early returns when walk has more than num_points_threshold_ (3 by default), it means the current walk has 3 points or less. But to be added to the walk, the azimut diff between 2 points must be less than 1 degree. So the a value is at most 3 degrees. Thus cos(a) ~= 1.

The distance calculation can be simplified to: dist2(p1,p2) = (d1 - d2)^2.

-> we don't need to fetch the xyz points anymore, just the 2 distance attributes are necessary.

This proposal changes the logical output by approximation. It is beyond the scope of this Pull Request to speed up the processing without changing the logical output. It would be appreciated if you could start a new issue for discussion and agreement.

Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>

This reverts commit 1463b7d.

sykwer · 2023-04-28T21:59:49Z

@Shin-kyoto
I've confirmed by your tests that the number of points is unchanged before and after this PR.

Shin-kyoto · 2023-05-09T12:51:17Z

@Shin-kyoto I've confirmed by your tests that the number of points is unchanged before and after this PR.

@sykwer Thanks! I have also confirmed that the number of points and the x,y,z value of each point is unchanged before and after this PR.

Shin-kyoto

I have also confirmed that the number of points and the x,y,z value of each point is unchanged before and after this PR.
I checked whether the number of point clouds per topic and the values (x, y, z) of each point in the point clouds remained unchanged before and after the pull request. I used sample rosbag used in tutorial of rosbag replay simulation
The results are as follows:

Number of point clouds per topic: No difference before and after the pull request.
Values (x, y, z) of each point in the point clouds within the topic: No difference before and after the pull request for each of x, y, and z.

LGTM

…tion#3014)" This reverts commit 81e03ae.

miursh · 2023-05-27T17:57:55Z

I just found a small bug in output pointcloud filed.
The output says the offset of intensity is 16 in the pointfiled even the point_step is 16, which causes the next point's x value is read out as the previous point's intensity value.

height: 1
fields:
- name: x
  offset: 0
  datatype: 7
  count: 1
- name: y
  offset: 4
  datatype: 7
  count: 1
- name: z
  offset: 8
  datatype: 7
  count: 1
- name: intensity
  offset: 16
  datatype: 7
  count: 1
is_bigendian: false
point_step: 16

The ring indices were not read correctly from point buffers. The ring/azimuth/etc. offsets were set to indices of the respective fields in the input buffer's fields vector. The missing lookup of the offset in the respective index's field definition was added. As discussed in autowarefoundation#3014 the law of cosines approach for calculating point distances in isCluster ields logically different results and thus needs more disussion before being merged. The isCluster implementation in this commit has been reverted to the previous euclidean distance check. LiDAR point representation in the ad-hoc WalkInfo struct has been replaced by a more readable ad-hoc PointXYZAD struct representation. Once the input data has an enforced, consistent format, existing point types can be used instead. Signed-off-by: Maximilian Schmeller <maximilian.schmeller@tier4.jp>

…tion#3014)" This reverts commit 81e03ae.

Perforance tuning

db2d220

Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>

github-actions bot added the component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) label Mar 6, 2023

style(pre-commit): autofix

cb103ec

VRichardJP reviewed Mar 7, 2023

View reviewed changes

sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp Show resolved Hide resolved

VRichardJP reviewed Mar 7, 2023

View reviewed changes

sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp Show resolved Hide resolved

VRichardJP reviewed Mar 7, 2023

View reviewed changes

VRichardJP reviewed Mar 8, 2023

View reviewed changes

sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp Outdated Show resolved Hide resolved

VRichardJP reviewed Mar 8, 2023

View reviewed changes

sensing/pointcloud_preprocessor/src/outlier_filter/ring_outlier_filter_nodelet.cpp Show resolved Hide resolved

sykwer added 2 commits March 29, 2023 22:14

parameterize

e6e1a77

Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>

fix bug

cd99895

Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>

This was referenced Mar 30, 2023

On abusive use of reinterpret_cast UB #3215

Closed

ring_outlier_filter skips last input point #3217

Closed

ring_outlier_filter ignores ring's "end" point and "start" point are neighbors #3218

Closed

add transform computation

390b279

Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>

github-actions bot added the type:documentation Creating or refining documentation. (auto-assigned) label Apr 3, 2023

sykwer marked this pull request as ready for review April 3, 2023 18:22

sykwer requested review from amc-nu, miursh, yukkysaito and a team as code owners April 3, 2023 18:22

pre-commit-ci bot and others added 2 commits April 3, 2023 18:24

style(pre-commit): autofix

88da12b

fix

7213591

Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>

VRichardJP reviewed Apr 5, 2023

View reviewed changes

..._preprocessor/include/pointcloud_preprocessor/outlier_filter/ring_outlier_filter_nodelet.hpp Outdated Show resolved Hide resolved

VRichardJP reviewed Apr 5, 2023

View reviewed changes

VRichardJP mentioned this pull request Apr 6, 2023

perf(ring_outlier_filter): a cache friendly impl #3293

Closed

7 tasks

change param default value

36cdac6

Signed-off-by: Takahiro Ishikawa <sykwer@gmail.com>

sykwer enabled auto-merge (squash) April 17, 2023 11:08

Revert "fix"

378b023

This reverts commit 1463b7d.

sykwer disabled auto-merge April 18, 2023 14:35

sykwer enabled auto-merge (squash) April 18, 2023 15:33

sykwer requested a review from Shin-kyoto April 28, 2023 21:33

Shin-kyoto approved these changes May 9, 2023

View reviewed changes

miursh approved these changes May 9, 2023

View reviewed changes

sykwer merged commit 81e03ae into autowarefoundation:main May 9, 2023

tier4-autoware-public-bot bot mentioned this pull request May 10, 2023

chore: sync upstream tier4/autoware.universe#418

Merged

asa-naki added a commit to T4-FY23-AW-Training-Team1/autoware.universe that referenced this pull request May 19, 2023

Revert "perf(ring_outlier_filter): performance tuning (autowarefounda…

89a7a75

…tion#3014)" This reverts commit 81e03ae.

miursh mentioned this pull request May 28, 2023

fix(pointcloud_preprocessor): fix output intensity value of ring outlier filter #2618

Merged

4 tasks

VRichardJP mentioned this pull request Jun 1, 2023

ring_outlier_filter blows up my cache (and yours too) #3269

Closed

3 tasks

mojomex mentioned this pull request Jul 6, 2023

perf(ring_outlier_filter): a cache friendly impl (continuation of VRichardJP's work) #4185

Merged

7 tasks

1222-takeshi added a commit to 1222-takeshi/autoware.universe that referenced this pull request Jul 31, 2023

Revert "perf(ring_outlier_filter): performance tuning (autowarefounda…

27811b5

…tion#3014)" This reverts commit 81e03ae.

1222-takeshi added a commit to 1222-takeshi/autoware.universe that referenced this pull request Jul 31, 2023

Revert "perf(ring_outlier_filter): performance tuning (autowarefounda…

6834a0d

…tion#3014)" This reverts commit 81e03ae.

1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Aug 10, 2023

Revert "perf(ring_outlier_filter): performance tuning (autowarefounda…

2a4f383

…tion#3014)" This reverts commit 81e03ae.

1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Aug 14, 2023

Revert "perf(ring_outlier_filter): performance tuning (autowarefounda…

ba6d3eb

…tion#3014)" This reverts commit 81e03ae.

1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Aug 15, 2023

Revert "perf(ring_outlier_filter): performance tuning (autowarefounda…

a14a072

…tion#3014)" This reverts commit 81e03ae.

1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Aug 16, 2023

Revert "perf(ring_outlier_filter): performance tuning (autowarefounda…

864092f

…tion#3014)" This reverts commit 81e03ae.

1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Aug 24, 2023

Revert "perf(ring_outlier_filter): performance tuning (autowarefounda…

126fa33

…tion#3014)" This reverts commit 81e03ae.

1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Sep 1, 2023

Revert "perf(ring_outlier_filter): performance tuning (autowarefounda…

2ac4431

…tion#3014)" This reverts commit 81e03ae.

1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Sep 6, 2023

Revert "perf(ring_outlier_filter): performance tuning (autowarefounda…

eca3df2

…tion#3014)" This reverts commit 81e03ae.

1222-takeshi added a commit to tier4/autoware.universe that referenced this pull request Sep 8, 2023

Revert "perf(ring_outlier_filter): performance tuning (autowarefounda…

a98dacb

…tion#3014)" This reverts commit 81e03ae.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(ring_outlier_filter): performance tuning #3014

perf(ring_outlier_filter): performance tuning #3014

sykwer commented Mar 6, 2023 •

edited

Loading

VRichardJP Mar 7, 2023

sykwer Mar 16, 2023

VRichardJP Mar 16, 2023

sykwer Mar 29, 2023

VRichardJP Mar 7, 2023

sykwer Mar 17, 2023

VRichardJP Mar 8, 2023

sykwer Mar 29, 2023

VRichardJP Mar 8, 2023 •

edited

Loading

sykwer Mar 29, 2023

VRichardJP left a comment

codecov bot commented Apr 4, 2023 •

edited

Loading

VRichardJP Apr 5, 2023

sykwer Apr 11, 2023

sykwer commented Apr 28, 2023

Shin-kyoto commented May 9, 2023

Shin-kyoto left a comment •

edited

Loading

miursh commented May 27, 2023 •

edited

Loading

	float x{0.0F};
	float y{0.0F};
	float z{0.0F};
	char padding1[4]{0U};
	float intensity{0.0F};

perf(ring_outlier_filter): performance tuning #3014

perf(ring_outlier_filter): performance tuning #3014

Conversation

sykwer commented Mar 6, 2023 • edited Loading

Description

Related links

Tests performed

Notes for reviewers

Pre-review checklist for the PR author

In-review checklist for the PR reviewers

Post-review checklist for the PR author

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VRichardJP Mar 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VRichardJP left a comment

Choose a reason for hiding this comment

codecov bot commented Apr 4, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sykwer commented Apr 28, 2023

Shin-kyoto commented May 9, 2023

Shin-kyoto left a comment • edited Loading

Choose a reason for hiding this comment

miursh commented May 27, 2023 • edited Loading

sykwer commented Mar 6, 2023 •

edited

Loading

VRichardJP Mar 8, 2023 •

edited

Loading

codecov bot commented Apr 4, 2023 •

edited

Loading

Shin-kyoto left a comment •

edited

Loading

miursh commented May 27, 2023 •

edited

Loading