Skip to content

Latest commit

 

History

History
97 lines (66 loc) · 3.42 KB

File metadata and controls

97 lines (66 loc) · 3.42 KB

Evaluation

This page describes the evaluation metrics used by this benchmark. Currently, three types of map elements are evaluated, including pedestrian crossing, lane divider and road boundary.

Rasterized Map Construction

Output

The final output of this task is a BEV segmentation image for each input sample. The image size is set to $400\times 200$. The ground-truth labels are generated by drawing the map elements on the BEV canvas with line width set to $3$.

Metrics

As commonly used in semantic segmentation tasks, Intersection over Union (IoU) is used for characterizing the performance of a model on rasterized map construction.

$$ IoU=\frac{TP}{TP+FP+FN} $$

Here is an explanation.

IoU

Format

Format for submission file

rasterized_submisson {
  "meta": {
    "use_camera":   <bool>  -- Whether this submission uses camera data as an input.
    "use_lidar":    <bool>  -- Whether this submission uses lidar data as an input.
    "use_radar":    <bool>  -- Whether this submission uses radar data as an input.
    "use_external": <bool>  -- Whether this submission uses external data as an input.
    "output_format": "raster" -- This submission uses rasterized format.
  },
  "results": {
    token <str>: {     -- Maps each predictions by tokens.
      "semantic_mask": Array[<bool, (C, H, W)>],   -- mask in 3 channels (C=0: ped; 1: divider 2: boundary). The values are 0 or 1.
    }
  }
}

Vectorized Map Construction

Output

The final output for each input sample is a set of polylines, which is similar to a set of bounding boxes in object detection. Every line has a class label and a confidence score.

Metrics

Average Precision (AP) is used for characterizing the performance of models on vectorized map construction. Matching between predicted lines and ground-truth lines are based on their spatial distance, which is calculated by Chamfer Distance (CD) on interpolated sample points. For two lines $L_1$ and $L_2$, two set of points $S_1$ and $S_2$ are sampled from them, respectively. The CD are calculated as follows. The direction of lines is not considered in CD.

$$ CD_{dir}(S_1, S_2)=\frac{1}{|S_1|}\sum_{x\in S_1}\min_{y\in S_2}\Vert x-y\Vert_2 $$

$$ CD(S_1, S_2)=\frac{1}{2}CD(S_1, S_2)+\frac{1}{2}CD(S_2+S_1) $$

Then AP is calculated according to the matching results as same as the one used in object detection.

$$ AP=\sum_r{\mathrm{Pr}_r} $$

where $\mathrm{Pr_r}$ is the precision when recall is at $r$.

Format

Format for submission file

vectorized_submisson {
  "meta": {
    "use_camera":   <bool>  -- Whether this submission uses camera data as an input.
    "use_lidar":    <bool>  -- Whether this submission uses lidar data as an input.
    "use_radar":    <bool>  -- Whether this submission uses radar data as an input.
    "use_external": <bool>  -- Whether this submission uses external data as an input.
    "output_format": "vector"	-- This submission uses rasterized format.
  },
  "results": {
    token <str>: {     -- Maps each predictions by tokens.
      "vectors": 	list[Array<float, (N, 2)>],		-- list of lines, each line is an array of points.
      "scores":		list[float],	-- list of scores for lines.
      "labels": 	list[int],		-- list of labels.
    }
  }
}