Skip to content

Commit

Permalink
Update README.md and blog.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Peterande committed Oct 19, 2024
1 parent fb6ea64 commit 2f91a64
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 9 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--# [D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/xxxxxx) -->

English | [简体中文](README_cn.md) | [Blog](src/zoo/dfine/blog.md)
English | [简体中文](README_cn.md) | [English Blog](src/zoo/dfine/blog.md) | [中文博客](src/zoo/dfine/blog_cn.md)

<h2 align="center">
D-FINE: Redefine Regression Task of DETRs as Fine&#8209;grained&nbsp;Distribution&nbsp;Refinement
Expand Down
2 changes: 1 addition & 1 deletion README_cn.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--# [D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/xxxxxx) -->

[English](README.md) | 简体中文 | [博客](src/zoo/dfine/blog_cn.md)
[English](README.md) | 简体中文 | [English Blog](src/zoo/dfine/blog.md) | [中文博客](src/zoo/dfine/blog_cn.md)

<h2 align="center">
D-FINE: Redefine Regression Task of DETRs as Fine&#8209;grained&nbsp;Distribution&nbsp;Refinement
Expand Down
6 changes: 3 additions & 3 deletions src/zoo/dfine/blog.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[English Blog](blog.md) | [中文博客](blog_cn.md)
English Blog | [中文博客](blog_cn.md)

## 🔥 Revolutionizing Real-Time Object Detection: D-FINE vs. YOLO and Other DETR Models

Expand Down Expand Up @@ -43,7 +43,7 @@ The main advantages of redefining the bounding box regression task as **FDR** ar

Based on the above, object detectors equipped with the **FDR** framework satisfy the following two points:

1. **Ability to Achieve Knowledge Transfer**: As Hinton mentioned in the paper **"Distilling the Knowledge in a Neural Network"**, probabilities are "knowledge." The network's output becomes probability distributions, and these distributions carry localization knowledge. By calculating the KLD loss, this "knowledge" can be transferred from deeper layers to shallower layers. This is something that traditional fixed box representations (Dirac δ functions) cannot achieve.
1. **Ability to Achieve Knowledge Transfer**: As Hinton mentioned in the paper *"Distilling the Knowledge in a Neural Network"*, probabilities are "knowledge." The network's output becomes probability distributions, and these distributions carry localization knowledge. By calculating the KLD loss, this "knowledge" can be transferred from deeper layers to shallower layers. This is something that traditional fixed box representations (Dirac δ functions) cannot achieve.

2. **Consistent Optimization Objectives**: Since each decoder layer in the **FDR** framework shares a common goal: reducing the residual between the initial bounding box and the ground truth bounding box, the precise probability distributions generated by the final layer can serve as the ultimate target for each preceding layer and guide them through distillation.

Expand All @@ -55,7 +55,7 @@ Thus, based on **FDR**, we propose **GO-LSD (Global Optimal Localization Self-Di

Similarly, for readability, we will not elaborate on the mathematical formulas and the Decoupled Distillation Focal (DDF) Loss that aids optimization here. Interested readers can refer to the original paper for derivations.

This creates a win-win synergistic effect: as training progresses, the predictions of the final layer become increasingly accurate, and its generated soft labels can better help the preceding layers improve prediction accuracy. Conversely, the earlier layers learn to localize accurately more quickly, simplifying the optimization tasks of the deeper layers and further enhancing overall accuracy.
This results in a synergistic win-win effect: as training progresses, the predictions of the final layer become increasingly accurate, and its generated soft labels can better help the preceding layers improve prediction accuracy. Conversely, the earlier layers learn to localize accurately more quickly, simplifying the optimization tasks of the deeper layers and further enhancing overall accuracy.

---

Expand Down
8 changes: 4 additions & 4 deletions src/zoo/dfine/blog_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

在快速发展的实时目标检测领域,**D-FINE** 作为一项革命性的方法,显著超越了现有模型(如 **YOLOv10****YOLO11****RT-DETR v1/v2/v3**),提升了实时目标检测的性能上限。经过大规模数据集 Objects365 的预训练,**D-FINE** 远超其竞争对手 **LW-DETR**,在 COCO 数据集上实现了高达 **59.3%** 的 AP,同时保持了卓越的帧率、参数量和计算复杂度。这使得 **D-FINE** 成为实时目标检测领域的佼佼者,为未来的研究奠定了基础。

目前,D-FINE 的所有代码、权重、日志、编译工具,以及 Fiftyone 可视化工具已经全部开源,感谢 RT-DETR 提供的 code-base。其中还包括了预训练教程、自定义数据集教程等。之后还会陆续更新一些改进心得和调参攻略,欢迎大家多提 issue,共同将 D-FINE 系列发扬光大。同时希望您能随手留下一颗 ⭐,这是对我们最好的鼓励。
目前,D-FINE 的所有代码、权重、日志、编译工具,以及 FiftyOne 可视化工具已经全部开源,感谢 RT-DETR 提供的 codebase。其中还包括了预训练教程、自定义数据集教程等。之后还会陆续更新一些改进心得和调参攻略,欢迎大家多提 issue,共同将 D-FINE 系列发扬光大。同时希望您能随手留下一颗 ⭐,这是对我们最好的鼓励。

**Github Repo**: https://github.com/Peterande/D-FINE

Expand All @@ -29,7 +29,7 @@

将边界框回归任务重新定义为 FDR 的主要优势在于:

1. **简化的监督**:在使用传统的 L1 损失、IOU 损失优化检测框的同时,可以额外用标签和预测结果之间的“残差”来约束这些中间态的概率分布。这使每个解码层 (decoder layer) 能够更有效地关注并解决其当前面临的定位误差,随着层数加深,其优化目标变得越来越简单,从而简化了整体优化过程。
1. **简化的监督**:在使用传统的 L1 损失、IoU 损失优化检测框的同时,可以额外用标签和预测结果之间的“残差”来约束这些中间态的概率分布。这使每个解码层 (decoder layer) 能够更有效地关注并解决其当前面临的定位误差,随着层数加深,其优化目标变得越来越简单,从而简化了整体优化过程。

2. **复杂场景下的鲁棒性**:这些概率分布的值本质上代表了对每个边界“微调”的自信程度。这使 **D-FINE** 能够在不同网络深度独立建模每个边界的不确定性,从而在遮挡、运动模糊和低光照等复杂的实际场景下表现出更强的鲁棒性,相比直接回归四个固定值要更为稳健。

Expand All @@ -43,7 +43,7 @@

根据上文,搭载 FDR 框架的目标检测器满足了以下两点:

1. **能够实现知识传递**:Hinton 早在 **Distilling the Knowledge in a Neural Network** 一文中就说过:概率即“知识”;网络输出变成了概率分布,而概率分布携带定位知识 (Localization Knowledge),而通过计算 KLD 损失,可以将这些“知识”从深层传递到浅层。这是传统固定框表示(狄拉克 δ 函数)无法实现的。
1. **能够实现知识传递**:Hinton 早在 *"Distilling the Knowledge in a Neural Network"* 一文中就说过:概率即“知识”;网络输出变成了概率分布,而概率分布携带定位知识 (Localization Knowledge),而通过计算 KLD 损失,可以将这些“知识”从深层传递到浅层。这是传统固定框表示(狄拉克 δ 函数)无法实现的。

2. **一致的优化目标**:由于 FDR 架构中每一个解码层都共享一个共同目标:减少初始边界框与真实边界框之间的残差;因此最后一层生成的精确概率分布可以作为前面每一层的最终目标,并通过蒸馏引导前几层。

Expand Down Expand Up @@ -83,7 +83,7 @@

#### 问题2:FDR 和 GO-LSD 会带来更多的训练成本吗?

**回答**:训练成本的增加主要来源于如何生成分布的标签。我们已经对该过程进行了优化,将训练时长和显存占用控制在了 6% 和 2%,几乎无感。
**回答**:训练成本的增加主要来源于如何生成分布的标签。我们已经对该过程进行了优化,将额外训练时长和显存占用控制在了 6% 和 2%,几乎无感。

#### 问题3:D-FINE 为什么会比 RT-DETR 系列更快、更轻量?

Expand Down

0 comments on commit 2f91a64

Please sign in to comment.