CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation

*Equal Contribution 1University of Michigan
CRKD

We propose CRKD to conduct a novel cross-modality knowledge distillation path from LiDAR-camera teacher to camera-radar student.

Abstract

In the field of 3D object detection for autonomous driving, LiDAR-Camera (LC) fusion is the top-performing sensor configuration. Still, LiDAR is relatively high cost, which hinders adoption of this technology for consumer automobiles. Alternatively, camera and radar are commonly deployed on vehicles already on the road today, but performance of Camera-Radar (CR) fusion falls behind LC fusion. In this work, we propose CRKD to bridge the performance gap between LC and CR detectors with a novel cross-modality knowledge distillation (KD) framework. We use the Bird's-Eye-View (BEV) representation as the shared feature space to enable effective knowledge distillation. To accommodate the unique cross-modality KD path, we propose four distillation losses to help the student learn crucial features from the teacher model. We present extensive evaluations on the nuScenes dataset to demonstrate the effectiveness of the proposed CRKD framework.

Overview

We propose a novel cross-modality KD framework to enable LC-to-CR distillation in the BEV feature space. With the transferred knowledge from an LC teacher detector, the CR student detector can outperform existing baselines without additional cost during inference.

We design four KD modules to address the notable discrepancies between different sensors to realize realize effective cross-modality KD. As we operate KD in the BEV space, the proposed loss designs can be applied to other KD configurations. Our improvement also includes adding a gated network to the baseline model for adaptive fusion.

We conduct extensive evaluation on nuScenes to demonstrate the effectiveness of CRKD. CRKD can improve the mAP and NDS of student detectors by 3.5% and 3.2% respectively. Since our method focuses on a novel KD path with distinctively large modality gap, we provide thorough study and analysis to support our design choices.

CRKD Overview

BibTeX

@inproceedings{zhao2024crkd,
  author    = {Zhao, Lingjun and Song, Jingyu and Skinner, Katherine A},
  title     = {CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation},
  journal   = {2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2024},
}