JRDB Leaderboard

CW Panoptic Segmentation Submissions

Name

OSPA↓

OSPA^K_Thing↓

OSPA^K_Stuff↓

PQ↑

MaskDINO

0.666

0.726

0.511

32.857

Feng Li, Hao Zhang, Huaizhe xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation in CVPR 2023

Details

Name	MaskDINO
Submission Date	2024-05-02 16:29:04+00:00
Abstract	In this paper we present Mask DINO, a unified object detection and segmentation framework. Mask DINO extends DINO (DETR with Improved Denoising Anchor Boxes) by adding a mask prediction branch which supports all image segmentation tasks (instance, panoptic, and semantic). It makes use of the query embeddings from DINO to dot-product a high-resolution pixel embedding map to predict a set of binary masks. Some key components in DINO are extended for segmentation through a shared architecture and training process. Mask DINO is simple, efficient, and scalable, and it can benefit from joint large-scale detection and segmentation datasets. Our experiments show that Mask DINO significantly outperforms all existing specialized segmentation methods, both on a ResNet-50 backbone and a pre-trained model with SwinL backbone. Notably, Mask DINO establishes the best results to date on instance segmentation (54.5 AP on COCO), panoptic segmentation (59.4 PQ on COCO), and semantic segmentation (60.8 mIoU on ADE20K) among models under one billion parameters.
Publication title	Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation	Publication authors:	Feng Li, Hao Zhang, Huaizhe xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum
Publication venue and year	CVPR 2023	Publication URL:	https://arxiv.org/abs/2206.02777
Code Language	N/A	Hardware	N/A
Code Website	N/A	Code URL	N/A

Visualisation

Best 3 By OSPA ↓

Generating tressider-2019-03-16_2, Come Back Later!

Worst 3 By OSPA ↓

Per-sequence Results

Sequence	OSPA	OSPA_CARD	OSPA_LOC	OSPA_KNOWN_THING	OSPA_STUFF	OSPA_SMALL	OSPA_MEDIUM	OSPA_LARGE
cubberly-auditorium-2019-04-22_1	0.705	0.519	0.186	0.789	0.479	0.814	0.49	0.536
discovery-walk-2019-02-28_0	0.542	0.278	0.263	0.58	0.465	0.611	0.536	0.558
discovery-walk-2019-02-28_1	0.634	0.419	0.215	0.701	0.508	0.723	0.471	0.576
food-trucks-2019-02-12_0	0.647	0.379	0.267	0.729	0.414	0.801	0.492	0.43
gates-ai-lab-2019-04-17_0	0.691	0.465	0.226	0.727	0.417	0.858	0.623	0.42
gates-basement-elevators-2019-01-17_0	0.53	0.361	0.17	0.56	0.404	0.638	0.412	0.361
gates-foyer-2019-01-17_0	0.749	0.501	0.247	0.807	0.574	0.754	0.503	0.606
gates-to-clark-2019-02-28_0	0.529	0.335	0.193	0.61	0.39	0.721	0.504	0.475
hewlett-class-2019-01-23_0	0.731	0.506	0.225	0.752	0.648	0.935	0.544	0.422
hewlett-class-2019-01-23_1	0.64	0.436	0.204	0.669	0.47	0.84	0.53	0.43
huang-2-2019-01-25_1	0.532	0.342	0.19	0.601	0.286	0.557	0.401	0.444
huang-intersection-2019-01-22_0	0.646	0.473	0.172	0.783	0.315	0.82	0.466	0.264
indoor-coupa-cafe-2019-02-06_0	0.838	0.557	0.281	0.872	0.718	0.863	0.675	0.548
lomita-serra-intersection-2019-01-30_0	0.611	0.436	0.174	0.712	0.422	0.847	0.446	0.399
meyer-green-2019-03-16_1	0.701	0.561	0.14	0.806	0.43	0.847	0.577	0.284
nvidia-aud-2019-01-25_0	0.773	0.531	0.242	0.802	0.69	0.827	0.592	0.61
nvidia-aud-2019-04-18_1	0.548	0.324	0.224	0.663	0.394	0.627	0.378	0.376
nvidia-aud-2019-04-18_2	0.702	0.423	0.278	0.781	0.542	0.837	0.51	0.564
outdoor-coupa-cafe-2019-02-06_0	0.797	0.569	0.228	0.859	0.653	0.874	0.622	0.61
quarry-road-2019-02-28_0	0.703	0.491	0.212	0.76	0.564	0.794	0.616	0.531
serra-street-2019-01-30_0	0.688	0.507	0.18	0.787	0.46	0.831	0.467	0.418
stlc-111-2019-04-19_1	0.667	0.447	0.22	0.697	0.591	0.698	0.581	0.571
stlc-111-2019-04-19_2	0.67	0.46	0.21	0.693	0.605	0.674	0.584	0.603
tressider-2019-03-16_2	0.332	0.096	0.236	0.427	0.174	0.573	0.385	0.243
tressider-2019-04-26_0	0.805	0.598	0.207	0.83	0.713	0.828	0.602	0.458
tressider-2019-04-26_1	0.759	0.503	0.255	0.786	0.673	0.805	0.497	0.49
tressider-2019-04-26_3	0.813	0.605	0.208	0.821	0.789	0.851	0.566	0.468
COMBINED	0.666	0.449	0.217	0.726	0.511	0.772	0.521	0.47

Sequence	PQ	SQ	RQ	PQ_Things	SQ_Things	RQ_Things	PQ_Stuff	SQ_Stuff	RQ_Stuff
cubberly-auditorium-2019-04-22_1	30.096	59.301	38.677	24.351	57.466	30.689	45.69	64.28	60.359
discovery-walk-2019-02-28_0	43.145	59.206	56.049	41.828	64.019	53.865	45.78	49.581	60.417
discovery-walk-2019-02-28_1	37.109	64.702	48.959	33.383	66.596	43.112	44.095	61.153	59.922
food-trucks-2019-02-12_0	29.286	55.128	37.52	20.889	55.852	27.233	53.079	53.079	66.667
gates-ai-lab-2019-04-17_0	30.965	60.487	38.01	27.86	58.971	34.316	55.801	72.616	67.562
gates-basement-elevators-2019-01-17_0	49.572	62.429	59.603	46.067	61.949	55.981	64.469	64.469	75.0
gates-foyer-2019-01-17_0	25.547	55.572	33.428	22.376	52.196	27.478	35.059	65.698	51.277
gates-to-clark-2019-02-28_0	47.222	63.162	58.092	40.509	59.416	51.326	58.729	69.584	69.69
hewlett-class-2019-01-23_0	25.558	53.62	31.463	25.192	54.728	30.237	27.019	49.19	36.366
hewlett-class-2019-01-23_1	36.838	62.425	45.122	34.307	61.002	41.209	51.183	70.495	67.296
huang-2-2019-01-25_1	46.289	69.771	56.424	44.008	70.931	53.823	54.501	65.594	65.789
huang-intersection-2019-01-22_0	35.216	51.684	44.004	22.625	41.414	30.088	67.952	78.387	80.183
indoor-coupa-cafe-2019-02-06_0	14.634	52.999	20.094	14.278	53.888	18.795	15.88	49.886	24.643
lomita-serra-intersection-2019-01-30_0	37.703	48.874	46.902	27.656	44.36	34.73	57.796	57.902	71.248
meyer-green-2019-03-16_1	30.593	51.251	38.632	21.287	48.454	27.998	54.523	58.445	65.977
nvidia-aud-2019-01-25_0	21.692	59.445	28.56	20.744	61.977	26.197	24.416	52.165	35.352
nvidia-aud-2019-04-18_1	42.877	53.779	52.988	35.382	54.064	43.464	52.871	53.398	65.686
nvidia-aud-2019-04-18_2	29.165	52.275	38.916	23.833	49.791	30.306	39.829	57.245	56.134
outdoor-coupa-cafe-2019-02-06_0	19.983	54.224	26.329	16.941	54.73	22.066	27.08	53.044	36.277
quarry-road-2019-02-28_0	30.636	56.494	39.594	27.758	58.481	36.067	37.626	51.669	48.159
serra-street-2019-01-30_0	31.118	53.05	38.977	23.718	51.887	30.888	48.034	55.707	57.466
stlc-111-2019-04-19_1	33.87	55.752	43.266	34.369	63.613	42.631	32.623	36.098	44.853
stlc-111-2019-04-19_2	30.696	57.121	39.586	29.267	56.892	36.546	34.697	57.763	48.101
tressider-2019-03-16_2	66.171	79.524	81.11	56.338	77.704	69.776	82.558	82.558	100.0
tressider-2019-04-26_0	20.135	59.599	26.92	18.609	62.223	24.197	25.806	49.853	37.035
tressider-2019-04-26_1	22.978	55.47	29.795	23.487	59.991	30.131	21.364	41.155	28.729
tressider-2019-04-26_3	18.037	46.594	23.251	18.292	53.842	23.081	17.273	24.85	23.761
COMBINED	32.857	57.553	41.566	28.717	57.646	36.157	43.546	57.254	55.702

Additional Information Used

Symbol	Description
Individual Image	Method uses individual images from each camera
Stitched Image	Method uses stitched images combined from the individual cameras
Pointcloud	Method uses 3D pointcloud data
Online Tracking	Method does frame-by-frame processing with no lookahead
Offline Tracking	Method does not do in-order frame processing
Public Detections	Method uses publicly available detections
Private Detections	Method uses its own private detections

Evaluation Measures [1]

Measure	Better	Perfect	Description
OSPA
OSPA	lower	0.0	OSPA is a set-based metric which can directly capture a distance, between two sets of mask tracks without a thresholding parameter [2,3].
OSPA Localization	lower	0.0	Representing prediction error such as the displacement, track ID switches, track fragmentation or even track late initiation/early termination [2,3].
OSPA Cardinality	lower	0.0	Representing cardinality mismatch between two sets, penalizing missed or false tracks without an explicit definition for them [2,3].
PQ
PQ	higher	1.0	Measure how closely matched segments are with the ground truths [6]

Reference

The style and content of the Evaluation Measures section is reference from MOT Challenges.
Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi. JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments In CVPR, 2024.
Hamid Rezatofighi∗, Tran Thien Dat Nguyen∗, Ba-Ngu Vo, Ba-Tuong Vo, Silvio Savarese, and Ian Reid. How Trustworthy are Performance Evaluationsfor Basic Vision Tasks? IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2023.
Keni Bernardin and Rainer Stiefelhagen. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. Image and Video Processing, 2008(1):1-10, 2008.
Yuan Li, Chang Huang and Ram Nevatia. Learning to Associate: HybridBoosted Multi-Target Tracker for Crowded Scene . In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009.
Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollar. Panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019.

Leaderboard

Instructions

CW Panoptic Segmentation Submissions

Additional Information Used

Evaluation Measures [1]

Reference