JRDB Leaderboard

OW Panoptic Segmentation Submissions

Name

OSPA↓

OSPA^K_Thing↓

OSPA^K_Stuff↓

OSPA^U_Thing↓

PQ↑

FC_Clip+

0.788

0.803

0.649

0.979

22.336

Anonymous Submission

Details

Name	FC_Clip+
Submission Date	2024-05-24 13:51:44+00:00
Abstract	Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a shared embedding space, which bridges the gap between closed-vocabulary and open-vocabulary recognition. Hence, existing methods often adopt a two-stage framework to tackle the problem, where the inputs first go through a mask generator and then through the CLIP model along with the predicted masks. This process involves extracting features from images multiple times, which can be ineffective and inefficient. By contrast, we propose to build everything into a single-stage framework using a shared Frozen Convolutional CLIP backbone, which not only significantly simplifies the current two-stage pipeline, but also remarkably yields a better accuracy-cost trade-off. The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining. When training on COCO panoptic data only and testing in a zero-shot manner, FC-CLIP achieve 26.8 PQ, 16.8 AP, and 34.1 mIoU on ADE20K, 18.2 PQ, 27.9 mIoU on Mapillary Vistas, 44.0 PQ, 26.8 AP, 56.2 mIoU on Cityscapes, outperforming the prior art by +4.2 PQ, +2.4 AP, +4.2 mIoU on ADE20K, +4.0 PQ on Mapillary Vistas and +20.1 PQ on Cityscapes, respectively. Additionally, the training and testing time of FC-CLIP is 7.5x and 6.6x significantly faster than the same prior art, while using 5.9x fewer parameters. FC-CLIP also sets a new state-of-the-art performance across various open-vocabulary semantic segmentation datasets
Publication title	N/A	Publication authors:	N/A
Publication venue and year	N/A	Publication URL:	N/A
Code Language	N/A	Hardware	N/A
Code Website	N/A	Code URL	N/A

Visualisation

Best 3 By OSPA ↓

Generating tressider-2019-03-16_2, Come Back Later!

Worst 3 By OSPA ↓

Per-sequence Results

Sequence	OSPA	OSPA_CARD	OSPA_LOC	OSPA_KNOWN_THING	OSPA_UNKNOWN_THING	OSPA_STUFF	OSPA_SMALL	OSPA_MEDIUM	OSPA_LARGE
cubberly-auditorium-2019-04-22_1	0.806	0.619	0.186	0.837	0.999	0.61	0.911	0.512	0.423
discovery-walk-2019-02-28_0	0.712	0.513	0.199	0.718	1.0	0.652	0.828	0.561	0.472
discovery-walk-2019-02-28_1	0.766	0.573	0.192	0.789	1.0	0.635	0.838	0.514	0.423
food-trucks-2019-02-12_0	0.807	0.507	0.3	0.882	0.805	0.597	0.923	0.536	0.465
gates-ai-lab-2019-04-17_0	0.851	0.661	0.19	0.833	0.994	0.656	0.954	0.712	0.427
gates-basement-elevators-2019-01-17_0	0.751	0.534	0.217	0.723	0.961	0.661	0.746	0.486	0.284
gates-foyer-2019-01-17_0	0.823	0.629	0.194	0.818	0.908	0.754	0.903	0.613	0.494
gates-to-clark-2019-02-28_0	0.72	0.552	0.168	0.787	0.959	0.57	0.815	0.556	0.441
hewlett-class-2019-01-23_0	0.812	0.668	0.144	0.783	0.999	0.694	0.903	0.618	0.387
hewlett-class-2019-01-23_1	0.766	0.642	0.124	0.747	1.0	0.561	0.958	0.619	0.367
huang-2-2019-01-25_1	0.684	0.477	0.208	0.738	1.0	0.427	0.769	0.429	0.425
huang-intersection-2019-01-22_0	0.724	0.523	0.201	0.829	0.994	0.418	0.878	0.422	0.444
indoor-coupa-cafe-2019-02-06_0	0.895	0.703	0.192	0.887	0.986	0.832	0.881	0.699	0.526
lomita-serra-intersection-2019-01-30_0	0.709	0.522	0.187	0.752	0.92	0.479	0.862	0.466	0.379
meyer-green-2019-03-16_1	0.803	0.594	0.209	0.881	1.0	0.575	0.908	0.566	0.426
nvidia-aud-2019-01-25_0	0.851	0.63	0.221	0.85	0.997	0.799	0.891	0.595	0.561
nvidia-aud-2019-04-18_1	0.689	0.508	0.181	0.692	0.999	0.651	0.838	0.44	0.376
nvidia-aud-2019-04-18_2	0.789	0.516	0.273	0.852	1.0	0.634	0.899	0.583	0.535
outdoor-coupa-cafe-2019-02-06_0	0.843	0.615	0.227	0.854	1.0	0.781	0.915	0.583	0.527
quarry-road-2019-02-28_0	0.835	0.652	0.183	0.833	1.0	0.746	0.877	0.626	0.513
serra-street-2019-01-30_0	0.782	0.576	0.207	0.845	nan	0.64	0.851	0.494	0.501
stlc-111-2019-04-19_1	0.781	0.589	0.192	0.755	0.992	0.635	0.919	0.598	0.468
stlc-111-2019-04-19_2	0.831	0.645	0.186	0.779	0.999	0.706	0.936	0.632	0.463
tressider-2019-03-16_2	0.585	0.352	0.233	0.61	1.0	0.405	0.654	0.451	0.4
tressider-2019-04-26_0	0.899	0.715	0.184	0.885	0.986	0.801	0.899	0.654	0.443
tressider-2019-04-26_1	0.885	0.663	0.222	0.877	0.997	0.781	0.901	0.612	0.456
tressider-2019-04-26_3	0.868	0.649	0.22	0.854	0.972	0.833	0.936	0.619	0.437
COMBINED	0.788	0.586	0.202	0.803	0.979	0.649	0.874	0.563	0.447

Sequence	PQ	SQ	RQ	PQ_Things	SQ_Things	RQ_Things	PQ_Stuff	SQ_Stuff	RQ_Stuff
cubberly-auditorium-2019-04-22_1	20.877	50.036	26.669	16.664	46.312	20.27	34.718	62.271	47.694
discovery-walk-2019-02-28_0	30.932	55.849	41.016	31.544	65.666	40.396	29.606	34.58	42.361
discovery-walk-2019-02-28_1	24.551	53.174	31.829	20.397	50.395	24.638	33.899	59.427	48.008
food-trucks-2019-02-12_0	18.503	50.462	23.659	13.135	51.97	15.627	36.396	45.434	50.433
gates-ai-lab-2019-04-17_0	16.077	54.446	19.701	14.767	55.171	18.189	29.177	47.198	34.815
gates-basement-elevators-2019-01-17_0	27.803	54.871	36.453	26.758	58.981	33.872	33.288	33.288	50.0
gates-foyer-2019-01-17_0	19.277	64.134	24.742	19.316	69.115	23.861	19.118	44.21	28.27
gates-to-clark-2019-02-28_0	29.054	64.184	37.703	22.316	63.468	29.268	41.566	65.512	53.368
hewlett-class-2019-01-23_0	21.607	40.188	27.788	19.265	40.794	24.335	33.902	37.009	45.915
hewlett-class-2019-01-23_1	26.815	38.961	33.547	23.453	37.196	28.995	50.347	51.316	65.409
huang-2-2019-01-25_1	36.095	67.205	44.319	33.753	68.409	40.885	44.997	62.629	57.368
huang-intersection-2019-01-22_0	27.728	51.797	33.701	20.117	47.803	25.53	47.517	62.183	54.945
indoor-coupa-cafe-2019-02-06_0	11.722	47.075	15.555	11.606	50.905	14.923	12.248	29.84	18.396
lomita-serra-intersection-2019-01-30_0	30.85	53.116	39.844	23.514	49.145	30.264	49.713	63.328	64.478
meyer-green-2019-03-16_1	20.814	50.529	27.493	13.86	49.774	18.263	39.689	52.58	52.548
nvidia-aud-2019-01-25_0	15.694	47.039	20.166	16.615	48.713	20.293	12.699	41.599	19.755
nvidia-aud-2019-04-18_1	31.627	50.889	38.769	33.711	63.006	40.044	28.617	33.386	36.928
nvidia-aud-2019-04-18_2	21.615	43.156	29.437	16.983	45.361	22.194	31.54	38.431	44.958
outdoor-coupa-cafe-2019-02-06_0	15.277	55.452	19.942	16.117	56.644	20.615	13.128	52.405	18.222
quarry-road-2019-02-28_0	18.435	46.662	23.925	18.018	50.062	22.029	19.686	36.463	29.611
serra-street-2019-01-30_0	21.641	48.062	27.68	18.344	47.461	24.249	29.176	49.436	35.52
stlc-111-2019-04-19_1	22.582	54.944	28.233	22.529	56.394	27.66	22.768	49.869	30.239
stlc-111-2019-04-19_2	17.823	44.095	22.434	17.516	45.745	20.984	19.171	36.836	28.81
tressider-2019-03-16_2	41.236	60.774	51.978	34.048	60.392	41.339	55.613	61.536	73.256
tressider-2019-04-26_0	10.559	46.739	13.603	9.477	45.841	12.241	16.283	51.488	20.806
tressider-2019-04-26_1	10.772	45.026	13.977	10.19	50.439	13.113	13.294	21.57	17.722
tressider-2019-04-26_3	13.104	43.379	16.713	13.038	48.335	16.23	13.35	24.796	18.527
COMBINED	22.336	51.194	28.551	19.891	52.722	24.826	30.056	46.245	40.31

Symbol	Description
Individual Image	Method uses individual images from each camera
Stitched Image	Method uses stitched images combined from the individual cameras
Pointcloud	Method uses 3D pointcloud data
Online Tracking	Method does frame-by-frame processing with no lookahead
Offline Tracking	Method does not do in-order frame processing
Public Detections	Method uses publicly available detections
Private Detections	Method uses its own private detections

Measure	Better	Perfect	Description
OSPA
OSPA	lower	0.0	OSPA is a set-based metric which can directly capture a distance, between two sets of mask tracks without a thresholding parameter [2,3].
OSPA Localization	lower	0.0	Representing prediction error such as the displacement, track ID switches, track fragmentation or even track late initiation/early termination [2,3].
OSPA Cardinality	lower	0.0	Representing cardinality mismatch between two sets, penalizing missed or false tracks without an explicit definition for them [2,3].
PQ
PQ	higher	1.0	Measure how closely matched segments are with the ground truths [6]

Leaderboard

Instructions

OW Panoptic Segmentation Submissions

Additional Information Used

Evaluation Measures [1]

Reference