Panoptic Segmentation and Tracking

Segmentation Classes

There are 71 object classes in JRDB-PanoTrack, which are classified into 60 thing (such as pedestrians, cars, and laptops) and 11 stuff (like sky and walls) classes. In the figure below, we show all object classes in train and test sets, together with mask counts.

Closed and Open-world settings

 Closed-world settings
1.  pedestrian
2.  chair/sofa
3.  bag
4.  table
5.  machines
6.  door
7.  hanging_light
8.  glass
9.  vegetation
10. board
11. sign
12. wall
13. barrier/fence
14. floor/walking_path
15. ceiling
16. bicycle/scooter
17. light_pole
18. trash_bin
19. pole/trafficcone
20. building
21. tableware
22. column
23. cabinet
24. shelf
25. window
26. terrain
27. stair
28. big_socket/button
29. television
30. sky
31. waterbottle
32. picture_frames
33. box
34. car
35. monitor
36. bicyclist/rider
37. controller
38. manhole
39. jacket
40. decoration
41. standee
42. road
43. phone

Open-world settings
1.  poster
2.  store_sign
3.  carpet
4.  trolley
5.  other
6.  document
7.  child
8.  airpot
9.  cargo
10. peripheral
11. vent
12. skateboard/segway/hoverboard
13. clock
14. door_handle
15. statues
16. helmet
17. curtain
18. golfcart
19. fountain
20. wall_panel
21. emergency_pole
22. accessory
23. crutch
24. big_vehicle
25. fire_extinguisher
26. umbrella
27. lift
28. animal
29. ladder

Annotation Format

For each scene, we provide a .json file with the COCO-style annotations dictionary for that scene. The segmentation data itself is contained in the annotations list, which looks like the following. Annotations with body pose information use category_id=2.

"categories": [
                {'id': 1, 'name': 'road', 'isthing': True},
                {'id': 2, 'name': 'terrain', 'isthing': True},
                ...
            ],
"images": [
                {'id': 1, 'file_name': 'sequence_name/000000.jpg', 'width': ..., 'height': ...},
                {'id': 2, 'file_name': 'sequence_name/000001.jpg', 'width': ..., 'height': ...},
                ...
            ],
"annotations": [
                {
                'id': 1,
                'image_id': 1,
                'category_id': 1,
                'segmentation': {
                    'counts': '...RLE encoded mask...',
                    'size': [height, width]
                },
                'bbox': [x, y, w, h],
                'area': 10000,
                'iscrowd': 0,
                'attributes': {
                    'occluded': False,
                    'tracking_id': 1, # track id for thing classes only
                    'floor_id': 1, # floor id if the class is 'floor', we track different floor in JRDB-PanoTrack
                },
                ...
            ]

Annotation Details

These annotations were collected over a period of several months. The annotation process starts with an unlimited list of classes which can be extended by all annotators, any clearly visible and semantically meaningful objects would be annotated. We generally annotate all visible objects in a scene that have bounding box area at least 100 pixels; below this, we find that the objects are too small to be semantically meaningful. While JRDB videos are 15fps, we annotate at 1fps (1 frame per second). These annotations are provided as JRDB-PanoTrack.

Panoptic Tracking

Every individual segmentation annotation comes with a tracking_id property, which remains consistent for the person across the video sequence and accross multiple camera views.

Evaluation Protocol

For Panoptic Segmentation (Closed and Open-world), we evaluate the submissions using PQ (Panoptic Quality) and OSPA (O_PS) metrics. For Panoptic Tracking, we use STQ (Segmentation Tracking Quality) and OSPA (O_PT²) metrics. OSPA and OSPA² will be the main metrics for the challenge.

There will be 4 tracks in the challenge : Open-world Panoptic Segmentation, Closed-world Panoptic Segmentation, Open-world Panoptic Tracking, and Closed-world Panoptic Tracking. Participants can submit to any or all of the tracks.

All methods are evaluated on stitched images of the entire scene, which are provided in the dataset. Participants are encouraged to train their models on stitched images. However, they can also use the individual camera views for training, we will provide the camera calibration parameters for each camera and the stitching code in the toolkit.