Pre-training for Video Understanding Challenge

Video to Language Challenge 2017 2016 Pre-training for Video Understanding Challenge 2020 2021

Pre-training for Video Understanding Challenge

Track 1 Leaderboard

#	Team Name	BLEU@4	METEOR	CIDEr-D	SPICE
1	CASIA_IVA	26.13	20.86	35.09	7.85
2	Gene	23.67	19.63	31.19	7.52
3	aimc_21	20.66	20.13	30.18	7.40
4	Nameless	22.80	18.87	27.95	6.40
5	Micro Genius	20.93	17.34	24.42	5.60
6	MSVLPT	21.26	17.10	23.35	5.50
7	tsinghua_hhh	7.98	13.90	17.28	5.16

Track 2 Leaderboard

#	Team Name	Top-1 accuracy
1	Silver_Bullet	62.28
2	MSVLPT	56.77
3	sunny_flower	54.33
4	ethan	53.66
5	ghost_rider	50.83

Metrics

For the evaluation in the downstream task of video captioning, we will use and publish in a leaderboard the automatic metric results, including BLEU@4, METEOR, CIDEr and SPICE, on the testing set of MSR-VTT dataset.

For the evaluation in the downstream task of video categorization, we will report the top-1 accuracy on the testing set of Downstream dataset.

Citations

@article{autogif2020,
title={Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training},
author={Yingwei Pan and Yehao Li and Jianjie Luo and Jun Xu and Ting Yao and Tao Mei},
journal={arXiv preprint arXiv:2007.02375},
year={2020}}
                                    
@inproceedings{msrvtt,
title={MSR-VTT: A Large Video Description Dataset for Bridging Video and Language},
author={Jun Xu and Tao Mei and Ting Yao and Yong Rui},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2016}}