Pre-training for Video Captioning Challenge

Leaderboard (Top-3 Winners)

# Team Name Affiliation BLEU@4 METEOR CIDEr-D SPICE
1 Old Boys Tsinghua University,
Beijing University of Posts and Telecommunications,
Shanghai Ocean University
21.14 17.38 24.42 5.65
2 sysu-cs Sun Yat-sen University 20.41 17.02 23.80 5.39
3 IVIPC-King University of Electronic Science and Technology of China 18.24 16.46 21.36 5.25


We computed multiple common metrics, including BLEU@4, METEOR, CIDEr-D and SPICE. The best-performing run from each team is taken into account for comparison across teams.


The ranking for the competition is based on the results from above metrics, respectively. Specifically, a rank list of teams is produced by sorting their scores on each metric. The final rank of a team is measured by combining its ranking positions in the four ranking list and defined as:
  R(team) = R(team)@BLEU@4 + R(team)@METEOR + R(team)@CIDEr-D + R(team)@SPICE.
where R(team) is the rank position of the team, e.g., if the team achieves the best performance in terms of BLEU@4, thenR(team)@BLEU@4 is "1". The smaller the final ranking, the better the performance.


@article{autogif2020, title={Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training}, author={Yingwei Pan and Yehao Li and Jianjie Luo and Jun Xu and Ting Yao and Tao Mei}, journal={arXiv preprint arXiv:2007.02375}, year={2020}} @inproceedings{msrvtt, title={MSR-VTT: A Large Video Description Dataset for Bridging Video and Language}, author={Jun Xu and Tao Mei and Ting Yao and Yong Rui}, booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2016}}