README.md
2.1 KB · 69 lines · markdown Raw
1 ---
2 license: cc-by-nc-4.0
3 tags:
4 - vision
5 - video-classification
6 pipeline_tag: video-classification
7 ---
8
9 # VideoMAE-v2 (Huge-sized model, Pretrained on UnlabeledHybrid-1M)
10
11 VideoMAEv2-Huge model pre-trained for 1200 epochs in a self-supervised way on UnlabeldHybrid-1M dataset. It was introduced in the paper [[CVPR23]VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking](https://arxiv.org/abs/2203.12602) by Wang et al. and first released in [GitHub](https://github.com/OpenGVLab/VideoMAEv2).
12
13
14 ## Intended uses & limitations
15
16 You can use the raw model for video feature extraction.
17
18 ### How to use
19
20 Here is how to use this model to extract a video feature:
21
22 ```python
23 from transformers import VideoMAEImageProcessor, AutoModel, AutoConfig
24 import numpy as np
25 import torch
26
27
28 config = AutoConfig.from_pretrained("OpenGVLab/VideoMAEv2-Huge", trust_remote_code=True)
29 processor = VideoMAEImageProcessor.from_pretrained("OpenGVLab/VideoMAEv2-Huge")
30 model = AutoModel.from_pretrained('OpenGVLab/VideoMAEv2-Huge', config=config, trust_remote_code=True)
31
32
33 video = list(np.random.rand(16, 3, 224, 224))
34
35
36
37
38 # B, T, C, H, W -> B, C, T, H, W
39 inputs = processor(video, return_tensors="pt")
40 inputs['pixel_values'] = inputs['pixel_values'].permute(0, 2, 1, 3, 4)
41
42 with torch.no_grad():
43 outputs = model(**inputs)
44 ```
45
46
47
48
49 ### BibTeX entry and citation info
50
51 ```bibtex
52 @InProceedings{wang2023videomaev2,
53 author = {Wang, Limin and Huang, Bingkun and Zhao, Zhiyu and Tong, Zhan and He, Yinan and Wang, Yi and Wang, Yali and Qiao, Yu},
54 title = {VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking},
55 booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
56 month = {June},
57 year = {2023},
58 pages = {14549-14560}
59 }
60
61 @misc{videomaev2,
62 title={VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking},
63 author={Limin Wang and Bingkun Huang and Zhiyu Zhao and Zhan Tong and Yinan He and Yi Wang and Yali Wang and Yu Qiao},
64 year={2023},
65 eprint={2303.16727},
66 archivePrefix={arXiv},
67 primaryClass={cs.CV}
68 }
69 ```