README.md
| 1 | --- |
| 2 | license: cc-by-nc-4.0 |
| 3 | library_name: timm |
| 4 | tags: |
| 5 | - image-classification |
| 6 | - timm |
| 7 | - transformers |
| 8 | datasets: |
| 9 | - imagenet-1k |
| 10 | - imagenet-1k |
| 11 | --- |
| 12 | # Model card for convnextv2_nano.fcmae_ft_in22k_in1k |
| 13 | |
| 14 | A ConvNeXt-V2 image classification model. Pretrained with a fully convolutional masked autoencoder framework (FCMAE) and fine-tuned on ImageNet-22k and then ImageNet-1k. |
| 15 | |
| 16 | ## Model Details |
| 17 | - **Model Type:** Image classification / feature backbone |
| 18 | - **Model Stats:** |
| 19 | - Params (M): 15.6 |
| 20 | - GMACs: 2.5 |
| 21 | - Activations (M): 8.4 |
| 22 | - Image size: train = 224 x 224, test = 288 x 288 |
| 23 | - **Papers:** |
| 24 | - ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders: https://arxiv.org/abs/2301.00808 |
| 25 | - **Original:** https://github.com/facebookresearch/ConvNeXt-V2 |
| 26 | - **Dataset:** ImageNet-1k |
| 27 | - **Pretrain Dataset:** ImageNet-1k |
| 28 | |
| 29 | ## Model Usage |
| 30 | ### Image Classification |
| 31 | ```python |
| 32 | from urllib.request import urlopen |
| 33 | from PIL import Image |
| 34 | import timm |
| 35 | |
| 36 | img = Image.open(urlopen( |
| 37 | 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png' |
| 38 | )) |
| 39 | |
| 40 | model = timm.create_model('convnextv2_nano.fcmae_ft_in22k_in1k', pretrained=True) |
| 41 | model = model.eval() |
| 42 | |
| 43 | # get model specific transforms (normalization, resize) |
| 44 | data_config = timm.data.resolve_model_data_config(model) |
| 45 | transforms = timm.data.create_transform(**data_config, is_training=False) |
| 46 | |
| 47 | output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1 |
| 48 | |
| 49 | top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5) |
| 50 | ``` |
| 51 | |
| 52 | ### Feature Map Extraction |
| 53 | ```python |
| 54 | from urllib.request import urlopen |
| 55 | from PIL import Image |
| 56 | import timm |
| 57 | |
| 58 | img = Image.open(urlopen( |
| 59 | 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png' |
| 60 | )) |
| 61 | |
| 62 | model = timm.create_model( |
| 63 | 'convnextv2_nano.fcmae_ft_in22k_in1k', |
| 64 | pretrained=True, |
| 65 | features_only=True, |
| 66 | ) |
| 67 | model = model.eval() |
| 68 | |
| 69 | # get model specific transforms (normalization, resize) |
| 70 | data_config = timm.data.resolve_model_data_config(model) |
| 71 | transforms = timm.data.create_transform(**data_config, is_training=False) |
| 72 | |
| 73 | output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1 |
| 74 | |
| 75 | for o in output: |
| 76 | # print shape of each feature map in output |
| 77 | # e.g.: |
| 78 | # torch.Size([1, 80, 56, 56]) |
| 79 | # torch.Size([1, 160, 28, 28]) |
| 80 | # torch.Size([1, 320, 14, 14]) |
| 81 | # torch.Size([1, 640, 7, 7]) |
| 82 | |
| 83 | print(o.shape) |
| 84 | ``` |
| 85 | |
| 86 | ### Image Embeddings |
| 87 | ```python |
| 88 | from urllib.request import urlopen |
| 89 | from PIL import Image |
| 90 | import timm |
| 91 | |
| 92 | img = Image.open(urlopen( |
| 93 | 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png' |
| 94 | )) |
| 95 | |
| 96 | model = timm.create_model( |
| 97 | 'convnextv2_nano.fcmae_ft_in22k_in1k', |
| 98 | pretrained=True, |
| 99 | num_classes=0, # remove classifier nn.Linear |
| 100 | ) |
| 101 | model = model.eval() |
| 102 | |
| 103 | # get model specific transforms (normalization, resize) |
| 104 | data_config = timm.data.resolve_model_data_config(model) |
| 105 | transforms = timm.data.create_transform(**data_config, is_training=False) |
| 106 | |
| 107 | output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor |
| 108 | |
| 109 | # or equivalently (without needing to set num_classes=0) |
| 110 | |
| 111 | output = model.forward_features(transforms(img).unsqueeze(0)) |
| 112 | # output is unpooled, a (1, 640, 7, 7) shaped tensor |
| 113 | |
| 114 | output = model.forward_head(output, pre_logits=True) |
| 115 | # output is a (1, num_features) shaped tensor |
| 116 | ``` |
| 117 | |
| 118 | ## Model Comparison |
| 119 | Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results). |
| 120 | |
| 121 | All timing numbers from eager model PyTorch 1.13 on RTX 3090 w/ AMP. |
| 122 | |
| 123 | | model |top1 |top5 |img_size|param_count|gmacs |macts |samples_per_sec|batch_size| |
| 124 | |------------------------------------------------------------------------------------------------------------------------------|------|------|--------|-----------|------|------|---------------|----------| |
| 125 | | [convnextv2_huge.fcmae_ft_in22k_in1k_512](https://huggingface.co/timm/convnextv2_huge.fcmae_ft_in22k_in1k_512) |88.848|98.742|512 |660.29 |600.81|413.07|28.58 |48 | |
| 126 | | [convnextv2_huge.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_huge.fcmae_ft_in22k_in1k_384) |88.668|98.738|384 |660.29 |337.96|232.35|50.56 |64 | |
| 127 | | [convnext_xxlarge.clip_laion2b_soup_ft_in1k](https://huggingface.co/timm/convnext_xxlarge.clip_laion2b_soup_ft_in1k) |88.612|98.704|256 |846.47 |198.09|124.45|122.45 |256 | |
| 128 | | [convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384](https://huggingface.co/timm/convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384) |88.312|98.578|384 |200.13 |101.11|126.74|196.84 |256 | |
| 129 | | [convnextv2_large.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_large.fcmae_ft_in22k_in1k_384) |88.196|98.532|384 |197.96 |101.1 |126.74|128.94 |128 | |
| 130 | | [convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320](https://huggingface.co/timm/convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320) |87.968|98.47 |320 |200.13 |70.21 |88.02 |283.42 |256 | |
| 131 | | [convnext_xlarge.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_xlarge.fb_in22k_ft_in1k_384) |87.75 |98.556|384 |350.2 |179.2 |168.99|124.85 |192 | |
| 132 | | [convnextv2_base.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_base.fcmae_ft_in22k_in1k_384) |87.646|98.422|384 |88.72 |45.21 |84.49 |209.51 |256 | |
| 133 | | [convnext_large.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_large.fb_in22k_ft_in1k_384) |87.476|98.382|384 |197.77 |101.1 |126.74|194.66 |256 | |
| 134 | | [convnext_large_mlp.clip_laion2b_augreg_ft_in1k](https://huggingface.co/timm/convnext_large_mlp.clip_laion2b_augreg_ft_in1k) |87.344|98.218|256 |200.13 |44.94 |56.33 |438.08 |256 | |
| 135 | | [convnextv2_large.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_large.fcmae_ft_in22k_in1k) |87.26 |98.248|224 |197.96 |34.4 |43.13 |376.84 |256 | |
| 136 | | [convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384](https://huggingface.co/timm/convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384) |87.138|98.212|384 |88.59 |45.21 |84.49 |365.47 |256 | |
| 137 | | [convnext_xlarge.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_xlarge.fb_in22k_ft_in1k) |87.002|98.208|224 |350.2 |60.98 |57.5 |368.01 |256 | |
| 138 | | [convnext_base.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_base.fb_in22k_ft_in1k_384) |86.796|98.264|384 |88.59 |45.21 |84.49 |366.54 |256 | |
| 139 | | [convnextv2_base.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_base.fcmae_ft_in22k_in1k) |86.74 |98.022|224 |88.72 |15.38 |28.75 |624.23 |256 | |
| 140 | | [convnext_large.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_large.fb_in22k_ft_in1k) |86.636|98.028|224 |197.77 |34.4 |43.13 |581.43 |256 | |
| 141 | | [convnext_base.clip_laiona_augreg_ft_in1k_384](https://huggingface.co/timm/convnext_base.clip_laiona_augreg_ft_in1k_384) |86.504|97.97 |384 |88.59 |45.21 |84.49 |368.14 |256 | |
| 142 | | [convnext_base.clip_laion2b_augreg_ft_in12k_in1k](https://huggingface.co/timm/convnext_base.clip_laion2b_augreg_ft_in12k_in1k) |86.344|97.97 |256 |88.59 |20.09 |37.55 |816.14 |256 | |
| 143 | | [convnextv2_huge.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_huge.fcmae_ft_in1k) |86.256|97.75 |224 |660.29 |115.0 |79.07 |154.72 |256 | |
| 144 | | [convnext_small.in12k_ft_in1k_384](https://huggingface.co/timm/convnext_small.in12k_ft_in1k_384) |86.182|97.92 |384 |50.22 |25.58 |63.37 |516.19 |256 | |
| 145 | | [convnext_base.clip_laion2b_augreg_ft_in1k](https://huggingface.co/timm/convnext_base.clip_laion2b_augreg_ft_in1k) |86.154|97.68 |256 |88.59 |20.09 |37.55 |819.86 |256 | |
| 146 | | [convnext_base.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_base.fb_in22k_ft_in1k) |85.822|97.866|224 |88.59 |15.38 |28.75 |1037.66 |256 | |
| 147 | | [convnext_small.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_small.fb_in22k_ft_in1k_384) |85.778|97.886|384 |50.22 |25.58 |63.37 |518.95 |256 | |
| 148 | | [convnextv2_large.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_large.fcmae_ft_in1k) |85.742|97.584|224 |197.96 |34.4 |43.13 |375.23 |256 | |
| 149 | | [convnext_small.in12k_ft_in1k](https://huggingface.co/timm/convnext_small.in12k_ft_in1k) |85.174|97.506|224 |50.22 |8.71 |21.56 |1474.31 |256 | |
| 150 | | [convnext_tiny.in12k_ft_in1k_384](https://huggingface.co/timm/convnext_tiny.in12k_ft_in1k_384) |85.118|97.608|384 |28.59 |13.14 |39.48 |856.76 |256 | |
| 151 | | [convnextv2_tiny.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_tiny.fcmae_ft_in22k_in1k_384) |85.112|97.63 |384 |28.64 |13.14 |39.48 |491.32 |256 | |
| 152 | | [convnextv2_base.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_base.fcmae_ft_in1k) |84.874|97.09 |224 |88.72 |15.38 |28.75 |625.33 |256 | |
| 153 | | [convnext_small.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_small.fb_in22k_ft_in1k) |84.562|97.394|224 |50.22 |8.71 |21.56 |1478.29 |256 | |
| 154 | | [convnext_large.fb_in1k](https://huggingface.co/timm/convnext_large.fb_in1k) |84.282|96.892|224 |197.77 |34.4 |43.13 |584.28 |256 | |
| 155 | | [convnext_tiny.in12k_ft_in1k](https://huggingface.co/timm/convnext_tiny.in12k_ft_in1k) |84.186|97.124|224 |28.59 |4.47 |13.44 |2433.7 |256 | |
| 156 | | [convnext_tiny.fb_in22k_ft_in1k_384](https://huggingface.co/timm/convnext_tiny.fb_in22k_ft_in1k_384) |84.084|97.14 |384 |28.59 |13.14 |39.48 |862.95 |256 | |
| 157 | | [convnextv2_tiny.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_tiny.fcmae_ft_in22k_in1k) |83.894|96.964|224 |28.64 |4.47 |13.44 |1452.72 |256 | |
| 158 | | [convnext_base.fb_in1k](https://huggingface.co/timm/convnext_base.fb_in1k) |83.82 |96.746|224 |88.59 |15.38 |28.75 |1054.0 |256 | |
| 159 | | [convnextv2_nano.fcmae_ft_in22k_in1k_384](https://huggingface.co/timm/convnextv2_nano.fcmae_ft_in22k_in1k_384) |83.37 |96.742|384 |15.62 |7.22 |24.61 |801.72 |256 | |
| 160 | | [convnext_small.fb_in1k](https://huggingface.co/timm/convnext_small.fb_in1k) |83.142|96.434|224 |50.22 |8.71 |21.56 |1464.0 |256 | |
| 161 | | [convnextv2_tiny.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_tiny.fcmae_ft_in1k) |82.92 |96.284|224 |28.64 |4.47 |13.44 |1425.62 |256 | |
| 162 | | [convnext_tiny.fb_in22k_ft_in1k](https://huggingface.co/timm/convnext_tiny.fb_in22k_ft_in1k) |82.898|96.616|224 |28.59 |4.47 |13.44 |2480.88 |256 | |
| 163 | | [convnext_nano.in12k_ft_in1k](https://huggingface.co/timm/convnext_nano.in12k_ft_in1k) |82.282|96.344|224 |15.59 |2.46 |8.37 |3926.52 |256 | |
| 164 | | [convnext_tiny_hnf.a2h_in1k](https://huggingface.co/timm/convnext_tiny_hnf.a2h_in1k) |82.216|95.852|224 |28.59 |4.47 |13.44 |2529.75 |256 | |
| 165 | | [convnext_tiny.fb_in1k](https://huggingface.co/timm/convnext_tiny.fb_in1k) |82.066|95.854|224 |28.59 |4.47 |13.44 |2346.26 |256 | |
| 166 | | [convnextv2_nano.fcmae_ft_in22k_in1k](https://huggingface.co/timm/convnextv2_nano.fcmae_ft_in22k_in1k) |82.03 |96.166|224 |15.62 |2.46 |8.37 |2300.18 |256 | |
| 167 | | [convnextv2_nano.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_nano.fcmae_ft_in1k) |81.83 |95.738|224 |15.62 |2.46 |8.37 |2321.48 |256 | |
| 168 | | [convnext_nano_ols.d1h_in1k](https://huggingface.co/timm/convnext_nano_ols.d1h_in1k) |80.866|95.246|224 |15.65 |2.65 |9.38 |3523.85 |256 | |
| 169 | | [convnext_nano.d1h_in1k](https://huggingface.co/timm/convnext_nano.d1h_in1k) |80.768|95.334|224 |15.59 |2.46 |8.37 |3915.58 |256 | |
| 170 | | [convnextv2_pico.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_pico.fcmae_ft_in1k) |80.304|95.072|224 |9.07 |1.37 |6.1 |3274.57 |256 | |
| 171 | | [convnext_pico.d1_in1k](https://huggingface.co/timm/convnext_pico.d1_in1k) |79.526|94.558|224 |9.05 |1.37 |6.1 |5686.88 |256 | |
| 172 | | [convnext_pico_ols.d1_in1k](https://huggingface.co/timm/convnext_pico_ols.d1_in1k) |79.522|94.692|224 |9.06 |1.43 |6.5 |5422.46 |256 | |
| 173 | | [convnextv2_femto.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_femto.fcmae_ft_in1k) |78.488|93.98 |224 |5.23 |0.79 |4.57 |4264.2 |256 | |
| 174 | | [convnext_femto_ols.d1_in1k](https://huggingface.co/timm/convnext_femto_ols.d1_in1k) |77.86 |93.83 |224 |5.23 |0.82 |4.87 |6910.6 |256 | |
| 175 | | [convnext_femto.d1_in1k](https://huggingface.co/timm/convnext_femto.d1_in1k) |77.454|93.68 |224 |5.22 |0.79 |4.57 |7189.92 |256 | |
| 176 | | [convnextv2_atto.fcmae_ft_in1k](https://huggingface.co/timm/convnextv2_atto.fcmae_ft_in1k) |76.664|93.044|224 |3.71 |0.55 |3.81 |4728.91 |256 | |
| 177 | | [convnext_atto_ols.a2_in1k](https://huggingface.co/timm/convnext_atto_ols.a2_in1k) |75.88 |92.846|224 |3.7 |0.58 |4.11 |7963.16 |256 | |
| 178 | | [convnext_atto.d2_in1k](https://huggingface.co/timm/convnext_atto.d2_in1k) |75.664|92.9 |224 |3.7 |0.55 |3.81 |8439.22 |256 | |
| 179 | |
| 180 | ## Citation |
| 181 | ```bibtex |
| 182 | @article{Woo2023ConvNeXtV2, |
| 183 | title={ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders}, |
| 184 | author={Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon and Saining Xie}, |
| 185 | year={2023}, |
| 186 | journal={arXiv preprint arXiv:2301.00808}, |
| 187 | } |
| 188 | ``` |
| 189 | ```bibtex |
| 190 | @misc{rw2019timm, |
| 191 | author = {Ross Wightman}, |
| 192 | title = {PyTorch Image Models}, |
| 193 | year = {2019}, |
| 194 | publisher = {GitHub}, |
| 195 | journal = {GitHub repository}, |
| 196 | doi = {10.5281/zenodo.4414861}, |
| 197 | howpublished = {\url{https://github.com/huggingface/pytorch-image-models}} |
| 198 | } |
| 199 | ``` |
| 200 | |