README.md
12.4 KB · 316 lines · markdown Raw
1 ---
2 license: mit
3 base_model:
4 - timm/eva02_base_patch14_448.mim_in22k_ft_in22k_in1k
5 pipeline_tag: image-classification
6 tags:
7 - pytorch
8 - transformers
9 ---
10
11 # EVA-based Fast NSFW Image Classifier
12
13 ## Table of Contents
14 - [Model Description](#model-description)
15 - [Try it Online!](#try-it-online-)
16 - [Model Performance Comparison](#model-performance-comparison)
17 - [Global Performance](#global-performance)
18 - [Accuracy by AI Content](#accuracy-by-ai-content)
19 - [AI-Generated Content](#ai-generated-content)
20 - [Non-AI-Generated Content](#non-ai-generated-content)
21 - [Usage](#usage)
22 - [Quick Start via pip](#quick-start-via-pip)
23 - [Quick Start with Pipeline](#quick-start-with-pipeline)
24 - [Avoid installation of pip dependency](#avoid-installation-of-pip-dependency)
25 - [Training](#training)
26 - [Speed and Memory Metrics](#speed-and-memory-metrics)
27
28 ## Model Description
29
30 This model is a vision transformer based on the **EVA architecture**, fine-tuned for **NSFW content classification**. It has been trained
31 to detect **four categories** (neutral, low, medium, high) of visual content using **100,000 synthetically labeled images**.
32
33 The model can be used as a **binary (true/false) classifier if desired, or you can obtain the full output probabilities.**. It **outperforms other excellent publicly available models** such as [Falconsai/nsfw_image_detection](https://huggingface.co/Falconsai/nsfw_image_detection) or [AdamCodd/vit-base-nsfw-detector](https://huggingface.co/AdamCodd/vit-base-nsfw-detector) in our internal benchmarks adding the enrichment of being able to select the NSFW level that suits your use case.
34
35 ## Try it Online! 🚀
36
37 You can try this model directly in your browser through our [Hugging Face Space](https://huggingface.co/spaces/ccabrerafreepik/nsfw_image_detector). Upload any image and get instant NSFW classification results without any installation required.
38
39 ## Model Performance Comparison
40
41 ### Global Performance
42
43 | Category | Freepik | Falconsai | Adamcodd |
44 |----------|-------------|------------------|----------------|
45 | High | 99.54% | 97.92% | 98.62% |
46 | Medium | 97.02% | 78.54% | 91.65% |
47 | Low | 98.31% | 31.25% | 89.66% |
48 | Neutral | 99.87% | 99.27% | 98.37% |
49
50
51 In the table below, the results are obtained as follows:
52
53 * For the **Falconsai and AdamCodd** models:
54 * A prediction is considered correct if the image is labeled "low", "medium", or "high" and the model returns true.
55 * If the label is "neutral", the correct output should be false.
56
57 * For the **Freepik model**:
58 * If the image label is "low", "medium", or "high", the model should return at least "low".
59 * If the label is "neutral", the correct output should be "neutral".
60
61
62 **Conclusions:**
63
64 * Our model **outperforms AdamCodd and Falconsai in accuracy**. It is entirely fair to compare them on the "high" and "neutral" labels.
65 * Our model **offers greater granularity**. It is not only suitable for detecting "high" and "neutral" content, but also performs excellently at identifying "low" and "medium" NSFW content.
66 * Falconsai may classify some "medium" and "low" images as not NSFW but mark others as safe for work(SFW), which could lead to unexpected results.
67 * AdamCodd classifies both "low" and "medium" categories as NSFW, which may not be desirable depending on your use case. Furthermore, a 10% of images in low and medium are considered SFW.
68
69 ### Accuracy by AI Content
70
71 We have created a **manually labeled dataset** with careful attention to **avoiding biases** (gender, ethnicity, etc.). While the sample size is relatively small, it provides meaningful insights into model performance across different scenarios, which was very useful in the training process to avoid biases.
72
73 The following tables show detection accuracy percentages across different NSFW categories and content types:
74
75 #### AI-Generated Content
76
77 | Category | Freepik Model | Falconsai Model | Adamcodd Model |
78 |----------|-------------|------------------|----------------|
79 | High | 100.00% | 84.00% | 92.00% |
80 | Medium | 96.15% | 69.23% | 96.00% |
81 | Low | 100.00% | 35.71% | 92.86% |
82 | Neutral | 100.00% | 100.00% | 66.67% |
83
84
85 **Conclusions:**
86 * **Avoid using Falconsai for AI-generated content** to prevent prediction errors.
87 * **Our model is the best option to detect NSFW content in AI-generated content**.
88
89
90 ## Usage
91
92 ### Quick Start via pip
93
94 ```sh
95 pip install nsfw-image-detector
96 ```
97
98 ```python
99 from PIL import Image
100 from nsfw_image_detector import NSFWDetector
101 import torch
102
103 # Initialize the detector
104 detector = NSFWDetector(dtype=torch.bfloat16, device="cuda")
105
106 # Load and classify an image
107 image = Image.open("your_image")
108
109 # Check if the image contains NSFW content sentivity level medium or higher
110 is_nsfw = detector.is_nsfw(image, "medium")
111
112 # Get probability scores for all categories
113 probabilities = detector.predict_proba(image)
114 print(f"Is NSFW: {is_nsfw}")
115 print(f"Probabilities: {probabilities}")
116 ```
117
118 Example output:
119 ```python
120 Is NSFW: False
121 Probabilities:
122 [
123 {<NSFWLevel.HIGH: 'high'>: 0.00372314453125,
124 <NSFWLevel.MEDIUM: 'medium'>: 0.1884765625,
125 <NSFWLevel.LOW: 'low'>: 0.234375,
126 <NSFWLevel.NEUTRAL: 'neutral'>: 0.765625}
127 ]
128 ```
129
130 ### Quick Start with Pipeline
131
132 ```python
133 from transformers import pipeline
134 from PIL import Image
135
136 # Create classifier pipeline
137 classifier = pipeline(
138 "image-classification",
139 model="Freepik/nsfw_image_detector",
140 device=0 # Use GPU (0) or CPU (-1)
141 )
142
143 # Load and classify an image
144 image = Image.open("path/to/your/image.jpg")
145 predictions = classifier(image)
146 print(predictions)
147 ```
148
149 Example output:
150 ```python
151 [
152 {'label': 'neutral', 'score': 0.92},
153 {'label': 'low', 'score': 0.05},
154 {'label': 'medium', 'score': 0.02},
155 {'label': 'high', 'score': 0.01}
156 ]
157 ```
158
159 The model supports efficient batch processing for multiple images:
160
161 ```python
162 images = [Image.open(path) for path in ["image1.jpg", "image2.jpg", "image3.jpg"]]
163 predictions = classifier(images)
164 ```
165
166 **Note**: If the intention is to use the model in production review [Speed and Memory Metrics](#speed-and-memory-metrics) section before using this approach.
167
168 ### Avoid installation of pip dependency
169
170 The following example demonstrates how to **customize the NSFW detection label**, it is very similar to the code in [PyPy](https://pypi.org/project/nsfw-image-detector/0.1.0/). This code returns True if the NSFW level is 'medium' or higher:
171
172 ```python
173 from transformers import AutoModelForImageClassification
174 import torch
175 from PIL import Image
176 from typing import List, Dict
177 import torch.nn.functional as F
178 from timm.data.transforms_factory import create_transform
179 from torchvision.transforms import Compose
180 from timm.data import resolve_data_config
181 from timm.models import get_pretrained_cfg
182
183
184 device = "cuda" if torch.cuda.is_available() else "cpu"
185
186 # Load model and processor
187 model = AutoModelForImageClassification.from_pretrained("Freepik/nsfw_image_detector", torch_dtype = torch.bfloat16).to(device)
188
189 # Load original processor (faster for tensors)
190 cfg = get_pretrained_cfg("eva02_base_patch14_448.mim_in22k_ft_in22k_in1k")
191 processor: Compose = create_transform(**resolve_data_config(cfg.__dict__))
192
193 def predict_batch_values(model, processor: Compose, img_batch: List[Image.Image] | torch.Tensor) -> List[Dict[str, float]]:
194 """
195 Process a batch of images and return prediction scores for each NSFW category
196 """
197 idx_to_label = {0: 'neutral', 1: 'low', 2: 'medium', 3: 'high'}
198
199 # Prepare batch
200 inputs = torch.stack([processor(img) for img in img_batch])
201 output = []
202 with torch.inference_mode():
203 logits = model(inputs).logits
204 batch_probs = F.log_softmax(logits, dim=-1)
205 batch_probs = torch.exp(batch_probs).cpu()
206
207 for i in range(len(batch_probs)):
208 element_probs = batch_probs[i]
209 output_img = {}
210 danger_cum_sum = 0
211
212 for j in range(len(element_probs) - 1, -1, -1):
213 danger_cum_sum += element_probs[j]
214 if j == 0:
215 danger_cum_sum = element_probs[j]
216 output_img[idx_to_label[j]] = danger_cum_sum.item()
217 output.append(output_img)
218
219 return output
220
221 def prediction(model, processor, img_batch: List[Image.Image], class_to_predict: str, threshold: float=0.5) -> List[bool]:
222 """
223 Predict if images meet or exceed a specific NSFW threshold
224 """
225 if class_to_predict not in ["low", "medium", "high"]:
226 raise ValueError("class_to_predict must be one of: low, medium, high")
227
228 if not 0 <= threshold <= 1:
229 raise ValueError("threshold must be between 0 and 1")
230
231 output = predict_batch_values(model, processor, img_batch)
232 return [output[i][class_to_predict] >= threshold for i in range(len(output))]
233
234 # Example usage
235 image = Image.open("path/to/your/image.jpg")
236 print(predict_batch_values(model, processor, [image]))
237 print(prediction(model, processor, [image], "medium")) # Options: low, medium, high
238 ```
239
240 Example output:
241
242 ```python
243 [
244 {'label': 'neutral', 'score': 0.92},
245 {'label': 'low', 'score': 0.08},
246 {'label': 'medium', 'score': 0.03},
247 {'label': 'high', 'score': 0.01}
248 ]
249
250 [False]
251 ```
252 **Note**: The sum is higher than one because the prediction is the cumulative sum of all labels equal to or higher than your selected label, except neutral. For instance, if you select 'medium', it is the sum of 'medium' and 'high'. In our opinion, this approach is more effective than selecting only the highest probability label.
253
254 ## Training
255
256 * **100,000 images** were used during training.
257 * The model were trained for **3 epochs on 3 NVIDIA GeForce RTX 3090**
258 * The model were trained using two sets, training and validation.
259 * There are **no images with a cosine similarity higher than 0.92** to avoid duplicates and biases between training and validation. The model used for deduplication is "openai/clip-vit-base-patch32"
260 * A **custom loss** was created to minimize predictions that are lower than the true class. For instance, it is very rare for an image labeled as 'high' to be predicted as 'neutral' (this only happens 0.46% of the time).
261
262 ## Speed and Memory Metrics
263
264 | Batch Size | Avg by batch (ms) | VRAM (MB) | Optimizations |
265 |------------|------------------|------------|---------------|
266 | 1 | 28 | 540 | BF16 using PIL images |
267 | 4 | 110 | 640 | BF16 using PIL images |
268 | 16 | 412 | 1144 | BF16 using PIL images |
269 | 1 | 10 | 540 | BF16 using torch tensor |
270 | 4 | 33 | 640 | BF16 using torch tensor |
271 | 16 | 102 | 1144 | BF16 using torch tensor |
272
273 **Notes:**
274 * The model has been trained in bf16 so it is **recommended to use it in bf16**.
275 * **Using torch tensor**: The speed using torch tensor is not achieved using pipeline. Avoid pipeline use in production.
276 * Measurements taken on **NVIDIA RTX 3090**, expect better metrics in more powerful servers.
277 * Throughput increases with larger batch sizes due to better GPU utilization. Consider your use case when selecting batch size.
278 * Optimizations listed are suggestions that could further improve performance.
279 * **Using torch tensors is specially indicated** in cases such as use the model for **text to image models or similar** because the output is already in tensor format.
280
281
282
283 ## License
284
285 This project is licensed under the MIT License - Copyright 2025 Freepik Company S.L.
286
287
288 ## Citation
289
290 If you use this model in your research or project, please cite it as:
291
292 ```bibtex
293 @software{freepik2025nsfw,
294 title={EVA-based Fast NSFW Image Classifier},
295 author={Freepik Company S.L.},
296 year={2025},
297 publisher={Hugging Face},
298 url = {https://huggingface.co/Freepik/nsfw_image_detector},
299 organization = {Freepik Company S.L.}
300 }
301 ```
302
303 ## Acknowledgements
304
305 This model is based on the EVA architecture ([timm/eva02_base_patch14_448.mim_in22k_ft_in22k_in1k](https://huggingface.co/timm/eva02_base_patch14_448.mim_in22k_ft_in22k_in1k)), as described in the following paper:
306
307 EVA-02: A Visual Representation for Neon Genesis - https://arxiv.org/abs/2303.11331
308
309 ```bibtex
310 @article{EVA02,
311 title={EVA-02: A Visual Representation for Neon Genesis},
312 author={Fang, Yuxin and Sun, Quan and Wang, Xinggang and Huang, Tiejun and Wang, Xinlong and Cao, Yue},
313 journal={arXiv preprint arXiv:2303.11331},
314 year={2023}
315 }
316 ```