README.md · nsfw_image_detection

1

---

2

license: apache-2.0

3

pipeline_tag: image-classification

4

---

5

# Model Card: Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification

6

7

## Model Description

8

9

The **Fine-Tuned Vision Transformer (ViT)** is a variant of the transformer encoder architecture, similar to BERT, that has been adapted for image classification tasks. This specific model, named "google/vit-base-patch16-224-in21k," is pre-trained on a substantial collection of images in a supervised manner, leveraging the ImageNet-21k dataset. The images in the pre-training dataset are resized to a resolution of 224x224 pixels, making it suitable for a wide range of image recognition tasks.

10

11

During the training phase, meticulous attention was given to hyperparameter settings to ensure optimal model performance. The model was fine-tuned with a judiciously chosen batch size of 16. This choice not only balanced computational efficiency but also allowed for the model to effectively process and learn from a diverse array of images.

12

13

To facilitate this fine-tuning process, a learning rate of 5e-5 was employed. The learning rate serves as a critical tuning parameter that dictates the magnitude of adjustments made to the model's parameters during training. In this case, a learning rate of 5e-5 was selected to strike a harmonious balance between rapid convergence and steady optimization, resulting in a model that not only learns swiftly but also steadily refines its capabilities throughout the training process.

14

15

This training phase was executed using a proprietary dataset containing an extensive collection of 80,000 images, each characterized by a substantial degree of variability. The dataset was thoughtfully curated to include two distinct classes, namely "normal" and "nsfw." This diversity allowed the model to grasp nuanced visual patterns, equipping it with the competence to accurately differentiate between safe and explicit content.

16

17

The overarching objective of this meticulous training process was to impart the model with a deep understanding of visual cues, ensuring its robustness and competence in tackling the specific task of NSFW image classification. The result is a model that stands ready to contribute significantly to content safety and moderation, all while maintaining the highest standards of accuracy and reliability.

18

## Intended Uses & Limitations

19

20

### Intended Uses

21

- **NSFW Image Classification**: The primary intended use of this model is for the classification of NSFW (Not Safe for Work) images. It has been fine-tuned for this purpose, making it suitable for filtering explicit or inappropriate content in various applications.

22

23

### How to use

24

Here is how to use this model to classifiy an image based on 1 of 2 classes (normal,nsfw):

25

26

```markdown

27

28

# Use a pipeline as a high-level helper

29

from PIL import Image

30

from transformers import pipeline

31

32

img = Image.open("<path_to_image_file>")

33

classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection")

34

classifier(img)

35

36

```

37

38

<hr>

39

40

``` markdown

41

42

# Load model directly

43

import torch

44

from PIL import Image

45

from transformers import AutoModelForImageClassification, ViTImageProcessor

46

47

img = Image.open("<path_to_image_file>")

48

model = AutoModelForImageClassification.from_pretrained("Falconsai/nsfw_image_detection")

49

processor = ViTImageProcessor.from_pretrained('Falconsai/nsfw_image_detection')

50

with torch.no_grad():

51

inputs = processor(images=img, return_tensors="pt")

52

outputs = model(**inputs)

53

logits = outputs.logits

54

55

predicted_label = logits.argmax(-1).item()

56

model.config.id2label[predicted_label]

57

58

```

59

60

<hr>

61

Run Yolo Version

62

63

``` markdown

64

65

import os

66

import matplotlib.pyplot as plt

67

from PIL import Image

68

import numpy as np

69

import onnxruntime as ort

70

import json # Added import for json

71

72

# Predict using YOLOv9 model

73

def predict_with_yolov9(image_path, model_path, labels_path, input_size):

74

"""

75

Run inference using the converted YOLOv9 model on a single image.

76

77

Args:

78

image_path (str): Path to the input image file.

79

model_path (str): Path to the ONNX model file.

80

labels_path (str): Path to the JSON file containing class labels.

81

input_size (tuple): The expected input size (height, width) for the model.

82

83

Returns:

84

str: The predicted class label.

85

PIL.Image.Image: The original loaded image.

86

"""

87

def load_json(file_path):

88

with open(file_path, "r") as f:

89

return json.load(f)

90

91

# Load labels

92

labels = load_json(labels_path)

93

94

# Preprocess image

95

original_image = Image.open(image_path).convert("RGB")

96

image_resized = original_image.resize(input_size, Image.Resampling.BILINEAR)

97

image_np = np.array(image_resized, dtype=np.float32) / 255.0

98

image_np = np.transpose(image_np, (2, 0, 1)) # [C, H, W]

99

input_tensor = np.expand_dims(image_np, axis=0).astype(np.float32)

100

101

# Load YOLOv9 model

102

session = ort.InferenceSession(model_path)

103

input_name = session.get_inputs()[0].name

104

output_name = session.get_outputs()[0].name # Assuming classification output

105

106

# Run inference

107

outputs = session.run([output_name], {input_name: input_tensor})

108

predictions = outputs[0]

109

110

# Postprocess predictions (assuming classification output)

111

# Adapt this section if your model output is different (e.g., detection boxes)

112

predicted_index = np.argmax(predictions)

113

predicted_label = labels[str(predicted_index)] # Assumes labels are indexed by string numbers

114

115

return predicted_label, original_image

116

117

# Display prediction for a single image

118

def display_single_prediction(image_path, model_path, labels_path, input_size):

119

"""

120

Predicts the class for a single image and displays the image with its prediction.

121

122

Args:

123

image_path (str): Path to the input image file.

124

model_path (str): Path to the ONNX model file.

125

labels_path (str): Path to the JSON file containing class labels.

126

input_size (tuple): The expected input size (height, width) for the model.

127

"""

128

try:

129

# Run prediction

130

prediction, img = predict_with_yolov9(image_path, model_path, labels_path, input_size)

131

132

# Display image and prediction

133

fig, ax = plt.subplots(1, 1, figsize=(8, 8)) # Create a single plot

134

ax.imshow(img)

135

ax.set_title(f"Prediction: {prediction}", fontsize=14)

136

ax.axis("off") # Hide axes ticks and labels

137

138

plt.tight_layout()

139

plt.show()

140

141

except FileNotFoundError:

142

print(f"Error: Image file not found at {image_path}")

143

except Exception as e:

144

print(f"An error occurred: {e}")

145

146

147

# --- Main Execution ---

148

149

# Paths and parameters - **MODIFY THESE**

150

single_image_path = "path/to/your/single_image.jpg" # <--- Replace with the actual path to your image file

151

model_path = "path/to/your/yolov9_model.onnx" # <--- Replace with the actual path to your ONNX model

152

labels_path = "path/to/your/labels.json" # <--- Replace with the actual path to your labels JSON file

153

input_size = (224, 224) # Standard input size, adjust if your model differs

154

155

# Check if the image file exists before proceeding (optional but recommended)

156

if os.path.exists(single_image_path):

157

# Run prediction and display for the single image

158

display_single_prediction(single_image_path, model_path, labels_path, input_size)

159

else:

160

print(f"Error: The specified image file does not exist: {single_image_path}")

161

162

```

163

164

<hr>

165

166

167

168

### Limitations

169

- **Specialized Task Fine-Tuning**: While the model is adept at NSFW image classification, its performance may vary when applied to other tasks.

170

- Users interested in employing this model for different tasks should explore fine-tuned versions available in the model hub for optimal results.

171

172

## Training Data

173

174

The model's training data includes a proprietary dataset comprising approximately 80,000 images. This dataset encompasses a significant amount of variability and consists of two distinct classes: "normal" and "nsfw." The training process on this data aimed to equip the model with the ability to distinguish between safe and explicit content effectively.

175

176

### Training Stats

177

``` markdown

178

179

- 'eval_loss': 0.07463177293539047,

180

- 'eval_accuracy': 0.980375,

181

- 'eval_runtime': 304.9846,

182

- 'eval_samples_per_second': 52.462,

183

- 'eval_steps_per_second': 3.279

184

185

```

186

187

<hr>

188

189

190

**Note:** It's essential to use this model responsibly and ethically, adhering to content guidelines and applicable regulations when implementing it in real-world applications, particularly those involving potentially sensitive content.

191

192

For more details on model fine-tuning and usage, please refer to the model's documentation and the model hub.

193

194

## References

195

196

- [Hugging Face Model Hub](https://huggingface.co/models)

197

- [Vision Transformer (ViT) Paper](https://arxiv.org/abs/2010.11929)

198

- [ImageNet-21k Dataset](http://www.image-net.org/)

199

200

**Disclaimer:** The model's performance may be influenced by the quality and representativeness of the data it was fine-tuned on. Users are encouraged to assess the model's suitability for their specific applications and datasets.