README.md
9.2 KB · 318 lines · markdown Raw
1 ---
2 license: mit
3 base_model: MCG-NJU/videomae-base
4 tags:
5 - video-classification
6 - crime-detection
7 - violence-detection
8 - videomae
9 - computer-vision
10 - security
11 - surveillance
12 - generated_from_trainer
13 language:
14 - en
15 datasets:
16 - jinmang2/ucf_crime
17 metrics:
18 - accuracy
19 - precision
20 - recall
21 - f1
22 pipeline_tag: video-classification
23 model-index:
24 - name: videomae-crime-detector-maxdata-v1
25 results:
26 - task:
27 name: Violence Detection
28 type: video-classification
29 dataset:
30 name: UCF Crime Dataset (Subset)
31 type: jinmang2/ucf_crime
32 args: violence_detection
33 metrics:
34 - name: Accuracy
35 type: accuracy
36 value: 0.7292
37 - name: Precision
38 type: precision
39 value: 0.7289
40 - name: Recall
41 type: recall
42 value: 0.7292
43 - name: F1
44 type: f1
45 value: 0.7287
46 ---
47
48 # Nikeytas/Videomae Crime Detector Maxdata V1
49
50 This model is a fine-tuned version of [MCG-NJU/videomae-base](https://huggingface.co/MCG-NJU/videomae-base) on the UCF Crime dataset with **event-based binary classification**. It achieves the following results on the evaluation set:
51
52 - **Loss**: 0.8405
53 - **Accuracy**: 0.7292
54 - **Precision**: 0.7289
55 - **Recall**: 0.7292
56 - **F1 Score**: 0.7287
57
58 ## 🎯 Model Overview
59
60 This VideoMAE model has been fine-tuned for **binary violence detection** in video content. The model classifies videos into two categories:
61 - **Violent Crime** (1): Videos containing violent criminal activities
62 - **Non-Violent Incident** (0): Videos with non-violent or normal activities
63
64 The model is based on the **VideoMAE architecture** and has been specifically trained on a curated subset of the UCF Crime dataset with event-based categorization for realistic crime detection scenarios.
65
66 ## 📊 Dataset & Training
67
68 ### Dataset Composition
69
70 **Total Videos**: 600
71 - **Violent Crime Videos**: 300
72 - **Non-Violent Incident Videos**: 300
73
74 **Class Balance**: 50.0% violent crimes
75
76 **Event Distribution**:
77 - **Abuse**: 28 videos
78 - **Arrest**: 18 videos
79 - **Arson**: 16 videos
80 - **Assault**: 62 videos
81 - **Burglary**: 120 videos
82 - **Explosion**: 54 videos
83 - **Fighting**: 48 videos
84 - **RoadAccidents**: 58 videos
85 - **Robbery**: 184 videos
86 - **Shoplifting**: 36 videos
87 - **Stealing**: 46 videos
88 - **Vandalism**: 72 videos
89
90 **Data Splits**:
91 - **Training**: 384 videos
92 - **Validation**: 96 videos
93 - **Test**: 120 videos
94
95 ## 🎯 Performance
96
97 ### Performance Metrics
98
99 **Validation Performance**:
100 - **eval_loss**: 0.8405
101 - **eval_accuracy**: 0.7292
102 - **eval_precision**: 0.7289
103 - **eval_recall**: 0.7292
104 - **eval_f1**: 0.7287
105 - **eval_runtime**: 11.5149
106 - **eval_samples_per_second**: 8.3370
107 - **eval_steps_per_second**: 4.1690
108 - **epoch**: 8.0000
109
110 **Test Performance**:
111 - **eval_loss**: 0.8573
112 - **eval_accuracy**: 0.6750
113 - **eval_precision**: 0.6749
114 - **eval_recall**: 0.6750
115 - **eval_f1**: 0.6749
116 - **eval_runtime**: 13.8665
117 - **eval_samples_per_second**: 8.6540
118 - **eval_steps_per_second**: 4.3270
119 - **epoch**: 8.0000
120
121 **Training Information**:
122 - **Training Time**: 33.7 minutes
123 - **Best Accuracy Achieved**: 0.7292
124 - **Model Architecture**: VideoMAE Base (fine-tuned)
125 - **Fine-tuning Approach**: Event-based binary classification
126
127 ## 🚀 Training Procedure
128
129 ### Training Hyperparameters
130
131 The following hyperparameters were used during training:
132 - **Learning Rate**: 5e-05
133 - **Train Batch Size**: 2
134 - **Eval Batch Size**: 2
135 - **Optimizer**: AdamW with betas=(0.9,0.999) and epsilon=1e-08
136 - **LR Scheduler Type**: Linear
137 - **Training Epochs**: 8
138 - **Weight Decay**: 0.01
139
140 ### Training Results
141
142 | Training Loss | Epoch | Step | Validation Loss | Accuracy |
143 |---------------|-------|------|-----------------|----------|
144 | 0.7291666666666666 | 8.00 | N/A | 0.8405 | 0.7292 |
145
146 ### Framework Versions
147
148 - **Transformers**: 4.30.2+
149 - **PyTorch**: 2.0.1+
150 - **Datasets**: Latest
151 - **Device**: Apple Silicon MPS / CUDA / CPU (Auto-detected)
152
153 ## 🚀 Quick Start
154
155 ### Installation
156
157 ```bash
158 pip install transformers torch torchvision opencv-python pillow
159 ```
160
161 ### Basic Usage
162
163 ```python
164 import torch
165 from transformers import AutoModelForVideoClassification, AutoProcessor
166 import cv2
167 import numpy as np
168
169 # Load model and processor
170 model = AutoModelForVideoClassification.from_pretrained("Nikeytas/videomae-crime-detector-maxdata-v1")
171 processor = AutoProcessor.from_pretrained("Nikeytas/videomae-crime-detector-maxdata-v1")
172
173 # Process video
174 def classify_video(video_path, num_frames=16):
175 # Extract frames
176 cap = cv2.VideoCapture(video_path)
177 frames = []
178
179 total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
180 indices = np.linspace(0, total_frames - 1, num_frames, dtype=int)
181
182 for idx in indices:
183 cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
184 ret, frame = cap.read()
185 if ret:
186 frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
187 frames.append(frame_rgb)
188
189 cap.release()
190
191 # Process with model
192 inputs = processor(frames, return_tensors="pt")
193
194 with torch.no_grad():
195 outputs = model(**inputs)
196 predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
197 predicted_class = torch.argmax(predictions, dim=-1).item()
198 confidence = predictions[0][predicted_class].item()
199
200 label = "Violent Crime" if predicted_class == 1 else "Non-Violent"
201 return label, confidence
202
203 # Example usage
204 video_path = "path/to/your/video.mp4"
205 prediction, confidence = classify_video(video_path)
206 print(f"Prediction: {prediction} (Confidence: {confidence:.3f})")
207 ```
208
209 ### Batch Processing
210
211 ```python
212 import os
213 from pathlib import Path
214
215 def process_video_directory(video_dir, output_file="results.txt"):
216 results = []
217
218 for video_file in Path(video_dir).glob("*.mp4"):
219 try:
220 prediction, confidence = classify_video(str(video_file))
221 results.append({
222 "file": video_file.name,
223 "prediction": prediction,
224 "confidence": confidence
225 })
226 print(f"✅ {video_file.name}: {prediction} ({confidence:.3f})")
227 except Exception as e:
228 print(f"❌ Error processing {video_file.name}: {e}")
229
230 # Save results
231 with open(output_file, "w") as f:
232 for result in results:
233 f.write(f"{result['file']}: {result['prediction']} ({result['confidence']:.3f})\n")
234
235 return results
236
237 # Process all videos in a directory
238 results = process_video_directory("./videos/")
239 ```
240
241 ## 📈 Technical Specifications
242
243 - **Base Model**: MCG-NJU/videomae-base
244 - **Architecture**: Vision Transformer (ViT) adapted for video
245 - **Input Resolution**: 224x224 pixels per frame
246 - **Temporal Resolution**: 16 frames per video clip
247 - **Output Classes**: 2 (Binary classification)
248 - **Training Framework**: HuggingFace Transformers
249 - **Optimization**: AdamW optimizer with learning rate 5e-5
250
251 ## ⚠️ Limitations
252
253 1. **Dataset Scope**: Trained on a subset of UCF Crime dataset - may not generalize to all types of violence
254 2. **Temporal Context**: Uses 16-frame clips which may miss context in longer sequences
255 3. **Environmental Bias**: Performance may vary with different lighting, camera angles, and video quality
256 4. **False Positives**: May misclassify intense but non-violent activities (sports, action movies)
257 5. **Real-time Performance**: Processing time depends on hardware capabilities
258
259 ## 🔒 Ethical Considerations
260
261 ### Intended Use
262 - **Primary**: Research and development in video analysis
263 - **Secondary**: Security system enhancement with human oversight
264 - **Educational**: Computer vision and AI safety research
265
266 ### Prohibited Uses
267 - **Surveillance without consent**: Do not use for unauthorized monitoring
268 - **Discriminatory profiling**: Avoid bias against specific groups or communities
269 - **Automated punishment**: Never use for automated legal or disciplinary actions
270 - **Privacy violation**: Respect privacy laws and individual rights
271
272 ### Bias and Fairness
273 - Model trained on specific dataset that may not represent all populations
274 - Regular evaluation needed for bias detection and mitigation
275 - Human oversight required for critical applications
276 - Consider demographic representation in deployment scenarios
277
278 ## 📝 Model Card Information
279
280 - **Developed by**: Research Team
281 - **Model Type**: Video Classification (Binary)
282 - **Training Data**: UCF Crime Dataset (Subset)
283 - **Training Date**: 2025-06-02 00:28:28 UTC
284 - **Evaluation Metrics**: Accuracy, Precision, Recall, F1-Score
285 - **Intended Users**: Researchers, Security Professionals, Developers
286
287 ## 📚 Citation
288
289 If you use this model in your research, please cite:
290
291 ```bibtex
292 @misc{Nikeytas_videomae_crime_detector_maxdata_v1,
293 title={VideoMAE Fine-tuned for Crime Detection},
294 author={Research Team},
295 year={2024},
296 publisher={Hugging Face},
297 url={https://huggingface.co/Nikeytas/videomae-crime-detector-maxdata-v1}
298 }
299 ```
300
301 ## 🤝 Contributing
302
303 We welcome contributions to improve the model! Please:
304 1. Report issues with specific examples
305 2. Suggest improvements for bias reduction
306 3. Share evaluation results on new datasets
307 4. Contribute to documentation and examples
308
309 ## 📞 Contact
310
311 For questions, issues, or collaboration opportunities, please open an issue in the model repository or contact the development team.
312
313 ---
314
315 *Last updated: 2025-06-02 00:28:28 UTC*
316 *Model version: 1.0*
317 *Framework: HuggingFace Transformers*
318