README.md · videomae-crime-detector-maxdata-v1

README.md

9.2 KB · 318 lines · markdown Raw

1	`---`
2	`license: mit`
3	`base_model: MCG-NJU/videomae-base`
4	`tags:`
5	`- video-classification`
6	`- crime-detection`
7	`- violence-detection`
8	`- videomae`
9	`- computer-vision`
10	`- security`
11	`- surveillance`
12	`- generated_from_trainer`
13	`language:`
14	`- en`
15	`datasets:`
16	`- jinmang2/ucf_crime`
17	`metrics:`
18	`- accuracy`
19	`- precision`
20	`- recall`
21	`- f1`
22	`pipeline_tag: video-classification`
23	`model-index:`
24	`- name: videomae-crime-detector-maxdata-v1`
25	`results:`
26	`- task:`
27	`name: Violence Detection`
28	`type: video-classification`
29	`dataset:`
30	`name: UCF Crime Dataset (Subset)`
31	`type: jinmang2/ucf_crime`
32	`args: violence_detection`
33	`metrics:`
34	`- name: Accuracy`
35	`type: accuracy`
36	`value: 0.7292`
37	`- name: Precision`
38	`type: precision`
39	`value: 0.7289`
40	`- name: Recall`
41	`type: recall`
42	`value: 0.7292`
43	`- name: F1`
44	`type: f1`
45	`value: 0.7287`
46	`---`
47
48	`# Nikeytas/Videomae Crime Detector Maxdata V1`
49
50	`This model is a fine-tuned version of [MCG-NJU/videomae-base](https://huggingface.co/MCG-NJU/videomae-base) on the UCF Crime dataset with event-based binary classification. It achieves the following results on the evaluation set:`
51
52	`- Loss: 0.8405`
53	`- Accuracy: 0.7292`
54	`- Precision: 0.7289`
55	`- Recall: 0.7292`
56	`- F1 Score: 0.7287`
57
58	`## 🎯 Model Overview`
59
60	`This VideoMAE model has been fine-tuned for binary violence detection in video content. The model classifies videos into two categories:`
61	`- Violent Crime (1): Videos containing violent criminal activities`
62	`- Non-Violent Incident (0): Videos with non-violent or normal activities`
63
64	`The model is based on the VideoMAE architecture and has been specifically trained on a curated subset of the UCF Crime dataset with event-based categorization for realistic crime detection scenarios.`
65
66	`## 📊 Dataset & Training`
67
68	`### Dataset Composition`
69
70	`Total Videos: 600`
71	`- Violent Crime Videos: 300`
72	`- Non-Violent Incident Videos: 300`
73
74	`Class Balance: 50.0% violent crimes`
75
76	`Event Distribution:`
77	`- Abuse: 28 videos`
78	`- Arrest: 18 videos`
79	`- Arson: 16 videos`
80	`- Assault: 62 videos`
81	`- Burglary: 120 videos`
82	`- Explosion: 54 videos`
83	`- Fighting: 48 videos`
84	`- RoadAccidents: 58 videos`
85	`- Robbery: 184 videos`
86	`- Shoplifting: 36 videos`
87	`- Stealing: 46 videos`
88	`- Vandalism: 72 videos`
89
90	`Data Splits:`
91	`- Training: 384 videos`
92	`- Validation: 96 videos`
93	`- Test: 120 videos`
94
95	`## 🎯 Performance`
96
97	`### Performance Metrics`
98
99	`Validation Performance:`
100	`- eval_loss: 0.8405`
101	`- eval_accuracy: 0.7292`
102	`- eval_precision: 0.7289`
103	`- eval_recall: 0.7292`
104	`- eval_f1: 0.7287`
105	`- eval_runtime: 11.5149`
106	`- eval_samples_per_second: 8.3370`
107	`- eval_steps_per_second: 4.1690`
108	`- epoch: 8.0000`
109
110	`Test Performance:`
111	`- eval_loss: 0.8573`
112	`- eval_accuracy: 0.6750`
113	`- eval_precision: 0.6749`
114	`- eval_recall: 0.6750`
115	`- eval_f1: 0.6749`
116	`- eval_runtime: 13.8665`
117	`- eval_samples_per_second: 8.6540`
118	`- eval_steps_per_second: 4.3270`
119	`- epoch: 8.0000`
120
121	`Training Information:`
122	`- Training Time: 33.7 minutes`
123	`- Best Accuracy Achieved: 0.7292`
124	`- Model Architecture: VideoMAE Base (fine-tuned)`
125	`- Fine-tuning Approach: Event-based binary classification`
126
127	`## 🚀 Training Procedure`
128
129	`### Training Hyperparameters`
130
131	`The following hyperparameters were used during training:`
132	`- Learning Rate: 5e-05`
133	`- Train Batch Size: 2`
134	`- Eval Batch Size: 2`
135	`- Optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08`
136	`- LR Scheduler Type: Linear`
137	`- Training Epochs: 8`
138	`- Weight Decay: 0.01`
139
140	`### Training Results`
141
142	`\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|`
143	`\|---------------\|-------\|------\|-----------------\|----------\|`
144	`\| 0.7291666666666666 \| 8.00 \| N/A \| 0.8405 \| 0.7292 \|`
145
146	`### Framework Versions`
147
148	`- Transformers: 4.30.2+`
149	`- PyTorch: 2.0.1+`
150	`- Datasets: Latest`
151	`- Device: Apple Silicon MPS / CUDA / CPU (Auto-detected)`
152
153	`## 🚀 Quick Start`
154
155	`### Installation`
156
157	```bash
158	`pip install transformers torch torchvision opencv-python pillow`
159	```
160
161	`### Basic Usage`
162
163	```python
164	`import torch`
165	`from transformers import AutoModelForVideoClassification, AutoProcessor`
166	`import cv2`
167	`import numpy as np`
168
169	`# Load model and processor`
170	`model = AutoModelForVideoClassification.from_pretrained("Nikeytas/videomae-crime-detector-maxdata-v1")`
171	`processor = AutoProcessor.from_pretrained("Nikeytas/videomae-crime-detector-maxdata-v1")`
172
173	`# Process video`
174	`def classify_video(video_path, num_frames=16):`
175	`# Extract frames`
176	`cap = cv2.VideoCapture(video_path)`
177	`frames = []`
178
179	`total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))`
180	`indices = np.linspace(0, total_frames - 1, num_frames, dtype=int)`
181
182	`for idx in indices:`
183	`cap.set(cv2.CAP_PROP_POS_FRAMES, idx)`
184	`ret, frame = cap.read()`
185	`if ret:`
186	`frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)`
187	`frames.append(frame_rgb)`
188
189	`cap.release()`
190
191	`# Process with model`
192	`inputs = processor(frames, return_tensors="pt")`
193
194	`with torch.no_grad():`
195	`outputs = model(**inputs)`
196	`predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)`
197	`predicted_class = torch.argmax(predictions, dim=-1).item()`
198	`confidence = predictions[0][predicted_class].item()`
199
200	`label = "Violent Crime" if predicted_class == 1 else "Non-Violent"`
201	`return label, confidence`
202
203	`# Example usage`
204	`video_path = "path/to/your/video.mp4"`
205	`prediction, confidence = classify_video(video_path)`
206	`print(f"Prediction: {prediction} (Confidence: {confidence:.3f})")`
207	```
208
209	`### Batch Processing`
210
211	```python
212	`import os`
213	`from pathlib import Path`
214
215	`def process_video_directory(video_dir, output_file="results.txt"):`
216	`results = []`
217
218	`for video_file in Path(video_dir).glob("*.mp4"):`
219	`try:`
220	`prediction, confidence = classify_video(str(video_file))`
221	`results.append({`
222	`"file": video_file.name,`
223	`"prediction": prediction,`
224	`"confidence": confidence`
225	`})`
226	`print(f"✅ {video_file.name}: {prediction} ({confidence:.3f})")`
227	`except Exception as e:`
228	`print(f"❌ Error processing {video_file.name}: {e}")`
229
230	`# Save results`
231	`with open(output_file, "w") as f:`
232	`for result in results:`
233	`f.write(f"{result['file']}: {result['prediction']} ({result['confidence']:.3f})\n")`
234
235	`return results`
236
237	`# Process all videos in a directory`
238	`results = process_video_directory("./videos/")`
239	```
240
241	`## 📈 Technical Specifications`
242
243	`- Base Model: MCG-NJU/videomae-base`
244	`- Architecture: Vision Transformer (ViT) adapted for video`
245	`- Input Resolution: 224x224 pixels per frame`
246	`- Temporal Resolution: 16 frames per video clip`
247	`- Output Classes: 2 (Binary classification)`
248	`- Training Framework: HuggingFace Transformers`
249	`- Optimization: AdamW optimizer with learning rate 5e-5`
250
251	`## ⚠️ Limitations`
252
253	`1. Dataset Scope: Trained on a subset of UCF Crime dataset - may not generalize to all types of violence`
254	`2. Temporal Context: Uses 16-frame clips which may miss context in longer sequences`
255	`3. Environmental Bias: Performance may vary with different lighting, camera angles, and video quality`
256	`4. False Positives: May misclassify intense but non-violent activities (sports, action movies)`
257	`5. Real-time Performance: Processing time depends on hardware capabilities`
258
259	`## 🔒 Ethical Considerations`
260
261	`### Intended Use`
262	`- Primary: Research and development in video analysis`
263	`- Secondary: Security system enhancement with human oversight`
264	`- Educational: Computer vision and AI safety research`
265
266	`### Prohibited Uses`
267	`- Surveillance without consent: Do not use for unauthorized monitoring`
268	`- Discriminatory profiling: Avoid bias against specific groups or communities`
269	`- Automated punishment: Never use for automated legal or disciplinary actions`
270	`- Privacy violation: Respect privacy laws and individual rights`
271
272	`### Bias and Fairness`
273	`- Model trained on specific dataset that may not represent all populations`
274	`- Regular evaluation needed for bias detection and mitigation`
275	`- Human oversight required for critical applications`
276	`- Consider demographic representation in deployment scenarios`
277
278	`## 📝 Model Card Information`
279
280	`- Developed by: Research Team`
281	`- Model Type: Video Classification (Binary)`
282	`- Training Data: UCF Crime Dataset (Subset)`
283	`- Training Date: 2025-06-02 00:28:28 UTC`
284	`- Evaluation Metrics: Accuracy, Precision, Recall, F1-Score`
285	`- Intended Users: Researchers, Security Professionals, Developers`
286
287	`## 📚 Citation`
288
289	`If you use this model in your research, please cite:`
290
291	```bibtex
292	`@misc{Nikeytas_videomae_crime_detector_maxdata_v1,`
293	`title={VideoMAE Fine-tuned for Crime Detection},`
294	`author={Research Team},`
295	`year={2024},`
296	`publisher={Hugging Face},`
297	`url={https://huggingface.co/Nikeytas/videomae-crime-detector-maxdata-v1}`
298	`}`
299	```
300
301	`## 🤝 Contributing`
302
303	`We welcome contributions to improve the model! Please:`
304	`1. Report issues with specific examples`
305	`2. Suggest improvements for bias reduction`
306	`3. Share evaluation results on new datasets`
307	`4. Contribute to documentation and examples`
308
309	`## 📞 Contact`
310
311	`For questions, issues, or collaboration opportunities, please open an issue in the model repository or contact the development team.`
312
313	`---`
314
315	`Last updated: 2025-06-02 00:28:28 UTC`
316	`Model version: 1.0`
317	`Framework: HuggingFace Transformers`
318