README.md
4.7 KB · 143 lines · markdown Raw
1 ---
2 license: apache-2.0
3 tags:
4 - depth-estimation
5 - computer-vision
6 - monocular-depth
7 - multi-view-geometry
8 - pose-estimation
9 library_name: depth-anything-3
10 pipeline_tag: depth-estimation
11 ---
12
13 # Depth Anything 3: DA3METRIC-LARGE
14
15 <div align="center">
16
17 [![Project Page](https://img.shields.io/badge/Project_Page-Depth_Anything_3-green)](https://depth-anything-3.github.io)
18 [![Paper](https://img.shields.io/badge/arXiv-Depth_Anything_3-red)](https://arxiv.org/abs/)
19 [![Demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue)](https://huggingface.co/spaces/depth-anything/Depth-Anything-3) # noqa: E501
20 <!-- Benchmark badge removed as per request -->
21
22 </div>
23
24 ## Model Description
25
26 DA3 Metric Large model specialized for metric depth estimation in monocular settings, ideal for applications requiring real-world scale. Canonical metric depth; multiplying by focal length gives metric depth.
27
28 | Property | Value |
29 |----------|-------|
30 | **Model Series** | Monocular Metric Depth |
31 | **Parameters** | 0.35B |
32 | **License** | Apache 2.0 |
33
34
35
36 ## Capabilities
37
38 - ✅ Relative Depth
39 - ✅ Metric Depth
40 - ✅ Sky Segmentation
41
42 ## Quick Start
43
44 ### Installation
45
46 ```bash
47 git clone https://github.com/ByteDance-Seed/depth-anything-3
48 cd depth-anything-3
49 pip install -e .
50 ```
51
52 ### Basic Example
53
54 ```python
55 import torch
56 from depth_anything_3.api import DepthAnything3
57
58 # Load model from Hugging Face Hub
59 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
60 model = DepthAnything3.from_pretrained("depth-anything/da3metric-large")
61 model = model.to(device=device)
62
63 # Run inference on images
64 images = ["image1.jpg", "image2.jpg"] # List of image paths, PIL Images, or numpy arrays
65 prediction = model.inference(
66 images,
67 export_dir="output",
68 export_format="glb" # Options: glb, npz, ply, mini_npz, gs_ply, gs_video
69 )
70
71 # Access results
72 print(prediction.depth.shape) # Depth maps: [N, H, W] float32
73 print(prediction.conf.shape) # Confidence maps: [N, H, W] float32
74 print(prediction.extrinsics.shape) # Camera poses (w2c): [N, 3, 4] float32
75 print(prediction.intrinsics.shape) # Camera intrinsics: [N, 3, 3] float32
76 ```
77
78 ### Command Line Interface
79
80 ```bash
81 # Process images with auto mode
82 da3 auto path/to/images \
83 --export-format glb \
84 --export-dir output \
85 --model-dir depth-anything/da3metric-large
86
87 # Use backend for faster repeated inference
88 da3 backend --model-dir depth-anything/da3metric-large
89 da3 auto path/to/images --export-format glb --use-backend
90 ```
91
92 ## Model Details
93
94 - **Developed by:** ByteDance Seed Team
95 - **Model Type:** Vision Transformer for Visual Geometry
96 - **Architecture:** Plain transformer with unified depth-ray representation
97 - **Training Data:** Public academic datasets only
98
99 ### Key Insights
100
101 💎 A **single plain transformer** (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization. # noqa: E501
102
103 ✨ A singular **depth-ray representation** obviates the need for complex multi-task learning.
104
105 ## Performance
106
107 🏆 Depth Anything 3 significantly outperforms:
108 - **Depth Anything 2** for monocular depth estimation
109 - **VGGT** for multi-view depth estimation and pose estimation
110
111 For detailed benchmarks, please refer to our [paper](https://depth-anything-3.github.io). # noqa: E501
112
113 ## Limitations
114
115 - The model is trained on academic datasets and may have limitations on certain domain-specific images # noqa: E501
116 - Performance may vary depending on image quality, lighting conditions, and scene complexity
117
118
119 ## Citation
120
121 If you find Depth Anything 3 useful in your research or projects, please cite:
122
123 ```bibtex
124 @article{depthanything3,
125 title={Depth Anything 3: Recovering the visual space from any views},
126 author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang}, # noqa: E501
127 journal={arXiv preprint arXiv:XXXX.XXXXX},
128 year={2025}
129 }
130 ```
131
132 ## Links
133
134 - 🏠 [Project Page](https://depth-anything-3.github.io)
135 - 📄 [Paper](https://arxiv.org/abs/)
136 - 💻 [GitHub Repository](https://github.com/ByteDance-Seed/depth-anything-3)
137 - 🤗 [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
138 - 📚 [Documentation](https://github.com/ByteDance-Seed/depth-anything-3#-useful-documentation)
139
140 ## Authors
141
142 [Haotong Lin](https://haotongl.github.io/) · [Sili Chen](https://github.com/SiliChen321) · [Junhao Liew](https://liewjunhao.github.io/) · [Donny Y. Chen](https://donydchen.github.io) · [Zhenyu Li](https://zhyever.github.io/) · [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) · [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) · [Bingyi Kang](https://bingykang.github.io/) # noqa: E501
143