batch_utils/README.md

2.0 KB · 54 lines · markdown Raw

1	`# Batch Utils`
2
3	`batch_utils` contains the optional batched hybrid generation runtime for
4	`LocateAnything. It keeps the model loading, tokenization, image feature caching,`
5	sampling, and scheduler code used by `batch_infer.py` and the detection
6	`experiments.`
7
8	`## Runtime Modes`
9
10	- `LA_FLASH_ATTN=sdpa`: stock PyTorch SDPA path.
11	- `LA_FLASH_ATTN=eager`: eager attention path for debugging.
12	- `LA_FLASH_ATTN=magi`: MagiAttention path when MagiAttention is installed.
13	- `LA_FLASH_ATTN=la_flash`: LA Flash sparse range backend
14	from `kernel_utils`.
15
16	`## Common Knobs`
17
18	`\| Variable \| Default \| Meaning \|`
19	`\| --- \| --- \| --- \|`
20	\| `LA_FLASH_MODEL` \| `nvidia/LocateAnything-3B` \| HF model id or local model directory. \|
21	\| `LA_FLASH_ATTN` \| `sdpa` \| LLM attention backend. \|
22	\| `LA_FLASH_VISION_ATTN` \| `auto` \| Vision encoder attention: `auto`, `flash_attention_2`, `sdpa`, or `eager`. \|
23	\| `LA_FLASH_STRICT_ATTN` \| `0` \| Set `1` to fail instead of falling back to SDPA. \|
24	\| `LA_FLASH_HYBRID_SCHEDULER` \| `eager` \| Hybrid decode scheduler. \|
25	\| `LA_FLASH_HYBRID_GROUP_SIZE` \| `0` \| Scheduler group size; `0` lets the runtime decide. \|
26	\| `LA_FLASH_VISION_ENCODE_BATCH_SIZE` \| `8` \| Maximum images per MoonViT encode micro-batch. \|
27	\| `LA_FLASH_KV_PACK_TOKEN_BUDGET` \| `0` \| Optional KV packing memory cap for long-tail batches. \|
28	\| `LA_FLASH_DENSE_BACKEND` \| `sdpa` \| Dense worker/prefill attention backend. Keep this as `sdpa`; LA Flash is used for sparse range plans. \|
29	\| `LA_FLASH_SEGMENT_FASTPATH` \| `auto` \| Sparse MTP decode uses FlashAttention varlen multi-segment merge by default. \|
30
31	`## CLI Example`
32
33	```bash
34	`python batch_infer.py \`
35	`--model nvidia/LocateAnything-3B \`
36	`--attn la_flash \`
37	`--scheduler pipeline \`
38	`--batch-size 4 \`
39	`--image /path/to/image.jpg \`
40	`--query "person</c>car"`
41	```
42
43	`For JSONL input, each row should contain:`
44
45	```json
46	`{"image": "/path/to/image.jpg", "query": "person</c>car"}`
47	```
48
49	`## Training Boundary`
50
51	`This package is for inference and evaluation. Training remains on the`
52	`MagiAttention backend; the batched sparse-plan decode runtime does not support`
53	the `labels` training path.
54