batch_utils/README.md
| 1 | # Batch Utils |
| 2 | |
| 3 | `batch_utils` contains the optional batched hybrid generation runtime for |
| 4 | LocateAnything. It keeps the model loading, tokenization, image feature caching, |
| 5 | sampling, and scheduler code used by `batch_infer.py` and the detection |
| 6 | experiments. |
| 7 | |
| 8 | ## Runtime Modes |
| 9 | |
| 10 | - `LA_FLASH_ATTN=sdpa`: stock PyTorch SDPA path. |
| 11 | - `LA_FLASH_ATTN=eager`: eager attention path for debugging. |
| 12 | - `LA_FLASH_ATTN=magi`: MagiAttention path when MagiAttention is installed. |
| 13 | - `LA_FLASH_ATTN=la_flash`: LA Flash sparse range backend |
| 14 | from `kernel_utils`. |
| 15 | |
| 16 | ## Common Knobs |
| 17 | |
| 18 | | Variable | Default | Meaning | |
| 19 | | --- | --- | --- | |
| 20 | | `LA_FLASH_MODEL` | `nvidia/LocateAnything-3B` | HF model id or local model directory. | |
| 21 | | `LA_FLASH_ATTN` | `sdpa` | LLM attention backend. | |
| 22 | | `LA_FLASH_VISION_ATTN` | `auto` | Vision encoder attention: `auto`, `flash_attention_2`, `sdpa`, or `eager`. | |
| 23 | | `LA_FLASH_STRICT_ATTN` | `0` | Set `1` to fail instead of falling back to SDPA. | |
| 24 | | `LA_FLASH_HYBRID_SCHEDULER` | `eager` | Hybrid decode scheduler. | |
| 25 | | `LA_FLASH_HYBRID_GROUP_SIZE` | `0` | Scheduler group size; `0` lets the runtime decide. | |
| 26 | | `LA_FLASH_VISION_ENCODE_BATCH_SIZE` | `8` | Maximum images per MoonViT encode micro-batch. | |
| 27 | | `LA_FLASH_KV_PACK_TOKEN_BUDGET` | `0` | Optional KV packing memory cap for long-tail batches. | |
| 28 | | `LA_FLASH_DENSE_BACKEND` | `sdpa` | Dense worker/prefill attention backend. Keep this as `sdpa`; LA Flash is used for sparse range plans. | |
| 29 | | `LA_FLASH_SEGMENT_FASTPATH` | `auto` | Sparse MTP decode uses FlashAttention varlen multi-segment merge by default. | |
| 30 | |
| 31 | ## CLI Example |
| 32 | |
| 33 | ```bash |
| 34 | python batch_infer.py \ |
| 35 | --model nvidia/LocateAnything-3B \ |
| 36 | --attn la_flash \ |
| 37 | --scheduler pipeline \ |
| 38 | --batch-size 4 \ |
| 39 | --image /path/to/image.jpg \ |
| 40 | --query "person</c>car" |
| 41 | ``` |
| 42 | |
| 43 | For JSONL input, each row should contain: |
| 44 | |
| 45 | ```json |
| 46 | {"image": "/path/to/image.jpg", "query": "person</c>car"} |
| 47 | ``` |
| 48 | |
| 49 | ## Training Boundary |
| 50 | |
| 51 | This package is for inference and evaluation. Training remains on the |
| 52 | MagiAttention backend; the batched sparse-plan decode runtime does not support |
| 53 | the `labels` training path. |
| 54 | |