Browse PQC-verified AI models, datasets, and tools
FineFineWeb: A Comprehensive Study on Fine-Grained Domain Web Corpus arXiv: Coming Soon Project Page: Coming Soon Blog: Coming Soon Data Statistics Domain (#tokens/#samples) Iteration 1 Tokens Iteration 2 Tokens Iteration 3 Tokens Total Tokens Iteration 1 Count Iteration 2 Count Iteration 3 Count Total Count aerospace 5.77B 261.63M 309.33M 6.34B 9100000 688505 611034 10399539 agronomy 13.08B 947.41M 229.04M 14.26B 15752828 2711790 649404 19114022 artistic… See the full description on the dataset page: https://huggingface.co/datasets/m-a-p/FineFineWeb.