Browse PQC-verified AI models, datasets, and tools
Fineweb-Edu-Fortified The composition of fineweb-edu-fortified, produced by automatically clustering a 500k row sample in Airtrain What is it? Fineweb-Edu-Fortified is a dataset derived from Fineweb-Edu by applying exact-match deduplication across the whole dataset and producing an embedding for each row. The number of times the text from each row appears is also included as a count column. The embeddings were produced using TaylorAI/bge-micro Fineweb and… See the full description on the dataset page: https://huggingface.co/datasets/airtrain-ai/fineweb-edu-fortified.