{"id":591,"slug":"airtrain-ai--fineweb-edu-fortified","name":"fineweb-edu-fortified","author":"airtrain-ai","description":"\n\t\n\t\t\n\t\tFineweb-Edu-Fortified\n\t\n\n\n\n\n\nThe composition of fineweb-edu-fortified, produced by automatically clustering a 500k row sample in\n Airtrain \n\n\n\n\n\t\n\t\t\n\t\n\t\n\t\tWhat is it?\n\t\n\nFineweb-Edu-Fortified is a dataset derived from\nFineweb-Edu by applying exact-match\ndeduplication across the whole dataset and producing an embedding for each row. The number of times\nthe text from each row appears is also included as a count column. The embeddings were produced\nusing TaylorAI/bge-micro\nFineweb and… See the full description on the dataset page: https://huggingface.co/datasets/airtrain-ai/fineweb-edu-fortified.","tags":"[\"Task_categories:text-Generation\",\"Language:en\",\"Size_categories:100M<n<1B\",\"Format:parquet\",\"Modality:tabular\",\"Modality:text\"]","license":null,"framework":null,"parameters":null,"downloads":120214,"likes":65,"verified":0,"created_at":"2026-06-23 18:23:35","updated_at":"2026-06-27 07:23:36","source_url":"https://huggingface.co/datasets/airtrain-ai/fineweb-edu-fortified","source_platform":"huggingface","hf_repo_id":"airtrain-ai/fineweb-edu-fortified","ollama_name":"","category":"dataset","latest_version":"v1.0.0","version_count":1,"signature_count":1,"risk_level":null,"risk_score":null,"versions":[{"id":590,"model_id":591,"version":"v1.0.0","manifest_hash":"74884934268a74ae298e0cc2757a8bd5d09ad85cf9f680c49b6e924baa14b232","file_count":0,"total_size":0,"r2_manifest_key":"manifests/datasets/airtrain-ai--fineweb-edu-fortified/v1.0.0.json","created_at":"2026-06-23 18:23:35"}],"files":[],"signatures":[{"id":1124,"version_id":590,"signer_did":"did:quantamrkt:registry:shield-v1","algorithm":"ML-DSA-65","signature_hex":"a94f23e72f40b6e738fa6a6635d0860b1acc886ba00c7161ac5111322d105cec","attestation_type":"registry","signed_at":"2026-06-23 18:23:36"}],"hndl":null}