{"id":517,"slug":"huggingfacefw--finetranslations","name":"finetranslations","author":"HuggingFaceFW","description":"\n\t\n\t\t\n\t\t💬 FineTranslations\n\t\n\n\n    \n\n\n\nThe world's knowledge in 1+1T tokens of parallel text\n\n\n\t\n\t\t\n\t\tWhat is it?\n\t\n\nThis dataset contains over 1 trillion tokens of parallel text in English and 500+ languages. It was obtained by translating data from 🥂 FineWeb2 into English using Gemma3 27B.\nWe relied on datatrove's inference runner to deploy a synthetic data pipeline at scale. Its checkpointing and VLLM lifecycle management features allowed us to use leftover compute from the HF cluster… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/finetranslations.","tags":"[\"Task_categories:text-Generation\",\"Task_categories:translation\",\"Language:abk\",\"Language:abq\",\"Language:abs\",\"Language:acm\"]","license":null,"framework":null,"parameters":null,"downloads":118928,"likes":286,"verified":0,"created_at":"2026-05-02 11:08:13","updated_at":"2026-05-04 07:25:05","source_url":"https://huggingface.co/datasets/HuggingFaceFW/finetranslations","source_platform":"huggingface","hf_repo_id":"HuggingFaceFW/finetranslations","ollama_name":"","category":"dataset","latest_version":"v1.0.0","version_count":1,"signature_count":1,"risk_level":null,"risk_score":null,"versions":[{"id":516,"model_id":517,"version":"v1.0.0","manifest_hash":"f36e2c290eabf935c352266af34a59ad0cd8cf54520326834a1657d58f1e9aa0","file_count":0,"total_size":0,"r2_manifest_key":"manifests/datasets/huggingfacefw--finetranslations/v1.0.0.json","created_at":"2026-05-02 11:08:13"}],"files":[],"signatures":[{"id":1041,"version_id":516,"signer_did":"did:quantamrkt:registry:shield-v1","algorithm":"ML-DSA-65","signature_hex":"51d686381d0acb0aaa73a20eb27b08c42d6b9935a214f834e2015f2048bf019e","attestation_type":"registry","signed_at":"2026-05-02 11:08:13"}],"hndl":null}