{"id":633,"slug":"natgillin--translations-raw","name":"translations-raw","author":"natgillin","description":"\n\t\n\t\t\n\t\n\t\n\t\tnatgillin/translations-raw\n\t\n\nFrozen, canonical raw bitext consolidated from upstream alvations/mtdata-raw* snapshots (since deleted). This is the read-only source-of-truth for downstream quality-filtering pipelines.\n\n31,663 parquet files (1566.8 GB)\n49 language pairs under data/<src-tgt>/\nSchema: 5 columns — see below\nRead-only for downstream pipelines. Do not delete or modify.\n\n\n\t\n\t\t\n\t\n\t\n\t\tSchema\n\t\n\nEach parquet has 5 columns:\n\n\t\n\t\t\ncolumn\ntype\ndescription\n\n\n\t\t\nsource\nstring… See the full description on the dataset page: https://huggingface.co/datasets/natgillin/translations-raw.","tags":"[\"Language:multilingual\",\"Size_categories:1M<n<10M\",\"Format:parquet\",\"Modality:text\",\"Library:datasets\",\"Library:dask\"]","license":null,"framework":null,"parameters":null,"downloads":86248,"likes":5,"verified":0,"created_at":"2026-06-23 20:23:25","updated_at":"2026-06-29 13:23:35","source_url":"https://huggingface.co/datasets/natgillin/translations-raw","source_platform":"huggingface","hf_repo_id":"natgillin/translations-raw","ollama_name":"","category":"dataset","latest_version":"v1.0.0","version_count":1,"signature_count":1,"risk_level":null,"risk_score":null,"versions":[{"id":632,"model_id":633,"version":"v1.0.0","manifest_hash":"bbabcab123e63805501a76ec1d4fa376e0c2c603166b811cbb2367390ba1a6af","file_count":0,"total_size":0,"r2_manifest_key":"manifests/datasets/natgillin--translations-raw/v1.0.0.json","created_at":"2026-06-23 20:23:25"}],"files":[],"signatures":[{"id":1180,"version_id":632,"signer_did":"did:quantamrkt:registry:shield-v1","algorithm":"ML-DSA-65","signature_hex":"16742421e58082ce8c4a849bf1acc004394d36957031b33ee04c31d3bd1eea0d","attestation_type":"registry","signed_at":"2026-06-23 20:23:25"}],"hndl":null}