natgillin/translations-raw Frozen, canonical raw bitext consolidated from upstream alvations/mtdata-raw* snapshots (since deleted). This is the read-only source-of-truth for downstream quality-filtering pipelines. 31,663 parquet files (1566.8 GB) 49 language pairs under data/<src-tgt>/ Schema: 5 columns — see below Read-only for downstream pipelines. Do not delete or modify. Schema Each parquet has 5 columns: column type description source string… See the full description on the dataset page: https://huggingface.co/datasets/natgillin/translations-raw.
Use this model
Pull with QuantumShield
quantumshield pull natgillin/translations-raw Verify integrity
quantumshield verify natgillin/translations-raw pip install
pip install quantumshield && quantumshield pull natgillin/translations-raw Unverified Model
This model has not been PQC-verified. File integrity cannot be guaranteed against quantum threats.
README.md
translations-raw
natgillin/translations-raw Frozen, canonical raw bitext consolidated from upstream alvations/mtdata-raw* snapshots (since deleted). This is the read-only source-of-truth for downstream quality-filtering pipelines. 31,663 parquet files (1566.8 GB) 49 language pairs under data/<src-tgt>/ Schema: 5 columns — see below Read-only for downstream pipelines. Do not delete or modify. Schema Each parquet has 5 columns: column type description source string… See the full description on the dataset page: https://huggingface.co/datasets/natgillin/translations-raw.
Intended Uses
This model is registered on the QuantaMrkt quantum-safe registry. This model has not yet been PQC-verified.
Quick Start
# Install the CLI pip install quantumshield # Pull the model quantumshield pull natgillin/translations-raw # Verify file integrity quantumshield verify natgillin/translations-raw
About
natgillin/translations-raw Frozen, canonical raw bitext consolidated from upstream alvations/mtdata-raw* snapshots (since deleted). This is the read-only source-of-truth for downstream quality-filtering pipelines. 31,663 parquet files (1566.8 GB) 49 language pairs under data/<src-tgt>/ Schema: 5 columns — see below Read-only for downstream pipelines. Do not delete or modify. Schema Each parquet has 5 columns: column type description source string… See the full description on the dataset page: https://huggingface.co/datasets/natgillin/translations-raw.
Get this model
Pull with QuantumShield
quantumshield pull natgillin/translations-raw Verify signatures
quantumshield verify natgillin/translations-raw