{"id":404,"slug":"skylion007--openwebtext","name":"openwebtext","author":"Skylion007","description":"\n\t\n\t\t\n\t\tDataset Card for \"openwebtext\"\n\t\n\n\n\t\n\t\t\n\t\tDataset Summary\n\t\n\nAn open-source replication of the WebText dataset from OpenAI, that was used to train GPT-2.\nThis distribution was created by Aaron Gokaslan and Vanya Cohen of Brown University.\n\n\t\n\t\t\n\t\tSupported Tasks and Leaderboards\n\t\n\nMore Information Needed\n\n\t\n\t\t\n\t\tLanguages\n\t\n\nMore Information Needed\n\n\t\n\t\t\n\t\tDataset Structure\n\t\n\n\n\t\n\t\t\n\t\tData Instances\n\t\n\n\n\t\n\t\t\n\t\tplain_text\n\t\n\n\nSize of downloaded dataset files: 13.51 GB\nSize of the… See the full description on the dataset page: https://huggingface.co/datasets/Skylion007/openwebtext.","tags":"[\"Task_categories:text-Generation\",\"Task_categories:fill-Mask\",\"Task_ids:language-Modeling\",\"Task_ids:masked-Language-Modeling\",\"Annotations_creators:no-Annotation\",\"Language_creators:found\"]","license":null,"framework":null,"parameters":null,"downloads":72875,"likes":505,"verified":0,"created_at":"2026-04-21 06:14:37","updated_at":"2026-04-26 06:20:03","source_url":"https://huggingface.co/datasets/Skylion007/openwebtext","source_platform":"huggingface","hf_repo_id":"Skylion007/openwebtext","ollama_name":"","category":"dataset","latest_version":"v1.0.0","version_count":1,"signature_count":1,"risk_level":null,"risk_score":null,"versions":[{"id":403,"model_id":404,"version":"v1.0.0","manifest_hash":"1ec1b361c3fc6c4cc10a7931fb1903bec7a663e5d416a5a141ab3c085888fbc8","file_count":0,"total_size":0,"r2_manifest_key":"manifests/datasets/skylion007--openwebtext/v1.0.0.json","created_at":"2026-04-21 06:14:37"}],"files":[],"signatures":[{"id":892,"version_id":403,"signer_did":"did:quantamrkt:registry:shield-v1","algorithm":"ML-DSA-65","signature_hex":"dba680de72ed4b21e0cb3b160a24516d918d5589626412c263ed438438101f16","attestation_type":"registry","signed_at":"2026-04-21 06:14:37"}],"hndl":null}