{"id":353,"slug":"roneneldan--tinystories","name":"TinyStories","author":"roneneldan","description":"Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary.\nDescribed in the following paper: https://arxiv.org/abs/2305.07759. \nThe models referred to in the paper were trained on TinyStories-train.txt  (the file tinystories-valid.txt can be used for validation loss). These models can be found on Huggingface, at roneneldan/TinyStories-1M/3M/8M/28M/33M/1Layer-21M.\nAdditional resources:\ntinystories_all_data.tar.gz - contains a superset of… See the full description on the dataset page: https://huggingface.co/datasets/roneneldan/TinyStories.","tags":"[\"Task_categories:text-Generation\",\"Language:en\",\"Size_categories:1M<n<10M\",\"Format:parquet\",\"Modality:text\",\"Library:datasets\"]","license":null,"framework":null,"parameters":null,"downloads":97341,"likes":974,"verified":0,"created_at":"2026-04-20 23:04:32","updated_at":"2026-05-08 16:45:15","source_url":"https://huggingface.co/datasets/roneneldan/TinyStories","source_platform":"huggingface","hf_repo_id":"roneneldan/TinyStories","ollama_name":"","category":"dataset","latest_version":"v1.0.0","version_count":1,"signature_count":1,"risk_level":null,"risk_score":null,"versions":[{"id":352,"model_id":353,"version":"v1.0.0","manifest_hash":"760b9c0330fbc5e99e823c1534ac62d346eb18a0f401368835b38a433b32129b","file_count":0,"total_size":0,"r2_manifest_key":"manifests/datasets/roneneldan--tinystories/v1.0.0.json","created_at":"2026-04-20 23:04:32"}],"files":[],"signatures":[{"id":809,"version_id":352,"signer_did":"did:quantamrkt:registry:shield-v1","algorithm":"ML-DSA-65","signature_hex":"127acad8967dfd0ec1e898bfa307e25206daca2d308fe1e46a8719f417d25eff","attestation_type":"registry","signed_at":"2026-04-20 23:04:32"}],"hndl":null}