S2ORC Full — Semantic Scholar Open Research Corpus A complete redistribution of the S2ORC dataset in Parquet format on Hugging Face, containing 14.5 million academic papers with full text, structured metadata, and citation information. Dataset Description S2ORC (Semantic Scholar Open Research Corpus) is a general-purpose corpus for NLP and text mining research over scientific papers, originally developed by the Allen Institute for AI. This version provides the full… See the full description on the dataset page: https://huggingface.co/datasets/AlgorithmicResearchGroup/s2orc_full.
Use this model
Pull with QuantumShield
quantumshield pull AlgorithmicResearchGroup/s2orc_full Verify integrity
quantumshield verify AlgorithmicResearchGroup/s2orc_full pip install
pip install quantumshield && quantumshield pull AlgorithmicResearchGroup/s2orc_full Unverified Model
This model has not been PQC-verified. File integrity cannot be guaranteed against quantum threats.
README.md
s2orc_full
S2ORC Full — Semantic Scholar Open Research Corpus A complete redistribution of the S2ORC dataset in Parquet format on Hugging Face, containing 14.5 million academic papers with full text, structured metadata, and citation information. Dataset Description S2ORC (Semantic Scholar Open Research Corpus) is a general-purpose corpus for NLP and text mining research over scientific papers, originally developed by the Allen Institute for AI. This version provides the full… See the full description on the dataset page: https://huggingface.co/datasets/AlgorithmicResearchGroup/s2orc_full.
Intended Uses
This model is registered on the QuantaMrkt quantum-safe registry. This model has not yet been PQC-verified.
Quick Start
# Install the CLI pip install quantumshield # Pull the model quantumshield pull AlgorithmicResearchGroup/s2orc_full # Verify file integrity quantumshield verify AlgorithmicResearchGroup/s2orc_full
About
S2ORC Full — Semantic Scholar Open Research Corpus A complete redistribution of the S2ORC dataset in Parquet format on Hugging Face, containing 14.5 million academic papers with full text, structured metadata, and citation information. Dataset Description S2ORC (Semantic Scholar Open Research Corpus) is a general-purpose corpus for NLP and text mining research over scientific papers, originally developed by the Allen Institute for AI. This version provides the full… See the full description on the dataset page: https://huggingface.co/datasets/AlgorithmicResearchGroup/s2orc_full.
Get this model
Pull with QuantumShield
quantumshield pull AlgorithmicResearchGroup/s2orc_full Verify signatures
quantumshield verify AlgorithmicResearchGroup/s2orc_full