A

AlgorithmicResearchGroup / s2orc_full

Unverified HuggingFace

S2ORC Full — Semantic Scholar Open Research Corpus A complete redistribution of the S2ORC dataset in Parquet format on Hugging Face, containing 14.5 million academic papers with full text, structured metadata, and citation information. Dataset Description S2ORC (Semantic Scholar Open Research Corpus) is a general-purpose corpus for NLP and text mining research over scientific papers, originally developed by the Allen Institute for AI. This version provides the full… See the full description on the dataset page: https://huggingface.co/datasets/AlgorithmicResearchGroup/s2orc_full.

0 52,665 1

Unverified Model

This model has not been PQC-verified. File integrity cannot be guaranteed against quantum threats.

README.md

s2orc_full

S2ORC Full — Semantic Scholar Open Research Corpus A complete redistribution of the S2ORC dataset in Parquet format on Hugging Face, containing 14.5 million academic papers with full text, structured metadata, and citation information. Dataset Description S2ORC (Semantic Scholar Open Research Corpus) is a general-purpose corpus for NLP and text mining research over scientific papers, originally developed by the Allen Institute for AI. This version provides the full… See the full description on the dataset page: https://huggingface.co/datasets/AlgorithmicResearchGroup/s2orc_full.

Intended Uses

This model is registered on the QuantaMrkt quantum-safe registry. This model has not yet been PQC-verified.

Quick Start

# Install the CLI
pip install quantumshield

# Pull the model
quantumshield pull AlgorithmicResearchGroup/s2orc_full

# Verify file integrity
quantumshield verify AlgorithmicResearchGroup/s2orc_full

About

S2ORC Full — Semantic Scholar Open Research Corpus A complete redistribution of the S2ORC dataset in Parquet format on Hugging Face, containing 14.5 million academic papers with full text, structured metadata, and citation information. Dataset Description S2ORC (Semantic Scholar Open Research Corpus) is a general-purpose corpus for NLP and text mining research over scientific papers, originally developed by the Allen Institute for AI. This version provides the full… See the full description on the dataset page: https://huggingface.co/datasets/AlgorithmicResearchGroup/s2orc_full.

Created 2026-06-30
Downloads 52,665
Likes 0

Get this model

View on HuggingFace

Pull with QuantumShield

quantumshield pull AlgorithmicResearchGroup/s2orc_full

Verify signatures

quantumshield verify AlgorithmicResearchGroup/s2orc_full

Signers

V1
did:quantamrkt:regis...hield-v1