The GitHub Code clean dataset in a more filtered version of codeparrot/github-code dataset, it consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in almost 1TB of text data.
Use this model
Pull with QuantumShield
quantumshield pull codeparrot/github-code-clean Verify integrity
quantumshield verify codeparrot/github-code-clean pip install
pip install quantumshield && quantumshield pull codeparrot/github-code-clean Unverified Model
This model has not been PQC-verified. File integrity cannot be guaranteed against quantum threats.
README.md
github-code-clean
The GitHub Code clean dataset in a more filtered version of codeparrot/github-code dataset, it consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in almost 1TB of text data.
Intended Uses
This model is registered on the QuantaMrkt quantum-safe registry. This model has not yet been PQC-verified.
Quick Start
# Install the CLI pip install quantumshield # Pull the model quantumshield pull codeparrot/github-code-clean # Verify file integrity quantumshield verify codeparrot/github-code-clean
About
The GitHub Code clean dataset in a more filtered version of codeparrot/github-code dataset, it consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in almost 1TB of text data.
Get this model
Pull with QuantumShield
quantumshield pull codeparrot/github-code-clean Verify signatures
quantumshield verify codeparrot/github-code-clean