C

codeparrot / github-code-clean

Unverified HuggingFace

The GitHub Code clean dataset in a more filtered version of codeparrot/github-code dataset, it consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in almost 1TB of text data.

142 117,130 1

Unverified Model

This model has not been PQC-verified. File integrity cannot be guaranteed against quantum threats.

README.md

github-code-clean

The GitHub Code clean dataset in a more filtered version of codeparrot/github-code dataset, it consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in almost 1TB of text data.

Intended Uses

This model is registered on the QuantaMrkt quantum-safe registry. This model has not yet been PQC-verified.

Quick Start

# Install the CLI
pip install quantumshield

# Pull the model
quantumshield pull codeparrot/github-code-clean

# Verify file integrity
quantumshield verify codeparrot/github-code-clean

About

The GitHub Code clean dataset in a more filtered version of codeparrot/github-code dataset, it consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in almost 1TB of text data.

Created 2026-07-04
Downloads 117,130
Likes 142

Get this model

View on HuggingFace

Pull with QuantumShield

quantumshield pull codeparrot/github-code-clean

Verify signatures

quantumshield verify codeparrot/github-code-clean

Signers

V1
did:quantamrkt:regis...hield-v1