DefendableDatasets docs
DefendableDatasets is a static-first dataset store, registry, graph browser, selector, verifier, and export system for AI builders. Current corpora were curated on sovereign bare-metal RTX 6000 fleet and RTX 3090 systems.
Registry Schema
Registry JSON lives under /data/registry. Datasets include identity, license, formats, tasks, source summary, validation, files, hashes, receipts, examples, citation, and model compatibility.
Graph Schema
The graph contains DOMAIN, CATEGORY, DATASET, VERSION, FILE, LICENSE, FORMAT, TASK, and RECEIPT nodes connected by typed edges such as CONTAINS, HAS_FILE, LICENSED_AS, AVAILABLE_AS, SUPPORTS_TASK, and VERIFIED_BY.
How to Add a Dataset
Add a registry entry, create a dataset folder under /datasets/[domain]/[dataset_id], include manifest.json, dataset.card.md, samples, receipts, and split files where licensing allows.
How to Export a Pack
Use the graph, registry, or detail page to add datasets to the pack. The pack page exports pack.manifest.json, hf_dataset_card.md, fine_tune_manifest.json, sha256_manifest.json, and README snippets.
How Receipts Work
Receipts are proof objects that describe hashes, validation runs, license checks, provenance summaries, or future Merkle proofs. Verified entries require receipt records and file-level SHA256 hashes.
License Policy
Every dataset must declare a license and whether commercial use is allowed. Packs warn when gated research or attribution licenses are mixed into exports.
Download Quotas
Public metadata remains open. Production file delivery should use the Cloudflare Worker download gate with 500 successful file downloads per email per rolling 30-day window.
Quality Foundry
The defdata Python CLI turns raw JSONL into schema-valid, deduped, graded, split, hashed, manifest-backed packages with stage receipts. Tiers are royal_jelly, honey, jelly, and propolis.
Hack Edge Reviewer
Hack is registered as node_hack_orin / worker_hack with model_lfm2_5_8b_a1b for edge-volume review. Use defdata grade --reviewer hack with the finance rubric for WACC and related referee passes.
CLI
Use defendable-datasets validate, defendable-datasets hash, and defendable-datasets pack to check registry integrity, generate SHA256 receipts, and create pack manifests before opening pull requests.