by datachain-ai · Codex Skill · ★ 2.7k
DataChain - Data Context Layer for Object Storage DataChain is a data context layer for object storage. It gives AI agents and pipelines a typed, versioned, queryable view of your files - what exists, what schema it has, what's already been computed - without copying data or loading it into memory. Metadata queries across 100M+ files execute in milliseconds against a backend database Pipelines checkpoint - re-running the same script resumes compute without duplicating expensive LLM-call or ML scoring makes re-runs incremental — only new or changed files are processed Every registers a named,...
| Stars | 2,745 |
| Forks | 144 |
| Language | Python |
| Category | Codex Skill |
| License | Apache-2.0 |
| Quality Score | 45.73/100 |
| Open Issues | 58 |
| Last Updated | 2026-05-22 |
| Created | 2024-06-25 |
| Platforms | claude-code, codex, python |
| Est. Tokens | ~1090k |
These tools work well together with datachain for enhanced workflows:
Explore other popular codex skill tools:
datachain is The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure. It is categorized as a Codex Skill with 2.7k GitHub stars.
datachain is primarily written in Python. It covers topics such as ai-agents, claude-code, codex.
You can find installation instructions and usage details in the datachain GitHub repository at github.com/datachain-ai/datachain. The project has 2.7k stars and 144 forks, indicating an active community.
datachain is released under the Apache-2.0 license, making it free to use and modify according to the license terms.