datachain

by datachain-ai · Codex Skill · ★ 2.7k

About datachain

DataChain - Data Context Layer for Object Storage DataChain is a data context layer for object storage. It gives AI agents and pipelines a typed, versioned, queryable view of your files - what exists, what schema it has, what's already been computed - without copying data or loading it into memory. Metadata queries across 100M+ files execute in milliseconds against a backend database Pipelines checkpoint - re-running the same script resumes compute without duplicating expensive LLM-call or ML scoring makes re-runs incremental — only new or changed files are processed Every registers a named,...

ai-agentsclaude-codecodexdata-context-layerdata-processingharness-engineeringknowledge-basemlopsmultimodalpydantic

Quick Facts

Stars2,745
Forks144
LanguagePython
CategoryCodex Skill
LicenseApache-2.0
Quality Score45.73/100
Open Issues58
Last Updated2026-05-22
Created2024-06-25
Platformsclaude-code, codex, python
Est. Tokens~1090k

Compatible Skills

These tools work well together with datachain for enhanced workflows:

  • nexent — semantic(0.23)+complementary+rare_topics+same_lang+similar_pop+shared_platform (62%)
  • dlt — semantic(0.47)+complementary+same_lang+similar_pop+shared_platform (62%)
  • hive — semantic(0.19)+complementary+rare_topics+same_lang+similar_pop+shared_platform (61%)
  • DataDesigner — semantic(0.44)+complementary+same_lang+similar_pop+shared_platform (60%)
  • datagouv-mcp — semantic(0.26)+complementary+same_lang+similar_pop+shared_platform (59%)

More Codex Skill Tools

Explore other popular codex skill tools:

View all Codex Skill tools →

Popular Python Agent Tools

Frequently Asked Questions

What is datachain?

datachain is The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure. It is categorized as a Codex Skill with 2.7k GitHub stars.

What programming language is datachain written in?

datachain is primarily written in Python. It covers topics such as ai-agents, claude-code, codex.

How do I install or use datachain?

You can find installation instructions and usage details in the datachain GitHub repository at github.com/datachain-ai/datachain. The project has 2.7k stars and 144 forks, indicating an active community.

What license does datachain use?

datachain is released under the Apache-2.0 license, making it free to use and modify according to the license terms.

View on GitHub → Browse Codex Skill tools