datachain

About datachain

DataChain - Data Context Layer for Object Storage DataChain is a data context layer for object storage. It gives AI agents and pipelines a typed, versioned, queryable view of your files - what exists, what schema it has, what's already been computed - without copying data or loading it into memory. Metadata queries across 100M+ files execute in milliseconds against a backend database Pipelines checkpoint - re-running the same script resumes compute without duplicating expensive LLM-call or ML scoring makes re-runs incremental — only new or changed files are processed Every registers a named,...

ai-agents claude-code codex data-context-layer data-processing harness-engineering knowledge-base mlops multimodal pydantic

Quick Facts

Stars	2,745
Forks	144
Language	Python
Category	Codex Skill
License	Apache-2.0
Quality Score	45.73/100
Open Issues	58
Last Updated	2026-05-22
Created	2024-06-25
Platforms	claude-code, codex, python
Est. Tokens	~1090k

Compatible Skills

These tools work well together with datachain for enhanced workflows:

nexent — semantic(0.23)+complementary+rare_topics+same_lang+similar_pop+shared_platform (62%)
dlt — semantic(0.47)+complementary+same_lang+similar_pop+shared_platform (62%)
hive — semantic(0.19)+complementary+rare_topics+same_lang+similar_pop+shared_platform (61%)
DataDesigner — semantic(0.44)+complementary+same_lang+similar_pop+shared_platform (60%)
datagouv-mcp — semantic(0.26)+complementary+same_lang+similar_pop+shared_platform (59%)

More Codex Skill Tools

Explore other popular codex skill tools:

openclaw ⭐ 374.0k
hermes-agent ⭐ 162.8k
ui-ux-pro-max-skill ⭐ 57.6k
graphify ⭐ 51.5k
open-design ⭐ 49.7k
awesome-openclaw-skills ⭐ 49.2k
cherry-studio ⭐ 46.1k
siyuan ⭐ 44.1k
nanobot ⭐ 43.0k
system_prompts_leaks ⭐ 40.5k

View all Codex Skill tools →

Popular Python Agent Tools

AutoGPT ⭐ 184.5k · Agent Tool
hermes-agent ⭐ 162.8k · Codex Skill
langflow ⭐ 148.7k · Agent Tool
open-webui ⭐ 138.2k · MCP Server
skills ⭐ 137.5k · Claude Skill

Frequently Asked Questions

What is datachain?

datachain is The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure. It is categorized as a Codex Skill with 2.7k GitHub stars.

What programming language is datachain written in?

datachain is primarily written in Python. It covers topics such as ai-agents, claude-code, codex.

How do I install or use datachain?

You can find installation instructions and usage details in the datachain GitHub repository at github.com/datachain-ai/datachain. The project has 2.7k stars and 144 forks, indicating an active community.

What license does datachain use?

datachain is released under the Apache-2.0 license, making it free to use and modify according to the license terms.

View on GitHub → Browse Codex Skill tools