Skip to content

MCP Server Introduction

MCP Server Introduction

We’re excited to announce a major new capability in the dbt Power User VSCode extension: an embedded Model Context Protocol (MCP) server. This MCP server is now built directly into the extension, acting as a bridge between AI-powered development tools and your dbt project. The motivation behind implementing an embedded MCP server is to enable seamless communication between AI assistants (like those in Cursor IDE) and dbt, eliminating the friction of context switching. In practical terms, this means your AI coding assistant can query your dbt project’s schema, compile models, run tests, and more – all through a standardized protocol – leading to faster development workflows and greater efficiency for dbt developers.

By leveraging the MCP server, dbt Power User can expose rich project context and operations to AI agents in real-time. This empowers data engineers to automate repetitive tasks (like looking up column definitions or running model builds) and accelerates development. The embedded MCP server unlocks new AI-driven workflows for dbt, from intelligent autocompletion to on-demand documentation generation, all while keeping the interaction localized and secure within your development environment.

Tools Provided by the MCP Server

The MCP server exposes a robust set of tools that the AI assistant can use. These tools cover most day-to-day dbt tasks and are grouped into a few categories for clarity. Below is a breakdown of major capabilities and examples of tools in each category:

Project & Environment Management Tools

  • Project Information: The AI can retrieve high-level info about the project, such as the project name and the current target (e.g. development vs production profile). For instance, a tool GET_PROJECT_NAME returns the dbt project name, and GET_SELECTED_TARGET returns which target/profile is active. This is useful for the AI to contextually understand which environment it’s working with.

    github.com

  • List/Reload Projects: (Planned) As we extend to multi-workspace scenarios, the AI could list open dbt projects or switch between them. In the current release, the focus is on the active project within the VS Code workspace, but the foundation is laid for managing multiple projects.

Model and Source Exploration Tools

  • Model SQL and Schema: Tools in this category allow the AI to fetch the content or properties of dbt models and sources. For example, a GET_MODEL_SQL tool can return the raw SQL of a model file, and a GET_SOURCE_DEFINITION could return details about a source (like database/schema and columns as defined in YAML). With these, the AI can, say, read the logic of a model to help debug or suggest changes.
  • Lineage and Dependency Queries: We provide tools like GET_CHILDREN_MODELS and GET_PARENT_MODELS, which return the immediate downstream and upstream models of a given model. The AI can use these to understand the dependency graph – for instance, finding what will be impacted if a model is changed or which upstream data sources feed into it. Internally, these tools leverage the manifest graph in memory to quickly compute the relationships. Additionally, there are count-based queries (like a tool to count the number of models depending on a source) to aid impact analysis.
  • Column Lineage and Documentation: (Upcoming) The extension already has features for column-level lineage and documentation. Through MCP, we intend to expose tools so the AI can answer questions like “which models use column X from source Y” or even retrieve documentation strings for a model or column. This will bridge the gap between documentation and code by letting the AI pull in documented context as needed.

SQL Compilation & Execution Tools

  • Compile SQL: The AI might want to see how dbt renders a model’s SQL with Jinja templating. The COMPILE_QUERY tool takes a model name (and optionally custom SQL) and returns the compiled SQL, exactly as dbt would generate it. This is extremely useful for checking logic or debugging macros – the AI can get the SQL that would run on the warehouse, without actually running it. It uses dbt’s compilation engine (via the extension’s interface to dbt Core) under the hood.

  • Execute Arbitrary SQL: Sometimes the AI will need to run a query to answer a question (for example, “preview the first 10 rows of model X”). The EXECUTE_SQL tool (with a safe row limit) allows running a SQL query against the project’s data warehouse connection and returns results. In our implementation, EXECUTE_SQL_WITH_LIMIT is used to run a given query (or select from a model) with a default limit, ensuring we don’t pull massive datasets by accident. The AI could use this to validate assumptions or provide sample data to the user. Results are streamed back over SSE – if a query takes a while, the user can see partial progress or simply await the final output.

    github.com

  • Run dbt Models: For full pipeline execution, we offer tools to run or build models. The RUN_MODEL tool triggers dbt run for a specific model (and optionally its dependencies), while BUILD_MODEL can run the model and its tests (equivalent to dbt build for that model). There’s also a BUILD_PROJECT for running the entire project (dbt build). These commands execute in the background through the extension’s job runner. The AI can thus initiate a model run or a full project build from within the conversation – for example, “Re-run the revenue model to see if the issue is fixed.” The SSE channel streams the build logs and final status back to the AI, which can then inform the user of success or failures.

Testing and Validation Tools

  • Execute Tests: The server provides a RUN_TEST tool to execute a specific dbt test by name, as well as RUN_MODEL_TEST to run all tests associated with a particular model. This allows the AI to verify data quality or detect if recent changes broke any tests. For instance, after modifying a model, the AI might call RUN_MODEL_TEST to ensure all tests still pass and then report the results. Test results or failures will be returned via the event stream.
  • SQL Validation: Even without running on the warehouse, the extension can do a lightweight SQL parse/validation. A tool is available to validate a SQL query (checking for syntax errors, missing refs, etc.) without executing it. This is similar to the extension’s existing SQL Lint feature, now accessible to the AI. It can prevent the AI from running a bad query by first validating it and catching errors.
  • Performance Analysis: (Planned) We aim to add tools that use dbt’s compilation stats or query planner insights to help the AI assist in performance tuning. For example, a tool to show the compiled query explain plan (if the warehouse supports EXPLAIN) or to estimate the cost of a query. Such tools would let the AI caution the user if a query might be expensive or if a model lacks proper pruning filters, etc.

Package Management and Repository Tools

  • Install Packages: Managing dbt packages is made easier with tools like INSTALL_DEPS which runs dbt deps to pull the latest packages listed in packages.yml. There’s also an INSTALL_DBT_PACKAGES tool that can add new packages on the fly. For example, if the AI suggests using a package (like dbt-utils), it could call INSTALL_DBT_PACKAGES with the package name – the extension will add it to packages.yml and run dbt deps internally to install it, returning the result. This showcases agentic behavior: the AI can not only suggest but also take action to modify your project structure (with your permission).
  • Upgrade dbt Version: Although not explicitly a tool in this release, the infrastructure could allow the AI to assist in upgrading the dbt version or dependencies. For example, by reading requirements.txt or checking the installed dbt version, and then running pip installs. This would likely be added as a safe tool in future once we ensure such actions are gated behind user approval.
  • Git Operations: While primarily focused on dbt tasks, the larger vision includes enabling some project-level Git interactions via MCP (for example, checking out a new branch for a fix, or opening a pull request). The current server does not directly include Git tools, but it lays the groundwork for integrating such capabilities alongside dbt tasks, so the AI could orchestrate end-to-end workflows (code change -> run model -> test -> commit).

All these tools are defined in the extension and registered with the MCP server at startup. When Cursor’s AI is formulating a plan to help you, it queries the MCP server for an index of available tools (the MCP protocol allows the client to discover tools and their input schema). The AI sees something like “I have a tool called get_children_models that needs a table name” and it can then decide to use it if the user asks a question about model dependencies.