Inroduction
For decades, organizations have been forced to maintain two separate systems for their data. One database handles transactional workloads, inserts, updates, and point lookups, while a separate analytical platform handles aggregations, reporting, and machine learning pipelines. Getting data from one to the other meant building and maintaining ETL jobs, accepting latency, and managing yet another layer of infrastructure.
That architectural compromise made sense when applications changed slowly, and AI was a research topic. It no longer does. AI agents and real-time applications need both transactional and analytical data at once, without a pipeline in between. Databricks recognized this gap and responded with Lakebase, a fully managed, PostgreSQL-compatible operational database built directly into the Databricks Data Intelligence Platform.
Databricks Lakebase was announced at the Data + AI Summit in June 2025, entered Public Preview shortly after, and reached General Availability on 3 February 2026. It is currently available on AWS and Azure across 14 regions worldwide. This article walks through what Lakebase actually is, how its architecture works, how to set it up, and what you can realistically build with it.
The Problem Lakebase Is Solving
To understand why Lakebase matters, it helps to be precise about the problem.
A traditional OLTP database, PostgreSQL, MySQL, and Oracle, is optimized for short, high-frequency transactions. It keeps data highly normalized, uses row-based storage, and is tuned for single-record access patterns. It is not designed to run analytical queries across hundreds of millions of rows.
A data lakehouse or warehouse, Databricks SQL, Snowflake, and BigQuery work in opposite directions. It uses columnar storage formats such as Delta Lake or Parquet, is built for parallel scans and aggregations, and excels at queries that access large volumes of data. It handles transactions poorly and has latency measured in seconds, not milliseconds.
The result is an architecture in which operational and analytical systems are separate. Teams build ETL pipelines using tools like Fivetran, Airbyte, or custom Spark jobs to move data between them. That pipeline adds latency, introduces consistency risks, and requires its own maintenance burden. Every time the source schema changes, the pipeline breaks.
For traditional BI use cases, this was an acceptable trade-off. For AI applications, where an agent may need to read recent transaction history, run a scoring model, and write a decision back to the operational system in a single workflow, it is not.
Lakebase collapses the two systems into one. It brings a fully managed PostgreSQL engine to the Databricks platform, enabling transactional and analytical workloads to share the same data, governance layer, and compute platform.
What Is Lakebase, Technically?
Lakebase is a serverless PostgreSQL database integrated natively with the Databricks Lakehouse. Under the hood, it is powered by Neon, a cloud-native Postgres engine that Databricks acquired for $1 billion in May 2025. Neon was built from the ground up with a separate compute-and-storage architecture, the key technical property that sets Lakebase apart from a standard managed Postgres service.
In a conventional PostgreSQL deployment, compute and storage are tightly coupled. The database server owns the storage layer, which means scaling compute requires either over-provisioning storage or running complex replication setups. Neon separates the two completely. The PostgreSQL compute layer (the processes that parse SQL, plan queries, and execute transactions) scales independently from the storage layer (where the actual data pages live). This architecture is what allows Lakebase to deliver sub-10ms query latency at over 10,000 queries per second while also scaling down to zero when idle.
PostgreSQL 16 Compatibility
Lakebase runs PostgreSQL 16. This is significant because PostgreSQL has a mature, deeply understood ecosystem. Developers can connect using any standard PostgreSQL client, psql, pgAdmin, DBeaver, or any application using libpq. Connection strings, drivers, and ORMs that work with standard Postgres work with Lakebase without modification. Port 5432, the standard JDBC/ODBC interface, and standard SSL certificate authentication all behave as expected.
This is not a “Postgres-compatible” system in the loose sense that some databases use the term. It is PostgreSQL, running the PostgreSQL 16 engine, with the full SQL dialect, stored procedures, triggers, and extension support.
Unity Catalog Governance
Every Lakebase database instance is governed by Unity Catalog, Databricks’ unified data governance layer. This means the same permission model, USAGE, SELECT, INSERT, UPDATE, DELETE, that governs Delta tables also governs Lakebase tables. Fine-grained access control, audit logging, and lineage tracking work consistently across both operational and analytical data assets.
For enterprise environments with strict compliance requirements, this is a major operational advantage. There is no need to maintain a separate access control system for the operational database.
Storage on Delta Lake
Data written to Lakebase is stored in Delta Lake format. This means every table in Lakebase is an open-format Delta table, readable directly by Spark, Databricks SQL, or any Delta-compatible reader, with no data movement. Time travel, the ability to query historical versions of a table using VERSION AS OF or TIMESTAMP AS OF, is available on Lakebase tables just as it is on standard Delta tables. Audit history and data versioning come for free.
Lakebase Architecture: How the Pieces Fit Together
When a query hits Lakebase, it goes through the Neon compute layer, which handles transaction management, query planning, and execution. Reads and writes go to the Delta Lake storage layer rather than to a local disk. Because compute is stateless and storage is shared, the system can scale compute horizontally without any data migration.
The separation also enables a capability Lakebase calls “branching”. Just as Git creates a lightweight copy of a repository at a specific commit, Lakebase can create a branch of a database at a specific point in time. The branch does not copy the underlying data; it references the existing Delta Lake storage and records diverging changes separately. This makes it possible to spin up a production-identical environment for integration testing, run a destructive data migration on the branch, and discard it without affecting the main database.
From an operational standpoint, the architecture also means there is no single point of failure at the storage layer. Lakebase supports a High Availability pool, configurable at instance creation, which keeps warm compute replicas ready to take over if a primary node fails.
Setting Up Lakebase: Step by Step
The setup process assumes an active Databricks workspace. Lakebase requires the Premium tier or above.
Step 1: Enable Lakebase in Your Workspace
Since Lakebase is currently rolling out to workspaces, it may need to be activated first.
- Navigate to the Admin Console in your Databricks workspace.
- Open Preview Features.
- Find Lakebase in the list and toggle it on.
- Save the setting and refresh the workspace.
Once enabled, the Lakebase option appears under the Compute section.
Step 2: Create a Database Instance
- Go to the Compute tab in the left sidebar.
- Select Lakebase, then click Create Database Instance.
You will see the following configuration options:
| Parameter | Description | Default |
|---|---|---|
| Engine | PostgreSQL version | PostgreSQL 16 |
| Instance Size | Compute units allocated | 2 Units (minimum) |
| High Availability Pool | Number of standby replicas | 1 |
| Data Retention | Log retention window | 7 days |
The minimum instance size of 2 units is sufficient for development and testing. Production deployments with high-concurrency requirements should start with a higher unit count and leverage autoscaling to handle spikes.
Click Create and wait for the instance to provision. This typically takes under two minutes.
Step 3: Retrieve Connection Details
Once the instance is running, navigate to the Connection Details tab on the Lakebase dashboard. You will find:
- Hostname, the fully qualified domain name for the instance
- Port, 5432 (standard PostgreSQL)
- Database name
- Credentials, username, and password, with optional certificate-based authentication
To connect using psql:
| psql -h <hostname> -p 5432 -U <username> -d <database_name> |
To connect from a Python application using psycopg2:
| import psycopg2 conn = psycopg2.connect( host=”<hostname>”, port=5432, dbname=”<database_name>”, user=”<username>”, password=”<password>”, sslmode=”require” ) cursor = conn.cursor() cursor.execute(“SELECT version();”) print(cursor.fetchone()) |
Any application that can connect to PostgreSQL can connect to Lakebase. No driver changes, no schema changes, no migration tooling required.
Step 4: Configure Unity Catalog Permissions
By default, only the instance creator has access. To grant access to other users or service principals:
| GRANT USAGE ON DATABASE lakebase_db TO data_engineer_role; GRANT SELECT, INSERT ON ALL TABLES IN SCHEMA public TO data_engineer_role; |
These grants are managed through Unity Catalog and use the same role-based model as the rest of the Databricks platform.
Step 5: Create a Branch for Development
To create a development branch of your database:
- Go to the Branches tab in the Lakebase dashboard.
- Click Create Branch.
- Select the source branch (default is main) and a point-in-time snapshot if needed.
- Name the branch (for example, feature/schema-migration-v2).
The branch is ready in seconds and provides a fully isolated copy of the database at that snapshot. Write transactions on the branch do not affect the main database. When the development work is validated, the branch can be merged back or discarded.
Key Capabilities Worth Understanding
Serverless Autoscaling
Lakebase scales compute up instantly under load and scale down to zero when idle. For workloads with unpredictable traffic, a product that sees heavy usage during business hours and almost none overnight, this means you pay only for active compute time. There are no idle server costs.
This is a meaningful architectural difference from managed PostgreSQL services like Amazon RDS or Azure Database for PostgreSQL, which provision a fixed compute size and charge for it continuously.
AI and Natural Language Querying
Lakebase is designed to work with Databricks AI features. Because the database lives inside the same platform as Mosaic AI and Databricks Genie, it is possible to expose Lakebase tables to AI agents that can query them using natural language. An agent built with Agent Bricks or a custom LangChain/LlamaIndex integration can generate SQL against a Lakebase schema, execute it, and return results, without any data replication or additional API layer.
This makes Lakebase a practical backend for AI assistants that need to answer questions about live operational data.
No ETL Between Transactional and Analytical Workloads
Because Lakebase tables are Delta Lake tables, they are directly readable by Databricks SQL and Apache Spark. A table that an application writes to through the PostgreSQL interface can be joined with a Delta table from your data warehouse in the same Databricks SQL query, with no pipeline in between. The data is the same data.
This eliminates an entire category of pipeline engineering work and removes the latency window that ETL introduces.
Real-World Use Cases
-
Real-Time Fraud Detection
An e-commerce platform uses Lakebase as the operational database for its checkout service. When a transaction is created, a Databricks workflow reads recent purchase history directly from the same Lakebase table, scores the transaction against a fraud detection model, and writes the result back, all within the same platform, with no cross-system data movement.
-
Customer 360 with Live Data
A CRM application writes user events, logins, purchases, and support tickets to Lakebase. An analytics dashboard queries those same tables using Databricks SQL, joining them with historical data from a Delta table, to produce a customer profile that reflects activity from the last few minutes rather than the last few hours.
-
AI Agent with Operational Context
An AI agent built to answer business questions about sales performance queries Lakebase tables directly using natural language through Databricks Genie. Because the data is live, not a daily export, the agent’s answers reflect the current state of the business.
-
Development and Testing Isolation
A data engineering team uses Lakebase branches to test schema migrations before applying them to production. Each pull request spins up an isolated branch, runs the migration script, executes integration tests, and deletes the branch, without ever touching the production database.
Lakebase vs. Standard Managed PostgreSQL
| Capability | Lakebase | Standard Managed Postgres (RDS, Azure DB) |
|---|---|---|
| Compute/Storage Separation | Yes | No |
| Scale to Zero | Yes | No |
| Delta Lake Storage | Yes | No |
| Unity Catalog Governance | Yes | No |
| AI Agent Integration | Native | Requires custom integration |
| Git-style Branching | Yes | No |
| Time Travel Queries | Yes | No |
| PostgreSQL Compatibility | Full (v16) | Full |
| Latency | Under 10ms | Under 10ms |
| Max QPS | Over 10,000 | Depends on instance size |
Standard managed PostgreSQL services remain valid for standalone operational workloads. Lakebase is the better choice when that operational database also needs to serve analytical queries, feed AI workloads, or integrate with a broader data platform without ETL.
Things to Keep in Mind
Since March 12, 2026, new Lakebase instances are created as Autoscaling projects. Existing Provisioned instances are being upgraded automatically to Autoscaling, starting in June 2026. If you are on an older provisioned instance, plan for this migration.
Multi-cloud feature parity is still maturing. Lakebase is generally available on AWS and is rolling out to Azure. If your organization runs primarily on Google Cloud Platform, check the current availability documentation before committing to an architecture plan that depends on it.
Branching and time travel have storage cost implications. Historical version retention is configurable (default: 7 days) and should be tuned to your recovery point objectives and compliance requirements.
The minimum instance size is 2 compute units. For high-concurrency production workloads, sizing should be validated through load testing. The autoscaling behavior means costs scale with usage, and spike-heavy workloads need to be profiled before going live.
Conclusion
Lakebase is not a minor product update. It represents a genuine architectural shift in how transactional and analytical data can coexist. By building a PostgreSQL 16 engine on top of Neon’s separated compute and storage, integrating it with Delta Lake for open data access, and governing it through Unity Catalog, Databricks has created something that no managed Postgres service and no lakehouse alone can offer: a single system that handles both workloads without compromise.
For teams building AI applications, the implications are immediate. The data an agent needs for real-time decisions lives in the same place as the data an analyst needs for quarterly reporting. There is no pipeline to maintain, no synchronization delay, and no second system to operate.
If your organization is already on Databricks, Lakebase is worth evaluating now. The setup takes minutes, PostgreSQL compatibility means a zero-learning-curve for developers, and the integration with the rest of the platform is already in place. The harder question is not how to set it up, it is how to rethink your data architecture now that the OLTP and OLAP boundary no longer needs to exist.
NeosAlpha Technologies works with enterprise clients to design and implement modern data architectures on Databricks, Microsoft Fabric, and Snowflake. If you are evaluating Lakebase for production use or planning a migration from a traditional OLTP stack, our data engineering team can help you assess the architecture and build the right integration pattern for your workload.