12 Jun 22:14

gueniai

e969dff

Release v0.14.0 Latest

Latest

Release Notes — Lakebridge v0.14.0

Highlights

Five new profiler sources. Snowflake, Redshift, BigQuery, Oracle, and Legacy SQL DW are now all supported profiler targets, significantly expanding the set of platforms Lakebridge can assess ahead of a migration.
Switch now supports SAS. A new built-in prompt converts SAS programs to PySpark, adding SAS to the growing list of languages Switch can migrate automatically.
Reconcile now supports Teradata. Teradata can now be used as a source platform for data quality reconciliation, enabling validation for customers migrating from Teradata to Databricks.
Automatic reconciliation configuration. A new auto-configure-recon-tables command discovers source and target tables and generates the initial reconcile configuration automatically, replacing a previously manual setup step.

Profilers

New Sources

Redshift (#2305, #2304, #2306, #2408, #2501)
Amazon Redshift is now a fully supported profiler source, covering all three deployment variants: provisioned, provisioned multi-AZ, and serverless. This includes credential flows (database password, federated user, AWS Secrets Manager ARN, temporary credentials with db user or IAM, with optional SSL), dedicated extraction queries and validation schemas for each variant, full CLI wiring, and user documentation. A bug in the serverless managed-storage aggregation was also fixed: the previous query summed hourly sys_serverless_usage snapshots, causing reported storage to grow linearly with the lookback window rather than reflecting the actual allocated amount.
Snowflake (#2420, #2499)
Adds Snowflake as a profiler source. The interactive configurator prompts for connection details and a Programmatic Access Token (PAT), then extracts warehouse usage, query history, storage, user activity, account info, and optional credits (pipe, autoclustering, materialized-view refresh) from SNOWFLAKE.ACCOUNT_USAGE into a timestamped DuckDB file. Post-extraction computations produce TCO summaries. A supplementary rate_sheet extract pulls the effective per-credit rate (90-day average) and account service tier from SNOWFLAKE.ORGANIZATION_USAGE.RATE_SHEET_DAILY, providing the inputs needed by the downstream TCO value model.
BigQuery (#2472)
Adds BigQuery as a profiler source. Running configure-database-profiler followed by execute-database-profiler --source-tech bigquery executes 16 region-qualified INFORMATION_SCHEMA queries against the customer's configured BigQuery project(s) and writes 12 analysis tables into a local DuckDB file at ~/.databricks/labs/lakebridge_profilers/bigquery_assessment/profiler_extract.db. The pipeline structure mirrors existing source-techs.
Oracle (#2187)
Adds Oracle as a profiler source. Covers the interactive configuration dialog, extraction SQL scripts, and local DuckDB population from Oracle Database, with unit tests for all three components.
Legacy SQL DW (#2441)
Extends profiler coverage to legacy Azure SQL DW (pre-Synapse) deployments, adding the necessary extraction queries to assess this platform.

Enhancements

SQL Server profiler switched to SQL scripts and shared DatabaseManager (#2482)
The MSSQL profiler's activity and info extraction steps have been converted from per-step Python virtualenvs to in-process sql/ddl steps using the shared DatabaseManager connector. Thirteen SQL query/DDL pairs replace the previous Python scripts, eliminating the duplicated get_sqlserver_reader connector and aligning SQL Server with the architecture used by other profiler sources.
User-configurable output folder (#2488)
The profiler's DuckDB extract path is now exposed as a CLI flag (--output-folder) instead of being hardcoded per source in pipeline_config.yml. The default remains ~/.databricks/labs/lakebridge_profilers/<source>_assessment. Output filenames now include a timestamp (profiler_extract_<YYYYMMDD_HHMMSS>.db) to prevent overwrites on repeated runs, and the absolute path to the extract is logged on successful completion.
Custom credentials file path for execute-database-profiler (#2494)
The execute-database-profiler command now accepts an optional --cred-file-path argument, letting users supply a non-default credentials file rather than the one written by configure-database-profiler. This makes it easier to manage multiple credential configurations or run the profiler in scripted, non-interactive environments.
SQL Server: TrustServerCertificate support (#2498)
The configure-database-profiler command for SQL Server now surfaces a TrustServerCertificate connection property, addressing a common customer request for environments where the server certificate cannot be validated.
Fix: profiler steps no longer fail with ModuleNotFoundError (#2485)
The Synapse and SQL Server profilers previously created a fresh virtualenv per pipeline step, installing only each step's declared dependencies. After database_manager.py began importing redshift_connector at module scope, every clean run of those profilers failed with ModuleNotFoundError. Profiler Python steps now run with the parent interpreter, inheriting all installed packages—faster, more robust, and immune to this class of transitive-import failures.

Converters

Morpheus

T-SQL Improvements

Better date and number formatting from T-SQL CONVERT
T-SQL's CONVERT function accepts a style code to format dates and numbers as strings (e.g. CONVERT(VARCHAR, myDate, 103) for dd/MM/yyyy). Morpheus now correctly translates these style codes to the equivalent Databricks SQL expressions, unblocking a large class of previously untranslatable queries.
Fix integer-as-date behavior from T-SQL
T-SQL allows using 0 (or any integer) where a date is expected, treating it as "N days after 1 Jan 1900". Morpheus now replicates this behavior, preventing runtime type errors when running translated queries on Databricks.
Fix variable declarations with VARCHAR(N) / CHAR(N) types
T-SQL local variables declared as VARCHAR(N) or CHAR(N) were being passed through verbatim and failing at runtime in Databricks. They are now automatically translated to STRING, the correct equivalent type for variable declarations.
Fix T-SQL variable assignment queries with ORDER BY
T-SQL queries that assign a value to a variable (e.g. SELECT @var = col FROM t ORDER BY col) were being incorrectly structured during translation. The ORDER BY and similar clauses are now placed correctly in the output.
Support HASH JOIN query hint in T-SQL
T-SQL's OPTION(HASH JOIN) query hint, which instructs the database to use a specific join strategy, is now correctly parsed and handled during translation.

Snowflake Improvements

Translate REGEXP_SUBSTR_ALL to REGEXP_EXTRACT_ALL
Snowflake's REGEXP_SUBSTR_ALL (returns all regex matches as an array) is now translated to its Databricks SQL equivalent REGEXP_EXTRACT_ALL.
Translate binary hash functions (MD5_BINARY, SHA1_BINARY, SHA2_BINARY)
Snowflake's binary digest functions are now translated to their Databricks SQL equivalents by wrapping the hex output with UNHEX().
Translate hex hash function synonyms (MD5_HEX, SHA1_HEX, SHA2_HEX)
Snowflake's *_HEX hash function aliases are now directly mapped to their identically-behaved Databricks SQL counterparts.
Translate UNICODE and TRY_TO_DOUBLE
Snowflake's UNICODE() (returns the code point of the first character) is now mapped to ASCII() in Databricks SQL, and TRY_TO_DOUBLE() is mapped to TRY_CAST(_ AS DOUBLE). The UNICODE fix also applies to T-SQL.
Translate type-check functions (IS_DATE, IS_DOUBLE, IS_REAL, etc.)
Snowflake functions that test whether a value inside a semi-structured (VARIANT) column holds a specific type are now translated where possible (e.g. IS_DATE → TRY_CAST(v AS DATE) IS NOT NULL). Those with no equivalent in Databricks (IS_TIME, IS_TIMESTAMP_TZ) are flagged with a clear migration note.
Flag unsupported functions (CHECK_XML, PARSE_XML, IS_ROLE_IN_SESSION, etc.) with migration notes
Seven Snowflake-specific functions with no Databricks equivalent now produce a clear "FIXME" comment in the output instead of silently passing through and failing at runtime. IS_NULL_VALUE is also correctly translated to IS_VARIANT_NULL.
Flag 12 more Snowflake-only functions with migration notes
Additional Snowflake admin, statistical, and VARIANT-inspection functions (including COMPRESS, NORMAL, ZIPF, INVOKER_ROLE, IS_BOOLEAN) now produce clear FIXME annotations instead of failing silently at runtime.
Flag Snowflake INFORMATION_SCHEMA metadata functions with migration notes
Snowflake monitoring/metadata functions called via INFORMATION_SCHEMA (like PIPE_USAGE_HISTORY, `M...

Contributors

ysmx-github, take60, and 3 other contributors

Assets 6

13 May 20:21

gueniai

v0.13.0

8178513

Release v0.13.0

Lakebridge v0.13.0 Release Notes

Highlights

A few headline changes in this release worth calling out:

SQL Server profiler is now available, extending assessment coverage to Microsoft SQL Server alongside the existing Azure Synapse support.
Redshift reconciliation is now supported. Redshift can be used as a source for all reconcile report types.
Major Morpheus conversion improvements for T-SQL and Redshift, including a full Redshift dialect rollout (DDL, DML, functions, operators) and a wide expansion of T-SQL function and resilience handling.

This release also includes a major overhaul of the documentation, aimed at simplifying the structure and making the docs easier to follow.

Assessment

Profiler

Added a new SQL Server profiler that extends the existing assessment capabilities to Microsoft SQL Server, closely mirroring the Azure Synapse profiler design. The implementation exposes a last_execution_time parameter on all server queries, laying the groundwork for future incremental/scheduled extractions. On-prem SQL Server is not in scope. (#2151)
Updated the Azure Synapse Workspace profiler summary dashboard and introduced a new dashboard template for SQL Server. The Synapse template removes deprecated dashboard widget parameters, parameterizes table values so they can be set dynamically by Lakebridge, adds a dedicated-storage summary widget by SQL pool, renames datasets for clarity, fixes broken column references in SQL pool activity widgets, and reformats dataset queries. (#2317)
Reworked the create-profiler-dashboard CLI flow to bring it in line with the rest of the Lakebridge installer experience: clearer prompts for extract file location, UC catalog, schema, and volume; a helper to parse the extract path and UC volume upload location; and dashboard install/uninstall hooked into the standard Lakebridge installer/uninstaller. (#2319)
Fixed a bug that produced false positives in test-profiler-connection. (#2342)

Converters

Morpheus

Snowflake

Tightened parsing of CREATE FILE FORMAT statements, including format type options, with clear diagnostics for unsupported variants.

Synapse / T-SQL

Improved resilience by silently dropping unsupported constructs (table hints such as WITH (NOLOCK) and READPAST, UPDATE STATISTICS/CREATE STATISTICS, and other unsupported CREATE TABLE options) with warnings so surrounding scripts keep transpiling instead of failing.
Expanded function coverage with a batch of T-SQL "easy wins" (DATEPART unit aliases, EOMONTH, ISNUMERIC, TIME → STRING conversions) and richer date/time handling (DATETRUNC, DATE_BUCKET, SYSDATETIME, DATENAME, normalized DATE_PART units).
Added Databricks-compatible translations for PATINDEX (to REGEXP_INSTR, converting SQL wildcards to regex), IIF (mapped to IF at parse time), and FORMATMESSAGE (graceful fallback with diagnostics for unsupported format specifiers).
Migrated the T-SQL functional test suite to the new eval-based scenario runner, expanding executable coverage and removing legacy skips.

Redshift

Added Redshift as a first-class source dialect: a full dialect mapping in the converter (parser, IR builder, generator), Language Server advertisement alongside Snowflake and T-SQL, and a published per-feature workplan.
Implemented Redshift DDL and DML coverage, including CREATE TABLE (distribution, sort key, constraint clauses), DELETE … REMOVE DUPLICATES, complex literals (arrays, super values, composite forms), and the supportable portion of the SUPER type with explicit rejection diagnostics for the rest.
Added wide function coverage across string, VARBYTE, window, admin, OBJECT, JSON, HLL, and math families, plus date/time (DATEADD, DATE_CMP, TIMEZONE, timezone comparisons, TIMEOFDAY, LAST_DAY, MONTHS_BETWEEN, SYSDATE, single-argument TRUNC), numeric coercion (TEXT_TO_INT_ALT, TEXT_TO_NUMERIC_ALT), hashing (CHECKSUM, FARMHASH64), and a staged TO_TIMESTAMP implementation.
Implemented Redshift-specific operators including the + overload (so string and date arithmetic disambiguate correctly) and the |/ (square root) and |// (cube root) prefix operators.
Explicitly rejected EXPLAIN_MODEL since Databricks SQL has no equivalent ML-model explainer, surfacing actionable diagnostics rather than silent miscompilation.

General

Broadened cross-dialect DDL with CREATE FUNCTION (as much as is feasible per dialect, with diagnostics for unsupported procedural features) and CREATE SCHEMA for Snowflake, T-SQL, and Redshift, both lowered to Databricks SQL.
Expanded shared function and expression support: CURRENT_USER across all dialects, additional TIMESTAMP-related functions, raw strings as proper IR expressions (so they participate in type inference and round-trip cleanly), and EXECUTE IMMEDIATE usable as an expression (not just a statement).
Added full coverage of H3 and ST spatial functions across every supported dialect, with normalized names and argument shapes.
Improved grammar flexibility by allowing reserved keywords DATABASE and PRIMARY to be used as identifiers in unambiguous contexts, unblocking real-world schemas.
Expanded DELETE to cover all Redshift, T-SQL, and Snowflake use cases (including dialect-specific extensions) and added UPDATE-to-MERGE lowering for Redshift and T-SQL when an update uses a join or source table.

Reconcile

Added a Redshift connector to reconcile so Redshift can be used as a source for data, row, schema, and full report types. (#2339)
Replaced direct JDBC connections (Oracle, Snowflake, SQL Server) with Databricks Unity Catalog remote_query() calls backed by UC Connections. Reconcile no longer manages JDBC URLs, secret scopes, or PEM keys directly — authentication and connectivity are handled by Databricks. This introduces a v2 configuration format that takes a uc_connection_name in place of secret_scope; existing v1 configs are auto-migrated on load. (#2362)
Fixed reconcile schema fetch failures on Foreign Catalogs created via Lakehouse Federation. Foreign catalogs lack the Databricks-specific full_data_type column in information_schema.columns, which previously caused UNRESOLVED_COLUMN errors for all report types (schema, data, row, all). A new DatabricksNonUnityCatalogDataSource now falls back to DESCRIBE TABLE and covers hive_metastore, global_temp views, and Foreign Catalogs, while the native DatabricksDataSource remains scoped to Unity Catalog tables. (#2422)
Fixed a T-SQL/Synapse reconciliation regression where switching to VARCHAR(MAX) in hash concatenation broke date/time columns: SQL Server accepts VARCHAR(256) + DATE via implicit conversion but rejects VARCHAR(MAX) + DATE. Temporal transforms now CONVERT DATE/TIME/DATETIME to VARCHAR(10)/VARCHAR(12)/VARCHAR(23) so all temporal columns produce VARCHAR output that concatenates safely in the hash input string. (#2320)
Improved Oracle reconcile coverage, fixed parsing of remote query options, and dropped the legacy Oracle test scripts and Docker harness that required heavy manual setup. (#2433)

Installer

Added minimal support for using a Maven mirror when installing Morpheus via install-transpile. Setting LAKEBRIDGE_MAVEN_URL overrides the default repository URL, and credentials can be supplied through ~/.netrc (or via NETRC) so install-transpile works in environments without direct Maven Central access. (#2405)
Updated the wheel installer used during install-transpile to look up version information via pip instead of issuing a direct HTTP call to PyPI. This allows install-transpile to work in environments where only a local PyPI mirror is available. (#2404)

Documentation

Major revamp of the Lakebridge documentation focused on clarity, structure, and first-time user experience: added a new end-to-end Getting Started tutorial (SQL Server → Databricks SQL walkthrough), a Choosing Tools decision guide, a dedicated Morpheus transpiler page, and a split-out Switch architecture page. Reconcile docs were consolidated from 5 files (~1,400 lines) to 3 (~750 lines) with a new report-type comparison table and unified Configuration Reference and Running Reconcile pages. SSIS docs were moved into a dedicated subfolder with a collapsible sidebar category. The Installation page was rewritten for brevity, the FAQ expanded from 3 to 25+ questions, and the sidebar reordered to match the actual user journey: Installation → Getting Started → Choosing Tools → Assessment → Transpile → Reconcile → SQL Splitter → FAQ. (#2365)
Fixed the reconcile notebook documentation to match the v2 TableRecon API: the example now shows TableRecon as tables: list[Table] only (with source_schema, target_catalog, target_schema, and source_catalog configured via DatabaseConfig inside ReconcileConfig), corrected the location of drop_columns (it belongs on Table, not TableRecon), and added a migration note pointing users to DatabaseConfig. ([#2329](#2...

Assets 4

26 Feb 18:18

gueniai

v0.12.2

1d855d0

v0.12.2

Assessment

Profiler

Enhanced Synapse profiler extraction and monitoring by correctly handling batched pipeline/trigger runs, adding serverless‑pool routine listing via sys.objects, reconnecting to master for server‑level DMVs, stripping whitespace from credential fields, replacing deprecated DataFrame.union() with pd.concat(), and clarifying Azure auth, DMV permissions, and serverless catalog view behavior.

Analyzer

Added support for a new --generate-json switch to produce a JSON report alongside the existing Excel report, enabling programmatic consumption of analyzer results without changing default behavior.

Converters

Morpheus

Snowflake

Added transpilation support for seven Snowflake geospatial functions (ST_MAKEPOINT, ST_POINT, ST_X, ST_Y, ST_CENTROID, TRY_TO_GEOGRAPHY, HAVERSINE) to Databricks SQL, including SRID handling, argument normalization, and custom SQL for Haversine distance.
Introduced support for Snowflake SPLIT_TO_TABLE by mapping it to Databricks POSEXPLODE(SPLIT(...)), including column renaming, regex‑safe delimiter handling, and both TABLE() and LATERAL invocation forms.
Implemented Snowflake JSON helpers CHECK_JSON, GET_PATH, and IFF for Databricks SQL, using TRY_PARSE_JSON‑based validation, GET_JSON_OBJECT path translation, and direct IF‑style semantics.

Synapse / TSQL

Added IR support for executing stored procedures and immediate SQL strings via T‑SQL EXEC/EXECUTE, including positional and named parameters, output parameters, AS USER, and AT <data_source> constructs, plus parity tests for Snowflake EXECUTE IMMEDIATE.

Other / General

Enhanced the Morpheus DataType and Expression system with static typing and promotion at IR generation time, including numeric and string helpers, a NumericValue pseudo‑type, SQL‑style highestType promotion rules, and an expanded test suite.

BladeBridge

Oracle

Updated NUMBER without precision to map to DECIMAL(38,18) for Oracle to correctly handle floating‑point semantics in converted code.

Teradata

Updated NUMBER without precision to map to DECIMAL(38,18) for Teradata to preserve floating‑point behavior in conversions.
Added Teradata stored procedure test cases with various DML statements, transactions, and explicit handling of output parameters returned without CALL, aligning with Teradata’s procedure semantics.

Redshift

Fixed Redshift NUMBER mapping to DECIMAL(38,0) to ensure correct numeric precision in converted objects.

General SQL

Added support for CONNECT BY into the platform source gap specifications and introduced CREATE PROJECTION patterns into general_sql_specs.json to broaden SQL feature coverage across supported sources.
Added DDL test cases and patterns to improve datatype and table partition conversion and to strip unwanted default values from DDL in the final master step when they are not visible in configuration.

ETL to Databricks (Informatica)

Fixed workflow parameter default value handling and introduced a table‑based workflow parameter storage system, providing type‑aware defaults and a workflow_utils.workflow_params Delta table to centralize parameter metadata for Informatica‑to‑Databricks conversions.
Added a configurable data_type_mapping for Informatica‑to‑Python conversions to improve type inference and consistency across generated notebooks.

Reconcile

Updated T‑SQL/Synapse reconciliation hash generation to avoid VARCHAR(256) truncation by using VARCHAR(MAX) in COALESCE and HASHBYTES, reducing false mismatches and better aligning behavior with Databricks.
Introduced source and target record count metrics (source_record_count, target_record_count) into reconciliation metrics and dashboards, including an upgrade script and tests to support enhanced reconciliation observability.

Documentation

Added LLM‑friendly documentation via the @signalwire/docusaurus-plugin-llms-txt plugin, exposing a structured llms.txt index and per‑page markdown URLs so AI tools can more easily discover and consume Lakebridge documentation.
Documented automatic serverless cluster detection behavior for Reconcile, including configuration requirements for Unity Catalog volumes and environment variables.
Published an IBM DataStage‑to‑Databricks conversion guide that explains supported DataStage versions and objects, generated Databricks artifacts, helper libraries, and troubleshooting practices.
Extended the BladeBridge ETL configuration guide with native database connection examples, including tokenized JDBC/ODBC templates for systems such as Oracle and MSSQL.
Updated Switch documentation to describe the new input_file_relative_path column in the Conversion Result Table schema and how it preserves input directory structure in outputs.

Dependency updates:

Updated pyodbc requirement from ~=5.2.0 to >=5.2,<5.4 (#2104).
Bump webpack from 5.99.6 to 5.105.0 in /docs/lakebridge (#2269).
Bump sigstore/gh-action-sigstore-python from 3.0.1 to 3.2.0 (#2177).
Updated duckdb requirement from ~=1.2.2 to >=1.2.2,<1.5.0 (#2079).

Contributors: @sundarshankar89, @dependabot[bot], @BesikiML, @simone-dbx-labs, @eri-adepoju, @m-abulazm, @gueniai, @hiroyukinakazato-db

Contributors

gueniai, dependabot, and 6 other contributors

Assets 4

12 Feb 22:11

gueniai

v0.12.1

58b45ec

v0.12.1

Synapse Profiler

Fixed several critical errors in the Synapse profiler extraction pipeline, including a type mismatch when initializing the credential manager and handling of empty or partial result sets from Spark data pools.
Added support for an env secret type in the Synapse profiler, allowing profiler configurations to resolve secrets from environment variables.
Enhanced the credential manager backing the profiler so it can resolve nested credential structures (for example, workspace config, JDBC settings, and profiler options) instead of only flat key–value maps.
Introduced recursive, type-aware resolution of dictionaries, lists, and strings in profiler-related credentials while preserving primitive values and maintaining backward compatibility with existing configurations.
Enhanced the credential manager to support nested credential structures

Analyzer

Expanded SQL parsing to cover additional TSQL constructs (including CREATE STATISTICS, THROW, and DROP TEMPORARY TABLE IF EXISTS), improving handling of error management, statistics, and temporary tables.
Fixed crashes caused by special characters in mapping names by treating them as escaped literals instead of regex symbols.
Improved reliability of MERGE into partitioned targets with enhanced handling and added test coverage.
Added Jupyter Notebook detection to the Lakebridge analyze command by mapping notebook assets to the JUPYTERNB type and updating local dev tooling ignores.
Simplified analyze filepath handling: --report-file now directly controls the Excel filename with consistent relative-path semantics, and --source-directory behavior and prompt text are aligned with the implementation.
Removed a confusing “timestamped directory” behavior when the target report file already existed so logs and output now match user expectations.

Converters

Morpheus

Snowflake

Always emit SQL SECURITY INVOKER on generated Snowflake procedures so the security context is explicit and tests align with this behavior.
Improved parsing and generation of INTERVAL literals to support both ANSI-style and Snowflake-style syntaxes.
Implemented correct transpilation of Snowflake INTERVAL literals, including composite values and Snowflake-specific units, into normalized Databricks SQL YEAR TO MONTH and day-time intervals.

Synapse / TSQL / SQL Server

Improved parsing and handling of logical expressions (AND/OR/NOT) from other expressions in the T-SQL grammar to reduce ambiguity and improve parsing performance.
Added support for T-SQL ALTER DATABASE statements into Databricks SQL ALTER SCHEMA, emitting comments for unmappable features.
Introduced partial support for T-SQL CONVERT, adding indication for unsupported datetime types.
Improved typing and parsing of the T-SQL + operator so string concatenation is consistently treated as string operations and flattened where possible.
Added T-SQL-specific overrides for equality and missing-value functions to avoid unresolved routines by mapping them to appropriate Databricks SQL equivalents.
Implemented full AST support for T-SQL RAISERROR
Supported T-SQL CROSS APPLY by transpiling it to CROSS JOIN LATERAL, with correct join clause and hint formatting.
Implemented T-SQL OUTER APPLY by generating LEFT JOIN LATERAL, with tests for ordering and LIMIT placement.
Fixed transpilation of WHERE … LIKE … so columns and patterns render correctly, including COLLATE expressions.
Added tests for SET within nested IF blocks to validate correct handling of T-SQL control flow without extra variable declarations.
Correctly translated T-SQL DATEDIFF for all date parts into Databricks SQL expressions that match T-SQL boundary-count semantics, with comprehensive tests.
Generated Delta Lake computed columns from T-SQL COMPUTED definitions, including PERSISTED columns, using GENERATED ALWAYS AS in target schemas.
Recognized T-SQL table variables as temporary tables and transpiled them to appropriate temporary table syntax in the target dialect.
Implemented support for T-SQL SELECT … INTO by converting to CREATE TABLE AS SELECT and handling INTO precedence rules, while rewriting Snowflake-style INTO to session variables.
Split T-SQL DECLARE statements with scalar subquery defaults into separate DECLARE and SET statements compatible with Databricks SQL.

General (Morpheus engine)

Ensured block-level DECLARE variable scoping is dialect-aware by introducing a postDeclare flag so Snowflake-style declarations appear inside blocks.
Cleaned up data type definitions and generators, adding TIME support, refactoring INTERVAL handling, and replacing ir.Byte with ir.TinyInt.
Improved grammar, IR, and generation of INTERVAL literals so Snowflake-like and ANSI-style syntaxes are both supported.
Fixed expression rendering to always emit parentheses for bracketed constructs, including empty window clauses.
Added transformations to hoist DECLARE variables to the start of blocks and wrap batches with blocks when variables are present, improving procedural SQL handling.
Generalized batch-wrapping logic so any scripting statements are encapsulated in BEGIN … END blocks, with tests updated for the new structure.
Added dialect-specific configuration and grammar for IF/WHILE block parsing so T-SQL and Snowflake scripting blocks terminate correctly per dialect.

BladeBridge

TSQL / SQL Server

Expanded SQL parsing to include TSQL features such as CREATE STATISTICS, THROW, and DROP TEMPORARY TABLE IF EXISTS, improving conversion robustness for TSQL workloads.
Added support for SELECT column aliasing and extended TSQL keyword recognition to improve conversion of SQL scripts to Databricks-compatible syntax.
Enhanced handling of MERGE statements into specific partitions and added tests to improve conversion reliability for partitioned MERGE patterns.

SSIS

Ensured deterministic SSIS conversion output by enforcing stable ordering for variables and target columns, improving null handling, and adding tests so repeated runs generate consistent PySpark.

Informatica / IICS

Implemented the SQL Transform component for Informatica-to-PySpark conversion to cover more data transformation logic.
Added native JDBC/ODBC database connection support for IICS-to-Databricks conversions, allowing direct database reads/writes and fixing connection flag logic for Target components.

Switch

General

Used empty strings instead of nulls for optional configuration parameters to simplify downstream handling.
Preserved directory hierarchy in conversion output so generated artifacts mirror the source layout.

Reconcile

Refactored reconciliation intermediate persistence to clean checkpoint volumes after runs, remove overwrite write mode usage, and prefer Delta volumes on Databricks instead of Parquet.
Hid implementation details behind a more generic interface and marked future work to persist to Delta instead of re-reading from source systems.
Improved reconciliation result handling and logging: reconciliation exceptions now raise a ReconciliationException, while mismatches and passes are logged with severity and report type.
Implemented capability-based caching detection to keep reconciliation compatible with Databricks serverless compute, caching only when supported and using Delta writes as materialization boundaries on serverless.

Documentation

Overhaul of documentation to more clearly show which source system is support in which module.
Updated analyze command documentation to match the current implementation, simplifying caveats and clarifying expected behavior for --report-file and --source-directory.

Dependency updates:

Bump lodash from 4.17.21 to 4.17.23

Contributors: @sundarshankar89, @m-abulazm, @asnare, @gueniai, @dependabot[bot], @BesikiML

Contributors

asnare, gueniai, and 4 other contributors

Assets 4

26 Jan 21:05

gueniai

v0.12.0

7ac560e

v0.12.0

Analyzer

Extended the Analyzer to recognize more SQL-bearing file types, including Oracle package files (.pks, .pkb), Teradata utilities (.bteq, .fload, .mload, etc.), Hive scripts (.hql), and shell scripts with embedded SQL (.sh, .ksh, .bash, .csh), so more source assets are discovered without manual renaming.

Converters

General

Enabled SSIS conversion so SSIS packages can be translated into Databricks notebooks via the BladeBridge-based converter, providing a new migration path for ETL workloads built on SSIS.
Added Amazon Redshift conversion documentation describing supported features, known limitations, and a step-by-step workflow to convert Redshift SQL into Databricks SQL using the BladeBridge transpiler.

Morpheus

All dialects

Implemented explicit support for multi-statement transactions (BEGIN TRANSACTION, COMMIT TRANSACTION, ROLLBACK TRANSACTION, and ATOMIC blocks) in the parser and generator, enabling transaction-aware translation and testing.
Fixed generation of CASE expressions so CASE expressions now terminate with END while CASE statements remain terminated with END CASE, improving standards-compliant SQL output across dialects.
Expanded CREATE VIEW support to handle SCHEMABINDING and MATERIALIZED options, increasing coverage of advanced view definitions.
Standardized translation of DATE_xxx functions by mapping to DATE_ADD and adding synonyms such as DATE_FORMAT, DATE_PART, DATE_SUB, and DATE_TRUNC for consistent naming and mapping.
Ensured control-flow statements like LEAVE and ITERATE are always generated with labels by auto-labelling enclosing blocks or loops when needed, improving robustness of generated control-flow SQL.

Snowflake

Added support for Snowflake ICEBERG catalog DDL by extending the grammar to recognize CREATE ICEBERG TABLE and related syntax so ICEBERG table definitions parse and test correctly.

Synapse / TSQL

Updated translation of T‑SQL VAR and VARP to map to ANSI VARIANCE and VAR_POP, aligning variance aggregation semantics and tests.
Updated translation of T‑SQL STDEV and STDEVP to treat them as synonyms for STDDEV and STDDEV_POP, improving aggregate function compatibility.
Fixed translation of T‑SQL REPLICATE by mapping it as a synonym of REPEAT, clarifying conversion behavior and tightening test coverage.

BladeBridge

SSIS

Enabled SSIS support so SSIS packages can be translated into Databricks notebooks, allowing customers to migrate SSIS workloads using the BladeBridge converter within Lakebridge.

Amazon Redshift

Enabled Amazon Redshift SQL conversion to Databricks SQL, broadening coverage of cloud data warehouse sources and aligning with the new Redshift conversion documentation.ppl-ai-file-upload.s3.amazonaws+1

Synapse / TSQL / MSSQL

Fixed handling of non-standard DELETE statements with two FROM clauses by normalizing the first FROM to use the table alias before conversion, preventing malformed MERGE statements for Synapse and similar targets.
Updated SQL conversion to automatically remove unsupported NOLOCK hints, corrected a fragment-breaker bug that split scripts before INSERT, and fixed variable declarations using AS, improving reliability of T‑SQL parsing and conversion.
Improved stored procedure conversion from SQL Server to Databricks SQL by correctly handling output parameters and standardizing EXEC-to-CALL translation to stay within Databricks SQL scripting constraints.
Added patterns to correctly convert COUNT(DISTINCT COL1) OVER (PARTITION BY COL2) window expressions and to normalize table references from '{database_param}'.schema.table_name to {database_param}.schema.table_name, eliminating stray quoting in database qualifiers.
Fixed BIGINT datatype conversion when the type appears inside braces so it is recognized as a datatype and emitted as plain bigint rather than as a backticked identifier, avoiding invalid Databricks SQL.

Oracle

Improved Oracle package conversion by stripping unnecessary BEGIN/END blocks from functions, emitting logic as a single returned SELECT, adding a dedicated handler for UDF definitions, and tightening procedure conversion for variable declarations, loop THEN usage, and cursor placement inside loops.

Informatica

Enhanced Informatica-to-Spark SQL mappings by removing redundant empty-string wrapping, adding mappings for additional datetime and related functions, and fixing parameter replacement and .format() usage so generated Spark SQL is cleaner and more accurate.
Corrected Databricks notebook generation for Informatica mapplets by switching from relative to absolute imports in the Python template and simplifying mapplet argument collection, removing unused JOB_PARAMETERS and MAPPLET_INFO code.

Documentation

Added Redshift conversion documentation and guide, describing supported features, limitations, and a recommended workflow for converting Redshift SQL to Databricks SQL.
Added an SSIS conversion guide with a full list of supported components, step-by-step migration instructions, and a sample SSIS package to showcase an end-to-end workflow.
Updated Switch documentation for Spark Declarative Pipeline conversion, including the new result_sdp_error column in Delta schemas, target_type = sdp, and sdp_language options, and documented the 7-step conversion pipeline and validation behavior.
Removed WSL from the Windows installation prerequisites, simplifying setup instructions while retaining guidance for Python installation and version checks across platforms.

General

Updated project metadata to require Python versions between 3.10.1 and 3.13.x, avoiding Python 3.10.0, and revised installation docs to reflect the new supported version range.

Contributors: @asnare, @m-abulazm, @sundarshankar89, @BesikiML, @gueniai, @andresgarciaf, @hiroyukinakazato-db, @yyoli-db

Contributors

asnare, gueniai, and 6 other contributors

Assets 4

29 Dec 21:54

gueniai

v0.11.3

b6a901f

v0.11.3

Analyzer

Optimized SAS Analyzer performance by consolidating regex operations, delivering roughly a 7x speed improvement for large-scale SAS analysis workloads.
Added support for new SSIS components Microsoft.Pivot, Microsoft.UnPivot, and ExtensibleFileTask, broadening coverage for SSIS package migrations analysis.

Converters – Morpheus

Core
- Significantly improved ANTLR parsing performance by merging grammars, refactoring ambiguous rules, and updating the Scala integration and build pipeline for the new grammar workflow.
- Allowed the STREAMS token to be used as an identifier so patterns like SELECT * FROM streams.foo.bar now parse correctly in Snowflake-oriented SQL.
- Updated the error reporting to align to the following:
  - - Info: no error, the input was fully translated
  - Hint: the input was fully translated but some irrelevant bits have been elided
  - Warning: the input was translated but with unsupported bits
  - Error: the input couldn't be translated
MSSQL / T-SQL / SQL Server
- Added full support for SQL Server T-SQL CREATE INDEX and table-level index directives, parsing them into a new index IR and translating to CLUSTER BY AUTO in Databricks SQL so index statements are no longer rejected.
- Extended grammar and parsing to handle T-SQL computed columns, QUOTENAME calls, GROUP options in query hints, DROP INDEX statements, and additional keywords like PARAMETERS, STREAMS, PROCEDURES, and VIEWS, improving coverage of real-world T-SQL workloads.
- Improved DML parsing so INSERT targets use proper dot identifiers instead of expression-like forms, preventing misinterpretation as function calls and preserving case sensitivity where required.
- Re-enabled and migrated T-SQL functional tests to a YAML-based format, expanding automated coverage and keeping still-failing cases isolated for follow-up.

Converters – BladeBridge

MSSQL / SSIS / T-SQL
- Resolved issues with column names containing single quotes and standardized DATEADD and DATEDIFF function patterns to improve compatibility across target SQL dialects.
DataStage
- Implemented mapping for the JulianDayFromDate function with corresponding tests, extending DataStage function coverage in the converter.
- Enhanced DataStage Spark and workflow handling by adding Databricks cluster sections, improving widget default handling, and mapping TransformStringToDate and spark.sqltemplate attributes for smoother Spark migrations.

Reconcile

Improved reconciliation hash query generation to guarantee consistent column ordering across SQL dialects, preventing false hash mismatches when column names are substrings of each other.
Reverted the Oracle reconcile implementation to use MD5 via DBMS_CRYPTO.HASH with RAWTOHEX, restoring compatibility with Oracle 11 while keeping the updated QueryBuilder engine handling..

Documentation

Added practical details about how to extend BladeBridge configurations

Dependency updates:

Bump actions/checkout from 5 to 6 (#2158).

Contributors: @asnare, @sundarshankar89, @dependabot[bot], @m-abulazm, @BesikiML

Contributors

asnare, dependabot, and 3 other contributors

Assets 4

11 Dec 22:53

gueniai

v0.11.2

4190672

v0.11.2

Analyzer

Normalized complexity categories in the analyzer from “COMPLEX/VERY_COMPLEX” to “HIGH/VERY_HIGH” for clearer reports.

Converters

Morpheus

Snowflake

Implemented full support for DECLARE, LET, and assignment statements to better handle procedural Snowflake scripts.
Added support for DROP PROCEDURE statements, improving Snowflake DDL coverage.

TSQL/Synapse

Cleaned up grammar by removing duplicate and unsupported rules for TSQL special functions, reducing ambiguity and improving parser stability.
Implemented full support for DECLARE, LET, and assignment statements in TSQL, enabling richer stored procedure conversion.
Added support for TSQL DROP PROCEDURE statements to improve parity with source DDL.
Updated handling of options such as ANSI_NULLS and QUOTED_IDENTIFIER to emit informative comments instead of errors when they do not apply to Databricks SQL.
Enhanced handling of SET NOCOUNT by emitting comments explaining its behavior in Databricks SQL and warning when NOCOUNT OFF is used.
Allowed PRECISION to be used as an identifier (for example, c.precision), fixing parsing issues with such column names.
Improved handling of EXEC statements by detecting well‑known stored procedures like sp_executesql and issuing more specific diagnostics.
Added translation of OBJECT_ID() checks into EXISTS queries against catalog metadata to preserve control flow in procedural TSQL.
Added warnings for unsupported PRINT statements by generating explanatory comments rather than hard errors.
Added parsing support for the Synapse RENAME OBJECT syntax, currently surfaced as an unsupported but recognized construct.

Generic Morpheus engine

Enabled attaching comments and error markers to empty code blocks so that diagnostics are preserved in rendered SQL.
Prevented semicolons from being printed after empty statements to keep output formatting consistent.
Bundled multiple column-level primary keys into composite table constraints to produce more correct DDL.
Allowed the identifier PRECISION in general parsing contexts, improving compatibility with more schemas.

BladeBridge

MSSQL / TSQL

Improved handling of MERGE statements, including insertion of semicolons before MERGE in statement breaking and correct ordering of MATCHED and NOT MATCHED clauses.
Fixed issues when converting updates on temporary tables into MERGE statements and added tests to guard the behavior.
Improved statement categorization by stripping comments before categorization and simplifying legacy comment-key handling.
Added a new handler for nested static strings and inline comments, improving function substitution and parser robustness.

Generic BladeBridge engine

Enhanced logging configuration to produce clearer diagnostics while keeping noise manageable.

Reconcile

Added support for specifying a catalog for Databricks sources in Reconcile and prompting for the source catalog when necessary.
Removed redundant Reconcile configuration parameters to simplify setup.

General

Improved handling of output from LSP servers by safely chunking very long stderr lines and logging critical processing errors, preventing hangs and unbounded memory use.
Adjusted JDBC handling to accept usernames and passwords via Spark options instead of embedding credentials in the JDBC URL, improving support for special characters in passwords.
Consolidated the automated test suite to keep only unit and integration scopes, simplifying test configuration.

Dependency Updates

Dependencies: update documentation (yarn) packages by @asnare in #2178

Full Changelog: v0.11.1...v0.11.2

Contributors: @m-abulazm, @asnare, @sundarshankar89

Contributors

asnare, sundarshankar89, and m-abulazm

Assets 4

26 Nov 23:26

gueniai

v0.11.1

338e93c

v0.11.1

Analyzer

No updates in this release.

Converters

General

Improved end-to-end migration behavior through tighter integration with the centralized Morpheus function mapping layer and expanded cross-dialect coverage

Morpheus

Snowflake

Centralized SQL function mappings and expanded cross-dialect coverage, improving Snowflake-to-Databricks SQL conversions and reducing noisy, non-actionable warnings.
Added full translation support for Snowflake exception blocks, enabling richer error-handling logic to be preserved when converting to Databricks SQL.

TSQL / SQL Server

Reworked SQL function handling so most mappings are centralized, making TSQL-to-Databricks SQL conversions more accurate and easier to extend for future Lakebridge-based migrations.
Implemented full support for TSQL TRY/CATCH constructs, including THROW/RAISERROR-style logic and helper-based error handling, improving the fidelity of translated control-flow and error semantics.

BladeBridge

TSQL / SQL Server

Fixed handling of T-SQL column alias syntax in SELECT statements so aliases are no longer mistaken for variable assignments, and removed a deprecated alias-normalization method to improve translation accuracy.
Resolved failures caused by nested comments, improved post-conversion handling for shell and Python wrapper scripts, and ensured labeled UPDATE/DELETE statements that translate to MERGE remain correctly embedded in SQL.
Corrected processing of SELECT statements without a FROM clause when assigning to variables, so expressions like variable increments and severity mappings are handled reliably during migration.
Improved “delete by source” MERGE translations so separators and DELETE placement are preserved, and fixed static string handling so T-SQL patterns that use square brackets are not misinterpreted as identifier quoting or ranges.

Reconcile

No updates in this release

Documentation

Clarified that Python 3.14 is not yet supported and updated macOS instructions to recommend Python 3.13 as the latest supported version
Expanded installation prerequisites with detailed Databricks workspace requirements, authentication options, network and repository access expectations, and a comprehensive pre-installation checklist aimed at enterprise and security-restricted environments

General

Increased the maximum stderr line size accepted from LSP servers during transpilation to prevent crashes or hangs when converters emit very large log lines
Reduced noise from LSP integrations by lowering stderr mirroring from INFO to DEBUG level, ensuring detailed logs remain available for troubleshooting without cluttering normal operation logs

Contributors: @asnare, @andresgarciaf

Contributors

asnare and andresgarciaf

Assets 4

07 Nov 22:05

gueniai

v0.11.0

04f1df6

v0.11.0

🎉 New Features

This release introduces two exciting new capabilities to Lakebridge:

Synapse Profiler

A powerful new Synapse Profiler feature is now available to help you analyze and profile your Synapse data. Refer to the documentation for usage details and examples.

Switch LLM Converter

Introducing the new Switch LLM converter, expanding Lakebridge's conversion capabilities. Refer to the documentation for usage details and examples.

Other updates

Converters

General

Conversion Output Fix
Fixed a bug where files nested 2 or more directories deep within the input directory could fail to be written out after conversion when the directory structure wasn't already in place.

Morpheus

Code Formatting Improvements
Refactored code formatting logic by introducing a tree-like structure in CodeBlock and a new CodeBlockRenderer to handle whitespace, comments, and error positioning, making the formatting system more maintainable and accurate.

TSQL

Added support for translating TSQL join hints (like REPLICATE and MERGE) to their Databricks SQL equivalents by transforming them into special /*+ ... */ comments after the SELECT keyword, while unsupported hints are flagged as annotated errors.

BladeBridge

SQL Server

Fixed SELECT INTO real table syntax, corrected LIKE pattern handling, and mapped unsupported FUNC_ROW_NUMBER function while removing ANON_NOLOCK.
Resolved an issue where CASE WHEN expressions as the last statement in a file generated incorrect semicolon placement in SQL scripts.
Added fragment breaker before GO keyword and removed unsupported COMMIT TRANSACTION and CREATE INDEX constraints.
Fixed T-SQL UPDATE statements that were not correctly converted to MERGE operations in specific cases.
Corrected fragment handling around SELECT and UNION statements, and fixed issues with IF condition blocks and error handling blocks being mixed up.
Removed SET IDENTITY_INSERT and BEGIN/COMMIT TRANSACTION statements, and changed INT GENERATED ALWAYS AS IDENTITY to BIGINT GENERATED ALWAYS AS IDENTITY.
Added validation check for converted MERGE statements, implemented global variable reset in init_hook subroutine, and performed code refactoring.
Fixed T-SQL DELETE statements that were not correctly converted to MERGE operations and added corresponding test cases.

Reconcile

Oracle

Improved Oracle support with the following enhancements:

Fixed Oracle JDBC URL by moving credentials out of URL into options and correcting thin syntax
Updated hashing/expression pipeline to replace RAWTOHEX(...), 2 with UTL_I18N.STRING_TO_RAW(...,'AL32UTF8'), 4 (SHA-256)
Fixed schema comparison for Oracle
Tweaked datatype parsing in default transformations for Oracle compatibility
Added Oracle jars in setup script
Extended integration scaffolding and added end-to-end tests

Snowflake

Fixed schema comparison for Snowflake
Adjusted log levels by demoting noisy warnings to debug/info
Added Snowflake jars in setup script
Extended integration scaffolding

Documentation

Added documentation for deploying reconciliation dashboards and updated documentation notebooks.

Dependency updates:

Bump actions/setup-node from 5 to 6 by @dependabot[bot] in #2094

New Contributors

@hiroyukinakazato-db made their first contribution in #2066

Full Changelog: v0.10.13...v0.11.0

Contributors: @goodwillpunning, @hiroyukinakazato-db, @sundarshankar89, @asnare, @m-abulazm, @dependabot[bot], @bishwajit-db

Contributors

asnare, dependabot, and 5 other contributors

Assets 4

28 Oct 04:02

gueniai

v0.10.13

51b2a05

v0.10.13

Analyzer

Added defensive code to prevent analyzer crashes on DataStage files with empty array references - Fixes an issue where the DataStage analyzer would crash when encountering empty array references

Converters

Morpheus

General

Enhanced name representation consistency - Major refactoring that replaces String representations with Expression types for table names, column names, and constraints across IR nodes, improving SQL/PySpark code generation accuracy
Fixed DBT parsing issues - Resolved template parsing problems by changing template markers to !#Jinja0001#! format and improving whitespace handling for proper tokenization

TSQL (Synapse/SQL Server)

Support for dual OUTPUT clauses in TSQL INSERT/DELETE/UPDATE statements - Enhanced T-SQL parser to handle complex statements with multiple OUTPUT clauses (OUTPUT ... INTO ... OUTPUT ...) with comprehensive test coverage
Fixed TSQL DECLARE statement handling - Refactored DECLARE statement processing by moving logic to dedicated visitor methods and properly marking unsupported statements for future implementation
Improved BLOCK structure parsing for BEGIN and BEGIN TRY statements - Updated parser grammar to support flexible scripting blocks and transaction handling, allowing zero or more statements in control flow constructs
Added comprehensive USE statement support - Introduced new IR representations (UseCatalog, UseSchema) with dialect-specific AST building logic and proper SQL generation

Snowflake

Fixed Snowflake connection tests - Internal improvements for database connection test reliability
Added comprehensive USE statement support - Introduced new IR representations (UseCatalog, UseSchema) with dialect-specific AST building logic and proper SQL generation

BladeBridge

General

Automatically creates and cleans up temporary folders for embedded SQL conversion in wrapper scripts - Improves workflow management by implicitly creating temp folders and cleaning them up once conversion is complete

MSSQL (SQL Server)

Enhanced table variable and temporary table conversion - Added support for table variable conversion to temporary tables and improved string handling with logic to convert double single quotes to double quotes
Fixed semicolon placement in nested select statements - Resolved issue where semicolons appeared before comments in nested select statements
Improved MS SQL procedure handling - Added LIMIT 1 for Set in select statements, enhanced function mappings, fixed string concatenation, and removed unsupported constraints

Reconcile

No updates in this release.

Documentation

No updates in this release.

Contributors: @gueniai, @sundarshankar89

Contributors

gueniai and sundarshankar89

Assets 4

Uh oh!

Releases: databrickslabs/lakebridge

Release v0.14.0

Release Notes — Lakebridge v0.14.0

Highlights

Profilers

New Sources

Enhancements

Converters

Morpheus

T-SQL Improvements

Snowflake Improvements

Contributors

Uh oh!

Release v0.13.0

Lakebridge v0.13.0 Release Notes

Highlights

Assessment

Profiler

Converters

Morpheus

Snowflake

Synapse / T-SQL

Redshift

General

Reconcile

Installer

Documentation

Uh oh!

v0.12.2

Assessment

Profiler

Analyzer

Converters

Morpheus

Snowflake

Synapse / TSQL

Other / General

BladeBridge

Oracle

Teradata

Redshift

General SQL

ETL to Databricks (Informatica)

Reconcile

Documentation

Dependency updates:

Contributors

Uh oh!

v0.12.1

Synapse Profiler

Analyzer

Converters

Morpheus

Snowflake

Synapse / TSQL / SQL Server

General (Morpheus engine)

BladeBridge

TSQL / SQL Server

SSIS

Informatica / IICS

Switch

General

Reconcile

Documentation

Contributors

Uh oh!

v0.12.0

Analyzer

Converters

General

Morpheus

All dialects

Snowflake

Synapse / TSQL

BladeBridge

SSIS

Amazon Redshift

Synapse / TSQL / MSSQL

Oracle