Skip to content

Releases: databrickslabs/lakebridge

Release v0.14.0

12 Jun 22:14
e969dff

Choose a tag to compare

Release Notes — Lakebridge v0.14.0

Highlights

  • Five new profiler sources. Snowflake, Redshift, BigQuery, Oracle, and Legacy SQL DW are now all supported profiler targets, significantly expanding the set of platforms Lakebridge can assess ahead of a migration.

  • Switch now supports SAS. A new built-in prompt converts SAS programs to PySpark, adding SAS to the growing list of languages Switch can migrate automatically.

  • Reconcile now supports Teradata. Teradata can now be used as a source platform for data quality reconciliation, enabling validation for customers migrating from Teradata to Databricks.

  • Automatic reconciliation configuration. A new auto-configure-recon-tables command discovers source and target tables and generates the initial reconcile configuration automatically, replacing a previously manual setup step.


Profilers

New Sources

  • Redshift (#2305, #2304, #2306, #2408, #2501)
    Amazon Redshift is now a fully supported profiler source, covering all three deployment variants: provisioned, provisioned multi-AZ, and serverless. This includes credential flows (database password, federated user, AWS Secrets Manager ARN, temporary credentials with db user or IAM, with optional SSL), dedicated extraction queries and validation schemas for each variant, full CLI wiring, and user documentation. A bug in the serverless managed-storage aggregation was also fixed: the previous query summed hourly sys_serverless_usage snapshots, causing reported storage to grow linearly with the lookback window rather than reflecting the actual allocated amount.

  • Snowflake (#2420, #2499)
    Adds Snowflake as a profiler source. The interactive configurator prompts for connection details and a Programmatic Access Token (PAT), then extracts warehouse usage, query history, storage, user activity, account info, and optional credits (pipe, autoclustering, materialized-view refresh) from SNOWFLAKE.ACCOUNT_USAGE into a timestamped DuckDB file. Post-extraction computations produce TCO summaries. A supplementary rate_sheet extract pulls the effective per-credit rate (90-day average) and account service tier from SNOWFLAKE.ORGANIZATION_USAGE.RATE_SHEET_DAILY, providing the inputs needed by the downstream TCO value model.

  • BigQuery (#2472)
    Adds BigQuery as a profiler source. Running configure-database-profiler followed by execute-database-profiler --source-tech bigquery executes 16 region-qualified INFORMATION_SCHEMA queries against the customer's configured BigQuery project(s) and writes 12 analysis tables into a local DuckDB file at ~/.databricks/labs/lakebridge_profilers/bigquery_assessment/profiler_extract.db. The pipeline structure mirrors existing source-techs.

  • Oracle (#2187)
    Adds Oracle as a profiler source. Covers the interactive configuration dialog, extraction SQL scripts, and local DuckDB population from Oracle Database, with unit tests for all three components.

  • Legacy SQL DW (#2441)
    Extends profiler coverage to legacy Azure SQL DW (pre-Synapse) deployments, adding the necessary extraction queries to assess this platform.

Enhancements

  • SQL Server profiler switched to SQL scripts and shared DatabaseManager (#2482)
    The MSSQL profiler's activity and info extraction steps have been converted from per-step Python virtualenvs to in-process sql/ddl steps using the shared DatabaseManager connector. Thirteen SQL query/DDL pairs replace the previous Python scripts, eliminating the duplicated get_sqlserver_reader connector and aligning SQL Server with the architecture used by other profiler sources.

  • User-configurable output folder (#2488)
    The profiler's DuckDB extract path is now exposed as a CLI flag (--output-folder) instead of being hardcoded per source in pipeline_config.yml. The default remains ~/.databricks/labs/lakebridge_profilers/<source>_assessment. Output filenames now include a timestamp (profiler_extract_<YYYYMMDD_HHMMSS>.db) to prevent overwrites on repeated runs, and the absolute path to the extract is logged on successful completion.

  • Custom credentials file path for execute-database-profiler (#2494)
    The execute-database-profiler command now accepts an optional --cred-file-path argument, letting users supply a non-default credentials file rather than the one written by configure-database-profiler. This makes it easier to manage multiple credential configurations or run the profiler in scripted, non-interactive environments.

  • SQL Server: TrustServerCertificate support (#2498)
    The configure-database-profiler command for SQL Server now surfaces a TrustServerCertificate connection property, addressing a common customer request for environments where the server certificate cannot be validated.

  • Fix: profiler steps no longer fail with ModuleNotFoundError (#2485)
    The Synapse and SQL Server profilers previously created a fresh virtualenv per pipeline step, installing only each step's declared dependencies. After database_manager.py began importing redshift_connector at module scope, every clean run of those profilers failed with ModuleNotFoundError. Profiler Python steps now run with the parent interpreter, inheriting all installed packages—faster, more robust, and immune to this class of transitive-import failures.


Converters

Morpheus

T-SQL Improvements

  • Better date and number formatting from T-SQL CONVERT
    T-SQL's CONVERT function accepts a style code to format dates and numbers as strings (e.g. CONVERT(VARCHAR, myDate, 103) for dd/MM/yyyy). Morpheus now correctly translates these style codes to the equivalent Databricks SQL expressions, unblocking a large class of previously untranslatable queries.

  • Fix integer-as-date behavior from T-SQL
    T-SQL allows using 0 (or any integer) where a date is expected, treating it as "N days after 1 Jan 1900". Morpheus now replicates this behavior, preventing runtime type errors when running translated queries on Databricks.

  • Fix variable declarations with VARCHAR(N) / CHAR(N) types
    T-SQL local variables declared as VARCHAR(N) or CHAR(N) were being passed through verbatim and failing at runtime in Databricks. They are now automatically translated to STRING, the correct equivalent type for variable declarations.

  • Fix T-SQL variable assignment queries with ORDER BY
    T-SQL queries that assign a value to a variable (e.g. SELECT @var = col FROM t ORDER BY col) were being incorrectly structured during translation. The ORDER BY and similar clauses are now placed correctly in the output.

  • Support HASH JOIN query hint in T-SQL
    T-SQL's OPTION(HASH JOIN) query hint, which instructs the database to use a specific join strategy, is now correctly parsed and handled during translation.

Snowflake Improvements

  • Translate REGEXP_SUBSTR_ALL to REGEXP_EXTRACT_ALL
    Snowflake's REGEXP_SUBSTR_ALL (returns all regex matches as an array) is now translated to its Databricks SQL equivalent REGEXP_EXTRACT_ALL.

  • Translate binary hash functions (MD5_BINARY, SHA1_BINARY, SHA2_BINARY)
    Snowflake's binary digest functions are now translated to their Databricks SQL equivalents by wrapping the hex output with UNHEX().

  • Translate hex hash function synonyms (MD5_HEX, SHA1_HEX, SHA2_HEX)
    Snowflake's *_HEX hash function aliases are now directly mapped to their identically-behaved Databricks SQL counterparts.

  • Translate UNICODE and TRY_TO_DOUBLE
    Snowflake's UNICODE() (returns the code point of the first character) is now mapped to ASCII() in Databricks SQL, and TRY_TO_DOUBLE() is mapped to TRY_CAST(_ AS DOUBLE). The UNICODE fix also applies to T-SQL.

  • Translate type-check functions (IS_DATE, IS_DOUBLE, IS_REAL, etc.)
    Snowflake functions that test whether a value inside a semi-structured (VARIANT) column holds a specific type are now translated where possible (e.g. IS_DATETRY_CAST(v AS DATE) IS NOT NULL). Those with no equivalent in Databricks (IS_TIME, IS_TIMESTAMP_TZ) are flagged with a clear migration note.

  • Flag unsupported functions (CHECK_XML, PARSE_XML, IS_ROLE_IN_SESSION, etc.) with migration notes
    Seven Snowflake-specific functions with no Databricks equivalent now produce a clear "FIXME" comment in the output instead of silently passing through and failing at runtime. IS_NULL_VALUE is also correctly translated to IS_VARIANT_NULL.

  • Flag 12 more Snowflake-only functions with migration notes
    Additional Snowflake admin, statistical, and VARIANT-inspection functions (including COMPRESS, NORMAL, ZIPF, INVOKER_ROLE, IS_BOOLEAN) now produce clear FIXME annotations instead of failing silently at runtime.

  • Flag Snowflake INFORMATION_SCHEMA metadata functions with migration notes
    Snowflake monitoring/metadata functions called via INFORMATION_SCHEMA (like PIPE_USAGE_HISTORY, `M...

Read more

Release v0.13.0

13 May 20:21
8178513

Choose a tag to compare

Lakebridge v0.13.0 Release Notes

Highlights

A few headline changes in this release worth calling out:

  1. SQL Server profiler is now available, extending assessment coverage to Microsoft SQL Server alongside the existing Azure Synapse support.
  2. Redshift reconciliation is now supported. Redshift can be used as a source for all reconcile report types.
  3. Major Morpheus conversion improvements for T-SQL and Redshift, including a full Redshift dialect rollout (DDL, DML, functions, operators) and a wide expansion of T-SQL function and resilience handling.

This release also includes a major overhaul of the documentation, aimed at simplifying the structure and making the docs easier to follow.

Assessment

Profiler

  • Added a new SQL Server profiler that extends the existing assessment capabilities to Microsoft SQL Server, closely mirroring the Azure Synapse profiler design. The implementation exposes a last_execution_time parameter on all server queries, laying the groundwork for future incremental/scheduled extractions. On-prem SQL Server is not in scope. (#2151)
  • Updated the Azure Synapse Workspace profiler summary dashboard and introduced a new dashboard template for SQL Server. The Synapse template removes deprecated dashboard widget parameters, parameterizes table values so they can be set dynamically by Lakebridge, adds a dedicated-storage summary widget by SQL pool, renames datasets for clarity, fixes broken column references in SQL pool activity widgets, and reformats dataset queries. (#2317)
  • Reworked the create-profiler-dashboard CLI flow to bring it in line with the rest of the Lakebridge installer experience: clearer prompts for extract file location, UC catalog, schema, and volume; a helper to parse the extract path and UC volume upload location; and dashboard install/uninstall hooked into the standard Lakebridge installer/uninstaller. (#2319)
  • Fixed a bug that produced false positives in test-profiler-connection. (#2342)

Converters

Morpheus

Snowflake

  • Tightened parsing of CREATE FILE FORMAT statements, including format type options, with clear diagnostics for unsupported variants.

Synapse / T-SQL

  • Improved resilience by silently dropping unsupported constructs (table hints such as WITH (NOLOCK) and READPAST, UPDATE STATISTICS/CREATE STATISTICS, and other unsupported CREATE TABLE options) with warnings so surrounding scripts keep transpiling instead of failing.
  • Expanded function coverage with a batch of T-SQL "easy wins" (DATEPART unit aliases, EOMONTH, ISNUMERIC, TIME → STRING conversions) and richer date/time handling (DATETRUNC, DATE_BUCKET, SYSDATETIME, DATENAME, normalized DATE_PART units).
  • Added Databricks-compatible translations for PATINDEX (to REGEXP_INSTR, converting SQL wildcards to regex), IIF (mapped to IF at parse time), and FORMATMESSAGE (graceful fallback with diagnostics for unsupported format specifiers).
  • Migrated the T-SQL functional test suite to the new eval-based scenario runner, expanding executable coverage and removing legacy skips.

Redshift

  • Added Redshift as a first-class source dialect: a full dialect mapping in the converter (parser, IR builder, generator), Language Server advertisement alongside Snowflake and T-SQL, and a published per-feature workplan.
  • Implemented Redshift DDL and DML coverage, including CREATE TABLE (distribution, sort key, constraint clauses), DELETE … REMOVE DUPLICATES, complex literals (arrays, super values, composite forms), and the supportable portion of the SUPER type with explicit rejection diagnostics for the rest.
  • Added wide function coverage across string, VARBYTE, window, admin, OBJECT, JSON, HLL, and math families, plus date/time (DATEADD, DATE_CMP, TIMEZONE, timezone comparisons, TIMEOFDAY, LAST_DAY, MONTHS_BETWEEN, SYSDATE, single-argument TRUNC), numeric coercion (TEXT_TO_INT_ALT, TEXT_TO_NUMERIC_ALT), hashing (CHECKSUM, FARMHASH64), and a staged TO_TIMESTAMP implementation.
  • Implemented Redshift-specific operators including the + overload (so string and date arithmetic disambiguate correctly) and the |/ (square root) and |// (cube root) prefix operators.
  • Explicitly rejected EXPLAIN_MODEL since Databricks SQL has no equivalent ML-model explainer, surfacing actionable diagnostics rather than silent miscompilation.

General

  • Broadened cross-dialect DDL with CREATE FUNCTION (as much as is feasible per dialect, with diagnostics for unsupported procedural features) and CREATE SCHEMA for Snowflake, T-SQL, and Redshift, both lowered to Databricks SQL.
  • Expanded shared function and expression support: CURRENT_USER across all dialects, additional TIMESTAMP-related functions, raw strings as proper IR expressions (so they participate in type inference and round-trip cleanly), and EXECUTE IMMEDIATE usable as an expression (not just a statement).
  • Added full coverage of H3 and ST spatial functions across every supported dialect, with normalized names and argument shapes.
  • Improved grammar flexibility by allowing reserved keywords DATABASE and PRIMARY to be used as identifiers in unambiguous contexts, unblocking real-world schemas.
  • Expanded DELETE to cover all Redshift, T-SQL, and Snowflake use cases (including dialect-specific extensions) and added UPDATE-to-MERGE lowering for Redshift and T-SQL when an update uses a join or source table.

Reconcile

  • Added a Redshift connector to reconcile so Redshift can be used as a source for data, row, schema, and full report types. (#2339)
  • Replaced direct JDBC connections (Oracle, Snowflake, SQL Server) with Databricks Unity Catalog remote_query() calls backed by UC Connections. Reconcile no longer manages JDBC URLs, secret scopes, or PEM keys directly — authentication and connectivity are handled by Databricks. This introduces a v2 configuration format that takes a uc_connection_name in place of secret_scope; existing v1 configs are auto-migrated on load. (#2362)
  • Fixed reconcile schema fetch failures on Foreign Catalogs created via Lakehouse Federation. Foreign catalogs lack the Databricks-specific full_data_type column in information_schema.columns, which previously caused UNRESOLVED_COLUMN errors for all report types (schema, data, row, all). A new DatabricksNonUnityCatalogDataSource now falls back to DESCRIBE TABLE and covers hive_metastore, global_temp views, and Foreign Catalogs, while the native DatabricksDataSource remains scoped to Unity Catalog tables. (#2422)
  • Fixed a T-SQL/Synapse reconciliation regression where switching to VARCHAR(MAX) in hash concatenation broke date/time columns: SQL Server accepts VARCHAR(256) + DATE via implicit conversion but rejects VARCHAR(MAX) + DATE. Temporal transforms now CONVERT DATE/TIME/DATETIME to VARCHAR(10)/VARCHAR(12)/VARCHAR(23) so all temporal columns produce VARCHAR output that concatenates safely in the hash input string. (#2320)
  • Improved Oracle reconcile coverage, fixed parsing of remote query options, and dropped the legacy Oracle test scripts and Docker harness that required heavy manual setup. (#2433)

Installer

  • Added minimal support for using a Maven mirror when installing Morpheus via install-transpile. Setting LAKEBRIDGE_MAVEN_URL overrides the default repository URL, and credentials can be supplied through ~/.netrc (or via NETRC) so install-transpile works in environments without direct Maven Central access. (#2405)
  • Updated the wheel installer used during install-transpile to look up version information via pip instead of issuing a direct HTTP call to PyPI. This allows install-transpile to work in environments where only a local PyPI mirror is available. (#2404)

Documentation

  • Major revamp of the Lakebridge documentation focused on clarity, structure, and first-time user experience: added a new end-to-end Getting Started tutorial (SQL Server → Databricks SQL walkthrough), a Choosing Tools decision guide, a dedicated Morpheus transpiler page, and a split-out Switch architecture page. Reconcile docs were consolidated from 5 files (~1,400 lines) to 3 (~750 lines) with a new report-type comparison table and unified Configuration Reference and Running Reconcile pages. SSIS docs were moved into a dedicated subfolder with a collapsible sidebar category. The Installation page was rewritten for brevity, the FAQ expanded from 3 to 25+ questions, and the sidebar reordered to match the actual user journey: Installation → Getting Started → Choosing Tools → Assessment → Transpile → Reconcile → SQL Splitter → FAQ. (#2365)
  • Fixed the reconcile notebook documentation to match the v2 TableRecon API: the example now shows TableRecon as tables: list[Table] only (with source_schema, target_catalog, target_schema, and source_catalog configured via DatabaseConfig inside ReconcileConfig), corrected the location of drop_columns (it belongs on Table, not TableRecon), and added a migration note pointing users to DatabaseConfig. ([#2329](#2...
Read more

v0.12.2

26 Feb 18:18
1d855d0

Choose a tag to compare

Assessment

Profiler

  • Enhanced Synapse profiler extraction and monitoring by correctly handling batched pipeline/trigger runs, adding serverless‑pool routine listing via sys.objects, reconnecting to master for server‑level DMVs, stripping whitespace from credential fields, replacing deprecated DataFrame.union() with pd.concat(), and clarifying Azure auth, DMV permissions, and serverless catalog view behavior.

Analyzer

  • Added support for a new --generate-json switch to produce a JSON report alongside the existing Excel report, enabling programmatic consumption of analyzer results without changing default behavior.

Converters

Morpheus

Snowflake

  • Added transpilation support for seven Snowflake geospatial functions (ST_MAKEPOINT, ST_POINT, ST_X, ST_Y, ST_CENTROID, TRY_TO_GEOGRAPHY, HAVERSINE) to Databricks SQL, including SRID handling, argument normalization, and custom SQL for Haversine distance.

  • Introduced support for Snowflake SPLIT_TO_TABLE by mapping it to Databricks POSEXPLODE(SPLIT(...)), including column renaming, regex‑safe delimiter handling, and both TABLE() and LATERAL invocation forms.

  • Implemented Snowflake JSON helpers CHECK_JSON, GET_PATH, and IFF for Databricks SQL, using TRY_PARSE_JSON‑based validation, GET_JSON_OBJECT path translation, and direct IF‑style semantics.

Synapse / TSQL

  • Added IR support for executing stored procedures and immediate SQL strings via T‑SQL EXEC/EXECUTE, including positional and named parameters, output parameters, AS USER, and AT <data_source> constructs, plus parity tests for Snowflake EXECUTE IMMEDIATE.

Other / General

  • Enhanced the Morpheus DataType and Expression system with static typing and promotion at IR generation time, including numeric and string helpers, a NumericValue pseudo‑type, SQL‑style highestType promotion rules, and an expanded test suite.

BladeBridge

Oracle

  • Updated NUMBER without precision to map to DECIMAL(38,18) for Oracle to correctly handle floating‑point semantics in converted code.

Teradata

  • Updated NUMBER without precision to map to DECIMAL(38,18) for Teradata to preserve floating‑point behavior in conversions.

  • Added Teradata stored procedure test cases with various DML statements, transactions, and explicit handling of output parameters returned without CALL, aligning with Teradata’s procedure semantics.

Redshift

  • Fixed Redshift NUMBER mapping to DECIMAL(38,0) to ensure correct numeric precision in converted objects.

General SQL

  • Added support for CONNECT BY into the platform source gap specifications and introduced CREATE PROJECTION patterns into general_sql_specs.json to broaden SQL feature coverage across supported sources.

  • Added DDL test cases and patterns to improve datatype and table partition conversion and to strip unwanted default values from DDL in the final master step when they are not visible in configuration.

ETL to Databricks (Informatica)

  • Fixed workflow parameter default value handling and introduced a table‑based workflow parameter storage system, providing type‑aware defaults and a workflow_utils.workflow_params Delta table to centralize parameter metadata for Informatica‑to‑Databricks conversions.

  • Added a configurable data_type_mapping for Informatica‑to‑Python conversions to improve type inference and consistency across generated notebooks.

Reconcile

  • Updated T‑SQL/Synapse reconciliation hash generation to avoid VARCHAR(256) truncation by using VARCHAR(MAX) in COALESCE and HASHBYTES, reducing false mismatches and better aligning behavior with Databricks.

  • Introduced source and target record count metrics (source_record_count, target_record_count) into reconciliation metrics and dashboards, including an upgrade script and tests to support enhanced reconciliation observability.

Documentation

  • Added LLM‑friendly documentation via the @signalwire/docusaurus-plugin-llms-txt plugin, exposing a structured llms.txt index and per‑page markdown URLs so AI tools can more easily discover and consume Lakebridge documentation.

  • Documented automatic serverless cluster detection behavior for Reconcile, including configuration requirements for Unity Catalog volumes and environment variables.

  • Published an IBM DataStage‑to‑Databricks conversion guide that explains supported DataStage versions and objects, generated Databricks artifacts, helper libraries, and troubleshooting practices.

  • Extended the BladeBridge ETL configuration guide with native database connection examples, including tokenized JDBC/ODBC templates for systems such as Oracle and MSSQL.

  • Updated Switch documentation to describe the new input_file_relative_path column in the Conversion Result Table schema and how it preserves input directory structure in outputs.

Dependency updates:

  • Updated pyodbc requirement from ~=5.2.0 to >=5.2,<5.4 (#2104).
  • Bump webpack from 5.99.6 to 5.105.0 in /docs/lakebridge (#2269).
  • Bump sigstore/gh-action-sigstore-python from 3.0.1 to 3.2.0 (#2177).
  • Updated duckdb requirement from ~=1.2.2 to >=1.2.2,<1.5.0 (#2079).

Contributors: @sundarshankar89, @dependabot[bot], @BesikiML, @simone-dbx-labs, @eri-adepoju, @m-abulazm, @gueniai, @hiroyukinakazato-db

v0.12.1

12 Feb 22:11
58b45ec

Choose a tag to compare

Synapse Profiler

  • Fixed several critical errors in the Synapse profiler extraction pipeline, including a type mismatch when initializing the credential manager and handling of empty or partial result sets from Spark data pools.
  • Added support for an env secret type in the Synapse profiler, allowing profiler configurations to resolve secrets from environment variables.
  • Enhanced the credential manager backing the profiler so it can resolve nested credential structures (for example, workspace config, JDBC settings, and profiler options) instead of only flat key–value maps.
  • Introduced recursive, type-aware resolution of dictionaries, lists, and strings in profiler-related credentials while preserving primitive values and maintaining backward compatibility with existing configurations.
  • Enhanced the credential manager to support nested credential structures

Analyzer

  • Expanded SQL parsing to cover additional TSQL constructs (including CREATE STATISTICS, THROW, and DROP TEMPORARY TABLE IF EXISTS), improving handling of error management, statistics, and temporary tables.
  • Fixed crashes caused by special characters in mapping names by treating them as escaped literals instead of regex symbols.
  • Improved reliability of MERGE into partitioned targets with enhanced handling and added test coverage.
  • Added Jupyter Notebook detection to the Lakebridge analyze command by mapping notebook assets to the JUPYTERNB type and updating local dev tooling ignores.
  • Simplified analyze filepath handling: --report-file now directly controls the Excel filename with consistent relative-path semantics, and --source-directory behavior and prompt text are aligned with the implementation.
  • Removed a confusing “timestamped directory” behavior when the target report file already existed so logs and output now match user expectations.

Converters

Morpheus

Snowflake

  • Always emit SQL SECURITY INVOKER on generated Snowflake procedures so the security context is explicit and tests align with this behavior.
  • Improved parsing and generation of INTERVAL literals to support both ANSI-style and Snowflake-style syntaxes.
  • Implemented correct transpilation of Snowflake INTERVAL literals, including composite values and Snowflake-specific units, into normalized Databricks SQL YEAR TO MONTH and day-time intervals.

Synapse / TSQL / SQL Server

  • Improved parsing and handling of logical expressions (AND/OR/NOT) from other expressions in the T-SQL grammar to reduce ambiguity and improve parsing performance.
  • Added support for T-SQL ALTER DATABASE statements into Databricks SQL ALTER SCHEMA, emitting comments for unmappable features.
  • Introduced partial support for T-SQL CONVERT, adding indication for unsupported datetime types.
  • Improved typing and parsing of the T-SQL + operator so string concatenation is consistently treated as string operations and flattened where possible.
  • Added T-SQL-specific overrides for equality and missing-value functions to avoid unresolved routines by mapping them to appropriate Databricks SQL equivalents.
  • Implemented full AST support for T-SQL RAISERROR
  • Supported T-SQL CROSS APPLY by transpiling it to CROSS JOIN LATERAL, with correct join clause and hint formatting.
  • Implemented T-SQL OUTER APPLY by generating LEFT JOIN LATERAL, with tests for ordering and LIMIT placement.
  • Fixed transpilation of WHERE … LIKE … so columns and patterns render correctly, including COLLATE expressions.
  • Added tests for SET within nested IF blocks to validate correct handling of T-SQL control flow without extra variable declarations.
  • Correctly translated T-SQL DATEDIFF for all date parts into Databricks SQL expressions that match T-SQL boundary-count semantics, with comprehensive tests.
  • Generated Delta Lake computed columns from T-SQL COMPUTED definitions, including PERSISTED columns, using GENERATED ALWAYS AS in target schemas.
  • Recognized T-SQL table variables as temporary tables and transpiled them to appropriate temporary table syntax in the target dialect.
  • Implemented support for T-SQL SELECT … INTO by converting to CREATE TABLE AS SELECT and handling INTO precedence rules, while rewriting Snowflake-style INTO to session variables.
  • Split T-SQL DECLARE statements with scalar subquery defaults into separate DECLARE and SET statements compatible with Databricks SQL.

General (Morpheus engine)

  • Ensured block-level DECLARE variable scoping is dialect-aware by introducing a postDeclare flag so Snowflake-style declarations appear inside blocks.
  • Cleaned up data type definitions and generators, adding TIME support, refactoring INTERVAL handling, and replacing ir.Byte with ir.TinyInt.
  • Improved grammar, IR, and generation of INTERVAL literals so Snowflake-like and ANSI-style syntaxes are both supported.
  • Fixed expression rendering to always emit parentheses for bracketed constructs, including empty window clauses.
  • Added transformations to hoist DECLARE variables to the start of blocks and wrap batches with blocks when variables are present, improving procedural SQL handling.
  • Generalized batch-wrapping logic so any scripting statements are encapsulated in BEGIN … END blocks, with tests updated for the new structure.
  • Added dialect-specific configuration and grammar for IF/WHILE block parsing so T-SQL and Snowflake scripting blocks terminate correctly per dialect.

BladeBridge

TSQL / SQL Server

  • Expanded SQL parsing to include TSQL features such as CREATE STATISTICS, THROW, and DROP TEMPORARY TABLE IF EXISTS, improving conversion robustness for TSQL workloads.
  • Added support for SELECT column aliasing and extended TSQL keyword recognition to improve conversion of SQL scripts to Databricks-compatible syntax.
  • Enhanced handling of MERGE statements into specific partitions and added tests to improve conversion reliability for partitioned MERGE patterns.

SSIS

  • Ensured deterministic SSIS conversion output by enforcing stable ordering for variables and target columns, improving null handling, and adding tests so repeated runs generate consistent PySpark.

Informatica / IICS

  • Implemented the SQL Transform component for Informatica-to-PySpark conversion to cover more data transformation logic.
  • Added native JDBC/ODBC database connection support for IICS-to-Databricks conversions, allowing direct database reads/writes and fixing connection flag logic for Target components.

Switch

General

  • Used empty strings instead of nulls for optional configuration parameters to simplify downstream handling.
  • Preserved directory hierarchy in conversion output so generated artifacts mirror the source layout.

Reconcile

  • Refactored reconciliation intermediate persistence to clean checkpoint volumes after runs, remove overwrite write mode usage, and prefer Delta volumes on Databricks instead of Parquet.
  • Hid implementation details behind a more generic interface and marked future work to persist to Delta instead of re-reading from source systems.
  • Improved reconciliation result handling and logging: reconciliation exceptions now raise a ReconciliationException, while mismatches and passes are logged with severity and report type.
  • Implemented capability-based caching detection to keep reconciliation compatible with Databricks serverless compute, caching only when supported and using Delta writes as materialization boundaries on serverless.

Documentation

  • Overhaul of documentation to more clearly show which source system is support in which module.
  • Updated analyze command documentation to match the current implementation, simplifying caveats and clarifying expected behavior for --report-file and --source-directory.

Dependency updates:

  • Bump lodash from 4.17.21 to 4.17.23

Contributors: @sundarshankar89, @m-abulazm, @asnare, @gueniai, @dependabot[bot], @BesikiML

v0.12.0

26 Jan 21:05
7ac560e

Choose a tag to compare

Analyzer

  • Extended the Analyzer to recognize more SQL-bearing file types, including Oracle package files (.pks, .pkb), Teradata utilities (.bteq, .fload, .mload, etc.), Hive scripts (.hql), and shell scripts with embedded SQL (.sh, .ksh, .bash, .csh), so more source assets are discovered without manual renaming.

Converters

General

  • Enabled SSIS conversion so SSIS packages can be translated into Databricks notebooks via the BladeBridge-based converter, providing a new migration path for ETL workloads built on SSIS.

  • Added Amazon Redshift conversion documentation describing supported features, known limitations, and a step-by-step workflow to convert Redshift SQL into Databricks SQL using the BladeBridge transpiler.

Morpheus

All dialects

  • Implemented explicit support for multi-statement transactions (BEGIN TRANSACTION, COMMIT TRANSACTION, ROLLBACK TRANSACTION, and ATOMIC blocks) in the parser and generator, enabling transaction-aware translation and testing.

  • Fixed generation of CASE expressions so CASE expressions now terminate with END while CASE statements remain terminated with END CASE, improving standards-compliant SQL output across dialects.

  • Expanded CREATE VIEW support to handle SCHEMABINDING and MATERIALIZED options, increasing coverage of advanced view definitions.

  • Standardized translation of DATE_xxx functions by mapping to DATE_ADD and adding synonyms such as DATE_FORMAT, DATE_PART, DATE_SUB, and DATE_TRUNC for consistent naming and mapping.

  • Ensured control-flow statements like LEAVE and ITERATE are always generated with labels by auto-labelling enclosing blocks or loops when needed, improving robustness of generated control-flow SQL.

Snowflake

  • Added support for Snowflake ICEBERG catalog DDL by extending the grammar to recognize CREATE ICEBERG TABLE and related syntax so ICEBERG table definitions parse and test correctly.

Synapse / TSQL

  • Updated translation of T‑SQL VAR and VARP to map to ANSI VARIANCE and VAR_POP, aligning variance aggregation semantics and tests.

  • Updated translation of T‑SQL STDEV and STDEVP to treat them as synonyms for STDDEV and STDDEV_POP, improving aggregate function compatibility.

  • Fixed translation of T‑SQL REPLICATE by mapping it as a synonym of REPEAT, clarifying conversion behavior and tightening test coverage.

BladeBridge

SSIS

  • Enabled SSIS support so SSIS packages can be translated into Databricks notebooks, allowing customers to migrate SSIS workloads using the BladeBridge converter within Lakebridge.

Amazon Redshift

  • Enabled Amazon Redshift SQL conversion to Databricks SQL, broadening coverage of cloud data warehouse sources and aligning with the new Redshift conversion documentation.ppl-ai-file-upload.s3.amazonaws+1

Synapse / TSQL / MSSQL

  • Fixed handling of non-standard DELETE statements with two FROM clauses by normalizing the first FROM to use the table alias before conversion, preventing malformed MERGE statements for Synapse and similar targets.

  • Updated SQL conversion to automatically remove unsupported NOLOCK hints, corrected a fragment-breaker bug that split scripts before INSERT, and fixed variable declarations using AS, improving reliability of T‑SQL parsing and conversion.

  • Improved stored procedure conversion from SQL Server to Databricks SQL by correctly handling output parameters and standardizing EXEC-to-CALL translation to stay within Databricks SQL scripting constraints.

  • Added patterns to correctly convert COUNT(DISTINCT COL1) OVER (PARTITION BY COL2) window expressions and to normalize table references from '{database_param}'.schema.table_name to {database_param}.schema.table_name, eliminating stray quoting in database qualifiers.

  • Fixed BIGINT datatype conversion when the type appears inside braces so it is recognized as a datatype and emitted as plain bigint rather than as a backticked identifier, avoiding invalid Databricks SQL.

Oracle

  • Improved Oracle package conversion by stripping unnecessary BEGIN/END blocks from functions, emitting logic as a single returned SELECT, adding a dedicated handler for UDF definitions, and tightening procedure conversion for variable declarations, loop THEN usage, and cursor placement inside loops.

Informatica

  • Enhanced Informatica-to-Spark SQL mappings by removing redundant empty-string wrapping, adding mappings for additional datetime and related functions, and fixing parameter replacement and .format() usage so generated Spark SQL is cleaner and more accurate.

  • Corrected Databricks notebook generation for Informatica mapplets by switching from relative to absolute imports in the Python template and simplifying mapplet argument collection, removing unused JOB_PARAMETERS and MAPPLET_INFO code.

Documentation

  • Added Redshift conversion documentation and guide, describing supported features, limitations, and a recommended workflow for converting Redshift SQL to Databricks SQL.

  • Added an SSIS conversion guide with a full list of supported components, step-by-step migration instructions, and a sample SSIS package to showcase an end-to-end workflow.

  • Updated Switch documentation for Spark Declarative Pipeline conversion, including the new result_sdp_error column in Delta schemas, target_type = sdp, and sdp_language options, and documented the 7-step conversion pipeline and validation behavior.

  • Removed WSL from the Windows installation prerequisites, simplifying setup instructions while retaining guidance for Python installation and version checks across platforms.

General

  • Updated project metadata to require Python versions between 3.10.1 and 3.13.x, avoiding Python 3.10.0, and revised installation docs to reflect the new supported version range.

Contributors: @asnare, @m-abulazm, @sundarshankar89, @BesikiML, @gueniai, @andresgarciaf, @hiroyukinakazato-db, @yyoli-db

v0.11.3

29 Dec 21:54
b6a901f

Choose a tag to compare

Analyzer

  • Optimized SAS Analyzer performance by consolidating regex operations, delivering roughly a 7x speed improvement for large-scale SAS analysis workloads.
  • Added support for new SSIS components Microsoft.Pivot, Microsoft.UnPivot, and ExtensibleFileTask, broadening coverage for SSIS package migrations analysis.

Converters – Morpheus

  • Core

    • Significantly improved ANTLR parsing performance by merging grammars, refactoring ambiguous rules, and updating the Scala integration and build pipeline for the new grammar workflow.
    • Allowed the STREAMS token to be used as an identifier so patterns like SELECT * FROM streams.foo.bar now parse correctly in Snowflake-oriented SQL.
    • Updated the error reporting to align to the following:
        • Info: no error, the input was fully translated
      • Hint: the input was fully translated but some irrelevant bits have been elided
      • Warning: the input was translated but with unsupported bits
      • Error: the input couldn't be translated
  • MSSQL / T-SQL / SQL Server

    • Added full support for SQL Server T-SQL CREATE INDEX and table-level index directives, parsing them into a new index IR and translating to CLUSTER BY AUTO in Databricks SQL so index statements are no longer rejected.

    • Extended grammar and parsing to handle T-SQL computed columns, QUOTENAME calls, GROUP options in query hints, DROP INDEX statements, and additional keywords like PARAMETERS, STREAMS, PROCEDURES, and VIEWS, improving coverage of real-world T-SQL workloads.

    • Improved DML parsing so INSERT targets use proper dot identifiers instead of expression-like forms, preventing misinterpretation as function calls and preserving case sensitivity where required.

    • Re-enabled and migrated T-SQL functional tests to a YAML-based format, expanding automated coverage and keeping still-failing cases isolated for follow-up.

Converters – BladeBridge

  • MSSQL / SSIS / T-SQL

    • Resolved issues with column names containing single quotes and standardized DATEADD and DATEDIFF function patterns to improve compatibility across target SQL dialects.
  • DataStage

    • Implemented mapping for the JulianDayFromDate function with corresponding tests, extending DataStage function coverage in the converter.

    • Enhanced DataStage Spark and workflow handling by adding Databricks cluster sections, improving widget default handling, and mapping TransformStringToDate and spark.sqltemplate attributes for smoother Spark migrations.

Reconcile

  • Improved reconciliation hash query generation to guarantee consistent column ordering across SQL dialects, preventing false hash mismatches when column names are substrings of each other.

  • Reverted the Oracle reconcile implementation to use MD5 via DBMS_CRYPTO.HASH with RAWTOHEX, restoring compatibility with Oracle 11 while keeping the updated QueryBuilder engine handling..

Documentation

  • Added practical details about how to extend BladeBridge configurations

Dependency updates:

  • Bump actions/checkout from 5 to 6 (#2158).

Contributors: @asnare, @sundarshankar89, @dependabot[bot], @m-abulazm, @BesikiML

v0.11.2

11 Dec 22:53
4190672

Choose a tag to compare

Analyzer

  • Normalized complexity categories in the analyzer from “COMPLEX/VERY_COMPLEX” to “HIGH/VERY_HIGH” for clearer reports.

Converters

Morpheus

Snowflake

  • Implemented full support for DECLARE, LET, and assignment statements to better handle procedural Snowflake scripts.
  • Added support for DROP PROCEDURE statements, improving Snowflake DDL coverage.

TSQL/Synapse

  • Cleaned up grammar by removing duplicate and unsupported rules for TSQL special functions, reducing ambiguity and improving parser stability.
  • Implemented full support for DECLARE, LET, and assignment statements in TSQL, enabling richer stored procedure conversion.
  • Added support for TSQL DROP PROCEDURE statements to improve parity with source DDL.
  • Updated handling of options such as ANSI_NULLS and QUOTED_IDENTIFIER to emit informative comments instead of errors when they do not apply to Databricks SQL.
  • Enhanced handling of SET NOCOUNT by emitting comments explaining its behavior in Databricks SQL and warning when NOCOUNT OFF is used.
  • Allowed PRECISION to be used as an identifier (for example, c.precision), fixing parsing issues with such column names.
  • Improved handling of EXEC statements by detecting well‑known stored procedures like sp_executesql and issuing more specific diagnostics.
  • Added translation of OBJECT_ID() checks into EXISTS queries against catalog metadata to preserve control flow in procedural TSQL.
  • Added warnings for unsupported PRINT statements by generating explanatory comments rather than hard errors.
  • Added parsing support for the Synapse RENAME OBJECT syntax, currently surfaced as an unsupported but recognized construct.

Generic Morpheus engine

  • Enabled attaching comments and error markers to empty code blocks so that diagnostics are preserved in rendered SQL.
  • Prevented semicolons from being printed after empty statements to keep output formatting consistent.
  • Bundled multiple column-level primary keys into composite table constraints to produce more correct DDL.
  • Allowed the identifier PRECISION in general parsing contexts, improving compatibility with more schemas.

BladeBridge

MSSQL / TSQL

  • Improved handling of MERGE statements, including insertion of semicolons before MERGE in statement breaking and correct ordering of MATCHED and NOT MATCHED clauses.
  • Fixed issues when converting updates on temporary tables into MERGE statements and added tests to guard the behavior.
  • Improved statement categorization by stripping comments before categorization and simplifying legacy comment-key handling.
  • Added a new handler for nested static strings and inline comments, improving function substitution and parser robustness.

Generic BladeBridge engine

  • Enhanced logging configuration to produce clearer diagnostics while keeping noise manageable.

Reconcile

  • Added support for specifying a catalog for Databricks sources in Reconcile and prompting for the source catalog when necessary.
  • Removed redundant Reconcile configuration parameters to simplify setup.

General

  • Improved handling of output from LSP servers by safely chunking very long stderr lines and logging critical processing errors, preventing hangs and unbounded memory use.
  • Adjusted JDBC handling to accept usernames and passwords via Spark options instead of embedding credentials in the JDBC URL, improving support for special characters in passwords.
  • Consolidated the automated test suite to keep only unit and integration scopes, simplifying test configuration.

Dependency Updates

  • Dependencies: update documentation (yarn) packages by @asnare in #2178

Full Changelog: v0.11.1...v0.11.2

Contributors: @m-abulazm, @asnare, @sundarshankar89

v0.11.1

26 Nov 23:26
338e93c

Choose a tag to compare

Analyzer

No updates in this release.

Converters

General

  • Improved end-to-end migration behavior through tighter integration with the centralized Morpheus function mapping layer and expanded cross-dialect coverage

Morpheus

Snowflake

  • Centralized SQL function mappings and expanded cross-dialect coverage, improving Snowflake-to-Databricks SQL conversions and reducing noisy, non-actionable warnings.
  • Added full translation support for Snowflake exception blocks, enabling richer error-handling logic to be preserved when converting to Databricks SQL.

TSQL / SQL Server

  • Reworked SQL function handling so most mappings are centralized, making TSQL-to-Databricks SQL conversions more accurate and easier to extend for future Lakebridge-based migrations.
  • Implemented full support for TSQL TRY/CATCH constructs, including THROW/RAISERROR-style logic and helper-based error handling, improving the fidelity of translated control-flow and error semantics.

BladeBridge

TSQL / SQL Server

  • Fixed handling of T-SQL column alias syntax in SELECT statements so aliases are no longer mistaken for variable assignments, and removed a deprecated alias-normalization method to improve translation accuracy.
  • Resolved failures caused by nested comments, improved post-conversion handling for shell and Python wrapper scripts, and ensured labeled UPDATE/DELETE statements that translate to MERGE remain correctly embedded in SQL.
  • Corrected processing of SELECT statements without a FROM clause when assigning to variables, so expressions like variable increments and severity mappings are handled reliably during migration.
  • Improved “delete by source” MERGE translations so separators and DELETE placement are preserved, and fixed static string handling so T-SQL patterns that use square brackets are not misinterpreted as identifier quoting or ranges.

Reconcile

No updates in this release

Documentation

  • Clarified that Python 3.14 is not yet supported and updated macOS instructions to recommend Python 3.13 as the latest supported version
  • Expanded installation prerequisites with detailed Databricks workspace requirements, authentication options, network and repository access expectations, and a comprehensive pre-installation checklist aimed at enterprise and security-restricted environments

General

  • Increased the maximum stderr line size accepted from LSP servers during transpilation to prevent crashes or hangs when converters emit very large log lines
  • Reduced noise from LSP integrations by lowering stderr mirroring from INFO to DEBUG level, ensuring detailed logs remain available for troubleshooting without cluttering normal operation logs

Contributors: @asnare, @andresgarciaf

v0.11.0

07 Nov 22:05
04f1df6

Choose a tag to compare

🎉 New Features

This release introduces two exciting new capabilities to Lakebridge:

Synapse Profiler

A powerful new Synapse Profiler feature is now available to help you analyze and profile your Synapse data. Refer to the documentation for usage details and examples.

Switch LLM Converter

Introducing the new Switch LLM converter, expanding Lakebridge's conversion capabilities. Refer to the documentation for usage details and examples.


Other updates

Converters

General

Conversion Output Fix
Fixed a bug where files nested 2 or more directories deep within the input directory could fail to be written out after conversion when the directory structure wasn't already in place.

Morpheus

Code Formatting Improvements
Refactored code formatting logic by introducing a tree-like structure in CodeBlock and a new CodeBlockRenderer to handle whitespace, comments, and error positioning, making the formatting system more maintainable and accurate.

TSQL

Added support for translating TSQL join hints (like REPLICATE and MERGE) to their Databricks SQL equivalents by transforming them into special /*+ ... */ comments after the SELECT keyword, while unsupported hints are flagged as annotated errors.

BladeBridge

SQL Server

  • Fixed SELECT INTO real table syntax, corrected LIKE pattern handling, and mapped unsupported FUNC_ROW_NUMBER function while removing ANON_NOLOCK.
  • Resolved an issue where CASE WHEN expressions as the last statement in a file generated incorrect semicolon placement in SQL scripts.
  • Added fragment breaker before GO keyword and removed unsupported COMMIT TRANSACTION and CREATE INDEX constraints.
  • Fixed T-SQL UPDATE statements that were not correctly converted to MERGE operations in specific cases.
  • Corrected fragment handling around SELECT and UNION statements, and fixed issues with IF condition blocks and error handling blocks being mixed up.
  • Removed SET IDENTITY_INSERT and BEGIN/COMMIT TRANSACTION statements, and changed INT GENERATED ALWAYS AS IDENTITY to BIGINT GENERATED ALWAYS AS IDENTITY.
  • Added validation check for converted MERGE statements, implemented global variable reset in init_hook subroutine, and performed code refactoring.
  • Fixed T-SQL DELETE statements that were not correctly converted to MERGE operations and added corresponding test cases.

Reconcile

Oracle

Improved Oracle support with the following enhancements:

  • Fixed Oracle JDBC URL by moving credentials out of URL into options and correcting thin syntax
  • Updated hashing/expression pipeline to replace RAWTOHEX(...), 2 with UTL_I18N.STRING_TO_RAW(...,'AL32UTF8'), 4 (SHA-256)
  • Fixed schema comparison for Oracle
  • Tweaked datatype parsing in default transformations for Oracle compatibility
  • Added Oracle jars in setup script
  • Extended integration scaffolding and added end-to-end tests

Snowflake

  • Fixed schema comparison for Snowflake
  • Adjusted log levels by demoting noisy warnings to debug/info
  • Added Snowflake jars in setup script
  • Extended integration scaffolding

Documentation

Added documentation for deploying reconciliation dashboards and updated documentation notebooks.

Dependency updates:

New Contributors

Full Changelog: v0.10.13...v0.11.0

Contributors: @goodwillpunning, @hiroyukinakazato-db, @sundarshankar89, @asnare, @m-abulazm, @dependabot[bot], @bishwajit-db

v0.10.13

28 Oct 04:02
51b2a05

Choose a tag to compare

Analyzer

  • Added defensive code to prevent analyzer crashes on DataStage files with empty array references - Fixes an issue where the DataStage analyzer would crash when encountering empty array references

Converters

Morpheus

General

  • Enhanced name representation consistency - Major refactoring that replaces String representations with Expression types for table names, column names, and constraints across IR nodes, improving SQL/PySpark code generation accuracy

  • Fixed DBT parsing issues - Resolved template parsing problems by changing template markers to !#Jinja0001#! format and improving whitespace handling for proper tokenization

TSQL (Synapse/SQL Server)

  • Support for dual OUTPUT clauses in TSQL INSERT/DELETE/UPDATE statements - Enhanced T-SQL parser to handle complex statements with multiple OUTPUT clauses (OUTPUT ... INTO ... OUTPUT ...) with comprehensive test coverage

  • Fixed TSQL DECLARE statement handling - Refactored DECLARE statement processing by moving logic to dedicated visitor methods and properly marking unsupported statements for future implementation

  • Improved BLOCK structure parsing for BEGIN and BEGIN TRY statements - Updated parser grammar to support flexible scripting blocks and transaction handling, allowing zero or more statements in control flow constructs

  • Added comprehensive USE statement support - Introduced new IR representations (UseCatalog, UseSchema) with dialect-specific AST building logic and proper SQL generation

Snowflake

  • Fixed Snowflake connection tests - Internal improvements for database connection test reliability

  • Added comprehensive USE statement support - Introduced new IR representations (UseCatalog, UseSchema) with dialect-specific AST building logic and proper SQL generation

BladeBridge

General

  • Automatically creates and cleans up temporary folders for embedded SQL conversion in wrapper scripts - Improves workflow management by implicitly creating temp folders and cleaning them up once conversion is complete

MSSQL (SQL Server)

  • Enhanced table variable and temporary table conversion - Added support for table variable conversion to temporary tables and improved string handling with logic to convert double single quotes to double quotes

  • Fixed semicolon placement in nested select statements - Resolved issue where semicolons appeared before comments in nested select statements

  • Improved MS SQL procedure handling - Added LIMIT 1 for Set in select statements, enhanced function mappings, fixed string concatenation, and removed unsupported constraints

Reconcile

No updates in this release.

Documentation

No updates in this release.

Contributors: @gueniai, @sundarshankar89