Releases: databrickslabs/lakebridge
Release v0.14.0
Release Notes — Lakebridge v0.14.0
Highlights
-
Five new profiler sources. Snowflake, Redshift, BigQuery, Oracle, and Legacy SQL DW are now all supported profiler targets, significantly expanding the set of platforms Lakebridge can assess ahead of a migration.
-
Switch now supports SAS. A new built-in prompt converts SAS programs to PySpark, adding SAS to the growing list of languages Switch can migrate automatically.
-
Reconcile now supports Teradata. Teradata can now be used as a source platform for data quality reconciliation, enabling validation for customers migrating from Teradata to Databricks.
-
Automatic reconciliation configuration. A new
auto-configure-recon-tablescommand discovers source and target tables and generates the initial reconcile configuration automatically, replacing a previously manual setup step.
Profilers
New Sources
-
Redshift (#2305, #2304, #2306, #2408, #2501)
Amazon Redshift is now a fully supported profiler source, covering all three deployment variants: provisioned, provisioned multi-AZ, and serverless. This includes credential flows (database password, federated user, AWS Secrets Manager ARN, temporary credentials with db user or IAM, with optional SSL), dedicated extraction queries and validation schemas for each variant, full CLI wiring, and user documentation. A bug in the serverless managed-storage aggregation was also fixed: the previous query summed hourlysys_serverless_usagesnapshots, causing reported storage to grow linearly with the lookback window rather than reflecting the actual allocated amount. -
Snowflake (#2420, #2499)
Adds Snowflake as a profiler source. The interactive configurator prompts for connection details and a Programmatic Access Token (PAT), then extracts warehouse usage, query history, storage, user activity, account info, and optional credits (pipe, autoclustering, materialized-view refresh) fromSNOWFLAKE.ACCOUNT_USAGEinto a timestamped DuckDB file. Post-extraction computations produce TCO summaries. A supplementaryrate_sheetextract pulls the effective per-credit rate (90-day average) and account service tier fromSNOWFLAKE.ORGANIZATION_USAGE.RATE_SHEET_DAILY, providing the inputs needed by the downstream TCO value model. -
BigQuery (#2472)
Adds BigQuery as a profiler source. Runningconfigure-database-profilerfollowed byexecute-database-profiler --source-tech bigqueryexecutes 16 region-qualifiedINFORMATION_SCHEMAqueries against the customer's configured BigQuery project(s) and writes 12 analysis tables into a local DuckDB file at~/.databricks/labs/lakebridge_profilers/bigquery_assessment/profiler_extract.db. The pipeline structure mirrors existing source-techs. -
Oracle (#2187)
Adds Oracle as a profiler source. Covers the interactive configuration dialog, extraction SQL scripts, and local DuckDB population from Oracle Database, with unit tests for all three components. -
Legacy SQL DW (#2441)
Extends profiler coverage to legacy Azure SQL DW (pre-Synapse) deployments, adding the necessary extraction queries to assess this platform.
Enhancements
-
SQL Server profiler switched to SQL scripts and shared DatabaseManager (#2482)
The MSSQL profiler's activity and info extraction steps have been converted from per-step Python virtualenvs to in-processsql/ddlsteps using the sharedDatabaseManagerconnector. Thirteen SQL query/DDL pairs replace the previous Python scripts, eliminating the duplicatedget_sqlserver_readerconnector and aligning SQL Server with the architecture used by other profiler sources. -
User-configurable output folder (#2488)
The profiler's DuckDB extract path is now exposed as a CLI flag (--output-folder) instead of being hardcoded per source inpipeline_config.yml. The default remains~/.databricks/labs/lakebridge_profilers/<source>_assessment. Output filenames now include a timestamp (profiler_extract_<YYYYMMDD_HHMMSS>.db) to prevent overwrites on repeated runs, and the absolute path to the extract is logged on successful completion. -
Custom credentials file path for
execute-database-profiler(#2494)
Theexecute-database-profilercommand now accepts an optional--cred-file-pathargument, letting users supply a non-default credentials file rather than the one written byconfigure-database-profiler. This makes it easier to manage multiple credential configurations or run the profiler in scripted, non-interactive environments. -
SQL Server:
TrustServerCertificatesupport (#2498)
Theconfigure-database-profilercommand for SQL Server now surfaces aTrustServerCertificateconnection property, addressing a common customer request for environments where the server certificate cannot be validated. -
Fix: profiler steps no longer fail with
ModuleNotFoundError(#2485)
The Synapse and SQL Server profilers previously created a fresh virtualenv per pipeline step, installing only each step's declared dependencies. Afterdatabase_manager.pybegan importingredshift_connectorat module scope, every clean run of those profilers failed withModuleNotFoundError. Profiler Python steps now run with the parent interpreter, inheriting all installed packages—faster, more robust, and immune to this class of transitive-import failures.
Converters
Morpheus
T-SQL Improvements
-
Better date and number formatting from T-SQL
CONVERT
T-SQL'sCONVERTfunction accepts a style code to format dates and numbers as strings (e.g.CONVERT(VARCHAR, myDate, 103)fordd/MM/yyyy). Morpheus now correctly translates these style codes to the equivalent Databricks SQL expressions, unblocking a large class of previously untranslatable queries. -
Fix integer-as-date behavior from T-SQL
T-SQL allows using0(or any integer) where a date is expected, treating it as "N days after 1 Jan 1900". Morpheus now replicates this behavior, preventing runtime type errors when running translated queries on Databricks. -
Fix variable declarations with
VARCHAR(N)/CHAR(N)types
T-SQL local variables declared asVARCHAR(N)orCHAR(N)were being passed through verbatim and failing at runtime in Databricks. They are now automatically translated toSTRING, the correct equivalent type for variable declarations. -
Fix T-SQL variable assignment queries with
ORDER BY
T-SQL queries that assign a value to a variable (e.g.SELECT @var = col FROM t ORDER BY col) were being incorrectly structured during translation. TheORDER BYand similar clauses are now placed correctly in the output. -
Support
HASH JOINquery hint in T-SQL
T-SQL'sOPTION(HASH JOIN)query hint, which instructs the database to use a specific join strategy, is now correctly parsed and handled during translation.
Snowflake Improvements
-
Translate
REGEXP_SUBSTR_ALLtoREGEXP_EXTRACT_ALL
Snowflake'sREGEXP_SUBSTR_ALL(returns all regex matches as an array) is now translated to its Databricks SQL equivalentREGEXP_EXTRACT_ALL. -
Translate binary hash functions (
MD5_BINARY,SHA1_BINARY,SHA2_BINARY)
Snowflake's binary digest functions are now translated to their Databricks SQL equivalents by wrapping the hex output withUNHEX(). -
Translate hex hash function synonyms (
MD5_HEX,SHA1_HEX,SHA2_HEX)
Snowflake's*_HEXhash function aliases are now directly mapped to their identically-behaved Databricks SQL counterparts. -
Translate
UNICODEandTRY_TO_DOUBLE
Snowflake'sUNICODE()(returns the code point of the first character) is now mapped toASCII()in Databricks SQL, andTRY_TO_DOUBLE()is mapped toTRY_CAST(_ AS DOUBLE). TheUNICODEfix also applies to T-SQL. -
Translate type-check functions (
IS_DATE,IS_DOUBLE,IS_REAL, etc.)
Snowflake functions that test whether a value inside a semi-structured (VARIANT) column holds a specific type are now translated where possible (e.g.IS_DATE→TRY_CAST(v AS DATE) IS NOT NULL). Those with no equivalent in Databricks (IS_TIME,IS_TIMESTAMP_TZ) are flagged with a clear migration note. -
Flag unsupported functions (
CHECK_XML,PARSE_XML,IS_ROLE_IN_SESSION, etc.) with migration notes
Seven Snowflake-specific functions with no Databricks equivalent now produce a clear "FIXME" comment in the output instead of silently passing through and failing at runtime.IS_NULL_VALUEis also correctly translated toIS_VARIANT_NULL. -
Flag 12 more Snowflake-only functions with migration notes
Additional Snowflake admin, statistical, and VARIANT-inspection functions (includingCOMPRESS,NORMAL,ZIPF,INVOKER_ROLE,IS_BOOLEAN) now produce clear FIXME annotations instead of failing silently at runtime. -
Flag Snowflake
INFORMATION_SCHEMAmetadata functions with migration notes
Snowflake monitoring/metadata functions called viaINFORMATION_SCHEMA(likePIPE_USAGE_HISTORY, `M...
Release v0.13.0
Lakebridge v0.13.0 Release Notes
Highlights
A few headline changes in this release worth calling out:
- SQL Server profiler is now available, extending assessment coverage to Microsoft SQL Server alongside the existing Azure Synapse support.
- Redshift reconciliation is now supported. Redshift can be used as a source for all reconcile report types.
- Major Morpheus conversion improvements for T-SQL and Redshift, including a full Redshift dialect rollout (DDL, DML, functions, operators) and a wide expansion of T-SQL function and resilience handling.
This release also includes a major overhaul of the documentation, aimed at simplifying the structure and making the docs easier to follow.
Assessment
Profiler
- Added a new SQL Server profiler that extends the existing assessment capabilities to Microsoft SQL Server, closely mirroring the Azure Synapse profiler design. The implementation exposes a
last_execution_timeparameter on all server queries, laying the groundwork for future incremental/scheduled extractions. On-prem SQL Server is not in scope. (#2151) - Updated the Azure Synapse Workspace profiler summary dashboard and introduced a new dashboard template for SQL Server. The Synapse template removes deprecated dashboard widget parameters, parameterizes table values so they can be set dynamically by Lakebridge, adds a dedicated-storage summary widget by SQL pool, renames datasets for clarity, fixes broken column references in SQL pool activity widgets, and reformats dataset queries. (#2317)
- Reworked the
create-profiler-dashboardCLI flow to bring it in line with the rest of the Lakebridge installer experience: clearer prompts for extract file location, UC catalog, schema, and volume; a helper to parse the extract path and UC volume upload location; and dashboard install/uninstall hooked into the standard Lakebridge installer/uninstaller. (#2319) - Fixed a bug that produced false positives in
test-profiler-connection. (#2342)
Converters
Morpheus
Snowflake
- Tightened parsing of
CREATE FILE FORMATstatements, including format type options, with clear diagnostics for unsupported variants.
Synapse / T-SQL
- Improved resilience by silently dropping unsupported constructs (table hints such as
WITH (NOLOCK)andREADPAST,UPDATE STATISTICS/CREATE STATISTICS, and other unsupportedCREATE TABLEoptions) with warnings so surrounding scripts keep transpiling instead of failing. - Expanded function coverage with a batch of T-SQL "easy wins" (
DATEPARTunit aliases,EOMONTH,ISNUMERIC,TIME → STRINGconversions) and richer date/time handling (DATETRUNC,DATE_BUCKET,SYSDATETIME,DATENAME, normalizedDATE_PARTunits). - Added Databricks-compatible translations for
PATINDEX(toREGEXP_INSTR, converting SQL wildcards to regex),IIF(mapped toIFat parse time), andFORMATMESSAGE(graceful fallback with diagnostics for unsupported format specifiers). - Migrated the T-SQL functional test suite to the new
eval-based scenario runner, expanding executable coverage and removing legacy skips.
Redshift
- Added Redshift as a first-class source dialect: a full dialect mapping in the converter (parser, IR builder, generator), Language Server advertisement alongside Snowflake and T-SQL, and a published per-feature workplan.
- Implemented Redshift DDL and DML coverage, including
CREATE TABLE(distribution, sort key, constraint clauses),DELETE … REMOVE DUPLICATES, complex literals (arrays, super values, composite forms), and the supportable portion of theSUPERtype with explicit rejection diagnostics for the rest. - Added wide function coverage across string,
VARBYTE, window, admin,OBJECT, JSON, HLL, and math families, plus date/time (DATEADD,DATE_CMP,TIMEZONE, timezone comparisons,TIMEOFDAY,LAST_DAY,MONTHS_BETWEEN,SYSDATE, single-argumentTRUNC), numeric coercion (TEXT_TO_INT_ALT,TEXT_TO_NUMERIC_ALT), hashing (CHECKSUM,FARMHASH64), and a stagedTO_TIMESTAMPimplementation. - Implemented Redshift-specific operators including the
+overload (so string and date arithmetic disambiguate correctly) and the|/(square root) and|//(cube root) prefix operators. - Explicitly rejected
EXPLAIN_MODELsince Databricks SQL has no equivalent ML-model explainer, surfacing actionable diagnostics rather than silent miscompilation.
General
- Broadened cross-dialect DDL with
CREATE FUNCTION(as much as is feasible per dialect, with diagnostics for unsupported procedural features) andCREATE SCHEMAfor Snowflake, T-SQL, and Redshift, both lowered to Databricks SQL. - Expanded shared function and expression support:
CURRENT_USERacross all dialects, additionalTIMESTAMP-related functions, raw strings as proper IR expressions (so they participate in type inference and round-trip cleanly), andEXECUTE IMMEDIATEusable as an expression (not just a statement). - Added full coverage of H3 and ST spatial functions across every supported dialect, with normalized names and argument shapes.
- Improved grammar flexibility by allowing reserved keywords
DATABASEandPRIMARYto be used as identifiers in unambiguous contexts, unblocking real-world schemas. - Expanded
DELETEto cover all Redshift, T-SQL, and Snowflake use cases (including dialect-specific extensions) and addedUPDATE-to-MERGElowering for Redshift and T-SQL when an update uses a join or source table.
Reconcile
- Added a Redshift connector to reconcile so Redshift can be used as a source for data, row, schema, and full report types. (#2339)
- Replaced direct JDBC connections (Oracle, Snowflake, SQL Server) with Databricks Unity Catalog
remote_query()calls backed by UC Connections. Reconcile no longer manages JDBC URLs, secret scopes, or PEM keys directly — authentication and connectivity are handled by Databricks. This introduces a v2 configuration format that takes auc_connection_namein place ofsecret_scope; existing v1 configs are auto-migrated on load. (#2362) - Fixed reconcile schema fetch failures on Foreign Catalogs created via Lakehouse Federation. Foreign catalogs lack the Databricks-specific
full_data_typecolumn ininformation_schema.columns, which previously causedUNRESOLVED_COLUMNerrors for all report types (schema,data,row,all). A newDatabricksNonUnityCatalogDataSourcenow falls back toDESCRIBE TABLEand covershive_metastore,global_tempviews, and Foreign Catalogs, while the nativeDatabricksDataSourceremains scoped to Unity Catalog tables. (#2422) - Fixed a T-SQL/Synapse reconciliation regression where switching to
VARCHAR(MAX)in hash concatenation broke date/time columns: SQL Server acceptsVARCHAR(256) + DATEvia implicit conversion but rejectsVARCHAR(MAX) + DATE. Temporal transforms nowCONVERTDATE/TIME/DATETIMEtoVARCHAR(10)/VARCHAR(12)/VARCHAR(23)so all temporal columns produceVARCHARoutput that concatenates safely in the hash input string. (#2320) - Improved Oracle reconcile coverage, fixed parsing of remote query options, and dropped the legacy Oracle test scripts and Docker harness that required heavy manual setup. (#2433)
Installer
- Added minimal support for using a Maven mirror when installing Morpheus via
install-transpile. SettingLAKEBRIDGE_MAVEN_URLoverrides the default repository URL, and credentials can be supplied through~/.netrc(or viaNETRC) soinstall-transpileworks in environments without direct Maven Central access. (#2405) - Updated the wheel installer used during
install-transpileto look up version information viapipinstead of issuing a direct HTTP call to PyPI. This allowsinstall-transpileto work in environments where only a local PyPI mirror is available. (#2404)
Documentation
- Major revamp of the Lakebridge documentation focused on clarity, structure, and first-time user experience: added a new end-to-end Getting Started tutorial (SQL Server → Databricks SQL walkthrough), a Choosing Tools decision guide, a dedicated Morpheus transpiler page, and a split-out Switch architecture page. Reconcile docs were consolidated from 5 files (~1,400 lines) to 3 (~750 lines) with a new report-type comparison table and unified Configuration Reference and Running Reconcile pages. SSIS docs were moved into a dedicated subfolder with a collapsible sidebar category. The Installation page was rewritten for brevity, the FAQ expanded from 3 to 25+ questions, and the sidebar reordered to match the actual user journey: Installation → Getting Started → Choosing Tools → Assessment → Transpile → Reconcile → SQL Splitter → FAQ. (#2365)
- Fixed the reconcile notebook documentation to match the v2
TableReconAPI: the example now showsTableReconastables: list[Table]only (withsource_schema,target_catalog,target_schema, andsource_catalogconfigured viaDatabaseConfiginsideReconcileConfig), corrected the location ofdrop_columns(it belongs onTable, notTableRecon), and added a migration note pointing users toDatabaseConfig. ([#2329](#2...
v0.12.2
Assessment
Profiler
- Enhanced Synapse profiler extraction and monitoring by correctly handling batched pipeline/trigger runs, adding serverless‑pool routine listing via
sys.objects, reconnecting tomasterfor server‑level DMVs, stripping whitespace from credential fields, replacing deprecatedDataFrame.union()withpd.concat(), and clarifying Azure auth, DMV permissions, and serverless catalog view behavior.
Analyzer
- Added support for a new
--generate-jsonswitch to produce a JSON report alongside the existing Excel report, enabling programmatic consumption of analyzer results without changing default behavior.
Converters
Morpheus
Snowflake
-
Added transpilation support for seven Snowflake geospatial functions (ST_MAKEPOINT, ST_POINT, ST_X, ST_Y, ST_CENTROID, TRY_TO_GEOGRAPHY, HAVERSINE) to Databricks SQL, including SRID handling, argument normalization, and custom SQL for Haversine distance.
-
Introduced support for Snowflake SPLIT_TO_TABLE by mapping it to Databricks
POSEXPLODE(SPLIT(...)), including column renaming, regex‑safe delimiter handling, and bothTABLE()andLATERALinvocation forms. -
Implemented Snowflake JSON helpers CHECK_JSON, GET_PATH, and IFF for Databricks SQL, using TRY_PARSE_JSON‑based validation, GET_JSON_OBJECT path translation, and direct IF‑style semantics.
Synapse / TSQL
- Added IR support for executing stored procedures and immediate SQL strings via T‑SQL
EXEC/EXECUTE, including positional and named parameters, output parameters,AS USER, andAT <data_source>constructs, plus parity tests for SnowflakeEXECUTE IMMEDIATE.
Other / General
- Enhanced the Morpheus DataType and Expression system with static typing and promotion at IR generation time, including numeric and string helpers, a NumericValue pseudo‑type, SQL‑style highestType promotion rules, and an expanded test suite.
BladeBridge
Oracle
- Updated NUMBER without precision to map to
DECIMAL(38,18)for Oracle to correctly handle floating‑point semantics in converted code.
Teradata
-
Updated NUMBER without precision to map to
DECIMAL(38,18)for Teradata to preserve floating‑point behavior in conversions. -
Added Teradata stored procedure test cases with various DML statements, transactions, and explicit handling of output parameters returned without
CALL, aligning with Teradata’s procedure semantics.
Redshift
- Fixed Redshift NUMBER mapping to
DECIMAL(38,0)to ensure correct numeric precision in converted objects.
General SQL
-
Added support for
CONNECT BYinto the platform source gap specifications and introducedCREATE PROJECTIONpatterns intogeneral_sql_specs.jsonto broaden SQL feature coverage across supported sources. -
Added DDL test cases and patterns to improve datatype and table partition conversion and to strip unwanted default values from DDL in the final master step when they are not visible in configuration.
ETL to Databricks (Informatica)
-
Fixed workflow parameter default value handling and introduced a table‑based workflow parameter storage system, providing type‑aware defaults and a
workflow_utils.workflow_paramsDelta table to centralize parameter metadata for Informatica‑to‑Databricks conversions. -
Added a configurable
data_type_mappingfor Informatica‑to‑Python conversions to improve type inference and consistency across generated notebooks.
Reconcile
-
Updated T‑SQL/Synapse reconciliation hash generation to avoid
VARCHAR(256)truncation by usingVARCHAR(MAX)inCOALESCEandHASHBYTES, reducing false mismatches and better aligning behavior with Databricks. -
Introduced source and target record count metrics (
source_record_count,target_record_count) into reconciliation metrics and dashboards, including an upgrade script and tests to support enhanced reconciliation observability.
Documentation
-
Added LLM‑friendly documentation via the
@signalwire/docusaurus-plugin-llms-txtplugin, exposing a structuredllms.txtindex and per‑page markdown URLs so AI tools can more easily discover and consume Lakebridge documentation. -
Documented automatic serverless cluster detection behavior for Reconcile, including configuration requirements for Unity Catalog volumes and environment variables.
-
Published an IBM DataStage‑to‑Databricks conversion guide that explains supported DataStage versions and objects, generated Databricks artifacts, helper libraries, and troubleshooting practices.
-
Extended the BladeBridge ETL configuration guide with native database connection examples, including tokenized JDBC/ODBC templates for systems such as Oracle and MSSQL.
-
Updated Switch documentation to describe the new
input_file_relative_pathcolumn in the Conversion Result Table schema and how it preserves input directory structure in outputs.
Dependency updates:
- Updated pyodbc requirement from ~=5.2.0 to >=5.2,<5.4 (#2104).
- Bump webpack from 5.99.6 to 5.105.0 in /docs/lakebridge (#2269).
- Bump sigstore/gh-action-sigstore-python from 3.0.1 to 3.2.0 (#2177).
- Updated duckdb requirement from ~=1.2.2 to >=1.2.2,<1.5.0 (#2079).
Contributors: @sundarshankar89, @dependabot[bot], @BesikiML, @simone-dbx-labs, @eri-adepoju, @m-abulazm, @gueniai, @hiroyukinakazato-db
v0.12.1
Synapse Profiler
- Fixed several critical errors in the Synapse profiler extraction pipeline, including a type mismatch when initializing the credential manager and handling of empty or partial result sets from Spark data pools.
- Added support for an
envsecret type in the Synapse profiler, allowing profiler configurations to resolve secrets from environment variables. - Enhanced the credential manager backing the profiler so it can resolve nested credential structures (for example, workspace config, JDBC settings, and profiler options) instead of only flat key–value maps.
- Introduced recursive, type-aware resolution of dictionaries, lists, and strings in profiler-related credentials while preserving primitive values and maintaining backward compatibility with existing configurations.
- Enhanced the credential manager to support nested credential structures
Analyzer
- Expanded SQL parsing to cover additional TSQL constructs (including CREATE STATISTICS, THROW, and DROP TEMPORARY TABLE IF EXISTS), improving handling of error management, statistics, and temporary tables.
- Fixed crashes caused by special characters in mapping names by treating them as escaped literals instead of regex symbols.
- Improved reliability of MERGE into partitioned targets with enhanced handling and added test coverage.
- Added Jupyter Notebook detection to the Lakebridge analyze command by mapping notebook assets to the JUPYTERNB type and updating local dev tooling ignores.
- Simplified analyze filepath handling:
--report-filenow directly controls the Excel filename with consistent relative-path semantics, and--source-directorybehavior and prompt text are aligned with the implementation. - Removed a confusing “timestamped directory” behavior when the target report file already existed so logs and output now match user expectations.
Converters
Morpheus
Snowflake
- Always emit SQL SECURITY INVOKER on generated Snowflake procedures so the security context is explicit and tests align with this behavior.
- Improved parsing and generation of INTERVAL literals to support both ANSI-style and Snowflake-style syntaxes.
- Implemented correct transpilation of Snowflake INTERVAL literals, including composite values and Snowflake-specific units, into normalized Databricks SQL YEAR TO MONTH and day-time intervals.
Synapse / TSQL / SQL Server
- Improved parsing and handling of logical expressions (AND/OR/NOT) from other expressions in the T-SQL grammar to reduce ambiguity and improve parsing performance.
- Added support for T-SQL ALTER DATABASE statements into Databricks SQL ALTER SCHEMA, emitting comments for unmappable features.
- Introduced partial support for T-SQL CONVERT, adding indication for unsupported datetime types.
- Improved typing and parsing of the T-SQL
+operator so string concatenation is consistently treated as string operations and flattened where possible. - Added T-SQL-specific overrides for equality and missing-value functions to avoid unresolved routines by mapping them to appropriate Databricks SQL equivalents.
- Implemented full AST support for T-SQL RAISERROR
- Supported T-SQL CROSS APPLY by transpiling it to CROSS JOIN LATERAL, with correct join clause and hint formatting.
- Implemented T-SQL OUTER APPLY by generating LEFT JOIN LATERAL, with tests for ordering and LIMIT placement.
- Fixed transpilation of WHERE … LIKE … so columns and patterns render correctly, including COLLATE expressions.
- Added tests for SET within nested IF blocks to validate correct handling of T-SQL control flow without extra variable declarations.
- Correctly translated T-SQL DATEDIFF for all date parts into Databricks SQL expressions that match T-SQL boundary-count semantics, with comprehensive tests.
- Generated Delta Lake computed columns from T-SQL COMPUTED definitions, including PERSISTED columns, using GENERATED ALWAYS AS in target schemas.
- Recognized T-SQL table variables as temporary tables and transpiled them to appropriate temporary table syntax in the target dialect.
- Implemented support for T-SQL SELECT … INTO by converting to CREATE TABLE AS SELECT and handling INTO precedence rules, while rewriting Snowflake-style INTO to session variables.
- Split T-SQL DECLARE statements with scalar subquery defaults into separate DECLARE and SET statements compatible with Databricks SQL.
General (Morpheus engine)
- Ensured block-level DECLARE variable scoping is dialect-aware by introducing a postDeclare flag so Snowflake-style declarations appear inside blocks.
- Cleaned up data type definitions and generators, adding TIME support, refactoring INTERVAL handling, and replacing ir.Byte with ir.TinyInt.
- Improved grammar, IR, and generation of INTERVAL literals so Snowflake-like and ANSI-style syntaxes are both supported.
- Fixed expression rendering to always emit parentheses for bracketed constructs, including empty window clauses.
- Added transformations to hoist DECLARE variables to the start of blocks and wrap batches with blocks when variables are present, improving procedural SQL handling.
- Generalized batch-wrapping logic so any scripting statements are encapsulated in BEGIN … END blocks, with tests updated for the new structure.
- Added dialect-specific configuration and grammar for IF/WHILE block parsing so T-SQL and Snowflake scripting blocks terminate correctly per dialect.
BladeBridge
TSQL / SQL Server
- Expanded SQL parsing to include TSQL features such as CREATE STATISTICS, THROW, and DROP TEMPORARY TABLE IF EXISTS, improving conversion robustness for TSQL workloads.
- Added support for SELECT column aliasing and extended TSQL keyword recognition to improve conversion of SQL scripts to Databricks-compatible syntax.
- Enhanced handling of MERGE statements into specific partitions and added tests to improve conversion reliability for partitioned MERGE patterns.
SSIS
- Ensured deterministic SSIS conversion output by enforcing stable ordering for variables and target columns, improving null handling, and adding tests so repeated runs generate consistent PySpark.
Informatica / IICS
- Implemented the SQL Transform component for Informatica-to-PySpark conversion to cover more data transformation logic.
- Added native JDBC/ODBC database connection support for IICS-to-Databricks conversions, allowing direct database reads/writes and fixing connection flag logic for Target components.
Switch
General
- Used empty strings instead of nulls for optional configuration parameters to simplify downstream handling.
- Preserved directory hierarchy in conversion output so generated artifacts mirror the source layout.
Reconcile
- Refactored reconciliation intermediate persistence to clean checkpoint volumes after runs, remove overwrite write mode usage, and prefer Delta volumes on Databricks instead of Parquet.
- Hid implementation details behind a more generic interface and marked future work to persist to Delta instead of re-reading from source systems.
- Improved reconciliation result handling and logging: reconciliation exceptions now raise a ReconciliationException, while mismatches and passes are logged with severity and report type.
- Implemented capability-based caching detection to keep reconciliation compatible with Databricks serverless compute, caching only when supported and using Delta writes as materialization boundaries on serverless.
Documentation
- Overhaul of documentation to more clearly show which source system is support in which module.
- Updated analyze command documentation to match the current implementation, simplifying caveats and clarifying expected behavior for
--report-fileand--source-directory.
Dependency updates:
- Bump lodash from 4.17.21 to 4.17.23
Contributors: @sundarshankar89, @m-abulazm, @asnare, @gueniai, @dependabot[bot], @BesikiML
v0.12.0
Analyzer
- Extended the Analyzer to recognize more SQL-bearing file types, including Oracle package files (
.pks,.pkb), Teradata utilities (.bteq,.fload,.mload, etc.), Hive scripts (.hql), and shell scripts with embedded SQL (.sh,.ksh,.bash,.csh), so more source assets are discovered without manual renaming.
Converters
General
-
Enabled SSIS conversion so SSIS packages can be translated into Databricks notebooks via the BladeBridge-based converter, providing a new migration path for ETL workloads built on SSIS.
-
Added Amazon Redshift conversion documentation describing supported features, known limitations, and a step-by-step workflow to convert Redshift SQL into Databricks SQL using the BladeBridge transpiler.
Morpheus
All dialects
-
Implemented explicit support for multi-statement transactions (
BEGIN TRANSACTION,COMMIT TRANSACTION,ROLLBACK TRANSACTION, andATOMICblocks) in the parser and generator, enabling transaction-aware translation and testing. -
Fixed generation of CASE expressions so CASE expressions now terminate with
ENDwhile CASE statements remain terminated withEND CASE, improving standards-compliant SQL output across dialects. -
Expanded
CREATE VIEWsupport to handleSCHEMABINDINGandMATERIALIZEDoptions, increasing coverage of advanced view definitions. -
Standardized translation of
DATE_xxxfunctions by mapping toDATE_ADDand adding synonyms such asDATE_FORMAT,DATE_PART,DATE_SUB, andDATE_TRUNCfor consistent naming and mapping. -
Ensured control-flow statements like
LEAVEandITERATEare always generated with labels by auto-labelling enclosing blocks or loops when needed, improving robustness of generated control-flow SQL.
Snowflake
- Added support for Snowflake ICEBERG catalog DDL by extending the grammar to recognize
CREATE ICEBERG TABLEand related syntax so ICEBERG table definitions parse and test correctly.
Synapse / TSQL
-
Updated translation of T‑SQL
VARandVARPto map to ANSIVARIANCEandVAR_POP, aligning variance aggregation semantics and tests. -
Updated translation of T‑SQL
STDEVandSTDEVPto treat them as synonyms forSTDDEVandSTDDEV_POP, improving aggregate function compatibility. -
Fixed translation of T‑SQL
REPLICATEby mapping it as a synonym ofREPEAT, clarifying conversion behavior and tightening test coverage.
BladeBridge
SSIS
- Enabled SSIS support so SSIS packages can be translated into Databricks notebooks, allowing customers to migrate SSIS workloads using the BladeBridge converter within Lakebridge.
Amazon Redshift
- Enabled Amazon Redshift SQL conversion to Databricks SQL, broadening coverage of cloud data warehouse sources and aligning with the new Redshift conversion documentation.ppl-ai-file-upload.s3.amazonaws+1
Synapse / TSQL / MSSQL
-
Fixed handling of non-standard
DELETEstatements with twoFROMclauses by normalizing the firstFROMto use the table alias before conversion, preventing malformedMERGEstatements for Synapse and similar targets. -
Updated SQL conversion to automatically remove unsupported
NOLOCKhints, corrected a fragment-breaker bug that split scripts beforeINSERT, and fixed variable declarations usingAS, improving reliability of T‑SQL parsing and conversion. -
Improved stored procedure conversion from SQL Server to Databricks SQL by correctly handling output parameters and standardizing EXEC-to-CALL translation to stay within Databricks SQL scripting constraints.
-
Added patterns to correctly convert
COUNT(DISTINCT COL1) OVER (PARTITION BY COL2)window expressions and to normalize table references from'{database_param}'.schema.table_nameto{database_param}.schema.table_name, eliminating stray quoting in database qualifiers. -
Fixed BIGINT datatype conversion when the type appears inside braces so it is recognized as a datatype and emitted as plain
bigintrather than as a backticked identifier, avoiding invalid Databricks SQL.
Oracle
- Improved Oracle package conversion by stripping unnecessary
BEGIN/ENDblocks from functions, emitting logic as a single returnedSELECT, adding a dedicated handler for UDF definitions, and tightening procedure conversion for variable declarations, loopTHENusage, and cursor placement inside loops.
Informatica
-
Enhanced Informatica-to-Spark SQL mappings by removing redundant empty-string wrapping, adding mappings for additional datetime and related functions, and fixing parameter replacement and
.format()usage so generated Spark SQL is cleaner and more accurate. -
Corrected Databricks notebook generation for Informatica mapplets by switching from relative to absolute imports in the Python template and simplifying mapplet argument collection, removing unused
JOB_PARAMETERSandMAPPLET_INFOcode.
Documentation
-
Added Redshift conversion documentation and guide, describing supported features, limitations, and a recommended workflow for converting Redshift SQL to Databricks SQL.
-
Added an SSIS conversion guide with a full list of supported components, step-by-step migration instructions, and a sample SSIS package to showcase an end-to-end workflow.
-
Updated Switch documentation for Spark Declarative Pipeline conversion, including the new
result_sdp_errorcolumn in Delta schemas,target_type = sdp, andsdp_languageoptions, and documented the 7-step conversion pipeline and validation behavior. -
Removed WSL from the Windows installation prerequisites, simplifying setup instructions while retaining guidance for Python installation and version checks across platforms.
General
- Updated project metadata to require Python versions between 3.10.1 and 3.13.x, avoiding Python 3.10.0, and revised installation docs to reflect the new supported version range.
Contributors: @asnare, @m-abulazm, @sundarshankar89, @BesikiML, @gueniai, @andresgarciaf, @hiroyukinakazato-db, @yyoli-db
v0.11.3
Analyzer
- Optimized SAS Analyzer performance by consolidating regex operations, delivering roughly a 7x speed improvement for large-scale SAS analysis workloads.
- Added support for new SSIS components Microsoft.Pivot, Microsoft.UnPivot, and ExtensibleFileTask, broadening coverage for SSIS package migrations analysis.
Converters – Morpheus
-
Core
- Significantly improved ANTLR parsing performance by merging grammars, refactoring ambiguous rules, and updating the Scala integration and build pipeline for the new grammar workflow.
- Allowed the STREAMS token to be used as an identifier so patterns like SELECT * FROM streams.foo.bar now parse correctly in Snowflake-oriented SQL.
- Updated the error reporting to align to the following:
-
Info: no error, the input was fully translated
Hint: the input was fully translated but some irrelevant bits have been elidedWarning: the input was translated but with unsupported bitsError: the input couldn't be translated
-
-
MSSQL / T-SQL / SQL Server
-
Added full support for SQL Server T-SQL CREATE INDEX and table-level index directives, parsing them into a new index IR and translating to CLUSTER BY AUTO in Databricks SQL so index statements are no longer rejected.
-
Extended grammar and parsing to handle T-SQL computed columns, QUOTENAME calls, GROUP options in query hints, DROP INDEX statements, and additional keywords like PARAMETERS, STREAMS, PROCEDURES, and VIEWS, improving coverage of real-world T-SQL workloads.
-
Improved DML parsing so INSERT targets use proper dot identifiers instead of expression-like forms, preventing misinterpretation as function calls and preserving case sensitivity where required.
-
Re-enabled and migrated T-SQL functional tests to a YAML-based format, expanding automated coverage and keeping still-failing cases isolated for follow-up.
-
Converters – BladeBridge
-
MSSQL / SSIS / T-SQL
- Resolved issues with column names containing single quotes and standardized DATEADD and DATEDIFF function patterns to improve compatibility across target SQL dialects.
-
DataStage
-
Implemented mapping for the JulianDayFromDate function with corresponding tests, extending DataStage function coverage in the converter.
-
Enhanced DataStage Spark and workflow handling by adding Databricks cluster sections, improving widget default handling, and mapping TransformStringToDate and spark.sqltemplate attributes for smoother Spark migrations.
-
Reconcile
-
Improved reconciliation hash query generation to guarantee consistent column ordering across SQL dialects, preventing false hash mismatches when column names are substrings of each other.
-
Reverted the Oracle reconcile implementation to use MD5 via DBMS_CRYPTO.HASH with RAWTOHEX, restoring compatibility with Oracle 11 while keeping the updated QueryBuilder engine handling..
Documentation
- Added practical details about how to extend BladeBridge configurations
Dependency updates:
- Bump actions/checkout from 5 to 6 (#2158).
Contributors: @asnare, @sundarshankar89, @dependabot[bot], @m-abulazm, @BesikiML
v0.11.2
Analyzer
- Normalized complexity categories in the analyzer from “COMPLEX/VERY_COMPLEX” to “HIGH/VERY_HIGH” for clearer reports.
Converters
Morpheus
Snowflake
- Implemented full support for
DECLARE,LET, and assignment statements to better handle procedural Snowflake scripts. - Added support for
DROP PROCEDUREstatements, improving Snowflake DDL coverage.
TSQL/Synapse
- Cleaned up grammar by removing duplicate and unsupported rules for TSQL special functions, reducing ambiguity and improving parser stability.
- Implemented full support for
DECLARE,LET, and assignment statements in TSQL, enabling richer stored procedure conversion. - Added support for TSQL
DROP PROCEDUREstatements to improve parity with source DDL. - Updated handling of options such as
ANSI_NULLSandQUOTED_IDENTIFIERto emit informative comments instead of errors when they do not apply to Databricks SQL. - Enhanced handling of
SET NOCOUNTby emitting comments explaining its behavior in Databricks SQL and warning whenNOCOUNT OFFis used. - Allowed
PRECISIONto be used as an identifier (for example,c.precision), fixing parsing issues with such column names. - Improved handling of
EXECstatements by detecting well‑known stored procedures likesp_executesqland issuing more specific diagnostics. - Added translation of
OBJECT_ID()checks intoEXISTSqueries against catalog metadata to preserve control flow in procedural TSQL. - Added warnings for unsupported
PRINTstatements by generating explanatory comments rather than hard errors. - Added parsing support for the Synapse
RENAME OBJECTsyntax, currently surfaced as an unsupported but recognized construct.
Generic Morpheus engine
- Enabled attaching comments and error markers to empty code blocks so that diagnostics are preserved in rendered SQL.
- Prevented semicolons from being printed after empty statements to keep output formatting consistent.
- Bundled multiple column-level primary keys into composite table constraints to produce more correct DDL.
- Allowed the identifier
PRECISIONin general parsing contexts, improving compatibility with more schemas.
BladeBridge
MSSQL / TSQL
- Improved handling of
MERGEstatements, including insertion of semicolons beforeMERGEin statement breaking and correct ordering ofMATCHEDandNOT MATCHEDclauses. - Fixed issues when converting updates on temporary tables into
MERGEstatements and added tests to guard the behavior. - Improved statement categorization by stripping comments before categorization and simplifying legacy comment-key handling.
- Added a new handler for nested static strings and inline comments, improving function substitution and parser robustness.
Generic BladeBridge engine
- Enhanced logging configuration to produce clearer diagnostics while keeping noise manageable.
Reconcile
- Added support for specifying a catalog for Databricks sources in Reconcile and prompting for the source catalog when necessary.
- Removed redundant Reconcile configuration parameters to simplify setup.
General
- Improved handling of output from LSP servers by safely chunking very long stderr lines and logging critical processing errors, preventing hangs and unbounded memory use.
- Adjusted JDBC handling to accept usernames and passwords via Spark options instead of embedding credentials in the JDBC URL, improving support for special characters in passwords.
- Consolidated the automated test suite to keep only unit and integration scopes, simplifying test configuration.
Dependency Updates
Full Changelog: v0.11.1...v0.11.2
Contributors: @m-abulazm, @asnare, @sundarshankar89
v0.11.1
Analyzer
No updates in this release.
Converters
General
- Improved end-to-end migration behavior through tighter integration with the centralized Morpheus function mapping layer and expanded cross-dialect coverage
Morpheus
Snowflake
- Centralized SQL function mappings and expanded cross-dialect coverage, improving Snowflake-to-Databricks SQL conversions and reducing noisy, non-actionable warnings.
- Added full translation support for Snowflake exception blocks, enabling richer error-handling logic to be preserved when converting to Databricks SQL.
TSQL / SQL Server
- Reworked SQL function handling so most mappings are centralized, making TSQL-to-Databricks SQL conversions more accurate and easier to extend for future Lakebridge-based migrations.
- Implemented full support for TSQL TRY/CATCH constructs, including THROW/RAISERROR-style logic and helper-based error handling, improving the fidelity of translated control-flow and error semantics.
BladeBridge
TSQL / SQL Server
- Fixed handling of T-SQL column alias syntax in SELECT statements so aliases are no longer mistaken for variable assignments, and removed a deprecated alias-normalization method to improve translation accuracy.
- Resolved failures caused by nested comments, improved post-conversion handling for shell and Python wrapper scripts, and ensured labeled UPDATE/DELETE statements that translate to MERGE remain correctly embedded in SQL.
- Corrected processing of SELECT statements without a FROM clause when assigning to variables, so expressions like variable increments and severity mappings are handled reliably during migration.
- Improved “delete by source” MERGE translations so separators and DELETE placement are preserved, and fixed static string handling so T-SQL patterns that use square brackets are not misinterpreted as identifier quoting or ranges.
Reconcile
No updates in this release
Documentation
- Clarified that Python 3.14 is not yet supported and updated macOS instructions to recommend Python 3.13 as the latest supported version
- Expanded installation prerequisites with detailed Databricks workspace requirements, authentication options, network and repository access expectations, and a comprehensive pre-installation checklist aimed at enterprise and security-restricted environments
General
- Increased the maximum stderr line size accepted from LSP servers during transpilation to prevent crashes or hangs when converters emit very large log lines
- Reduced noise from LSP integrations by lowering stderr mirroring from INFO to DEBUG level, ensuring detailed logs remain available for troubleshooting without cluttering normal operation logs
Contributors: @asnare, @andresgarciaf
v0.11.0
🎉 New Features
This release introduces two exciting new capabilities to Lakebridge:
Synapse Profiler
A powerful new Synapse Profiler feature is now available to help you analyze and profile your Synapse data. Refer to the documentation for usage details and examples.
Switch LLM Converter
Introducing the new Switch LLM converter, expanding Lakebridge's conversion capabilities. Refer to the documentation for usage details and examples.
Other updates
Converters
General
Conversion Output Fix
Fixed a bug where files nested 2 or more directories deep within the input directory could fail to be written out after conversion when the directory structure wasn't already in place.
Morpheus
Code Formatting Improvements
Refactored code formatting logic by introducing a tree-like structure in CodeBlock and a new CodeBlockRenderer to handle whitespace, comments, and error positioning, making the formatting system more maintainable and accurate.
TSQL
Added support for translating TSQL join hints (like REPLICATE and MERGE) to their Databricks SQL equivalents by transforming them into special /*+ ... */ comments after the SELECT keyword, while unsupported hints are flagged as annotated errors.
BladeBridge
SQL Server
- Fixed SELECT INTO real table syntax, corrected LIKE pattern handling, and mapped unsupported FUNC_ROW_NUMBER function while removing ANON_NOLOCK.
- Resolved an issue where CASE WHEN expressions as the last statement in a file generated incorrect semicolon placement in SQL scripts.
- Added fragment breaker before GO keyword and removed unsupported COMMIT TRANSACTION and CREATE INDEX constraints.
- Fixed T-SQL UPDATE statements that were not correctly converted to MERGE operations in specific cases.
- Corrected fragment handling around SELECT and UNION statements, and fixed issues with IF condition blocks and error handling blocks being mixed up.
- Removed SET IDENTITY_INSERT and BEGIN/COMMIT TRANSACTION statements, and changed INT GENERATED ALWAYS AS IDENTITY to BIGINT GENERATED ALWAYS AS IDENTITY.
- Added validation check for converted MERGE statements, implemented global variable reset in init_hook subroutine, and performed code refactoring.
- Fixed T-SQL DELETE statements that were not correctly converted to MERGE operations and added corresponding test cases.
Reconcile
Oracle
Improved Oracle support with the following enhancements:
- Fixed Oracle JDBC URL by moving credentials out of URL into options and correcting thin syntax
- Updated hashing/expression pipeline to replace
RAWTOHEX(...), 2withUTL_I18N.STRING_TO_RAW(...,'AL32UTF8'), 4(SHA-256) - Fixed schema comparison for Oracle
- Tweaked datatype parsing in default transformations for Oracle compatibility
- Added Oracle jars in setup script
- Extended integration scaffolding and added end-to-end tests
Snowflake
- Fixed schema comparison for Snowflake
- Adjusted log levels by demoting noisy warnings to debug/info
- Added Snowflake jars in setup script
- Extended integration scaffolding
Documentation
Added documentation for deploying reconciliation dashboards and updated documentation notebooks.
Dependency updates:
- Bump actions/setup-node from 5 to 6 by @dependabot[bot] in #2094
New Contributors
- @hiroyukinakazato-db made their first contribution in #2066
Full Changelog: v0.10.13...v0.11.0
Contributors: @goodwillpunning, @hiroyukinakazato-db, @sundarshankar89, @asnare, @m-abulazm, @dependabot[bot], @bishwajit-db
v0.10.13
Analyzer
- Added defensive code to prevent analyzer crashes on DataStage files with empty array references - Fixes an issue where the DataStage analyzer would crash when encountering empty array references
Converters
Morpheus
General
-
Enhanced name representation consistency - Major refactoring that replaces String representations with Expression types for table names, column names, and constraints across IR nodes, improving SQL/PySpark code generation accuracy
-
Fixed DBT parsing issues - Resolved template parsing problems by changing template markers to
!#Jinja0001#!format and improving whitespace handling for proper tokenization
TSQL (Synapse/SQL Server)
-
Support for dual OUTPUT clauses in TSQL INSERT/DELETE/UPDATE statements - Enhanced T-SQL parser to handle complex statements with multiple OUTPUT clauses (OUTPUT ... INTO ... OUTPUT ...) with comprehensive test coverage
-
Fixed TSQL DECLARE statement handling - Refactored DECLARE statement processing by moving logic to dedicated visitor methods and properly marking unsupported statements for future implementation
-
Improved BLOCK structure parsing for BEGIN and BEGIN TRY statements - Updated parser grammar to support flexible scripting blocks and transaction handling, allowing zero or more statements in control flow constructs
-
Added comprehensive USE statement support - Introduced new IR representations (UseCatalog, UseSchema) with dialect-specific AST building logic and proper SQL generation
Snowflake
-
Fixed Snowflake connection tests - Internal improvements for database connection test reliability
-
Added comprehensive USE statement support - Introduced new IR representations (UseCatalog, UseSchema) with dialect-specific AST building logic and proper SQL generation
BladeBridge
General
- Automatically creates and cleans up temporary folders for embedded SQL conversion in wrapper scripts - Improves workflow management by implicitly creating temp folders and cleaning them up once conversion is complete
MSSQL (SQL Server)
-
Enhanced table variable and temporary table conversion - Added support for table variable conversion to temporary tables and improved string handling with logic to convert double single quotes to double quotes
-
Fixed semicolon placement in nested select statements - Resolved issue where semicolons appeared before comments in nested select statements
-
Improved MS SQL procedure handling - Added LIMIT 1 for Set in select statements, enhanced function mappings, fixed string concatenation, and removed unsupported constraints
Reconcile
No updates in this release.
Documentation
No updates in this release.
Contributors: @gueniai, @sundarshankar89