Skip to content

adbc_ingest no possibility of overwrite the exact same elements #3102

@nikfio

Description

@nikfio

What happened?

Hello everyone,

I am happily experiencing adbc engine with polars using Python language, communicating to and from a DuckDB instance.
I discovered that if I write two times the exact same dataframe, for any possible reason.
There is no mode available that would let me write to the same table and overwrite the elements that are already in.
Because with "append" or "create_append" modes, the same elements are written appending exactly new elments, so I end up having doubles rows with exact same elements.
Meanwhile, "replace" deletes completely the existing table and create a new one to insert new fresh data: but, in this case, if the new dataframe has elements not present in the previous table. The deletion takes away any old element no more available in the new dataframe, and that i think it is a waste.

Long story short, I think in my humble opinion that it should be available, when I write a new dataframe to an axisting table of a SQL compliant DB, there has to be the the possibility of overwriting automatically the rows with same elements if already present in the existing table.

Double elements example generated with append mode

    import polars as pl
    import adbc_driver_sqlite.dbapi as litedbapi
    
    # Crate example dataframe
    df = pl.DataFrame({
        "id": [1, 2, 3],
        "name": ["Alice", "Bob", "Charlie"],
        "age": [25, 30, 35]
    })

    con = litedbapi.connect(uri=r"db2.duckdb")
    
    # First write
    df.write_database(
        table_name="people",
        connection=con,
        if_table_exists ="append",
        engine="adbc"
    )
    
    con.commit()
    
    # Second write
    df.write_database(
        table_name="people",
        connection=con,
        if_table_exists ="append",
        engine="adbc"
    )
    
    con.commit()

    # Verify
    query="SELECT * FROM people"
    result = pl.read_database(query=query,
                              connection=con)
    
    print(result)

The result is

shape: (6, 3)
┌─────┬─────────┬─────┐
│ id  ┆ name    ┆ age │
│ --- ┆ ---     ┆ --- │
│ i64 ┆ str     ┆ i64 │
╞═════╪═════════╪═════╡
│ 1   ┆ Alice   ┆ 25  │
│ 2   ┆ Bob     ┆ 30  │
│ 3   ┆ Charlie ┆ 35  │
│ 1   ┆ Alice   ┆ 25  │
│ 2   ┆ Bob     ┆ 30  │
│ 3   ┆ Charlie ┆ 35  │
└─────┴─────────┴─────┘

Old elements gone away example with replace mode

df = pl.DataFrame({
        "id": [1, 2, 3],
        "name": ["Alice", "Bob", "Charlie"],
        "age": [25, 30, 35]
    })

con = litedbapi.connect(uri=r"db.duckdb")
    
    # First write
    df.write_database(
        table_name="people",
        connection=con,
        if_table_exists ="replace",
        engine="adbc"
    )
    
    con.commit()
    
    # Verify
    query="SELECT * FROM people"
    result = pl.read_database(query=query,
                              connection=con)
    
    print(result)

An we have here:

shape: (3, 3)
┌─────┬─────────┬─────┐
│ id  ┆ name    ┆ age │
│ --- ┆ ---     ┆ --- │
│ i64 ┆ str     ┆ i64 │
╞═════╪═════════╪═════╡
│ 1   ┆ Alice   ┆ 25  │
│ 2   ┆ Bob     ┆ 30  │
│ 3   ┆ Charlie ┆ 35  │
└─────┴─────────┴─────┘

Then let's proceed with the second write with replace mode

    # Second write
    
    df = pl.DataFrame({
        "id": [2, 3, 4],
        "name": ["Bob", "Charlie", "Nathan"],
        "age": [30, 35, 40]
    })

    df.write_database(
        table_name="people",
        connection=con,
        if_table_exists ="replace",
        engine="adbc"
    )

 # Verify
    query="SELECT * FROM people"
    result = pl.read_database(query=query,
                              connection=con)
    
    print(result)

And we have:

shape: (3, 3)
┌─────┬─────────┬─────┐
│ id  ┆ name    ┆ age │
│ --- ┆ ---     ┆ --- │
│ i64 ┆ str     ┆ i64 │
╞═════╪═════════╪═════╡
│ 2   ┆ Bob     ┆ 30  │
│ 3   ┆ Charlie ┆ 35  │
│ 4   ┆ Nathan  ┆ 40  │
└─────┴─────────┴─────┘

And so you can see the first of the first write it has gone away.

Or, does anyone have any suggestion on how I could achieve that with the present fucntions/implementation?

Many thanks,
Nick

Stack Trace

No response

How can we reproduce the bug?

No response

Environment/Setup

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions