What happened?
Hello everyone,
I am happily experiencing adbc engine with polars using Python language, communicating to and from a DuckDB instance.
I discovered that if I write two times the exact same dataframe, for any possible reason.
There is no mode available that would let me write to the same table and overwrite the elements that are already in.
Because with "append" or "create_append" modes, the same elements are written appending exactly new elments, so I end up having doubles rows with exact same elements.
Meanwhile, "replace" deletes completely the existing table and create a new one to insert new fresh data: but, in this case, if the new dataframe has elements not present in the previous table. The deletion takes away any old element no more available in the new dataframe, and that i think it is a waste.
Long story short, I think in my humble opinion that it should be available, when I write a new dataframe to an axisting table of a SQL compliant DB, there has to be the the possibility of overwriting automatically the rows with same elements if already present in the existing table.
Double elements example generated with append mode
import polars as pl
import adbc_driver_sqlite.dbapi as litedbapi
# Crate example dataframe
df = pl.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35]
})
con = litedbapi.connect(uri=r"db2.duckdb")
# First write
df.write_database(
table_name="people",
connection=con,
if_table_exists ="append",
engine="adbc"
)
con.commit()
# Second write
df.write_database(
table_name="people",
connection=con,
if_table_exists ="append",
engine="adbc"
)
con.commit()
# Verify
query="SELECT * FROM people"
result = pl.read_database(query=query,
connection=con)
print(result)
The result is
shape: (6, 3)
┌─────┬─────────┬─────┐
│ id ┆ name ┆ age │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪═════════╪═════╡
│ 1 ┆ Alice ┆ 25 │
│ 2 ┆ Bob ┆ 30 │
│ 3 ┆ Charlie ┆ 35 │
│ 1 ┆ Alice ┆ 25 │
│ 2 ┆ Bob ┆ 30 │
│ 3 ┆ Charlie ┆ 35 │
└─────┴─────────┴─────┘
Old elements gone away example with replace mode
df = pl.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35]
})
con = litedbapi.connect(uri=r"db.duckdb")
# First write
df.write_database(
table_name="people",
connection=con,
if_table_exists ="replace",
engine="adbc"
)
con.commit()
# Verify
query="SELECT * FROM people"
result = pl.read_database(query=query,
connection=con)
print(result)
An we have here:
shape: (3, 3)
┌─────┬─────────┬─────┐
│ id ┆ name ┆ age │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪═════════╪═════╡
│ 1 ┆ Alice ┆ 25 │
│ 2 ┆ Bob ┆ 30 │
│ 3 ┆ Charlie ┆ 35 │
└─────┴─────────┴─────┘
Then let's proceed with the second write with replace mode
# Second write
df = pl.DataFrame({
"id": [2, 3, 4],
"name": ["Bob", "Charlie", "Nathan"],
"age": [30, 35, 40]
})
df.write_database(
table_name="people",
connection=con,
if_table_exists ="replace",
engine="adbc"
)
# Verify
query="SELECT * FROM people"
result = pl.read_database(query=query,
connection=con)
print(result)
And we have:
shape: (3, 3)
┌─────┬─────────┬─────┐
│ id ┆ name ┆ age │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪═════════╪═════╡
│ 2 ┆ Bob ┆ 30 │
│ 3 ┆ Charlie ┆ 35 │
│ 4 ┆ Nathan ┆ 40 │
└─────┴─────────┴─────┘
And so you can see the first of the first write it has gone away.
Or, does anyone have any suggestion on how I could achieve that with the present fucntions/implementation?
Many thanks,
Nick
Stack Trace
No response
How can we reproduce the bug?
No response
Environment/Setup
No response
What happened?
Hello everyone,
I am happily experiencing adbc engine with polars using Python language, communicating to and from a DuckDB instance.
I discovered that if I write two times the exact same dataframe, for any possible reason.
There is no mode available that would let me write to the same table and overwrite the elements that are already in.
Because with "append" or "create_append" modes, the same elements are written appending exactly new elments, so I end up having doubles rows with exact same elements.
Meanwhile, "replace" deletes completely the existing table and create a new one to insert new fresh data: but, in this case, if the new dataframe has elements not present in the previous table. The deletion takes away any old element no more available in the new dataframe, and that i think it is a waste.
Long story short, I think in my humble opinion that it should be available, when I write a new dataframe to an axisting table of a SQL compliant DB, there has to be the the possibility of overwriting automatically the rows with same elements if already present in the existing table.
Double elements example generated with append mode
The result is
Old elements gone away example with replace mode
An we have here:
Then let's proceed with the second write with replace mode
And we have:
And so you can see the first of the first write it has gone away.
Or, does anyone have any suggestion on how I could achieve that with the present fucntions/implementation?
Many thanks,
Nick
Stack Trace
No response
How can we reproduce the bug?
No response
Environment/Setup
No response