0

I have an use case where I want to update specific row, by any identifier/where clause conditions and update that record on Oracle or SQL Server from databricks.

As i use spark.read.format("jdbc") against any of the databases, I could not easily find a way to update specific rows back to these DBs.

If i use,

df.write.format("jdbc")
.option("url", dbServerJdbcUrl)
.option("user", username)
.option("secret", password)
.option("driver", <either com.microsoft.sqlserver.jdbc.SQLServerDriver or oracle.jdbc.driver.OracleDriver>)
.option("dbTable",<table on the database platform>)
.mode('overwrite') //or other options
.save()

it only overwrites the whole "dbTable" on the database. I could not find a way to have it work by using .option("query", "update statements") so far.

If i tend to write to another temp or parking table, then it becomes 2 stages of work, wherein, i have to go back to the Db platform and have the actual respective tables updated from the parking table.

Another note - when i do the above write, on a table which has millions of rows, and i only want to update handful of them, any of the modes are only just causing more trouble.

  • overwrite - simply makes the millions of rows to lose/overwritten by this handful of data from df.
  • append - either creates dupes or eventually failure due to constraints

Is there any better solution to have the databricks update the specific rows on a database?

Ak777
  • 303
  • 7
  • 16
  • There is not yet an API for DataFrameWriter to do this kind of job (refer to [this](https://stackoverflow.com/a/35640373/11289386)), but you may still loop through the records in the dataframe to create an update statement one by one (which is I think not a quite good option). – PhuriChal May 23 '22 at 03:20
  • Thanks for your input, but even if am going to loop through, how does the spark.write() will work with query and that wont be replacing my whole table? – Ak777 May 23 '22 at 04:24
  • What I am thinking is you may loop through the DF for getting values and then update the table via normal SQL script with cx_oracle. I found this will be possible from this [blog](https://medium.com/@srijansahay/connecting-sql-server-oracle-mysql-and-postgresql-from-azure-services-using-python-789e93d879b4). – PhuriChal May 23 '22 at 04:37

0 Answers0