5

I am using the below code to create a table from a dataframe in databricks and run into error.

df.write.saveAsTable("newtable")

This works fine the very first time but for re-usability if I were to rewrite like below

df.write.mode(SaveMode.Overwrite).saveAsTable("newtable")

I get the following error.

Error Message:

org.apache.spark.sql.AnalysisException: Can not create the managed table newtable. The associated location dbfs:/user/hive/warehouse/newtable already exists
paone
  • 604
  • 5
  • 14

2 Answers2

4

The SQL config 'spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation' was removed in the version 3.0.0. It was removed to prevent loosing of users data for non-default value.

3

Run following command to fix issue :

     dbutils.fs.rm("dbfs:/user/hive/warehouse/newtable/", true)

Or

Set the flag

spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation = true

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

Chema
  • 2,599
  • 2
  • 12
  • 22
vaquar khan
  • 9,473
  • 4
  • 64
  • 86
  • Hi @vaquar khan, Thank you for the response. I still get error and not sure why can't I use saveAsTable with the overwrite. Also, databricks documentation suggests against using dbutils.fs.rm on large datasets.(https://kb.databricks.com/data/list-delete-files-faster.html) – paone Sep 10 '20 at 21:47
  • 1
    Try this. Set the flag spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation to true – vaquar khan Sep 10 '20 at 23:26