Write dataframe to blob using azure databricks

Question

Is there any link or sample code where we can write dataframe to azure blob storage using python (not using pyspark module).

score 6 · Accepted Answer · edited Apr 08 '20 at 11:49

Below is the code snippet for writing (dataframe) CSV data directly to an Azure blob storage container in an Azure Databricks Notebook.

# Configure blob storage account access key globally
spark.conf.set(
  "fs.azure.account.key.%s.blob.core.windows.net" % storage_name,
  sas_key)

output_container_path = "wasbs://%s@%s.blob.core.windows.net" % (output_container_name, storage_name)
output_blob_folder = "%s/wrangled_data_folder" % output_container_path

# write the dataframe as a single file to blob storage
(dataframe
 .coalesce(1)
 .write
 .mode("overwrite")
 .option("header", "true")
 .format("com.databricks.spark.csv")
 .save(output_blob_folder))

# Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')
files = dbutils.fs.ls(output_blob_folder)
output_file = [x for x in files if x.name.startswith("part-")]

# Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container
# While simultaneously changing the file name
dbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)

Example: notebook

Output: Dataframe written to blob storage using Azure Databricks

Is there a way to write it simply as a CSV file without the other files and moving operations — Anirban Saha, May 19 '21 at 07:28

score 0 · Answer 2 · answered Aug 26 '21 at 19:13

This answer also helps to delete the wrangled data folder leaving you with only the file you need.

storage_name = "YOUR_STORAGE_NAME"
storage_access_key = "YOUR_STORAGE_ACCESS_KEY"
output_container_name = "YOUR_CONTAINER_NAME"

    # Configure blob storage account access key globally
spark.conf.set("fs.azure.account.key.%s.blob.core.windows.net" % storage_name, storage_access_key)



output_container_path = "wasbs://%s@%s.blob.core.windows.net" % (output_container_name, storage_name)
output_blob_folder = "%s/wrangled_data_folder" % output_container_path

    # write the dataframe as a single file to blob storage
(dataframe
 .coalesce(1)
 .write
 .mode("overwrite")
 .option("header", "true")
 .format("com.databricks.spark.csv")
 .save(output_blob_folder))

    # Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')
files = dbutils.fs.ls(output_blob_folder)
output_file = [x for x in files if x.name.startswith("part-")]

    # Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container
    # While simultaneously changing the file name
dbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)

    # Delete all folders and files with 'wrangled_data' and leave only the folder needed
dbutils.fs.rm("%s/wrangled_data_folder" % output_container_path, True)

Write dataframe to blob using azure databricks

2 Answers2

Linked