6

I'm running some large jobs in Databricks, which for now, include inventorying the data lake. I'm trying to print all blob names within a prefix (sub-folder). There are a lot of files in these sub-folders, and I'm getting about 280 rows of file names printed, but then I see this: *** WARNING: skipped 494256 bytes of output *** Then, I get another 280 rows printed.

I'm guessing there is a control to change this, right. I certainly hope so. This is designed to work with BIG data, not ~280 records. I understand that huge data sets can easily crash a browser, but common, this is basically nothing.

ASH
  • 18,040
  • 13
  • 61
  • 153

2 Answers2

3

Note: Using GUI, you can download full results (max 1 millions rows).

enter image description here

To download full results (more than 1 million), first save the file to dbfs and then copy the file to local machine using Databricks cli as follows.

dbfs cp "dbfs:/FileStore/tables/AA.csv" "A:\AzureAnalytics"

Reference: Databricks file system

The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:

# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana

Reference: Installing and configuring Azure Databricks CLI

Hope this helps.

CHEEKATLAPRADEEP-MSFT
  • 11,445
  • 1
  • 14
  • 35
1

After a little more research, I stumbled on something that worked for me.

enter image description here

Also, this will display the contents of a dataframe,

display(df)

So, that will generate the view you see directly above.

ASH
  • 18,040
  • 13
  • 61
  • 153