1

I am trying to replicate the SQL DB like feature of maintaining the Primary Keys in Databrciks Delta approach where the data is being written to Blob Storage such as ADLS2 or AWS S3.

I want a Auto Incremented Primary key feature using Databricks Delta.

Existing approach - is using the latest row count and maintaining the Primary keys. However, this approach does not suit in parallel processing environment where Primary keys get duplicated data.

mn0102
  • 809
  • 1
  • 12
  • 25
  • Possible duplicate of [Primary keys with Apache Spark](https://stackoverflow.com/questions/33102727/primary-keys-with-apache-spark) – simon_dmorias Aug 27 '19 at 14:50
  • I've flagged as duplicate. This isn't a Databricks Delta issue - rather a Spark in general issue. Ideally I would not use an incremental key - they don't work in a distributed world. Instead try a guid - or look at a function called monotonicallyIncreasingId. – simon_dmorias Aug 27 '19 at 14:52

0 Answers0