0

I think I am facing a common problem when moving data between two servers but I can't find a good solution. I have two servers: server1 and server2. I want to move a subset of a table in server1 to a table in server2. I have the following tables (source and sink tables have the same schema):

  1. server1.dbo.source - Contains ID + other columns.
  2. server2.dbo.sink - Contains ID + other columns (same schema as source table).
  3. server2.dbo.utilizedIds - Contains one column with list of IDs I would like to move.

If I could reference both servers in the same query, the data that I am trying to move from server1 to server2 could be described by this SQL statement:

SELECT * FROM server1.dbo.source 
WHERE server1.dbo.source.id IN (SELECT * FROM server2.dbo.utilizedIds)

What I am trying to achieve is to only move IDs that I need. These IDs are dynamic and stored in a table in my destination server. I am doing this because I need a really small subset of the source table (I need thousands of rows out of billions). So far, these are the solutions that I have found and the reasons why I can't use them or I wouldn't like to use them:

  1. Copy all the data into a temp table in server2 and perform the filtering there - Moving this much data every time I need to update my sink is not feasible.
  2. Perform a delta insert of the source table into server2 and then filter this table - While this would be more performant than option 1, I really don't want to incur the cost of storing billions of rows when I only need thousands.
  3. Create temp tables/persistent tables in server1 and insert the IDs into this table, then filter using standard SQL - I am not the owner of server1, I cannot create tables or stored procedures. The temp table option was promising but temp tables don't persist across activities.
  4. Create a dynamic query and pass the IDs as a string to the SELECT statement. Something like this: SELECT * FROM server1.dbo.source WHERE id IN ('ID1', 'ID2', 'ID3' ...) - So far, this is my best solution. However, SQL statements have a length and parameter limit, and this is not a good practice

To me, this seems like a common use case. What is the best practice or correct solution to this problem? Building my query using strings feels like a hack and not something ADF was designed for.

edferda
  • 388
  • 1
  • 4
  • 10

0 Answers0