1

I am joining tbl_A to tbl_B, on column CustomerID in tbl_A to column Output in tbl_B which contains customer ID. However, tbl_B has all other information in related rows that I do not want to lose when joining. I tried to join using like, but I lost rows that did not contain customer ID in the output column.

enter image description hereenter image description here

Here is my join query in Hive:

select a.*, b.Output from tbl_A a
left join tbl_B b
On b.Output like concat('%', a.CustomerID, '%')

However, I lose other rows from output.

David Buck
  • 3,594
  • 33
  • 29
  • 34
Julie
  • 55
  • 1
  • 6
  • 2
    Interesting. You claim that this code was executed without an error? – David דודו Markovitz Jun 01 '17 at 20:29
  • Please read and act on [mcve]. PS If you want all rows from b, you need b left join a, or a right join b. – philipxy Jun 05 '17 at 10:07
  • 1
    If you are able to run this without an error could you share your Hive version please? As far as I know, Hive only allows equi-joins; in other words, it does not allow join phrase to be in form `ON b.Output like concat('%', a.CustomerID, '%')`. see this: https://stackoverflow.com/a/25850819/1434041 – Zahra Aug 16 '17 at 17:27

2 Answers2

1

I would suggest first extract all ID's from free floating field which in your case is 'Output' column in table B into a separate table. Then join this table with ID's to Table B again to populate in each row the ID and then this second joined table which is table B with ID's to table A.

Hope this helps.

Lilu
  • 110
  • 7
1

You could also achieve the objective by a simple hive query like this :)

select a.*, b.Output 
from tbl_A a, tbl_B b
where b.Output like concat('%', a.CustomerID, '%')
D. Pardal
  • 5,634
  • 1
  • 15
  • 34
Vaibhav
  • 11
  • 1