0

I have an excel file with 128 000 unique emails. Each email can have multiple product ids attributed to it, however, the product codes are all stored in 1 cell, separated by semicolons.

   Email     Product_id
   XXXX@...  ;16;411;75;
   yyyy@...  ;3242;
   zzzz@...  ;58;411;16;65;24;

I'd like to normalize the product data into a separate table so it looks like this:

email            product_id
XXXX@...         16
XXXX@...         411
XXXX@...         75
yyyy@...         3242
zzzz@...         58
zzzz@...         411
zzzz@...         16
zzzz@...         65
zzzz@...         24

What are some possible solutions to normalizing this data? What tools/program should I be using?

Kris
  • 1
  • Where do you want to store this data? Why is the question tagged "sql"? – Gordon Linoff Jul 03 '16 at 15:50
  • for excel see here: http://stackoverflow.com/questions/35439099/how-to-split-a-single-cell-into-multiple-rows-and-add-another-row – Scott Craner Jul 03 '16 at 16:01
  • It won't be stored on any database, but I'm interested in possible solutions using MySQL/MS SQL since I have access to both of them. Should've made that clearer sorry. – Kris Jul 03 '16 at 17:13
  • Show us what you have tried, and where you have run into problems. In Excel, you can use `Text-to-columns` to split your data; then use `Power Pivot` or `Get and Transform` (which depends on your Excel version) to Unpivot the resultant Product_ID columns – Ron Rosenfeld Jul 04 '16 at 01:36
  • Hi Kris! As you are new here please allow me one hint: Is this question still open? If it's solved, it was kind to tick the acceptance check below the vote counter. This will 1) mark this issue as solved 2) make it easier for followers to find the solution 3) pay points to the answerer and 4) pay points to you. Once you crossed the 15 points border yourself you are asked to vote on contributions. This is the SO-way to say thank you. If it is still open, you might place more details... Happy Coding! – Shnugo Jul 05 '16 at 12:35

1 Answers1

0

With this SQL-Server query you get the list you need. You can use the SELECT col1, col2 INTO NewTable FROM ... syntax to write the result into a table immediately.

DECLARE @tbl TABLE(Email VARCHAR(100),Product_id VARCHAR(100));
INSERT INTO @tbl VALUES
 ('XXXX@...',';16;411;75;')
,('yyyy@...',';3242;')
,('zzzz@...',';58;411;16;65;24;');


WITH Splitted AS
(
    SELECT Email
          ,CAST('<x>'+REPLACE(Product_id,';','</x><x>') + '</x>' AS XML) AS ID_Splitted
    FROM @tbl AS tbl
)
SELECT EMail
      ,id_split.value('.','int') AS id
FROM Splitted
CROSS APPLY ID_Splitted.nodes('/x[text()!=""]') AS A(id_split);

The result

EMail       id
XXXX@...    16
XXXX@...    411
XXXX@...    75
yyyy@...    3242
zzzz@...    58
zzzz@...    411
zzzz@...    16
zzzz@...    65
zzzz@...    24

EDIT

I was a bit to fast as there is no hint, that you want to use SQL Server...

In Excel there is a very nice feature "text in columns". You find this at "Data" menu. (at least my german version calls this "Daten" and "Text in Spalte"). There you can define a delimiter (in your case the semicolon) and get the values split side-by-side automatically. Some kind of unpivot would create a one-below-the-other list

Shnugo
  • 64,489
  • 9
  • 46
  • 104