12

I have data in an Excel spreadsheet with delimited strings. For simplicity, think of something like "4#8#10", with an arbitrary number of pieces.

Is there any way to split this into an array for use with an array formula? For example, something like SUM(SplitStr(A1,"#")). (This does not work - it just returns "4".)

Is there a good way of approaching this? My ideal solution would end up with an array, as if the user entered {4,8,10} for use in array formulas.

The application of my sheet does not allow for delimiter-based text to columns and summing that. I would prefer to not use VBA if possible as I will not be the primary consumer of the sheet.

Thanks, Terry

Gahan
  • 3,870
  • 4
  • 22
  • 41
terry87
  • 425
  • 1
  • 4
  • 14
  • 1
    Have you tried LEFT, RIGHT, MID, SEARCH combined? I don't quite understand what output you expect, so can't direct to a proper answer – Inox Aug 14 '14 at 19:29
  • Does the string contain any symbols or specific set of symbols (# in this case)? – Ravi Yenugu Aug 14 '14 at 19:31
  • Inox - yes, though the issue was transforming it to an array for summation, which is necessary when you consider an arbitrary number of pieces in the string. The expected sum of that is 22. (See Ron's answer.) Thinkingcap - Yes, the string is delimited. In this case # but often char(10). – terry87 Aug 16 '14 at 17:30

7 Answers7

24

Use XML functionality:

={SUM(FILTERXML("<t><s>" & SUBSTITUTE(A1, "#", "</s><s>") & "</s></t>", "//s"))}

Update April 2022: Use new TEXTSPLIT function.

Vafā Sarmast
  • 341
  • 2
  • 5
  • 1
    Nice. +1 for this. – QHarr May 16 '18 at 06:10
  • this should be the accepted answer since it does not rely on a constant like in the accepted answer (what if the string is more than 99 chars ?). It is complete dynamic then – laloune Apr 01 '20 at 10:21
  • 1
    Note the curly braces cannot be entered manually, the formula must be entered without them ( i.e. =SUM(...) ) Then typing CTRL+SHIFT+ENTER will indicate the expression is an array formula, and the braces appear. – plasmo Jul 27 '20 at 06:28
23

To sum the entries 4,8,10 you could use something like:

=SUMPRODUCT(1*TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),(ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1))-1)*99+((ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)))=1),99)))

The array that is returned is a text array, so the 1* at the beginning is one way to convert them to numbers

This part:

TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),(ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1))-1)*99+((ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)))=1),99))

returns the array:

{"4";"8";"10"}

And with 1* before it:

{4;8;10}

Edit After six years, more succinct formulas are available to create the array:

With Excel O365 and the SEQUENCE function:

=1*(TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),IF(SEQUENCE(LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)=1,1,(SEQUENCE(LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)-1)*99),99)))

With Excel 2010+ for Windows (not MAC versions) with the FILTERXML funcdtion:

=FILTERXML("<t><s>" & SUBSTITUTE(A1,"#","</s><s>") & "</s></t>","//s")

Note that the FILTERXML function extracts the values as numbers, whereas the first formula extracts the numbers as text which must be converted to numbers before being used in a numeric formula.

Ron Rosenfeld
  • 47,198
  • 7
  • 25
  • 55
  • That is wicked clever. You are my new Excel god. Question: is the Trim necessary? (Doesn't the 1* serve the same purpose?) Thanks! – terry87 Aug 16 '14 at 17:19
  • This is great, but I cannot work out how it is working :) How can I apply it to another cell different from A1? Can you elaborate a bit more on how the formula is working, can;t work out how the array is generated... – Simone Jun 04 '15 at 13:32
  • @Simone To apply to a cell other than A1, you would change the reference to A1 within the formula, to some other cell. To understand how the array is generated, I would suggest you examine the second formula in my response using the formula evaluation tool. – Ron Rosenfeld Jun 06 '15 at 00:46
  • This doesn't seem to be working for me -- I enter the formula exactly as shown, but the returned array is {"";"";"10"}. Any ideas why @RonRosenfeld? – Austin Wismer Aug 23 '18 at 03:44
  • @AustinWismer There is something different about either how you entered the formula, or your data, from what the question assumes. Maybe extra spaces someplace? But without knowing **exactly** what you did, I can't say. – Ron Rosenfeld Aug 23 '18 at 10:58
  • I use the TRIM... function without multiplication with 1 to get the array returned in text format to the next function. When following the calculation steps it seams like MID(...) return an array when it is within SUMPRODUCT but it return only the first item in the array when SUMPRODUCT is removed. Do you have an idea how to change that behaviour? – Matthias Tidlund Jan 14 '19 at 15:55
  • @MatthiasTidlund Please post your problem as a new question. Comments are suboptimal in trying to help with this kind of issue. – Ron Rosenfeld Jan 15 '19 at 01:39
  • I have now formulated my comment as a separate but similar question. Here is a link for those who can be interested in that question as well [How to use a string from a single cell as an array in a function without VBA](https://stackoverflow.com/questions/54220420/how-to-use-a-string-from-a-single-cell-as-an-array-in-a-function-without-vba)! – Matthias Tidlund Jan 16 '19 at 15:37
  • I claim absolutely no credit for this, but in latest Excel it can be made even more succint: `=LET(in;L1;n_sub;SEQUENCE(1+LEN(in)-LEN(SUBSTITUTE(in;";";"")));1*TRIM(MID(SUBSTITUTE(in;";";REPT(" ";99));(n_sub-1)*99+1;99)))` – martin Mar 12 '21 at 08:49
2

With your data in A1, in B1 enter:

=TRIM(MID(SUBSTITUTE($A1,"#",REPT(" ",999)),COLUMNS($A:A)*999-998,999))

and copy across.

Gary's Student
  • 94,018
  • 8
  • 54
  • 89
  • This is an interesting formula to simulate text to columns, but since the structure of the spreadsheet does not allow for text-to-columns, unfortunately this doesn't work for the case in the post. Thanks! – terry87 Aug 16 '14 at 17:32
  • Yeah! it extract **4** form **4#8#10** – SIslam Jan 11 '16 at 07:32
2

Regarding Ron Rosenfeld's answer and Austin Wismer's question as to why it wasn't working for him, I spent the good part of an hour breaking it down and trying to figure out what makes it work(I think).

The OFFSET part of the function is taking the cell with the text that you want to split(my examples):

"Haz|Tank|Doub"

and duplicating it into an array column, with the height specified by taking the total number of characters in the cell

LEN("Haz|Tank|Doub") = 13 characters

and subtracting it taking the total number of characters in that cell, when the delimiter is replaced by empty space ""

LEN(SUBSTITUTE("Haz|Tank|Doub","|","")) =
LEN("HazTankDoub") = 11 characters

Which gives the number of delimiters in the cell being referenced:

LEN("Haz|Tank|Doub") - LEN(SUBSTITUTE("Haz|Tank|Doub","|","")) = 2

Adding 1 to this:

LEN("Haz|Tank|Doub")-LEN(SUBSTITUTE("Haz|Tank|Doub","|",""))**+1))** 

13 - 11 + 1 = 3

This gives the number of separate text strings that we want split up. The result is the OFFSET function duplicates the cell by 3(in my example) into separate rows

"Haz|Tank|Doub" "Haz|Tank|Doub" "Haz|Tank|Doub"

or

OFFSET("Haz|Tank|Doub",,,LEN("Haz|Tank|Doub")-LEN(SUBSTITUTE("Haz|Tank|Doub","|",""))+1))

OFFSET("Haz|Tank|Doub",,,3) = {"Haz|Tank|Doub";"Haz|Tank|Doub";"Haz|Tank|Doub"}

Which is then fed into the ROW function, which gives the row number for each cell in a range, and this is where it gets tricky.

So in my case, the Row # of the first cell in a list of cells with text split by delimiters starts at R2, which OFFSET then duplicates 3 rows down, so when I feed this into the ROW function, I get:

ROW({"Haz|Tank|Doub";"Haz|Tank|Doub";"Haz|Tank|Doub"}) = {2;3;4}

or

ROW({$R$2:$R$4}) = {2;3;4}

Seems fine, but then this is subtracted by 1

(ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1))-1)

{2;3;4} - 1 = {1;2;3}

Ok, then that is multiplied by 99

(ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1))-1) * 99

{1;2;3} * 99 = {99;297;396}

After this, the formula adds it to this

+((ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1)))=1)

Which is almost the same as the last part, except, it does not subtract 1 AND it checks if any of these results will = 1. This is done because the assumption is that the row number of the cell you're trying to split up into an array is 1. So for now, sticking with my numbers, this is what happens:

((ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1)))=1)   
((ROW(OFFSET($R$2,,,13 - 11 + 1)))=1)
((ROW(OFFSET($R$2,,,3)))=1)
((ROW({$R$2:$R$4}))=1)
(({2;3;4})=1)
(({FALSE;FALSE;FALSE})

{99;297;396} + ({FALSE;FALSE;FALSE})
{99;297;396} + ({0;0;0})
{99;297;396}

This is fed into the MID function as the start number to extract the text from, which is the number of characters into the text string to extract the text from. The text the MID function looks through is the cell we want split, however it first substitutes the delimiter with 99 " " empty spaces to separate them out.

MID(SUBSTITUTE($R$2,"|",REPT(" ",99)) = "Haz                                Tank                           Doub"

not exactly 99 spaces between each above, just wanted to show a representation

So our start numbers for each of the 3 rows in the array are {99;297;396}

So the MID function is going to start at 99 characters into the above text string, and extract 99 characters after that for the first row. Then it'll start at 297 characters in, and extract 99 characters after this.

MID(SUBSTITUTE($R$2,"|",REPT(" ",99)),{99;297;396},99)
MID("Haz                            Tank                           Doub",{99;297;396},99)

MID("Haz                            Tank                           Doub",{99},99)
MID("Haz                            **|Tank                           |**Doub",{99},99)
Tank

MID("Haz                            Tank                           Doub",{297},99)
MID("Haz                            Tank                           **|Doub               
     |**",{297},99)
Doub

MID("Haz                            Tank                           Doub",{396},99)
MID("Haz                            |Tank                          Doub",{396},99)
""

On the last one, it starts 396 characters in and goes 99 character past that, which contains nothing, and returns nothing.

Now if I my cell started on row $R$1, then it would do this:

ROW({$R$1:$R$3}) = {1;2;3}

(ROW(OFFSET($R$1,,,LEN($R$1)-LEN(SUBSTITUTE($R$1,"|",""))+1))-1)

{1;2;3} - 1 = {0;1;2}

(ROW(OFFSET($R$1,,,LEN($R$1)-LEN(SUBSTITUTE($R$1,"|",""))+1))-1) * 99

{0;1;2} * 99 = {0;99;198}

{0;99;198} + ((ROW(OFFSET($R$1,,,LEN($R$1)-LEN(SUBSTITUTE($R$1,"|",""))+1)))=1)   
{0;99;198} + ((ROW(OFFSET($R$1,,,13 - 11 + 1)))=1)
{0;99;198} + ((ROW(OFFSET($R$1,,,3)))=1)
{0;99;198} + ((ROW({$R$1:$R$3}))=1)
{0;99;198} + (({1;2;3})=1)
{0;99;198} + (({TRUE;FALSE;FALSE})
{0;99;198} + {1;0;0}
{1;99;198}

And this, fed into MID, would correctly start at 1 character in, and extract everything 99 characters to the right of it:

MID("Haz                            Tank                           Doub",{1;99;198},99)
{Haz;Tank;Doub}

So all of that said, I substituted 2 things in the code so that it'll do it regardless of where you start in the sheet:

TRIM(MID(SUBSTITUTE([@Endorsements],"|",REPT(" ",99)),
(ROW(OFFSET([@Endorsements],,,LEN([@Endorsements])-LEN(SUBSTITUTE([@Endorsements],"|",""))+1))-(ROW([@Endorsements])))*99+
((ROW(OFFSET([@Endorsements],,,LEN([@Endorsements])-LEN(SUBSTITUTE([@Endorsements],"|",""))+1)))=(ROW([@Endorsements]))),99))

Please note that [@Endorsements] is $R$2 for me, it's just the cell is in a named Table, and @ corresponds to the same row in the table, but another named column, so I think $R2 could be substituted and still work

TRIM(MID(SUBSTITUTE([@Endorsements],"|",REPT(" ",99)),
(ROW(OFFSET($R2,,,LEN(R$2)-LEN(SUBSTITUTE($R2,"|",""))+1))-(ROW($R2)))*99+
((ROW(OFFSET($R2,,,LEN($R2)-LEN(SUBSTITUTE($R2,"|",""))+1)))=(ROW($R2))),99))

Basically instead of subtracting 1, it subtracts the row number of whatever the row of the cell is that you want to split up. Not much of a change at all, so full credit still goes to Ron, but I figured I'd share it since I just figured it out.

0
=INDEX(FILTERXML("<t><s>"&SUBSTITUTE(A1,"#","</s><s>")&"</s></t>","//s"),3,1)

so you can refer for each instance without having an array formula.in this one you refer to the 3rd

AndrewK
  • 71
  • 3
0

I was using Ron's answer from above, but it was giving me a blank space with larger lists and I didn't like the limited size for each datum, so I made a similar formula based on his that doesn't have these issues.

=MID($A$1,IFERROR(FIND(" ",SUBSTITUTE($A$1,"#"," ",SEQUENCE(LEN($A$1)-LEN(SUBSTITUTE($A$1,"#","")),,0)))+1,1),FIND(" ",SUBSTITUTE($A$1,"#"," ",SEQUENCE(LEN($A$1)-LEN(SUBSTITUTE($A$1,"#","")),,1)))-IFERROR(FIND(" ",SUBSTITUTE($A$1,"#"," ",SEQUENCE(LEN($A$1)-LEN(SUBSTITUTE($A$1,"#","")),,0))),0)-1)
Dharman
  • 26,923
  • 21
  • 73
  • 125
Carble
  • 29
  • 2
-1

Just in case someone is looking to do the same but with Google Sheets, here's how I do it

Let's say A1 contains 4#8#10

=ARRAYFORMULA(SPLIT(A1, "#"))

That will create an array, and display the array entry in columns.

If you need the array to be shown in rows, you can use

=ARRAYFORMULA(TRANSPOSE(SPLIT(A1, "#")))

I wish there was a SPLIT formula in MS Excel..

m-p-3
  • 351
  • 4
  • 6