1

I'm working with this query:

SELECT
    REGEXP_SUBSTR ('test1   test2   test3', '[^\t]+', 1, 1) field1,
    REGEXP_SUBSTR ('test1   test2   test3', '[^\t]+', 1, 2) field2,
    REGEXP_SUBSTR ('test1   test2   test3', '[^\t]+', 1, 3) field3
FROM DUAL

I'm looking for each field to have each value. this is a tab-delimited string. But I'm getting random outputs to it.

Looking for:

field1    field2    field3
test1     test2     test3

Getting:

field1    field2    field3
es        es        es

what am I missing?

thanks

arcee123
  • 279
  • 8
  • 37
  • 102
  • 2
    By the way and just FYI, the Posix pattern `'[^[:space:]]+'` works for both tabs and spaces(even carriage return, newline etc). – Kaushik Nayak Mar 22 '19 at 17:26

2 Answers2

4

[^\t]+ will match characters that are not a backslash and not a t character.

So

SELECT
    REGEXP_SUBSTR ('aaatbbb\ccc', '[^\t]+', 1, 1) field1,
    REGEXP_SUBSTR ('aaatbbb\ccc', '[^\t]+', 1, 2) field2,
    REGEXP_SUBSTR ('aaatbbb\ccc', '[^\t]+', 1, 3) field3
FROM DUAL

Will output:

FIELD1 | FIELD2 | FIELD3
:----- | :----- | :-----
aaa    | bbb    | ccc   

Having split the string on \ and t

If you want to split it on a tab character then:

SELECT
    REGEXP_SUBSTR ('test1' || CHR(9) || 'test2' || CHR(9) || 'test3', '[^' || CHR(9) || ']+', 1, 1) field1,
    REGEXP_SUBSTR ('test1' || CHR(9) || 'test2' || CHR(9) || 'test3', '[^' || CHR(9) || ']+', 1, 2) field2,
    REGEXP_SUBSTR ('test1' || CHR(9) || 'test2' || CHR(9) || 'test3', '[^' || CHR(9) || ']+', 1, 3) field3
FROM DUAL

Which outputs:

FIELD1 | FIELD2 | FIELD3
:----- | :----- | :-----
test1  | test2  | test3 

db<>fiddle here

MT0
  • 113,669
  • 10
  • 50
  • 103
2

Characters inside the square brackets are interpreted as literal characters, not escape characters - so you're matching anything except 't' or '\' rather than tabs.

You could embed an actual tab character using concatenation:

SELECT
    REGEXP_SUBSTR ('test1   test2   test3', '[^'||chr(9)||']+', 1, 1) field1,
    REGEXP_SUBSTR ('test1   test2   test3', '[^'||chr(9)||']+', 1, 2) field2,
    REGEXP_SUBSTR ('test1   test2   test3', '[^'||chr(9)||']+', 1, 3) field3
FROM DUAL;

FIELD FIELD FIELD
----- ----- -----
test1 test2 test3

You may have problems with adjacent tabs though:

SELECT
    REGEXP_SUBSTR ('test1       test3', '[^'||chr(9)||']+', 1, 1) field1,
    REGEXP_SUBSTR ('test1       test3', '[^'||chr(9)||']+', 1, 2) field2,
    REGEXP_SUBSTR ('test1       test3', '[^'||chr(9)||']+', 1, 3) field3
FROM DUAL;

FIELD FIELD FIELD3
----- ----- ------
test1 test3       

A safer pattern is:

SELECT
    REGEXP_SUBSTR ('test1       test3', '(.*?)('||chr(9)||'|$)', 1, 1, null, 1) field1,
    REGEXP_SUBSTR ('test1       test3', '(.*?)('||chr(9)||'|$)', 1, 2, null, 1) field2,
    REGEXP_SUBSTR ('test1       test3', '(.*?)('||chr(9)||'|$)', 1, 3, null, 1) field3
FROM DUAL;

FIELD FIELD2 FIELD
----- ------ -----
test1        test3

db<>fiddle

Alex Poole
  • 173,063
  • 10
  • 166
  • 288