0

Hi I have a csv file with an error in it.so i want it to correct with regular expression, some of the fields contain line break, Example as below

"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre Pkwy

California",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"

the above two lines should be in one line

"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre PkwyCalifornia",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"

I tried to use the below regex but it didnt help me

%s/\\([^\"]\\)\\n/\\1/
  • 2
    Line breaks inside double quotes are legal in CSV (at least, in the most common dialects, as there is no single standard). This is the most common way if line breaks need to be included in a field. You are mutilating Google's address by just pasting those lines together. – Thomas Jul 13 '20 at 09:04

2 Answers2

0

Try this:

public static void main(String[] args) {
    String input = "\"AHLR150\",\"CDS\",\"-1\",\"MDCPBusinessRelationshipID\","
            + ",,\"Investigating\",\"1600 Amphitheatre Pkwy\n"
            + "California\",,\"Mountain View\",,\"United\n"
            + "States\",,\"California\",,,\"94043-1351\",\"9958\"\n";

    Matcher matcher = Pattern.compile("\"([^\"]*[\n\r].*?)\"").matcher(input);
    Pattern patternRemoveLineBreak = Pattern.compile("[\n\r]");

    String result = input;
    while(matcher.find()) {
        String quoteWithLineBreak = matcher.group(1);
        String quoteNoLineBreaks = patternRemoveLineBreak.matcher(quoteWithLineBreak).replaceAll(" ");
        result = result.replaceFirst(quoteWithLineBreak, quoteNoLineBreaks);
    }

    //Output
    System.out.println(result);
}

Output:

"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre Pkwy California",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"
DigitShifter
  • 761
  • 4
  • 11
-1

Based on this you can try with:

/\r?\n|\r/

I checked it here and seems to be fine