0

This seems simple enough but cannot get anything I try to work. I am trying to remove the CRLF from the ends of lines that don't meet my criteria, then output the file to an new file. For example this section:

One~Two~Three~Four
Test Plan Pay Work~scheduled payment pending~79f1cf6e~3/8/2020 6:13:07 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM
Test Plan Pay Work~GetCardInfo 
{failed to validate card
}
~f124a822-aa8d-4624-bb8c-ddsfgdfcc21fb~3/8/2020 6:14:31 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM

Should output to look like:

One~Two~Three~Four
Test Plan Pay Work~scheduled payment pending~79f1cf6e~3/8/2020 6:13:07 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM
Test Plan Pay Work~GetCardInfo {failed to validate card}~f124a822~3/8/2020 6:14:31 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM

Being a newbie I have tried:

Get-Content "C:\temp\errors.csv" | ForEach-Object {
  if ((!$_.EndsWith("AM") -and !$_.EndsWith("PM") -and !$_.EndsWith("Four")))
    {
       $_ -replace ("`r`n",' ')
    }
} | Out-File C:\temp\errors2.csv

But this does not work. Any ideas on this? Seems simple but cannot get it to work whatever I try.

1 Answers1

2

Get-Content splits text into separate lines and removes the newline characters by default. To prevent that use parameter -Raw. Now you can process the text as a whole, using regular expression -replace operator:

(Get-Content 'errors.csv' -Raw) -replace '(?<!AM|PM|Four)\r\n', ' ' | 
    Out-File 'errors2.csv'

The parentheses around the Get-Content call allow the use of the output of the command directly as the left-hand side operand of the -replace operator (see Grouping Operator).

Output:

One~Two~Three~Four
Test Plan Pay Work~scheduled payment pending~79f1cf6e~3/8/2020 6:13:07 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM
Test Plan Pay Work~GetCardInfo  {failed to validate card } ~f124a822-aa8d-4624-bb8c-ddsfgdfcc21fb~3/8/2020 6:14:31 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM

Regular expression breakdown:

  • (?<! starts a negative lookbehind assertion
    • AM|PM|Four any of literal AM, PM or Four
  • ) ends the negative lookbehind assertion
  • \r\n linefeed characters

The negative lookbehind assertion makes the RegEx match only if the linefeed characters are not preceeded by AM, PM or Four. The negative lookbehind doesn't take part of the match result, so only the linefeed characters will be replaced.

Regex lookahead, lookbehind and atomic groups

Note:

This approach, using Get-Content -Raw loads the entire file into memory. If the file is too big an approach using default Get-Content, to process input line by line (possibly in chunks using parameter -ReadCount) would be feasible, but a bit more complicated.

zett42
  • 18,406
  • 3
  • 20
  • 66
  • That output fills the file with tons of characters not needed. Doesnt't seem to work. – Jack Black Mar 10 '21 at 16:28
  • @JackBlack Shame on me for posting an overly complicated, not even working solution too quickly. I have replaced that by a much simpler one that appears to work for me. – zett42 Mar 10 '21 at 16:59
  • That did it indeed! Thank you SO MUCH! – Jack Black Mar 10 '21 at 17:04
  • @JackBlack Please click ✔️ on the left of the answer to mark this answer as accepted and optionally upvote. This is the way to say "thanks" on SO. You even get two reputation points for accepting. – zett42 Mar 10 '21 at 17:13
  • @JackBlack I just found a way to streamline this code even more, see updated answer. – zett42 Mar 10 '21 at 19:20
  • Excellent..that is wicked fast too on the large file. Thanks so much @zett42 – Jack Black Mar 10 '21 at 22:16