10

I'm using the hl command from the soul package for highlighting.

Is there a quick way, using sed or any better tool to remove all highlights? Keep in mind that the highlighted text may contain internal bracket pairs as well. For example running the command on:

Hello \hl{my math $\frac{1}{2}$} world

Should return

Hello my math $\frac{1}{2}$ world

  • 11
    do you need sed, why not simply \renewcommand\hl[1]{#1} ? – David Carlisle Mar 08 '16 at 20:03
  • You may want to use a more complete example in your question, so people know what you are asking for. When you are asking for something that works for Hello \hl{my math $\frac{1}{2}$} world and \hl{my math $\frac{1}{2}$} universe and \whatnotelse, that should be clear from the question, shouldn't it? – bers Mar 08 '16 at 20:14
  • @DavidCarlisle, @wrtlprnft: \renewcommand[1]{#1} or \renewcommand*{\hl}{}? Does it make any difference? – n.r. Mar 09 '16 at 07:31
  • @n.r. if you use the * you will get an error if there is a paragraph break (blank line) in the argument, i see no need for such checks in such a comand created by editing a working document. – David Carlisle Mar 09 '16 at 07:41
  • Sure, but what about the [1] in your version? Why do we need that? Is it to preserve structure or something? – n.r. Mar 09 '16 at 07:45

3 Answers3

16

Not sed but perl. We need recursive regular expressions to do that:

$ echo 'Hello \hl{my math $\frac{1}{2}$} world' | perl -e '
undef $/;
$_ = <>;
s/ \\hl \s* ({((?: \\. | [^{}] | (?-2) )*)}) /$2/gsx;
print;'

Line 4 means:

s/                  # replace
\\hl                  # any \hl control sequence
\s*                   # and some or no whitespace
(                     # and a TeX group (capture group #1)
  {                   # which consists in an opening brace 
    (                 # enclosing (capture group #2)
      (?: \\.           # any escaped characters
      | [^{}]           # or anything but braces
      | (?-2)           # or embedded TeX groups (recursion to #1)
      )*               # zero or more times
    )
  }                   # and a closing brace
)
/$2/gsx             # with group #2 globally

This approach assumes that your code parses correctly, and that braces in comments are either escaped or balanced.

n.r.
  • 4,942
10

One way would be to use vim. I understand that this approach probably isn't very accessible to a non-vim-user, but it works. return, left and so on stand for the corresponding keys.

  1. open vim: vim myfile.tex
  2. search for the pattern to be replaced: /\\hl{return
  3. now the interesting part: define a macro that will perform the replacement: qan4rightvleft%leftdv4leftpq
  4. execute the new macro once: @a
  5. now, you can execute the macro again by using @@ (just hold @) or execute it many times using something like 999@
  6. when you're done, exit using :wqreturn

I see it as an added bonus that you can inspect every instance of the replacement.

Of course, you could also \renewcommand*{\hl}{}, but that's not quite the same.

carnendil
  • 175
wrtlprnft
  • 3,859
  • 1
    I like the renewcommand idea - quick and dirty :) – Nathaniel Bubis Mar 08 '16 at 20:12
  • This doesn't work with \hl␣{ or \hl<cr>{ etc. – n.r. Mar 09 '16 at 07:02
  • You need to find and delete \hl\>, set mark a at the following opening brace (here braces in comments will be a problem for you too), move forward with % and delete the balanced closing brace, move back to a with a` and delete the opening brace. – n.r. Mar 09 '16 at 07:17
  • I know it's imperfect (it also won't work properly with escaped braces, braces in comments, \csname hl\endcsname, \hl^^;text^^=, \someothercommand\hl{text}moretext, \hl{\ttfamily text}, ...), but I tried to keep it simple. – wrtlprnft Mar 09 '16 at 08:07
  • Hmm, this is much harder than I thought. Actually, your answer does work with escaped braces. I'm bothered with braces in comments. Not sure how to deal with them. – n.r. Mar 09 '16 at 08:26
  • Personally, I'd just give up. \hl is probably only used to mark words or phrases, which usually don't include newlines and therefore comments. Nobody should use TeX trigraphs in this century, if you use \csname it's your fault and my last two examples could be solved by just leaving the braces in the text, which incidentally also solves the comment problem :-D – wrtlprnft Mar 09 '16 at 08:42
  • You're right. No prospects of a reliable Perl TeX parser without writing a basic TeX engine all over again. I feel tired just at the thought of trying. – n.r. Mar 09 '16 at 09:02
3

This works for me:

C:\Users\Name>echo Hello \hl{my math $\frac{1}{2}$} world | sed "s-\\\hl{\(.*\)}-\\1-"
Hello my math $\frac{1}{2}$ world

This is some version of sed on Windows, more specifically:

C:\Users\Name>sed --version
GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.

GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-gnu-utils@gnu.org>.
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.

But in general, this will not do: It just catches the final bracket of the expression, so multiple \hl{...} (or even other commands after that) might break it. So your example expression, for which my code works, does not represent all the use cases you may want to use it for.

This reminds me a lot of this question. What you want to do is find a matching curly bracket for \hl{; but even assuming that your code parses correctly, meaning that you never have an extra opening or closing bracket anywhere, inside or outside of \hl{...}, regular expressions seem to be incapable of achieving this without recursion, which I am not sure sed supports.

bers
  • 5,404