3

I'm trying to put in a LaTeX3 string a text to write later in a file (I also do other processing on that string before writing it, like text replacement, string concatenation…), but this file might not be a LaTeX document, but an arbitrary text/code, hence I want to preserve precisely the input, including newlines, spaces, tabulations, and many characters like _{}$\ like:

Hello % This is a comment that should not be removed
     I want precisely this spacing, and I want to be allowed to use any text, like 1 + (1*2), héhé,
or:
def mycode(my_variable_with_underscore="hey"): # This is a comment (with a single sharp)
    print(my_variable_with_underscore)
or latex macros like $\array{a & b\\c & d}$. And ideally unbalanced {} but I heard it was difficult
(maybe by replacing special tokens like \MYCLOSINGBRACE and \MYOPENINGBRACE in the final string?). 

For now, I tried to play with \tl_rescan:nn, trying to follow this answer without much success.

MWE:

\documentclass[options]{article}

\ExplSyntaxOn

\iow_new:N \g_my_write \cs_generate_variant:Nn \iow_now:Nn { NV }

%% Creates a new category table %% Inspired by https://github.com/latex3/latex3/blob/0b851165c1dba9625f7ab80bb5c4cbd27f3e9af7/l3kernel/l3cctab.dtx \cctab_const:Nn \c_my_active_cctab { \cctab_select:N \c_initex_cctab \int_set:Nn \tex_endlinechar:D { -1 } \int_step_inline:nnn { 0 } { 127 } { \char_set_catcode_active:n {#1} } }

\NewDocumentCommand{\myExactWrite}{m}{ \str_new:N \g_my_string \iow_open:Nn \g_my_write {my-output.txt} % === Fails: the final string contains \tl_rescan code: % \str_gset:Nn \g_my_string {\tl_rescan:nn {\cctab_select:N \c_my_active_cctab} { #1 }} % === Fails: I get errors about missing $ % \def\mygset{\str_gset:Nn} % \def\mystring{\g_my_string} % \tl_rescan:nn {\cctab_select:N \c_my_active_cctab} { \mygset \g_my_string {#1} } % In my code the write might arrives much later, even in another function, hence the use of a string % === See that the input is not the expected one: \str_set:Nn \g_my_string {#1} \iow_now:NV \g_my_write \g_my_string \iow_close:N \g_my_write }

\ExplSyntaxOff

\usepackage{verbatim}

\begin{document}

%%% Level 1 \myExactWrite{ Hello % This is a comment that should not be removed I want precisely this spacing, and I want to be allowed to use any text }

% %% Level 2 % \myExactWrite{ % Hello % This is a comment that should not be removed % I want precisely this spacing, and I want to be allowed to use any text, like 1 + (1*2), héhé, % or: % def mycode(my_variable_with_underscore="hey"): # This is a comment (with a single sharp) % print(my_variable_with_underscore) % or latex macros like $\array{a & b\c & d}$. And ideally unbalanced {} but I heard it was difficult % (maybe by replacing special tokens like \MYCLOSINGBRACE and \MYOPENINGBRACE in the final string?). % }

The content of the file is:

\verbatiminput{my-output.txt}

\end{document}

tobiasBora
  • 8,684
  • 4
    If \myExactWrite is not expected to work in the argument of another command, you could read the argument verbatim and write exact text (like filecontents) so rescan not needed. If you do need it to work in arguments, you can use \scantokens or \detokenize to make things approximately OK in different ways, but it can never exactly re-create the source, as the information has gone. \foo\bar, \foo \bar, {\catcode\!= 0 !foo!bar}` all will look the same. – David Carlisle Mar 21 '23 at 10:11
  • 1
    I don't think it's possible for you to create an exact write which can be used in a parameter. Since TeX would remove certain characters when tokenizing that parameter (and so also the parameter of the exact write) like % and newlines and certain spaces. – Slurp Mar 21 '23 at 11:16
  • @DavidCarlisle thanks, but I'm actually interested by cases where the function might be inside other functions and/or environements like align, fbox… I tried to use the LaTeX3 version of scantokens but seems like I'm not doing it properly. In LaTeX3 do I also need to use this trick with two macros, or is it only made for plain e-TeX? – tobiasBora Mar 21 '23 at 14:57
  • 1
    The issue with that is that TeX gets rid of certain characters when it tokenizes parameters, so for example something like \fbox{\exactwrite{file.txt}{%text}} will never work, since while getting the parameter for \fbox, the comment will be ignored and you'll end up with an error about unbalanced braces. – Slurp Mar 21 '23 at 15:04
  • 1
    @Slurp I see… then let's try to get the best of what is possible ^^' (why LaTeX are you so mean…) – tobiasBora Mar 21 '23 at 15:07
  • 1
    If your problem is restricted to \fbox (which I'm sure it's not) you could define your own version like \def\newfbox{\bgroup\setbox0=\hbox\bgroup\aftergroup\newfboxA\let\next} \def\newfboxA{\fbox{\box0}\egroup}, which doesn't take any parameters and so you don't need to worry about tokenization. Similar solutions exist for other similar macros. And an exactwrite macro should work in an environment like align since it's an environment and not a macro, so it doesn't tokenize your stuff – Slurp Mar 21 '23 at 15:24
  • @Slurp no, align is (really) a macro, it scans the body first so it can evaluate it in two passes, you can not use \verb etc – David Carlisle Mar 21 '23 at 15:42
  • Oh my bad, I had thought it was similar to \eqalign/no but it closed the group in \endalign instead of taking a parameter. – Slurp Mar 21 '23 at 16:00

4 Answers4

5

I have never used Expl3, and I don't know how, so here's my answer in plain (e-)TeX. It seems to work decently well.

\newwrite\myWrite

% \verbatimwrite{<file name>}{<text ...>} \def\verbatimwrite#1{\bgroup% % Begin setting up verbatim \catcode\^^M=12\relax% \def\do##1{\catcode##1=12\relax}% \dospecials% % Allow capture of next parameter \catcode\{=1\relax \catcode}=2\relax% \verbatimwriteA{#1}% }

\long\def\verbatimwriteA#1#2{% % Finish setting up verbatim \catcode\{=12\relax \catcode}=12\relax% % Dont expand EOF token \everyeof={\noexpand}% % Newlines in \write \newlinechar=`^^M\relax% % All the \write-ing stuff \immediate\openout\myWrite #1\relax% \immediate\write\myWrite{\scantokens{#2}}% \immediate\closeout\myWrite% \egroup}

\verbatimwrite{test.txt}{ Hello % This is a comment that should not be removed

I want precisely this spacing, and I want to be allowed to use any text

}

\verbatimwrite{test1.txt}{ Hello % This is a comment that should not be removed I want precisely this spacing, and I want to be allowed to use any text, like 1 + (1*2), héhé, or: def mycode(my_variable_with_underscore="hey"): # This is a comment (with a single sharp) print(my_variable_with_underscore) or latex macros like $\array{a & b\c & d}$. And ideally unbalanced {} but I heard it was difficult (maybe by replacing special tokens like \MYCLOSINGBRACE and \MYOPENINGBRACE in the final string?). }

This code does have the effect that the newline after the { for the text is added to the file. You could add some sort of test before the #2 which gobbles a character, checks if it's ^^M and if not puts it back in the stream.

If you want to allow for unbalanced braces, this can be done with a bit more complexity by using a \beginverbwrite...\endverbwrite construct, setting everything verbatim in \beginverbwrite and then gobbling through the tokens and appending them to a tokenlist. You'd have to check each time that you gobble a \ if the following characters are endverbwrite, which would have to write the tokenlist to the file and end the verbatim. You can't just pass the text as a parameter to a macro because you would have no idea when the parameter ends, hence the use of a \begin...\end construct.

This is a little more complicated to do, but I think I may have some code lying around somewhere which I could alter a bit to do this. If you're interested, tell me and I'll try to do this (it may take some time though, I may not be able to start immediately as well).

Slurp
  • 876
  • 1
    Thanks a lot! It's a great solution (even if I have to admit I'd prefer a LaTeX3 solution for multiple reasons), the only issue I see is that # is turned into ##. I asked this question and I got an answer in LaTeX3 https://tex.stackexchange.com/questions/680126/token-read-write-turns-into/680129#680129 So combining LaTeX and LaTeX3 might bring me closer to my goal ^^' – tobiasBora Mar 21 '23 at 14:51
  • 1
    If someone gives me a solution based on LaTeX3 that deal with the ## issue, I'll accept it, otherwise I'll accept yours. – tobiasBora Mar 21 '23 at 14:58
  • 3
    @tobiasBora I fixed the macro, it should work with # now (it also added spaces after control sequences even if there wasn't supposed to be any, now it doesn't). I just set up the verbatim in \verbatimwrite but left changing the catcodes of { and } to \verbatimwritaA – Slurp Mar 21 '23 at 15:00
3

Reading the items verbatim, per @davidcarlisle's comment, seems OK at first glance. Are you able to check/confirm?

expl3verb

Uses the expl3 +v parameter type (=multi-par verbatim).

A lot will depend on the font.

MWE

\documentclass[options]{article}

\ExplSyntaxOn

\iow_new:N \g_my_write

\cs_generate_variant:Nn \iow_now:Nn { NV }

\str_new:N \g_my_string

\NewDocumentCommand{\myExactWrite}{+v}{ \iow_open:Nn \g_my_write {my-output.txt} % In my code the write might arrives much later, even in another function, hence the use of a string % === See that the input is not the expected one: \str_set:Nn \g_my_string {#1} \iow_now:NV \g_my_write \g_my_string \iow_close:N \g_my_write }

\ExplSyntaxOff

\usepackage{verbatim}

\begin{document}

%%% Level 1 \myExactWrite{ Hello % This is a comment that should not be removed I want precisely this spacing, and I want to be allowed to use any text }

% %% Level 2 \myExactWrite{ Hello % This is a comment that should not be removed I want precisely this spacing, and I want to be allowed to use any text, like 1 + (1*2), héhé, or:

def mycode(my_variable_with_underscore="hey"): # This is a comment (with a single sharp) print(my_variable_with_underscore) or latex macros like $\array{a & b\c & d}$. And ideally unbalanced {} but I heard it was difficult (maybe by replacing special tokens like \MYCLOSINGBRACE and \MYOPENINGBRACE in the final string?). }

The content of the file is:

\verbatiminput{my-output.txt}

\end{document}


Correction: +v belongs to xparse (now part of the kernel), not expl3.

A v-parameter command can take two of the same character as argument delimiter (like \verb does), or a {} pair.

So, using ibex as the delimiter (for example), we can unbalance { and } all over the place, from what I can see:

Using single character

MWE

\documentclass{article}

\ExplSyntaxOn

\str_new:N \g_my_string \iow_new:N \g_my_write \cs_generate_variant:Nn \iow_now:Nn { NV }

\NewDocumentCommand { \myOpen }{ m }{

\iow_open:Nn \g_my_write { #1 } }

\NewDocumentCommand { \myClose }{ }{
\iow_close:N \g_my_write

}

\NewDocumentCommand { \myExactWrite } { +v } {

\str_set:Nx \g_my_string { #1 } \iow_now:NV \g_my_write \g_my_string }

\ExplSyntaxOff

\usepackage{verbatim}

\begin{document}

\myOpen{my-output.txt}

%%% Level 1 \myExactWrite [1]:: Hello % This is a comment that should not be removed I want precisely this spacing, and I want to be allowed to use any text

% %% Level 2 \myExactWrite { [2]:: Hello % This is a comment that should not be removed I want precisely this spacing, and I want to be allowed to use any text, ==> like 1 + (1*2), héhé, or: def mycode(my_variable_with_underscore="hey"): # This is a ==> comment (with a single sharp) print(my_variable_with_underscore) or latex macros like $\array{a & b\c & d}$. And ideally unbalanced {} but ==> I heard it was difficult (maybe by replacing special tokens like ==> \MYCLOSINGBRACE and \MYOPENINGBRACE in the final string?). }

\myExactWrite ----|----|----|----|----|----|----| ~!@#$%^&*()_+

----|----|----|----|----|----|----| ' ``'' -- ---1234567890-=

----|----|----|----|----|----|----| }}} [ { {}| []\ ,./ <>? :'' ;'

----|----| zxcv

\myClose

The content of the file is:

\verbatiminput{my-output.txt}

\end{document}

Cicada
  • 10,129
  • 1
    Oh thanks, but I'm surprised, it works for you? For me I get all newlines replaced with ^^M in the file (like 3 characters). I tried to do \str_replace_all:Nnn \g_my_string {\^^M} {\newlinechar} but it fails. Also, I initially wanted to avoid this as it fails inside other macros, but people made me realize that it's anyway impossible to get it work inside macros, so if we can make the – tobiasBora Mar 22 '23 at 08:18
  • 2
    I think I get the same result as @tobiasBora, newlines are marked as ^^M and all the code is listed on one line at input. – daleif Mar 22 '23 at 10:36
  • 2
    @Cicada An adaptation of the suggestion made in one comment of this post : add \cs_generate_variant:Nn \str_replace_all:Nnn { Nxx } , then use \str_replace_all:Nxx \g_my_string { \iow_char:N \^^M } { \iow_char:N \^^J } just before \iow_now. Then the newlines will be added as expected. – projetmbc Mar 22 '23 at 11:11
  • @tobiasBora I use Windows, if that's relevant. I understand that, there, CRLF / x0Dx0A always form a pair. Under Unix they are separable. – Cicada Mar 24 '23 at 11:46
  • Oh, that make sense, this reminds me that I'll really need to test my code on various OS… Thanks! – tobiasBora Mar 24 '23 at 23:26
  • @projetmbc The code as-is works OK as expected on Windows: (a) delimiter ( ) at start of line = no line in output; (b) space+delimiter=line in output; (c) empty line+delimiter at start of next line=line in output. I'll check what the adaptation suggestion does. – Cicada Mar 25 '23 at 04:50
  • @projetmbc The code adaptation outputs the line the delimiter is on: that is, case (b) and `` (a) both produce a line in the output. There could be a use-case for that. Perhaps it's a workaround (or was?). Manually adjusting a built-in function otherwise sounds somewhat counter-intuitive in the long perspective. – Cicada Mar 25 '23 at 05:04
2

I found another (certainly dirty, but at least the code is conceptually simpler and quite resilient as it works for non-balanced braces and can deal with gobble) solution based on the xsim package (or it's lighter subset xsimverb). The idea is to use xsimverb to write into a file, and then we read that file using LaTeX3 commands and put them into strings.

enter image description here

Note that it will fail inside macros, but this is expected since as soon as we deal with non-latex code the outer macros will anyway remove all comments etc… so I prefer in that case an error than some weird removal of some characters.

\documentclass{article}

\usepackage{verbatim} \usepackage{xsimverb}

\ExplSyntaxOn

\cs_generate_variant:Nn \iow_now:Nn { NV }

\iow_new:N \g_robExt_write \ior_new:N \g_robExt_read_ior

\NewDocumentEnvironment{robExtNamedTemplate}{}{\XSIMfilewritestart*{test.tmp}}{ \XSIMfilewritestop \ior_open:Nn \g_robExt_read_ior {test.tmp} \str_gclear:N \g_robExt_mystring %% Loop on all lines of the file: \ior_str_map_inline:Nn \g_robExt_read_ior { \str_gput_right:Nx \g_robExt_mystring {\tl_to_str:N{##1}^^J} } }

\NewDocumentCommand{\saveStringAndPrintFile}{O{}}{ \message{E} \iow_open:Nn \g_robExt_write {test-out.tex} \message{F} \iow_now:NV \g_robExt_write \g_robExt_mystring \message{G} \iow_close:N \g_robExt_write \message{H} \verbatiminput{test-out.tex} }

\ExplSyntaxOff

\begin{document}

\begin{robExtNamedTemplate}

This is a comment

def my_function(): a = {} a = {b} return a+b % 2
\end{robExtNamedTemplate}

\saveStringAndPrintFile

\end{document}

tobiasBora
  • 8,684
2

Some facts about TeX and LaTeX that should be taken into consideration:

  • All programming in TeX actually is based on so-called tokens.

    In TeX tokens can be control sequence tokens or explicit character tokens (with properties category and character code).

    Tokens can come into being as follows:

    • By having TeX read lines of a .tex-input-file and tokenize them or (in interactive mode) by having TeX read lines typed at the console and tokenize them.
      (\scantokens can be used for simulating the unexpanded-writing of tokens to file and then loading that file via \input.)
    • In the stage of expansion by expanding expandable tokens, e.g., macros or things like \the⟨register or parameter⟩, etc.
    • With LuaTeX-based TeX-engines by having the Lua-backend push tokens into the token-stream, e.g., via token.put_next(token.create(...)...).

    A mechanism based on macro-programming in TeX (and thus focussed on tokens!) for exact re-creation of arbitrary portions of .tex-input from tokens that came into being due to tokenizing these portions of .tex-input is not possible because in the stage of processing tokens via macros information about how the tokens processed by the macros came into being is not available. Thus in case these tokens came into being due to having TeX read/pre-process lines of a .tex-file and tokenize them, information on what these lines of .tex-input looked like is not available, either.

    Portions of .tex-input can only be re-created if at the time of tokenization it was ensured by adjusting catcode-régime and \endlinechar

    • that no character of the source is just dropped during pre-processing and tokenization,
    • that each character of the source only yields a (character-)token from whose properties (character-code) you can deduce the corresponding character in the .tex-input-file,
    • that information about ends of lines/linebreaking is not lost.

    (No character being dropped during pre-processing is a crucial point because in any case space characters at the right ends of lines of .tex-input are dropped. For more details about pre-processing see below.
    Besides this—but these are things which can be handled by adjusting the catcode-régime before tokenization—in case TeX is not gathering the first character of the name of a control sequence token,

    • characters of category code 9 are dropped.
    • characters of category code 10(space) are dropped if TeX's reading-apparatus is in state S(skipping blanks) or N(new line).
      (The latter is the reason why under normal catcode-régime you can use space-characters and horizontal-tab-characters, which in normal catcode-régime have category code 10, for indenting your code.)
    • characters of category code 10(space) in case the reading-apparatus is in state M (middle of line) are tokenized as explicit space tokens, i.e., as explicit character tokens of category 10(space) and character code 32. In case a character has category code 10(space), the resulting character token has character code 32 no matter what number the codepoint in TeX's internal character encoding scheme the character in question has.
    • what token comes into being when tokenizing a character of category 5(end of line) depends on the state of the reading-apparatus.
    • characters on the same line following a character of category code 5(end of line) are dropped.
    • characters of category 14(comment) and subsequent characters on the same line are dropped.
    • TeX does not like to encounter characters of category code 15 (invalid).
      ...        )
  • The result of (unexpanded) writing a control sequence token depends on the value of the integer-parameter \escapechar. This is also the case when applying \string, \detokenize, \scantokens, \meaning, \show.

  • When (unexpanded) writing a control-word-token, TeX appends a space-character. Even if in the .tex-input-file there was no space. E.g., under normal catcode-regime the input \TeX\TeX is tokenized as two control-word-tokens \TeX. Unexpanded-writing them yields the character-sequence \TeX␣\TeX␣— ␣ denotes a space-character. TeX does not append a space-character when (unexpanded) writing a control-symbol-token. This also applies with \scantokens and \detokenize.

  • Explicit space-tokens (explicit character tokens of category 10(space) and character code 32) are dropped while gathering the first token of an undelimited macro argument.

  • Hashes, i.e., explicit character tokens of category 6(parameter) are doubled when they get written to text file or screen. This also applies with \scantokens and \detokenize.

  • When LaTeX reads a line of .tex-input, some pre-processing takes place before tokenizing :

    1. The characters are converted from the computer-platform's character representation scheme to TeX's internal character representation scheme which with traditional TeX engines is ASCII and which with LuaTeX-based and XeTeX-based TeX-engines is unicode whereof ASCII is a strict subset.

    2. All space characters (and with TeX-implementations based on some Web2C-releases also all horizontal-tab-characters) at the right end of the line are removed. There is no way of getting around that removal of spaces at line-ends, not even with switching to verbatim-mode. (Switching to verbatim-mode implies temporarily changing the catcode-régime which in turn affects tokenizing which in turn takes place after the line of .tex-input is pre-processed.)

    3. A character is appended at the right end of the line whose codepoint in TeX's internal character representation scheme equals the value of the integer parameter \endlinechar.
      If \endlinechar has a value outside the range of codepoints available in the the TeX-engine's internal character representation scheme, then no character is appended at the right end of the line.
      Usually the value of \endlinechar is 13 which denotes the carriage-return-character. Usually the category code of the carriage-return-character is 5 (end of line) which means that TeX behaves as follows when encountering it during tokenization:

      If TeX is gathering the first character of the name of a control sequence token, TeX will insert into the token stream a control-symbol-token whose name is formed by the carriage-return-character, so-to-say a "control-return".

      If TeX is not gathering the first character of the name of a control sequence token, TeX drops the remaining characters of the line and in case the reading-apparatus is in state S (skipping blanks) does not append any token to the token-stream, in case the reading apparatus is in state M (middle of line) does append a space token (character code 32, category 10(space)) to the token-stream, in case the reading apparatus is in state N (new line) does append the control-word-token \par to the token-stream no matter what the current meaning of \par is.
      That's why an empty line usually yields \par: Like any line the empty line gets the endline-character appended, which usually is the carriage-return-character. When encountering that carriage-return-character, no other token has been produced from characters of that line, thus the state of the reading-apparatus is in state N while encountering a character of category 5 (end of line). Ergo does TeX append the control-word-token \par to the token-stream.

  • When TeX writes character-tokens to file, depending on the underlying TeX-engine in use (traditional (pdf)TeX/XeTeX/LuaTeX) and depending on the settings for character-translation (those .tcx-files where you can specify what characters to write in ^^-notation) character-translation takes place so that with some TeX-engines the carriage-return-character, which is considered somewhat special, is written in ^^-notation as ^^M and with other engines the carriage return character is written as the corresponding ASCII-byte/utf8-byte-sequence.

  • When TeX writes tokens to file or screen, explicit character tokens whose character code equals the number of the integer-parameter \newlinechar are not written but are taken for directives for continuing writing at the begin of another line. Usually \newlinechar has the value 10 which denotes the line-feed-character (codepoint 10 both in ASCII and in unicode; ^^J in ^^ notation; J is the 10th letter in the latin alphabet).

  • When LaTeX switches to verbatim-catcode-régime, also with +v-argument-type, the category-code of the horizontal-tab-character (codepoint 9 in ASCII and in unicode, \^^I in TeX's ^^-notation while I is the 9th letter in the latin alphabet) is left unchanged. I.e., in verbatim-catcode-régime the category code of the horizontal-tab-character is 10(space) which in turn implies that in verbatim-catcode-régime horizontal-tab-characters are tokenized as explicit space tokens (character code 32, category 10(space)) which in turn implies that they are not written as horizontal-tab-characters but are written as space-characters. This can be resolved by switching the category-code of the horizontal-tab-character to 12(other) when having stuff tokenized under verbatim-catcode-régime for writing to external text file.

  • When LaTeX switches to verbatim-catcode-régime, the carriage-return-character gets category code 12(other) so that in verbatim-mode carriage-return-characters appended to lines of .tex-input due to the \endlinechar-mechanism get tokenized as ordinary character tokens of category 12(other).
    When it comes to writing such ordinary return-character-tokens, depending on the engine in use and the character-translation in effect, they might be written either in ^^-notation as ^^M or as the corresponding ASCII-byte/utf8-byte-sequence.

  • Usually the carriage-return-character within the set of characters that forms a pre-processed line of TeX-input only occurs at the right end, due to the \endlinechar-mechanism. Therefore you can circumvent the need of writing carriage-return-characters completely by saying \newlinechar=\endlinechar/\tex_newlinechar:D=\tex_endlinechar:D when it comes to writing. This in turn implies that you need to know the moment in time when it comes to writing, which is easy when writing takes place immediately/in terms of \immediate/\tex_immediate:D, but which is not that easy when writing is delayed until the output-routine ships out another page.
    But this way at writing-time carriage-return-characters are not written explicitly (be it as ASCII-bytes/utf-8-byte-sequences or in ^^-notation) but they just signal that writing shall be continued at the begin of another line. This way they trigger whatsoever platform-specific action needs to be triggered with your TeX-installation on the computer-platform in use for continuing writing at the begin of another line.

    As projectmbc pointed out, instead of saying \newlinechar=\endlinechar/\tex_newlinechar:D=\tex_endlinechar:D, you may consider replacing in the string all ^^M-characters by ^^J-characters as this will also ensure preservation of proper line-breaking when it comes to writing the string to file—s.th. along the lines of \str_replace_all:Nxx \g_my_string { \iow_char:N \^^M } { \iow_char:N \^^J } before it comes to writing via some variant of \iow_now:Nn. I suppose that is a better approach because replacing can be done at any time so that this technique can also be used when it is about delayed-writing.

  • A problem with expl3's \iow_now:Nn and variants is that these commands internally via \int_set:Nn ensure that at writing-time \newlinechar denotes the line-feed-character. This makes it hard to have \newlinechar=\endlinechar at writing-time. You could temporarily neutralize \int_set:Nn by redefining it as a macro which just gobbles its arguments, but that would be an ugly hack. I suggest resorting to TeX-primitives \tex_immediate:D \tex_write:D instead.

This is what I might probably do if it must be expl3:

\documentclass{article}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Some margin-adjustments so that the verbatim input fits on the page: % These adjustments are sloppy and only for this example. % E.g., parameters for \marginpar are not adjusted as \marginpar % is not used with this example. \oddsidemargin=1cm \textwidth=\paperwidth \advance\textwidth-2\oddsidemargin \advance\oddsidemargin-1in \evensidemargin=\oddsidemargin %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\ExplSyntaxOn

\iow_new:N \g_my_write \str_new:N \g_my_string

\NewDocumentCommand{\myExactWrite}{}{ \group_begin: % +v-type-argument/verbatim-mode does not do this, so let's % turn horizontal tab from space to other and then fetch % the +v-argument by calling another macro. (Otherwise % horizontal-tabs will be written as space characters.) \char_set_catcode_other:N ^^I % \tex_newlinechar:D=\tex_endlinechar:D % projectmbc suggested replacing all ^^M by ^^J instead. % I think that would be a better approach because this can also % be done in combination with delayed-writing. \iow_open:Nn \g_my_write {my-output.txt} \myInnerExactWrite }

\NewDocumentCommand{\myInnerExactWrite}{+v}{ \str_set:Nn \g_my_string {#1} %\str_show:N \g_my_string \exp_args:NnV \use:n {\tex_immediate:D \tex_write:D \g_my_write} \g_my_string \group_end: \iow_close:N \g_my_write }

\ExplSyntaxOff

\usepackage{verbatim}

\begin{document}

%%% Level 1 \myExactWrite{ Hello % This is a comment that should not be removed I want precisely this spacing, and I want to be allowed to use any text }

% %% Level 2 \myExactWrite{ Hello % This is a comment that should not be removed I want precisely this spacing, and I want to be allowed to use any text, like 1 + (1*2), héhé, or:

def mycode(my_variable_with_underscore="hey"): # This is a comment (with a single sharp) print(my_variable_with_underscore) or latex macros like $\array{a & b\c & d}$. And ideally unbalanced {} but I heard it was difficult (maybe by replacing special tokens like \MYCLOSINGBRACE and \MYOPENINGBRACE in the final string?). }

\noindent The content of the file is:

\verbatiminput{my-output.txt}

\end{document}

enter image description here

Be aware:

enter image description here

Ulrich Diez
  • 28,770
  • I proposed a small adjustment of the Cicada solution. This gives a small LaTeX3 code that seems to work. – projetmbc Mar 22 '23 at 20:27
  • Awesome! I'm always impressed by the high complexity of LaTeX ^^ It seems so overly complicated, with so many edge cases… Anyway, that's really nice, thanks a lot! – tobiasBora Mar 22 '23 at 21:07
  • 1
    @tobiasBora We can use a simpler code. See me comment after the Cicada solution. – projetmbc Mar 22 '23 at 22:10
  • @projetmbc that's great, thanks a lot! feel free to add it as its own answer, it might help other people later :-) – tobiasBora Mar 22 '23 at 23:19
  • @projetmbc After thinking about it for a while I came to the conclusion that yours (replacing ^^M by ^^J) is the better approach as this can be done any time so that you are not bound to \immediate-writing. Thank you for pointing this out. – Ulrich Diez Mar 23 '23 at 14:57