5

My first exposure to LaTeX was back in 2014 from this tutorial and I settled with the pdfTeX engine because it was the only thing I knew and I was a total novice.
Lately, I started reading Tobias Oetiker's The Not So Short Introduction To LaTeX from CTAN and realized that I have been unaware of the existence of so many things such as the polyglossia package as well as XeTeX and LuaTeX.

I am very curious about these two new engines and started browsing this site to know more. From my understanding, these two support natively UTF-8 encodings but I usually typeset documents in English and in Malagasy. Thus, ASCII characters are more than enough for my everyday use. Apart from the everyday use, I also plan to use LaTeX to typeset my thesis (in English) for my final year.

All that context being said, I am curious to know if there should be a reason for me to start using either of these two engines or wether pdfTeX is amply sufficient for my indented use and if I should just stick with it.

  • 2
    An old post, but relevant: https://tex.stackexchange.com/q/70 – Thérèse Jun 08 '20 at 20:00
  • And if you care about the size of PDF files, OpenType fonts, especially the PostScript-flavored ones, produce smaller files. – Thérèse Jun 08 '20 at 20:01
  • I don't think I will need to fiddle with fonts as I am satisfied with the default one. Yet, I don't get what you mean by PostScript-flavored fonts producing smaller PDF file size. – billyandriam Jun 08 '20 at 20:08
  • Some OpenType fonts are PostScript-flavored, others TrueType-flavored. Has to do with the kind of outline used (cubic or quadratic). – Thérèse Jun 08 '20 at 20:12
  • 6
    for xetex it's just about the fonts really, whether you want to use the same system fonts as you would use in a browser or word processor etc. For luatex there is that plus whether you want to use the embedded Lua scripting. No one can tell you the answer to either really. – David Carlisle Jun 08 '20 at 20:30
  • @DavidCarlisle I do not plan to switch fonts for now, but your clarification is valuable. After that, should I infer that using luatex without any knowledge of Lua is comparable to using xetex in the end. Or? – billyandriam Jun 08 '20 at 20:41
  • well we (latex maintainers) try to make latex run as far as possible in a compatible way on luatex/xetex/pdftex so hopefully from a user point of view yes if you ignore lua then xelatex and lualatex are broadly equivalent although the implementation details are quite different. – David Carlisle Jun 08 '20 at 20:43
  • 2
    If you are merely writing documents in only the English language and there are only very few or no accented characters at all and you are happy with the default font, then there is no real reason to use LuaTeX or XeTeX. – Henri Menke Jun 08 '20 at 22:38
  • 2
    For your goals (default font, english) switch to xelatex/lualatex could mean in practice avoid some errors due to copy & paste some utf8 characters even if you see only plain text, i.e, in theory, this simple document: \documentclass{article}\begin{document}a​b\end{document} should compile with pdflatex, but in practice will produce a nice fatal error because the text is not "ab", but "a+<U+200B>+b". But on the other hand, switch mean also be prepared for some minor non-equivalences (e.g: some features of microtype package). – Fran Jun 09 '20 at 03:13
  • @Fran That's interesting! Is it possible to declutter the file and get rid of these lurking things? Or should I just read the error messages and delete the U+200Bs one by one? – billyandriam Jun 10 '20 at 13:43
  • David Carlisle mentioned Lua scripting---if you do scientific plotting in your work, scripting is a huge advantage of LuaLaTeX. Two very useful articles on that aspect are this one by Montijano et al. and this one by Menke. – John Jun 11 '20 at 00:57
  • 1
    Your follow-up question about U+200B has been asked several times over the years, but it’s never gotten an answer I liked. So, I finally wrote my own. – Davislor Jun 11 '20 at 09:47

3 Answers3

7

Some of the advantages include:

  • Being able to use unicode-math and copy, paste and search for math symbols
  • Not being limited to sixteen math alphabets
  • Not having to juggle 8-bit, or even 7-bit, text encodings
  • Being able to type symbols into your source code and have them work without a lot of set-up to declare them active
  • You can use any font on your machine without a complicated conversion to Type 1 format
  • Certain LaTeX3 interfaces only function properly if the engine supports Unicode natively
  • Even in English, you will often use non-ASCII characters, such as opening and curling quotes, dashes, ligatures and the occasional accent. You could theoretically make these copyable and searchable in PDFLaTeX with the mmap or cmap package. But I never see anyone do that, and I’ve frequently seen papers with typos like “di cult” because someone used a font with no ffi ligature.
  • You can use the extensions of the engines, such as XDV output (useful for document conversion) and Lua scripting.

A major application of this is accessibility. If a reader can identify a symbol, it can pronounce it for a visually-impaired user, as well as being able to convert it to another format.

Davislor
  • 44,045
  • I think the fact the Unicode doesn't support super- and subscripts severely limits the usefulness of “being able to copy, paste and search for math symbols”. Also, there is no 7-bit TeX engine in use in the wild anymore, at least to my knowledge. – Henri Menke Jun 08 '20 at 22:34
  • @HenriMenke But there are still many documents using OT1/OMS. You can use realscripts or something like that to get many Unicode superscripts, but a major application of being readable as Unicode is accessibility. If a reader can recognize a symbol, it can pronounce it for a visually-impaired user. – Davislor Jun 08 '20 at 22:50
  • Unicode only supports very few “inferiors” and “superiors” (most fonts only have numbers), so complex super- and subscripts, like G_{\mathbb{R}} will be mapped in the same way as G \mathbb{R}. – Henri Menke Jun 08 '20 at 22:59
  • @HenriMenke This is true. For many purposes, that’s still beneficial. A PDF-to-speech app would pronounce that something like “O-Reals,” which a visually-impaired student could probably interpret in context. – Davislor Jun 09 '20 at 00:56
  • @Davislor -- "G-Reals" sounds like a very good idea. Do you know of any application(s) where the work has been done to make this a reality? – barbara beeton Jun 09 '20 at 20:02
  • @barbarabeeton I’d have guessed you’d know more about it than me! – Davislor Jun 09 '20 at 22:35
  • @Davislor -- Oh. I definitely have (sometimes great) gaps in my knowledge. One such gap is in tools and facilities that make (La)TeX accessible to visually impaired (potential) users. I have the impression that most mathematicians are simply oblivious to the fact that there may be a problem; I'm giving them the benefit of the doubt by not accusing them of not caring. – barbara beeton Jun 09 '20 at 22:58
  • @barbarabeeton Based on some very superficial research, an accessible PDF document is supposed to enclose all math equations in a FORMULA tag and provide ALT text for screen readers. Apparently, JAWS and some other readers are able to read MathML formulas. It seems like it should be possible, in principle, to generate MathML tags automatically from LaTeX expressions. – Davislor Jun 09 '20 at 23:18
  • @Davislor -- I do know about MathML, but the problem is that many math symbols have rather different meanings in different math areas. Quite a few times I have approached different mathematicians to ask about the possibility of creating a glossary of symbol+field+meaning. Invariably, the answer was a resounding "no!" This is not encouraging. Without that knowledge, MathML is at a disadvantage. I'm not surprised that blind math students prefer writing math as LaTeX -- the vocabulary and syntax are quite close to what their professors are speaking ... by design. – barbara beeton Jun 09 '20 at 23:22
  • @Davislor -- I'll try to look into JAWS. Thanks. – barbara beeton Jun 09 '20 at 23:24
  • @Davislor You have raised a fascinating point on the importance of symbol support and making documents for visually-impaired users. Nonetheless, I still would like to declutter my file from lurking useless symbols (such as Zero-width Spaces U+200B which @Fran has raised in the comment above). Is there any method to remove these things? – billyandriam Jun 10 '20 at 13:41
2

To answer to how get rid the U+200B character using pdflatex:

The 'ZERO WIDTH SPACE' (U+200B) as the name suggest, is a space without space, but you can note that the character is there because you need press the cursor key twice to pass to the next/previous character.

This causes problems because pdflatex does not know what to do with that, unlike xelatex and lualatex.

To clean it you can use any text tool able to and search and replace this character in all the docuemnt. Only as example, Texworks or Gummi in Linux allow type the character with:

Ctrl+Shift+u200BEnter

Then, you can copy and paste in the search tool and replace with nothing some other character to see where it was. If you have problem with this, other solution is tell to pdflatex what to do. Consider this example:

\documentclass{article}
\usepackage{xcolor}
\DeclareUnicodeCharacter{200B}{ \colorbox{yellow}{\sffamily\bfseries u+200B}
\typeout{}\typeout{WARNING: Bad character U+200B in the line \the\inputlineno}\typeout{}}
\begin{document}
a​b

cd

e​f

asasa \end{document}

This will show these warnings in the log file:

WARNING: Bad character U+200B in the line 6

WARNING: Bad character U+200B in the line 10

And the PDF will show also where they are:

mwe

But probably is better leave it as it really is, and forget it:

\DeclareUnicodeCharacter{200B}{\hspace{0pt}} 
Fran
  • 80,769
0

This is an already closed topic, but I want to add a practical example. Several years ago, I wanted to use LaTeX to write a CV through modercv class. But added some hck to improve the machine readibility of the documents by those Applicant Tracking Systems softwares, that strip all your formatting, and keeps the text. The ligatures from LaTeX sometimes kills this conversion. And I was using some icons on my CV that were also creating issues. Switching from pdflatex to lualatex solved many of these issues. Most of the stuff is documented on this old post How to improve machine-readability of a CV created in LaTeX with moderncv?

phollox
  • 554