2

Looks like the stringstrings package can do this:

\substring{This is a string.}{1}{4}

gives This, but it seems to be failed for non-ASCII characters:

\substring{这是一个句子。}{1}{4}

cannot give 这是一个.

Is there a similar package working for Chinese characters?

Mico
  • 506,678
oaheix
  • 101
  • Please tell us which TeX engine (pdfLaTeX, XeLaTeX, LuaLaTeX, something else?) you employ and which language-related packages (if any) you load. – Mico May 16 '18 at 11:41
  • XeLaTeX, with xecjk, xunicode, fontspec packages used, thanks. – oaheix May 16 '18 at 11:43
  • I think that this happens because there are no spaces between chinese characters. So the command works like the whole string is a word. – gvgramazio May 16 '18 at 11:46
  • 1
    @giusva - More likely, it's because (a) in the utf8 encoding system, Chinese characters take up more than 2 bytes and (b) the stringstrings package is not sufficiently utf8-aware. (The package's user guide says that it can handle some 2-byte-encoded characters; however, that's not full utf8-awareness.) – Mico May 16 '18 at 12:10

1 Answers1

3

If you can switch to LuaLaTeX, it's straightforward to create a "wrapper" LaTeX macro that invokes the Lua function unicode.utf8.sub. (unicode.utf8.sub is a utf8-aware version of the standard string.sub function.)

enter image description here

\documentclass{article}
\usepackage{fontspec}
\setmainfont{MingLiU} % or whatever font you prefer
\newcommand\substring[3]{\directlua{%
   tex.sprint( unicode.utf8.sub ( "#1", #2 , #3 ) ) }}
\begin{document}
这是一个句子。

\substring{这是一个句子。}{1}{4}
\end{document}
Mico
  • 506,678
  • 2
    Thanks very much, but my case is highly dependent on the xecjk package, which does not work with LuaLaTeX. If there were no other solutions using XeLaTeX in the near future, I think I should mark your answer as accepted. – oaheix May 17 '18 at 00:29