2

When using chars (lists of characters, thus atoms of length one) to represent text, we have the following options for writing them within terms:

  • "First," the double quoted list notation (6.3.7) is the most efficient one, requiring at least n+2 characters. But it can only be read back if the Prolog flag double_quotes is set to chars.

  • ['N',e,x,t,','] comes the list notation with at least 2n+1 characters. While it is nice and relatively compact, it implies that also operators are used when writing other data since it is enabled with ignore_ops(false), and this necessitates that the same operators will be present when reading, making it quite brittle.

  • '.'('L','.'(a,'.'(s,'.'(t,'.'(',',[]))))) the canonical notation which uses functional form also for lists requiring at least 7n+2 characters. That is a lot, but for interoperability (and that includes interoperability with the same system) it is best since it neither depends on the double_quotes flag nor the various operator declarations.

Writing chars in canonical notation can be done in constant space. But for reading, the situation is a bit more tricky. After all, a sequence starting with '.'(a, may also refer to a term '.'(a,Further,b). So a naive reading will have to wait (and use space) until the entire list of chars is read in. On the other hand, it seems to be a safe bet that '.'(a, will be a list constructor '.'(a,Further). In other words,

How to read a term in canonical notation with constant auxiliary space for the reading of chars within?

In case it helps just consider terms sampleterm/1. So consider the reading of all such terms written in canonical form. And, if you like, formulate it as a DCG.

sampleterm([]).
sampleterm(a).
sampleterm(b).
sampleterm('.'(E,Es)) :- % the real list constructor 
   sampleterm(E),
   sampleterm(Es).
sampleterm('.'(E,F,G)) :- % no list constructor
   sampleterm(E),
   sampleterm(F),
   sampleterm(G).

If such space efficient reading is possible, then systems that support a compact internal representation of chars like Scryer and Trealla could even go a tiny step further.

Ah, lest I forget what I have tried: read/1 indeed, but currently it was not ideal.

false
  • 10,533
  • 12
  • 98
  • 192
  • just for clarity: do strings in canonical notation have to be lists or can they be binary trees? I assume former but in your example `sampleterm/1` (the real list constructor) would accept trees as well. – DuDa Dec 06 '21 at 13:55
  • @DuDa: Note that I avoid to use the word "string" as it is so ambiguous. Focus is on lists of characters and their efficient reading (within general reading which also may include `'.'/3`). – false Dec 06 '21 at 15:42
  • 1
    Could you add an example to illustrate the perceived problem with `[nice, list, syntax]` and operators? – Isabelle Newbie Dec 06 '21 at 19:31
  • @IsabelleNewbie: See above, it's the option `ignore_ops(false)` which enables not only `[nice, list, syntax]` but also the current operators. – false Dec 07 '21 at 11:03
  • @false: Your `sampleterm` program throws an [instantiation_error in SWI](https://swish.swi-prolog.org/p/space-efficient-reading-of-chars-in-canonical-form.pl) – gusbro Dec 09 '21 at 14:15
  • @gusbro: This is about [tag:iso-prolog]-systems, SWI [no longer conforms](https://www.complang.tuwien.ac.at/ulrich/iso-prolog/SWI7_and_ISO). – false Dec 09 '21 at 18:38

0 Answers0