What software for interpreting phonological rules exists?

Question

So far, I have found PHONO33 for DOS, which is a quite old yet impressive program. Though overly strict and simplistic, applying rules in a strict order, which results in weird outcomes ×D It includes models for Pig Latin and Spanish, and I'm not quite sure what traction this software did gain actually and what other models exist for it. The model format looks definitely ad-hoc and not compatible with anything else.

Another tool like that is sca², which is a rather simple playground and requires manual fiddling with uncoordinated lexicons and rule sets.

And finally, the DiaDM project, which I found here on Linguistics SE. It's probably the closest to what I'm looking for: a large database with flexible tooling oriented to collaboration by many contributors and attempting to collect data on many language families. Unfortunately, the project seem to be stalled despite existing for more than a decade, the datasets are far from being comprehensive, and the UI is quite rough and uses deprecated Flash in some places. Also, it seems like the editor access is restricted there and all the data are collected in a closed database with no export tools, which means they're endangered and may go off if something happens to the website.

Do I miss any other decent projects?

As mentioned, I have a vision of an ideal system for that purpose, which would be characterized by:

Open source and open data formats, so the data can be reused in other projects or imported from other projects. Do any de-facto standard formats for this area exist actually? I'm not even sure if there exists an agreed-upon format for expressing phonological rules in a machine-readable way.
High flexibility, which involves describing uncertain rule order, uncertain conditions of how rules are applied, uncertain word origins and borrowings, etc. Whenever possible, multiple paths and variants should be provided. The system should be able to acknowledge the actually attested word changes, check if they fit into the formal rules, expose the flaws with current theories and assist with coming up with more correct ones.
Metadata on everything: authors, presumed date ranges, source works, assumed certainty, hypothetic relations, etc., etc.
Being suitable both for expressing and systematizing already existing knowledge on historical linguistics and for new researches. Most of simpler tools I've seen are rather focused on the latter, and existing knowledge on sound shifts exists mostly in a written form pleasant to humanitarians, made in times before prominent scientists like Zaliznyak and Chomsky turned linguistics into an exact science. I'd like to generate visual charts from existing data in a glimpse to impress mere mortals with how deeply related our seemingly different languages are.

I believe such a system would make a revolution in computational historical linguistics akin to how tools like Wolfram Mathematica/Alpha and MATLAB revolutionized the computational science. How close the existing tooling is to that?

Wow, it turns out Phono evolved since then, got rewritten in VB5 for Windows and was updated recently: https://langnhist.weebly.com/phonoTOC.html — bodqhrohro, Jan 05 '24 at 14:58

score 4 · Accepted Answer · answered Jan 05 '24 at 18:19

As you mentioned in the comments, Phono has gotten updated since the DOS days, and now has an online version. The core of the system is the same as it was decades ago: it takes rules in an approximation of SPE notation and applies them to input forms to produce output forms.

        BETA_VEE
        A: +ant (*) -cor (*) +cont (*) +voice (*)
        1: -distr (*) +strid (*)
        END: BETA_VEE

But there have been other experiments in the same vein since then; last year, Marr and Mortensen demonstrated a new system, "DiaSim" (Diachronic Simulator), which aims to be a more modern implementation of the same core idea (take a bunch of SPE-style rules and apply them):

$ s451-452 ẽ > ɛ̃ in later sixteenth century
[+front,-lo,+nas,+syl,-round] > [-hi,-tense] / [+cons] __

(It also offers more syntactic sugar than Phono by letting you use phonetic symbols directly in place of feature bundles, which is nice.)

$ CORRECTION: uvular ʀ trill becomes  a fricative 
ʀ > ʁ

And there's software like FOMA that's meant for building your own models, without any preconceptions about what the inputs and outputs should be (so it's been used for computational morphology and some optimality theory experiments).

It sounds like DiaSim is what you're looking for; the source code is online, and their "DiaCLEF" file is an implementation of M K Pope's rules from Classical Latin to Modern French. (Or rather, BaseCLEF is Pope's rules directly, DiaCLEF is Pope's rules after some debugging.)

score 3 · Answer 2 · answered Jan 05 '24 at 15:01

I will start with what does not exist: a system that implements the proposed computations in some linguistic theory of phonological rules, such as SPE rule theory or autosegmental phonology. On the other hand, there are systems that can compute outputs from inputs; and there are general computational systems that can compute anything, which can of course be applied to computing input-output relations in phonology.

In the realm of things that exist, there is a body of code developed by Giellatekno at Tromsø which correctly computes outputs for North Saami and a number of other languages (and it is all available to anyone). It differs from standard phonological theories in at least two significant ways. First, it doesn't have "features" as exist in phonology, it uses classes of letters such as [mnŋ]. Second, and here is where it most diverges from theories of phonological rules, it has paradigm-class tags, where the lexicon is divided into myriad subsets, and application of a rule to a string is controlled substantially not by the properties of the string, but by the arbitrary tags of the form. For instance, a language might have intervocalic voicing, and application of a rule to /aptkla/ would depend on whether a vowel was actually inserted so you would first compute apitkla (epenthesis) then later compute [abitikla] via intervocalic voicing. The Giellatekno approach does not have standard derivations.

There are also attempts to implement Optimality Theory phonology, but I don't do OT and I have no idea if anyone purports that they have a system that actually implements the ideas of OT. The two main problems there are that OT selects the best candidate out of all imaginable linguistic outputs, there being infinitely many of them, so this is an impediment to "actual implementation". There were a number of attempts to model small problems such as "stress computation, given a finite list of outputs to choose from".

As a concrete example, John Goldsmith initially got into computational linguistics by attempting to model stress and syllabification rules via executable code, using what turned into "harmonic phonology".

The main problem is that the technical concepts that might be implemented are not well understood in the first place. The first relatively formal theory of phonology, SPE theory, is despite author's efforts not well-enough defined to be computationally implemented (the most glaring problem is that the set of rules in a grammar is infinite), and notations are defined contradictorily. Post-SPE autosegmental phonology eliminated a lot of the uncomputabilities of rule theory by getting rid of the abbreviatory notations, unfortunately this was accomplished by putting a lot of the computation in the "CPU architecture" without explicitly saying what the CPU actually did (for example, "tone preservation" was a self-evident axiomatic principle of computations, not even a part of a rule, yet nobody could say exactly when it does, when).

In the current world of phonology, there may be some hope for devising a rule-execution system if someone can figure out how to implement representations. For the most part, as far as I know, "m" is still just treated as a letter, and is not a network of nodes in a dominance relation. The desideratum of modeling strong generative capacity has fallen by the wayside in favor of systems that generate a particular class of letter-outputs.

What software for interpreting phonological rules exists?

2 Answers2