0

Machine learning equipped with artificial intelligence is looking very promising in the field of machines understanding various human languages including English and translating one into another.

I'm no expert in the state-of-the-art machine learning algorithm or anything like that; all I know is that some kind of a statistical model is used in the process, and also that the current state-of-the-art technology is not perfect yet.

So, my question is whether machine learning still could use a better linguistic theory to make some more breakthroughs that would eventually allow A.I. to perfect its language acquisition and translation expertise, or whether all that needs improving is more computing power and more sophisticated statistical models.

JK2
  • 782
  • 1
  • 8
  • 22
  • I think this question is hard to answer very accurately since we don't know how tech will develop in the future, though I think your second scenario is far more likely: more computing power, better models and more data will be the way to go, given that the current trend is towards more and more powerful (though, as a result, less and less interpretable) models. – WavesWashSands Mar 02 '18 at 07:01
  • @WavesWashSands As you may well know, understanding 'context' is key to understanding any language and translating one into another. I just wonder how simply a better statistical model combined with more data can make machines understand 'context'. I guess there has been enough input (data) of English, if not other languages, but still machines cannot understand English as well as average native speakers do. – JK2 Mar 02 '18 at 07:14
  • 3
    It's an open question if theory matters, but current sentence representations are provably limited. That said, SE is not a great fit for this type of question. You should re-phrase it to something more concrete, eg "Are current deep representations of text complete?" or "What are the limitations of..." – Adam Bittlingmayer Mar 02 '18 at 08:39
  • @JK2: I mean, context is the very heart and soul of modern NLP methods, in the guise of word embeddings. – WavesWashSands Mar 02 '18 at 13:58
  • More computing power won’t bring anything. Theory is definitely needed but we don’t know (yet) how to make stochastical models use it. – Atamiri Mar 02 '18 at 16:47
  • 1
    Do you mean a better linguistic theory than machine learning has? Or a better linguistic theory than linguists have? Yes, probably. – Greg Lee Mar 03 '18 at 03:41
  • https://transacl.org/ojs/index.php/tacl/article/view/1202/286 – Adam Bittlingmayer Mar 03 '18 at 08:20
  • 1
    To properly understand language you need knowledge of the world. Computers that can really do translation etc. will need some sort of advanced knowledge model. We'll probably have to wait for true AI for that. – melissa_boiko Mar 03 '18 at 08:57
  • @A.M.Bittlingmayer You may be right, but I think I know too little about it to be more concrete. – JK2 Mar 03 '18 at 10:46
  • @WavesWashSands Thanks for your link to word embeddings. But I'm not sure whether word embeddings can incorporate any and all context, and even if they can, then I wonder why they have yet to come up with an A.I. that can speak just like an average native speaker. – JK2 Mar 03 '18 at 10:49
  • @GregLee Doesn't machine learning make use of at least some aspects of the linguistic theory of linguists? – JK2 Mar 03 '18 at 10:51
  • In connection with the GPSG theory, I've seen mentioned some theoretical results on the learnability of context free grammars. I'm not familiar with that literature and doubt it is relevant here. – Greg Lee Mar 03 '18 at 16:41
  • 1
    Syntactic theory may not be needed most of the time in NLP; still, there are situations where it comes in handy. Or it would come in handy, if it worked. But mostly it doesn't, so, yes, probably it does need a better theory. – jlawler Mar 03 '18 at 19:13
  • @JK2 I guess that would be because of a combination several factors: the underlying models not being perfect, the methods for inferring parameters having too much bias or variance, and there not being enough data. – WavesWashSands Mar 04 '18 at 11:33
  • @A.M.Bittlingmayer I understand that your paper suggests that current methods for estimating word embeddings have high variance and are not sufficiently robust, and proposes that researchers provide bootstrap CIs to indicate this variability. I'm not so sure how that bears on the current question though... – WavesWashSands Mar 04 '18 at 11:34
  • Related: https://linguistics.stackexchange.com/questions/26845/why-dont-you-get-back-the-original-text-when-you-use-translation-software-to-tr (my post) – new Q Open Wid Jan 26 '19 at 15:01

2 Answers2

7

This is a funny question because of 'machine learning' (ML), 'still', and 'better'. Presumably you mean 'machine learning methods in NLP' (Natural Language Processing), because I'm having a hard time thinking of linguistic theory that informs ML uses outside of NLP. 'Still' implies it has a 'better' one now, and 'better' implies a current one is not sufficient. That is a bit too motivated in one very particular direction, so I will just answer simply just how linguistic theory and NLP implementation are involved together.

The latest popular and successful methods for things like [Seq2Seq] translation and chat (https://en.wikipedia.org/wiki/Neural_machine_translation) models, such as RNN or LSTM, use barely any linguistic knowledge at all (little more than "here's a sequence of characters that might be whitespace-separated 'words'"). No parsing/phrase structure grammars, POS (parts of speech), anaphora resolution. Some NLP methods may use these linguistic ideas but they are almost entirely avoided in methods that are ML based.

Historically, there was an attempt to use syntactic parsers, and those did well-enough, but the these latest statistical methods have been much more accurate.

In the narrower field of speech-to-text (a common stage before NLU), some phonological theory is used, but its mostly years of incremental engineering that have produced the high quality you have today.

There's a famous quip by Frederick Jelinek about speech processing:

"Every time I fire a linguist, the performance of the speech recognizer goes up"

This isn't saying that linguistic theory is necessarily useless, just that... maybe... those trained in language theory aren't as good at implementation as maybe others?

Anyway, I suspect that once a lot of accuracy has been squeezed out of tweaks of the Deep Learning methods mentioned above, there will have to be some use of explicit knowledge of language specifics to get things 'human'.

For example, current statistical machine translation has trouble with presumably simple things like getting gender right, which is hardly deep linguistic theory

Douglas Hofstadter has a recent Atlantic article about the shortcomings of purely statistical methods. His simple example about gender translation is:

There’s his car and her car, his towels and her towels, and his library and hers.

which currently translates to

Il y a sa voiture et sa voiture, ses serviettes et ses serviettes, sa bibliothèque et la sienne.

which only gets the French agreement right on the last pair.

In the other direction though is this article about how current methods are pushing up against the limits of what they can do without linguistics. At first it shows how little linguistics is actually used, and then goes on to claim that linguistic models may be added to translators soon.

This is all related to the rule based vs statistical controversy in AI.

Mitch
  • 4,455
  • 24
  • 44
  • 1
    It would interest me to know whether Hofstadter's example defective translation into French is actually misunderstood by native speakers of French. After all, that is the central point, isn't it? The translations don't have to be perfect to be understandable. – Greg Lee Mar 04 '18 at 13:03
  • I share the doubts of Jesse Dunietz about the truth of some current linguistic theories, but would like to point out that linguistic analysis didn't start yesterday. We know lots about the various human languages from a tradition going back thousands of years. (Thorkild Jaobsen gave an example paradigm written down by Babylonian scribes for Sumerian forms -- see the reference below:) – Greg Lee Mar 04 '18 at 13:39
  • (reference for Sumerian paradigm) https://books.google.com/books?id=S4s5MveufJgC&pg=PA63&lpg=PA63&dq=thorkild+jacobsen,+paradigm+sumerian&source=bl&ots=OZCj1k65xV&sig=5T9UOjC-l5wbyR0mkQuYGMmigvo&hl=en&sa=X&ved=0ahUKEwjR85yA4NLZAhWLhlQKHcwWAkQQ6AEISDAF#v=onepage&q=thorkild%20jacobsen%2C%20paradigm%20sumerian&f=false – Greg Lee Mar 04 '18 at 13:41
  • @Greg good point about understandability. But I disagree. Yes, whether a standard French person would translate the same way would be an important indication of truth, but what that person says about their own understanding would only inform a hypothesis about the linguistic theory for French. The central point is not understanding, but simply is one mechanism is more accurate than the other. If the blind low linguistically informed method is better than one that uses lots of linguistic theory, then so be it. – Mitch Mar 04 '18 at 20:14
  • "Every time I fire a linguist, the performance of the speech recognizer goes up"

    Or Yorick Wilks' version: "There is no theory of language structure so ill-founded that it cannot be the basis for some successful Machine Translation." If I recall correctly, it was inspired by Systran....

    – Nick Nicholas Mar 23 '18 at 14:21
3

After reading Mitch's answer and some of his references, I think a prior question needs to be addressed: Does machine learning need a theory? In the essay from Norvig, Chomsky, I don't see any theory going on, at all, much less a linguistic theory. Does Norvig know what a theory is? He talks a lot about statistical modeling, and I think here he uses the term "modeling" appropriately. However, models and theories are two different animals altogether. Models can be deduced (or induced) from the facts, but not theories. I suspect most linguists' understanding of these two terms is similar to, or perhaps even derived from, Braithwaite's Scientific Explanation. (You can use the index of this partially online book to sample what Braithwaite says about "model".)

Mitch also mentions Hofstadter's reservations about current translators. I take Hofstader's point, but I think the problem is straightforward. Humans share with the non-speaking animals many mental abilities -- just not language. When accurate translation from one human language to another hinges on what the animal non-linguistic world is like, translation devices are just going to have to model this part of our mental abilities, too.


Greg Lee
  • 12,466
  • 1
  • 18
  • 32
  • 1
    I'm pretty sure Norvig has a good idea what 'theory' and 'model' mean. Sure there are differences between statistical models and theories: extremely oversimplified a statistical model is a very very tiny theory. E.g. Newtonian mechanics is a theory, and the path of a cannonball is a statistical model (a mini theory that may fit within the much bigger theory. A theory helps organize a lot of very specific models. – Mitch Mar 04 '18 at 20:01
  • No, the difference between model and theory is not a difference in size, and and a theory is not made up of models. A theory has unobservables terms, according to Braithwaite. An organized bunch of models is still just a model, only bigger. (Some will disagree with my characterization.) – Greg Lee Mar 04 '18 at 22:38
  • Whenever a mathematical formula cannot be conceptualized, e.g, due to its complexity, I think you call it a 'model' but not a 'theory'. But whenever a formula, mathematical or not, can be conceptualized, I think you can call it a 'theory'. And once you can call it a 'theory', I don't think you need to also call it a 'model' even if theoretically you can. – JK2 Mar 05 '18 at 03:14