8

I was thinking of something of the sort:

  1. Build a program (call this one fake user) that generates lots and lots and lots of data based on the usage of another program (call this one target) using stimuli and response. For example, if the target is a minesweeper, the fake user would play the game a carl sagan number of times, as well as try to click all buttons on all sorts of different situations, etc...

  2. run a machine learning program (call this one the copier) designed to evolve a code that works as similar as possible to the target.

  3. kablam, you have a "sufficiently nice" open source copy of the target.

Is this possible?

Is something else possible to achieve the same result, namely, to obtain a "sufficiently nice" open source copy of the original target program?

nbro
  • 40,472
  • 12
  • 105
  • 192
IpsumPanEst
  • 81
  • 1
  • 2
  • Perhaps I should add a disclaimer: I am not a computer scientist, but I'm learning to code (python). I'm a m.sc. in math. – IpsumPanEst Jan 14 '19 at 13:18

3 Answers3

2

Remarkably, more or less the scenario you describe is not only feasible and has already been demonstrated (detailed explanation and fascinating videos at link).

However, the fidelity of the copy is currently quite limited: enter image description here

So for now, your copy will be quite low quality. However, there is a big exception to this rule: if the software you are copying is itself based on machine learning, then you can probably make a high-quality copy quite cheaply and easy, as I and my co-authors explain in this short article.

Interesting question and I'm quite sure that the correct answer will change rapidly over the next few years.

Edward Dixon
  • 337
  • 2
  • 6
  • 1
    you have no idea how much this is EXACTLY what i wanted to hear (or read) – IpsumPanEst Jan 14 '19 at 13:14
  • 1
    are there evil capitalist responses to this? As in, trying to make this illegal or something... – IpsumPanEst Jan 14 '19 at 13:18
  • 1
    Also, how much power (computing, physical, intellectual) does this require at the moment? – IpsumPanEst Jan 14 '19 at 13:31
  • @IpsumPanEst Actually, reverse-engineering is a perfectly respectable free market idea that helps to keep our market nice and dynamic. Definitely something that terms of service may try to restrict for obvious reasons, but reverse engineering is an important part of keeping markets competitive. – Edward Dixon Jan 15 '19 at 14:13
  • @IpsumPanEst Power/compute requirements are really modest for copying machine learning services, but very immodest (probably not practical) for copying conventional software (and, the fidelity will be low), but this is a very active area of research & you should expect non-linear progress. – Edward Dixon Jan 15 '19 at 14:13
  • Dear @Edward I recommend that you read Max Tegmarks book LIFE 3.0 because in it he has some excellent thought-examples of how A.I.'s could create and market their "home made" videos, tv shows and series and even enter subliminal messages that could in theory affect peoples judgment when it comes to elections or choices regarding limiting A.I.'s influence etc. Read the book! https://en.wikipedia.org/wiki/Life_3.0 – Don King Jan 15 '19 at 18:35
  • Dear @EdwardDixon. If this praxis (automating reverse-engineering) actually helps big capital and not the people, i'm not gonna be a part of it. – IpsumPanEst Jan 16 '19 at 11:55
  • @IpsumPanEst if you think about the second-order effects of reverse engineering, it increases competition (think scrappy startup learning from strengths, weaknesses of incumbent's products) => downward pressure on big company margins => increased consumer surplus for the litle guy. – Edward Dixon Jan 16 '19 at 12:26
  • i was thinking more of something closer to: make all software free and accessible to all people, not just so small startups can be competitive. To make charging for software an obsolete anti-praxis. – IpsumPanEst Jan 16 '19 at 12:49
  • politics apart, the implications for imago-mundi creation in sentient beings' minds is absolutely awesome for philosophy of mind. We might learn a lot about consciousness through ML and your program – IpsumPanEst Jan 16 '19 at 13:11
1

This is the proposed way to reverse engineer software using AI.

  • Program fake_user operates program target_prog in diverse ways to generate a huge and comprehensive data set.
  • The parameters of an artificial network are trained to produce within specified accuracy and reliability criteria a behavioral equivalent of target_prog.

Not only is this possible, but it is becoming standard practice for AI projects other than reverse engineering games.

There are caveats.

  • Program target_prog may be of sufficient complexity to exceed the capabilities of existing network designs and convergence techniques.
  • The project may lack access to funds and computing resources to complete the generation and training required to achieve reasonable accuracy, with sufficient reliability, in the time allotted.
  • The expertise of those involved may not be sufficient to produce satisfactory results.
  • Although the source code is not copied and the parameter state achieved through learning contains equivalent functionality, there is no guarantee that a civil liability may not result. Copyright law one or more jurisdictions may be interpreted as a protection against this kind of copying even though the text of the source code was not copied verbatim.
Douglas Daseeco
  • 7,503
  • 1
  • 27
  • 62
0

A recent paper from January 2023 does this too: "an algorithm that synthesizes the source code of simple 2D video games from a small amount of observed video data" https://www.basis.ai/blog/autumn/

Original research article: https://dl.acm.org/doi/10.1145/3571249

MarioT
  • 1
  • 1