51

This is a hypothetical question, but I am curious. Feel free to ignore it, when it seems too theoretical for you.

Some people may know the situation that you implement a feature and in the end you have added like 10 lines and removed 100 and from the 10 there are possibly 5 that just were moved around and not really added.

Now suppose Alice writes code and releases it with a strong copyleft, e.g., using the GPL. Then Bob contributes to this code and improves its efficiency by removing redundant (but not dead) code and in the end Bob contributed a lot of deletions.

Afterward, Alice wants to re-license the code with a weak copyleft, e.g., using the BSD license. To do so, she asks every contributor to re-license their code and removes any code from persons that cannot be reached or not willing to re-license.

Let's say Carol is rewriting the missing pieces and licensing them under the BSD license, so there is no conflict with Alice already knowing the removed code.

Now let's assume the problem, that Bob did not re-license his changes, but contributed a lot of deletions. The deletions made a significant difference, so you would think that copyright should apply. At least for the diff (containing the copyrighted content to be removed) it certainly does.

How does copyright apply to this case to the resulting code (without its commit history) and what can Alice do about the GPLed deletions? You could say she cannot keep them, as then the project stays the same as after the work of Bob. But should she re-add the deleted code?

Even when she does and Carol removes it again, the result is the same and looks just like the code with the GPLed deletions, so it cannot be free of GPL work (i.e. the negative code).

allo
  • 781
  • 5
  • 9
  • 1
    You need to focus on the test for copyright infringement (for the country in which the law of copyright is to be applied). If there "isn't" code in the allegedly infringing copy, then it's not relevant in UK copyright law. – lellis Jul 20 '20 at 23:08
  • 6
    To make the scenario even more twisted: What about code that is removed to be able to facilitate a relicensing? – Wrzlprmft Jul 21 '20 at 08:26
  • 3
    There is quite a lot of copyrightable content that you could generate by having a monkey type on a typewriter for sufficiently long and then removing suitable characters from the output. – Federico Poloni Jul 21 '20 at 08:43
  • The answers to such questions surely depend on the country because the copyright laws differ from country to country:

    One (completely different) example are API interfaces: In the USA they seem to be protected (trial Google vs Oracle), while the German law explicitly says that they are not protected.

    As far as I know the situation in Germany, code changes that are only "straight forward monkey work" do not grant Person B any copyright. However, any "non-obvious" code changes do.

    – Martin Rosenau Jul 21 '20 at 13:00
  • ... It does not matter if you delete or add lines of code. So if Person B did any "non-obvious" deletions, he or she has copyrights on the modified code and you'll have to ask her or him if you can change the license of the code. – Martin Rosenau Jul 21 '20 at 13:03
  • 1
    By the way: Here in Germany there was some Fax program whose screen shots were already drawn (by pencil) by another person before the program was written. Later, a court decided that this program is not copyright protected because the software developer could not explain why implementing a program whose screen shots already exist is not "straight forward monkey work". – Martin Rosenau Jul 21 '20 at 13:06
  • 1
    Nothing wrong with hypothetical questions - probably the vast majority of SE questions are hypothetical! – Vikki Jul 21 '20 at 23:12

5 Answers5

44

For me, this exposes a weakness in the mental model many coders seem to have about the operation of copyright.

Consider a pile of bricks, representing code contributions to a work. In one (surprisingly common) model, each brick is painted in a colour representing its licence status; red for BSD, blue for GPL, green for Apache, and so on. Whoever made and placed any given brick can consent to its repainting, but nobody else can, though anyone can remove a brick from the pile. In this model, a pile of Alice's bricks, painted blue, is added to with blue bricks from Bob. Alice now wishes to paint the pile red, but cannot because of Bob's bricks, so she asks Carol to make some red bricks with which to replace them. Once the pile is entirely composed of Alice's blue bricks and Carol's red bricks, Alice repaints her own bricks red. How can the sometime presence of Bob's blue bricks be a problem?

According to a practising barrister and sometime lecturer in copyright law with whom I have discussed this1 a better mental model is a pile of bricks under one or more comparably-coloured tarpaulins (waterproof covering sheets, aka tarps), with names written on them. Here, anyone can add or remove bricks, but any time you touch a pile under one or more tarps, you add your name to the list(s) written on the tarp(s). The colour of a tarp specifies certain rules: for example, bricks removed from under a blue tarp can't be used in any other pile which is not also under a blue tarp. You can in some cases combine piles under various tarps, and throw a new tarp over the whole lot, without necessarily removing the under-tarps. Certain operations on certain coloured tarps (eg, replacement of a blue tarp with a red tarp) requires the consent of everyone whose name is written on the tarp.

I do not mean to suggest everyone should adopt this mental model, and certainly not all the time, because like any abstraction it too has problems. But thinking about it may reveal to you when you're (quite possibly unintentionally) using the coloured-bricks model, because that is not a good model for copyright. For a start, coloured-bricks falls foul of the Ship of Theseus problem, as we have well-documented here. The tarp model has no issues with the Ship of Theseus, and that alone, to my mind, makes it useful.

The tarp model makes understanding this question less-problematic. Firstly, Alice piles up bricks under a blue tarp with Alice's name on it. Bob comes along, removes several, and adds his own name to the tarp. A replaces them. Carol then slides a small pile of bricks, under their own tiny red tarp, underneath the big blue tarp, removes some of the bricks replaced earlier by A, and adds "Carol" to the blue tarp. You now have a blue-tarp-covered pile with three names on the tarp: Alice, Bob, and Carol. The consent of all three will be required to replace the blue tarp with a red tarp (again, with all three names on it).

In short: Alice, Bob, and Carol are all rightsholders in the current work, either by virtue of their contributions to it, or by having been rightsholders in an earlier version, of which the current work is a copyright derivative. The consent of all will be required to relicense it away from GPL.

If Alice can apply Carol's changes to the version immediately prior to Bob's doing any work on it, and if Carol's changes were made without knowledge of anything B had done, then she will have forked the project to a point where Alice and Carol are the only rightsholders, and relicensing will be possible without Bob's consent.

1Nevertheless, I am an imperfect conduit, and any mistakes are of course mine.

MadHatter
  • 48,547
  • 4
  • 122
  • 166
  • This is an interesting approach, and seems quite sensible. I would expect though that the names on the tarp can fade away if the current state of the bricks no longer reflects their creative expression. Thus, a refactoring that has been re-refactored might no longer have any copyright effect. The idea to fork the project from an older state is of course safest. – amon Jul 19 '20 at 08:28
  • 10
    Since you mention it, I will note that nothing I read in the Berne Convention, or in my local copyright acts, suggests such a fading with time. A copyright interest continues unabated, as I understand it, until it doesn't (see relevant local legislation). – MadHatter Jul 19 '20 at 09:09
  • When I understand your answer correctly, you are thinking about version history, e.g., the git history or the history of a Wikipedia page. I am thinking about ownership. I do not want to re-license the history, but come to a state with the desired ownership, keeping the history under its old license. I understand your model like when Microsoft talks about the "viral" GPL, that infects everything that comes into contact with it and cannot be removed anymore. I think the Ship of Theseus is a good analogy for the goal of someone who wants to be able to re-license the code. – allo Jul 19 '20 at 09:35
  • 4
    @allo I am not thinking of "e.g. the git history". I am thinking of rightsholders in the code in its current expression. – MadHatter Jul 19 '20 at 09:39
  • But this brings up another interesting point. When I fork an old state and add commits that are owned by me skipping commits owned by other people, the final code is exactly the same as when I remove the code from the commits that "infected" my code base and only the version history is different. But point is not completely related to my question, as I would have skipped the commit that removed code to make the program more efficient. – allo Jul 19 '20 at 09:39
  • 1
    I agree it's not the point of your question, and it's not what I intended either, so let's not discuss it here. – MadHatter Jul 19 '20 at 09:40
  • 9
    This logic makes sense to me, but the implications seem super problematic! If a contributor cannot be contacted, you can rewrite the code they contributed. But if someone who deleted code can't be contacted (or is being a stick-in-the-mud, or what have you), what is one to do? Go all the way back to the beginning of the project? – Wowfunhappy Jul 19 '20 at 18:26
  • @Wowfunhappy That depends on what exactly was removed, you always can claim that part of the work was trivial and copyright doesn't apply to it. But it's not easy, and rewriting the existing parts is not easy either because other work was built on top of it. – Jakub Kania Jul 19 '20 at 21:58
  • 1
    Simply talking about only the "end code" is going to skip over expressive content. Today, programmers don't share the end code, they share the entire git history of code. It may be that Judges won't even notice the git history, but the incompetence of Judges shouldn't be presumed either. And that git history has individual contributions of individual people. Are they not bricks? Do they not have a color? – Yakk - Adam Nevraumont Jul 19 '20 at 22:22
  • 2
    +1 for "any mistakes are of course mine [by copyright]" – Aryan Jul 20 '20 at 16:30
  • 9
    @Wowfunhappy: Unfortunately, it's generally not that easy. Copyright does not inhere in individual lines of code. It inheres in the entire codebase (or at least the self-contained module-ish unit of reuse). If someone made an original, creative contribution to the codebase, it's very difficult to "unring the bell" regardless of whether that contribution was an addition or a deletion. Their changes may have been refactored, copied, or even imperfectly mimicked in other parts of the codebase, and all of those are potentially derivative. – Kevin Jul 20 '20 at 18:36
  • 2
    @MadHatter A person's permission is required to copy a work or make a derivative work from it if, and only if, there is sufficient protectable expression authored by that particular person in the work. All that matters is whether that particular person can identify sufficient elements of protectable expression in the work that is being copied or derived from. Someone claims that Katy Perry's song Dark Horse infringes the copyright on Joyful Noise. The question is simple -- does Dark Horse contain sufficient protectable expression taken from, and original to, Joyful Noise? – David Schwartz Jul 20 '20 at 18:53
  • @DavidSchwartz I agree that question is simple. It's just not this question. – MadHatter Jul 20 '20 at 20:02
  • "Ship of Theseus problem". I've read about it in wiki (the link). Interestingly I have not seen a hint that BOTH ships can be considered original. Really no school of philosophy proposed such a solution? – Martian2020 Sep 20 '21 at 10:36
  • "and if Carol's changes were made without knowledge of anything B had done" does this imply that one cannot study and learn from copyleft code and then write proprietary code derived from the abstract principles/ideas/techniques one learned? In the negative code example I guess the point is moot since there is only one way to not write or remove code, but I thought it was permissible to learn from others' code without regard for license as long as the concrete derived work is significantly different from the concrete inspiration work. – CCJ Jun 07 '23 at 18:27
  • 1
    @CCJ it does not imply that. But having studied the copyleft code does rob you of your best defence against a later allegation of copyright infringement; hence, the clean room reimplementation, which sidesteps the problem by having one person do the reading-and-summarising, and another the implementing. – MadHatter Jun 07 '23 at 20:34
  • This is a great answer. And is well align with the following irony:

    “The sculpture is already complete within the marble block, before I start my work. It is already there, I just have to chisel away the superfluous material.” ― Michelangelo

    – user334639 Oct 09 '23 at 21:54
17

Not all changes are of sufficient novelty to constitute something copyrightable, whether they are additions or deletions.

For a simple example, consider any old out of copyright song or hymn of five verses. I could make an 'arrangement' of only the first, second, and fifth verses, but this should not be considered novel enough to be copyrightable.

So in terms of your question about code, I think we would have to see the actual code and what was deleted to make any judgement about it. If what was removed was previously cleaned encapsulated in module-like code (not necessarily using any programming language's module syntax) and the whole module was removed and nothing else, as a non-lawyer I'd guess it probably doesn't reach the level of being copyrightable. But if the deleted code was thoroughly interspersed through the code which was kept and it required a lot of thought to determine what should be kept and what should be deleted, then I'd guess it probably can be copyrighted. But if it matters to you, get a lawyer!

For a non-code example of the later, consider the idea of taking an old out-of-copyright novel, perhaps Little Women, and deleting any sections which don't pass the Bechdel test. I'd think the resulting work is probably novel enough to be copyrightable, especially if they were careful to leave enough connecting sentences that it still makes sense or has a logical plot.

So if B's contributions are thoroughly intertwined with A's original code, probably the best way forward is not to try to cut out only B's code, but all the affected sections or modules even though parts of them were authored by A not B. Then C can replace the whole modules and you don't need to worry about B's contributions any more.

curiousdannii
  • 7,768
  • 1
  • 27
  • 51
  • 3
    "the best way forward is not to try to cut out only B's code, but all the affected sections or modules" - this is the best and the most practical advice. – meolic Jul 20 '20 at 07:14
4

While in some sense this is really a question for Law SE, the common-sense version of it is a matter of whether there are original ideas in the deletion and whether the changeset is expressible (not necessarily as a patch, just in some comprehensible form) in a form that's insufficiently original/creative to be subject to copyright. As the extreme cases, consider:

  • Case 1, pretty clearly no copyright: "Remove froblicate function and all calls to it."

  • Case 2, pretty clearly copyright: Imagine a giant useless program consisting of an lexicographic-ordered list of all possible statements of length at most 80 characters in a language, where the "modification" of the "program" is a list of lines to remove to make it into a useful program.

You can imagine an entire spectrum between these. Unless it's incredibly obvious, where a particular case falls is really a matter for a lawyer.

  • 1
    Congratulations! You just copyrighted the universe! Seriously though, you'd also have to reorder the remaining lines to make a working program, and there's the possibility of necessary duplicate lines. ++$i; is not copyrightable by itself, only in the context of many other lines. – CJ Dennis Jul 21 '20 at 13:18
  • I did some handwaving there but spaces or other "stuffing" allow you to avoid reordering. – R.. GitHub STOP HELPING ICE Jul 21 '20 at 13:49
2

If you modify a work by removing from it or making rearrangements, the modifications you have made are based on your ideas which manipulate someone else's expression. Copyright is concerned with expression, not ideas.

The copyright statutes in the United States do not define what "expression" is. If you wanted to assert copyright over something where your only contribution was chopping, rearranging and deleting existing content, you would have convince a judge that it plausibly constitutes an expression, which would probably be difficult.

For one thing, your expression isn't a piece of content whose existence can be demonstrated on its own. You cannot show the judge:

  • here is the material I expressed, viewed separately from the combined work in question; and
  • here is where that same material appears in the combined work, which should therefore acknowledge me as a copyright holder

The material you expressed, in isolation, is just a blank, a nothing.

The US Copyright Law has a paragraph about derived works, 103.

Firstly, (a) states that if a derived work is unlawful, then it enjoys no copyright protection. This has implications for open source software, because, by default, a work grants no permissions for making derived works. The permitted ways for making derived works are defined by the specific license. If the license happens to say that the author of a derived version who simply deletes or rearranges lines of code is not permitted to add their name to the copyright notice, then that holds. If the license is violated, then it lapses, leaving that person with no permission to redistribute. Most open source licenses neglect to have such wording, however.

Subparagraph (b) is about lawful derivative works; this is more relevant to the present topic, has this sentence: [t]he copyright in a compilation or derivative work extends only to the material contributed by the author of such work, as distinguished from the preexisting material employed in the work, and does not imply any exclusive right in the preexisting material.

A deletion or rearrangement does not look like "material contributed", which is "distinguished from the preexisting material". All portions of the derived work are material which is preexisting. Concretely speaking, no line of code in the material is one which cannot be found in the original version.

Of course, people will try anything in court, and sometimes there are surprises.

Kaz
  • 544
  • 3
  • 10
  • 7
    Bearing in mind I agree that you need to convince a judge/jury of any legal conclusion, I think I disagree with the (implicit) suggestion that this case would prove particularly difficult. In particular, USC 17 s.101 explicitly lists an "abridgement" and a "condensation" as kinds of derivative work, so I think a court would have little difficulty admitting that a work has two copyright holders, even if one person only removed material. – apsillers Jul 19 '20 at 17:47
  • @apsillers Nothing in my answer claims that a condensation or abridgment is not a derived work. The claim is that chopping and rearranging material might be found to constitute new expression. If that is true, the the derived work has exactly the same set of copyright holders as the original work; the chopper and dicer has no rights over it. – Kaz Jul 20 '20 at 22:04
  • @apsillers Thanks for the link; paragraph 103 is quite useful here; I used it in the answer. – Kaz Jul 20 '20 at 22:19
1

I believe the question makes the assumption that there's just a single work and copyright can either apply or not.
However, we have two works which are separately copyrightable: The commit history, and the code at a specific version.

You get copyright in the history if you have a commit there, since you are one of the many authors of that history.
You get copyright in a specific version of the software if you code is in it. If your lines get deleted, you retain the copyright to the versions that have your lines, but not to later version.

toolforger
  • 127
  • 2
  • I am asking about the latest version only, ignoring the commit history, but keeping in mind the history (as concept) of work that is done by different people. So the result is copyrighted by different people and some contributions were actually deletions, so they are not visible (without some kind of commit history) but the result would be another one if this work was not done. – allo Jul 19 '20 at 20:40
  • 2
    "You get copyright in a specific version of the software if you code is in it. " is not true from the legal perspective, copyright applies to entire works, not specific lines, and you can make a derivative work (with appropriate permission of the original author) where you only delete parts; according to copyright law you would hold copyright on the derivative work even though it has no "your lines" in it. Copyright law has no concept of "your lines", the relevant concepts are "derivative work" and whether a work/modification is copyrightable at all. – Peteris Jul 20 '20 at 09:20
  • @Peteris It is part of the problem, that code derived from your changes that does not contain your exact code anymore still may be a derivative work from your code. While you can try to quantify how much / how successful the code was replaced by code that's not copyrighted by you for lines that were added, I think it is a complicated question to quantify this for lines that were removed. – allo Jul 20 '20 at 15:09
  • @allo modified lines can contribute copyright, but deleted lines do not. Yeah I know this can be a fine line, but the question already assumes that this line is clearly defined since it explicitly talks about deleted lines. – toolforger Jul 21 '20 at 09:17
  • @toolforger Where do you draw the line between modified and deleted and added another one? When I remove i++ and insert i+=2, did I modify the line or not? What about i+=1 to i+=2 (less characters difference)? This is not that clear either. But let's stay on-topic with deletions that affect the result of the program. – allo Jul 21 '20 at 10:09
  • @allo that's the kind of stuff that's decided in court, with all the caveats that this implies. However, since the question does not ask about modified lines at all, the answer is clear: Deleted lines to not confer copyright. – toolforger Jul 21 '20 at 11:05
  • @Peteris Copyright must work at the file level. Otherwise we couldn't use some author's BSD-licensed source file in a GNU program, without that author becoming a copyright holder over all other source files in the program, which doesn't even begin to make sense. – Kaz May 06 '23 at 00:05