nickjj 4 days ago

This might be overreacting but is there a way to opt-out of Copilot using your code in open source repos?

It feels morally wrong to me that I can spend thousands of hours working on projects on my own free will but then a company can sell the code I wrote to others in the form of snippet completion as a service. In fact they end up selling your code back to yourself if you plan to use the service.

If the answer is no, that moves the needle pretty far in the direction where I'd at least consider the idea of moving all of my repos to Gitlab. I don't care much about stars or popularity. I open source things that are interesting and useful to me and if other folks want to use it they can but I don't gain motivation from others using the projects I release. I like Github and its UI and it's no doubt "the spot" for open source but selling code written by others rubs me the wrong way a lot. It stinks because it also means no longer contributing to other code bases too. It's moving us in the opposite direction of what open source is about.

  • kemiller 4 days ago

    This is a really good point that I hadn't considered before. It's facebook all over again — selling your own content back to you. Repo owners should be at least compensated when their code gets used. That would be an incredible market.

    • leereeves 4 days ago

      I don't think that would be possible. One of the big limitations of neural networks is that they don't cite their sources.

      • radus 4 days ago

        Calculate a rough semantic similarity score across all your snippets, and pay out a fractional reward to all originating codebases.

        I think the bigger problem is that it will almost certainly lead to a proliferation of giant snippet spam repositories.

        • nemonemo 3 days ago

          It is like the search engines vs. SEO arms race. The hope could be that such proliferation can be managed by disincentivizing such abuses. The reality might be vastly different with codes that have more regularity and better chance for AI's emulating humans than the natural language texts.

      • woleium 2 days ago

        Then they need to negotiate a non-attributed contract with you before using your code to train (not sure abiut testing though).

    • selcuka 4 days ago

      > That would be an incredible market

      I for one welcome our new CEO (Copilot Engine Optimization) overlords.

      Jokes aside that will likely cause GitHub to be filled with lots of low quality repos (even AI generated, oh the irony!), to trick Copilot into using their code.

      • account42 3 days ago

        > Jokes aside that will likely cause GitHub to be filled with lots of low quality repos

        Hasn't this already been the case since GitHub became a CV boost.

  • PaulKeeble 4 days ago

    It should be automatic based on license. GPL code definitely shouldn't be included but MIT could be. They already have this information in most repositories and if its missing they have no right to use it at all. We don't need extra options the licenses already restrict the use and derivative work.

    • davesque 4 days ago

      Not without the text of the license. I, as a developer, cannot just poach open source code under MIT without including the copyright and terms from the original project. From the license:

      "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."

      • meshaneian 4 days ago

        They might argue that a snippet isn't a "substantial portion" of "the Software", and they're only charging for the service not the content - regardless, I don't like it, this is exactly what certain licenses attempt to prevent.

        • leereeves 4 days ago

          I would argue that substantial shouldn't be measured in lines of code, it should be measured in importance. Something like the fast inverse square root is substantial even though it's short.

          • williamcotton 4 days ago

            The fast inverse square makes for a poor example when it comes to notions of intellectual property and copyright because there is prior art. The Wikipedia page has a history.

            And imagine if Microsoft had been able to copyright the fast inverse square function before Carmack sat down to write Quake!

            • leereeves 4 days ago

              Prior art matters for patents, not copyrights. Carmack's code is still protected by copyright even if he didn't invent the algorithm.

              And copyright doesn't prevent someone else from implementing the same algorithm, only from copying the code. If Microsoft had been able to copyright the fast inverse square root function, Carmack could still have written his own version and even copyrighted that version himself.

              • williamcotton 4 days ago

                Prior art has been used for copyright in plenty of court cases, and increasingly so!

                Here’s an example: https://scholar.google.com/scholar_case?case=728470765881077...

                It seems particularly apt to consider prior art for use in software IP if only for the similarities with the patentable invention of mechanical parts.

                • imtringued 4 days ago

                  What? You are free to reverse engineer any non patented object and reproduce an almost identical object. Prior art for copyright is meaningless unless it is about authorship. You can still do a clean room implementation and ignore prior art.

                  • williamcotton 4 days ago

                    But the current interpretation is that you cannot claim authorship over things that were already in the public domain. Feel free to read the case notes and follow some of the links in there for more info. Am I not interpreting the court rulings correctly?

                    ——

                    Considering de novo the evidence before the district court, we hold that the district court did not err in granting summary judgment. Johannsongs failed to offer admissible evidence to rebut Ferrara's analysis, so there is no genuine dispute of material fact as to his conclusions that Söknuður and You Raise Me Up are not substantially similar and most of their similarities are attributable to prior art. Based on these conclusions, Johannsongs has failed to satisfy the extrinsic test and Defendants are entitled to judgment as a matter of law.

        • woleium 2 days ago

          but they used all your code to train. that's pretty substantial..

      • typetheorist 4 days ago

        I too have reservations about Copilot, but does the MIT license define a "substantial portion"? I doubt a snippet would fall under either "copies" or "substantial portions"

        • davesque 4 days ago

          I doubt many licenses define that kind of terminology. That's left to precedents established by actual cases. My point was just that you're not free to use code from an MIT-licensed project without following the terms of the license. The other details get worked out when legal actions are taken.

  • ellyagg 4 days ago

    Well, I hope your viewpoint doesn't win the day, because making code as freely shareable and remixable as possible is a huge boon for humanity.

    • celeritascelery 4 days ago

      Code being freely shareable and remixable is great. Selling that open source code for profit is not.

      • WisNorCan 4 days ago

        Is your take that Microsoft should offer this for free? Or if they are not willing to do it for free, Microsoft should cancel this service and we should wait for Apache or someone else to offer the service?

        Or something else ?

        • gfrff 4 days ago

          Microsoft should make this service free for open source (not just thought leaders), and compensate people otherwise. I should have a 0.01% equity in Open AI if they're using my stuff like this.

          Or they should do opt in.

          • throwaheyy 4 days ago

            Half serious/flippant, we need MS to create a cryptocurrency so that developers can be credited with micropayments each time their code gets “quoted” in the IDE.

            <ducks>

      • earnesti 4 days ago

        What is wrong with someone making a little dough. It is just numbers in database.

        • jdbernard 4 days ago

          Yeah, but those numbers translate to food on the table for my kids, a roof over their heads, better education, etc. Come on, this is a tired response. Nothing is wrong with people making money. There is a lot wrong with people making money off of the hard work of others without any consideration or remuneration.

    • gopiandcode 4 days ago

      I feel like you're missing the forest for the trees here - making code freely shareable and remixable is exactly the purpose of GPL and other free-software licenses, but you can bet that the proprietary codebases Copilot will be used in will go out of their way to prevent any such uses of _their_ particular code snippets.

      IMO, the only way to use Copilot's output in an ethically sound way is to only use the output it produces in AGPL licensed projects (assuming that Copilot has not been trained on any non-free software codebases which in itself is a strong assumption).

      • account42 3 days ago

        > IMO, the only way to use Copilot's output in an ethically sound way is to only use the output it produces in AGPL licensed projects

        Even then, that is missing attribution which should really be the default for all code reuse and derivation even when you legally are allowed to omit it.

    • jnsie 4 days ago

      It's just as shareable on Gitlab, no? And the issue isn't that code is not shareable - it's that a huge corporation is profiting from this code without consent from the developer.

      • leereeves 4 days ago

        > a huge corporation is profiting from this code without consent from the developer

        Also without attribution. The more permissive licenses allow corporations to profit from shared code, but most of them still require attribution.

        And it's really not much to ask: when someone gives you free code, give them credit for their work.

    • bayindirh 4 days ago

      Well, I hope your viewpoint doesn't win the day, because breaching GPL left and right to make some developers life easier opens huge cans of worms.

  • throwaheyy 4 days ago

    The Twitter thread’s title seems unnecessarily incendiary and clickbaity.

    I don’t buy that producing/synthesizing code snippets based off public repos is a problem.

    There’s nothing proprietary or original about eg. the syntax of a for-loop, or the boilerplate of setting up some JS framework MVC.

    Besides, it’s basically just a (semantic and contextual) search engine inlined within the IDE. Copyright infringement hasn’t taken place until the user activated the autocompletion and actually placed the code within their own and released their code containing the infringing code.

    • DJHenk 4 days ago

      > There’s nothing proprietary or original about eg. the syntax of a for-loop, or the boilerplate of setting up some JS framework MVC.

      Of course there is something proprietary or original about that. Why else would they need such an enormous AI to suggest it. Auto completing simple boilerplate was already solved in a much simpler way.

      > Copyright infringement hasn’t taken place until the user activated the autocompletion and actually placed the code within their own and released their code containing the infringing code.

      Copyright infringement takes place as soon as some company publishes/sells material without explicit license or permission. So not the moment the users hits accept, but the moment just before that: when the tool shows it to the user.

      • throwaheyy 4 days ago

        Applying your logic, is any search engine infringing on copyright because it contains a snippet of the source page?

        After all, if showing a search result in the IDE is “publishing” (let alone “selling” (?)) why hasn’t Google been sued out of existence for showing search results (oops, “publishing” copies of original work, billions of times over), as well as selling related advertising?

        https://en.wikipedia.org/wiki/Fair_use

        > Examples of fair use in United States copyright law include commentary, search engines, criticism, parody, news reporting, research, and scholarship.

        • tremon 2 days ago

          is any search engine infringing on copyright because it contains a snippet of the source page?

          In some jurisdictions, it is. And in other jurisdictions, it is only allowed as long as it shows a link to the source page, which Copilot also doesn't do.

  • lbhdc 4 days ago

    I stopped publishing open source after all this started coming out because I was so uncomfortable with it.

  • jaywalk 4 days ago

    If your code is using a license that allows it, how could you possibly opt-out aside from using a different license?

    • sammax 4 days ago

      Don’t most licenses require at least attribution? I don’t believe GitHub is restricting themselves to only licenses that don’t. In fact the only software licenses I can think of that don’t require attribution are 0BSD, WTFPL, CC0, MIT-0 and Unlicense, and they all aren’t super popular. Also in some countries creators have inalienable moral rights which can be enforced regardless of the license. For example in Germany it is impossible to relinquish certain rights you have as the creator of a work, including the right to attribution.

      • TAForObvReasons 4 days ago

        This is an important and overlooked point. Even common permissive licenses (ISC / MIT / Apache-2.0) require attribution

        • jazzyjackson 4 days ago

          Just as a mind experiment: couldn't CoPilot just publish a list of every github user and attribute the work to all of them?

          • TAForObvReasons 4 days ago

            CoPilot is a black box at the moment. Microsoft claims they used the public corpus on GitHub. There are plenty of GPL, AGPL, and "source available" projects in the public corpus. So what exactly is the licensing?

            The argument may make sense if they limited themselves to public-domain (CC0) works, but that is not what happened here. If CoPilot attributed something to an AGPL project, does it mean the "virality" applies to all projects that use code from CoPilot?

            • ntoskrnl 4 days ago

              There's also a good amount of commercial and leaked source code on GitHub, including MS's own leaked Windows XP source. I haven't played around with Copilot yet, but if I ever do I plan on copy/pasting some win32 API definitions to see if I can get it to spit out any of the leaked source.

              • yellowapple 4 days ago

                > if I ever do I plan on copy/pasting some win32 API definitions to see if I can get it to spit out any of the leaked source.

                If that works, then I can't wait for that to be a boon for Wine and ReactOS: "Microsoft itself provided this code and allowed us to use it, so therefore it's totally legal. Neener neener."

      • whoisthemachine 4 days ago

        This feels like a tool that can easily be destroyed by a lawsuit, I can't imagine a TOS can force you to give away your copy rights (especially if they allow and encourage you to post your own copyright).

        • kragen 4 days ago

          If it can't then Wikipedia is doomed; its entire licensing status rests on the notion that editors grant such a license as part of their clickwrap ToS.

    • bouke 4 days ago

      Does GitHub verify that the code that is in my repository is actually in accordance to the license that I’ve added? I could just upload any proprietary code with an incorrect license, and GitHub would just use that to feed their AI. Like any other dependency that you incorporate into your application, GitHub should verify/audit whether the license allows them to do so.

    • nickjj 4 days ago

      > If your code is using a license that allows it, how could you possibly opt-out aside from using a different license?

      A repo setting that instructs Github not to use your code for Copilot, it could be a similar option as turning Discussions on / off.

      If they really want to win developers over they would even have Copilot scanning disabled by default but that'll never happen.

      • quietbritishjim 4 days ago

        Even if Github did provide that setting, as a courtesy, someone could clone / fork the code to another repo (if you use any licence that allows it) and not enable that setting.

        • Inityx 4 days ago

          Sure that's possible, but there's a huuuge difference between Possible and Default Behavior.

          • TAForObvReasons 4 days ago

            In a case like this, GitHub itself could set up a bot account that forks all projects as soon as you make the switch. The company in fact would be incentivized to do so.

      • jonny_eh 4 days ago

        Sounds like you want a new license that just prohibits use by one company for one purpose.

    • igneo676 4 days ago

      I'm not sure using a different license actually opts you out. By merely hosting your code on GitHub you grant them the right to analyze your code on their servers[1]

      They may be morally in the wrong, but I'm unsure they are legally in the wrong here. To boot, denying them the right to create this tool in your license is technically a violation of OSS principles and problematic

      [1]: https://docs.github.com/en/site-policy/github-terms/github-t...

      • typetheorist 4 days ago

        > This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service, except that as part of the right to archive Your Content, GitHub may permit our partners to store and archive Your Content in public repositories in connection with the GitHub Arctic Code Vault and GitHub Archive Program.

        Wouldn't this be a violation?

    • okasaki 4 days ago

      Microsoft could provide an opt-out for projects or even contributors, regardless of licence.

  • ghostbrainalpha 4 days ago

    It would be kind of cool if Github could show some stat that code you wrote has been used 50,000 times for 12,000 people.

    Being a top CoPilot contributor should at least have value to signal on your resume.

  • dragonwriter 2 days ago

    > This might be overreacting but is there a way to opt-out of Copilot using your code in open source repos?

    I don't think there is a way to opt out if it is a public repo regardless of license, and Microsoft's copyright theory suggests that they wouldn't feel obligate to enxclude any code they got their hands on except under a specific NDA preventing such use; the use of public GitHub repos isn't based on legal constraints but practical convenience.

  • invig 3 days ago

    They’re not selling you code. They’re selling you an engine that helps you find the right free code at the right time.

    If you read free code yourself it’s fine, but if a machine does it for you it’s not? We overvalue humans.

    • bayindirh 3 days ago

      > If you read free code yourself it’s fine, but if a machine does it for you it’s not? We overvalue humans.

      No, it's not fine. Apparently, you missed SCO & Oracle vs. Google cases. Both of these cases argued that somebody looked to the code, and copied it. In SCO case it was not true, but the argument stretched the timeline rather successfully. In Oracle vs. Google, copying function signatures opened a big can of worms.

      So, just by copying the function signature without filling it the very same code with the original, even for interoperability, you're getting into a huge gray area in a legal sense.

      Similarly, no sane Wine developer will read leaked Microsoft source code, yet alone copy it. Again, no sane emulator developer will read leaked Nintendo code.

      Reading the code "colors" your creativity, and if you're tried at court and enough similarity is found in your code with the leaked code, it's game over.

      So, reading code and copying is not guaranteed to be legal, depending on its license. When this is done by a robot, it's still illegal (you're breaching licenses during the code generation process), and immoral and unethical on top of it.

      So, we don't overvalue humans, but overvalue AI, which is just informed search, BTW.

Guid_NewGuid 4 days ago

I find this whole topic very annoying, this is like the 3rd variation to reach the front page today. But it has made me realize why I instinctively dislike Free Software as a movement.

Copyright and licensing are bad, actually. Stop getting worked up about the idea of using courts to punish theft. Stop getting into a frenzy of arousal about the police kicking down doors to drag Billy Gates to jail because 80 characters of fast square root is theft but 79 isn't.

Where on earth is the ambition and vision!? Knowledge is public domain. A commons of knowledge is a public good. The cost of code copying is zero.

Sure in our day job we have to pretend to care about this stuff. But when did the ideological scope of what can be achieved become rules lawyering over license text.

Copy my MIT licensed code without attribution? I don't give a shit, go ahead, I hope it helps, in fact I want a truly public domain license but copyright law is so hostage to corporate interests no such thing exists in many countries.

Free the code.

  • sirsinsalot 4 days ago

    "A commons of knowledge is a public good."

    Yes but this copilot model takes that, adds value and doesn't itself join the public common good. Instead it takes it, and makes you pay to have it back in another form.

    If copilot were open source and the model released for the public good, being built of public data (in your scenario) we would have a very different conversation.

    • visarga 4 days ago

      It costs money to run a huge language model with low latency, in the loop with you - charging 10$/month is reasonable. You need multiple GPUs to load even a single copy. Copilot is adding something extra to the original code - it selects the recommendation from the whole corpus, while keeping the surrounding context into consideration and adapting to your variable names.

      And in reality 99.9% of the generated code has no long ngrams in common with the training set, it's already original. All they need to do is to enforce never to generate data identical to the training set, something that can be implemented with a bloom filter, then the generated code is impossible to attribute and should have no legal problems.

      In the end what do models like Copilot do? They act like culture - absorbing and replicating memes. They free the knowledge and make it reusable. They can act like a general purpose NLP tool for information extraction, classification and text generation. You can implement your ideas faster with it, don't need to label much data.

      It works even with just a prompt. Try OpenAi Codex to extract a receipt to see what I am talking about - it gives you the output in JSON. It's a new tool and a new interface to the computer. There are going to be plenty of open source implementations as well, some are already under training.

      • sirsinsalot 4 days ago

        You are incorrect. The code it generates is substantially the same (complete with comments) as the input, which is often sought without permission and in violation of license.

        And offers nothing back to those authors in return.

      • HeavyStorm 4 days ago

        Thank you for you this. I wouldn't never been able to articulate it better - people are just annoyed that someone is making m money and they aren't, without considering why that is.

        • sirsinsalot 4 days ago

          That's not at all what people are annoyed at and to be reductive like that is childish.

          The issue is consent of using people's code as input and paying nothing back.

          Also the parent comment is substantially technically wrong on a number of points, but feel free to use it as validation for yourself.

    • andybak 4 days ago

      And I really don't mind.

      I want every line of code I've ever written to be used as much as possible.

      I find "intellectual property" to be dubious to the core. I'm not confident enough in my feelings to be a zealot, but if I had to pick sides then I know which side I would pick.

      • JoshTriplett 4 days ago

        You're welcome to use a "do whatever you want" license on your code, and people should respect that. (Though even those licenses tend to require attribution, and copilot doesn't do even that.)

        Other people use licenses that try to create a commons where if you want to use it you need to share your own code, as a counterpoint to the non-commons in which you can't use code at all. And if people use those licenses, they should be respected as well.

        By all means, eliminate copyright, and let all code be copied freely. And until that happens, as long as proprietary code exists and doesn't let anyone copy it, respect copyleft licenses as well.

        • andybak 4 days ago

          A fair point. "What to do in a world where copyright already exists" is a tougher question to answer and one in which I tend to go back and forth.

      • sirsinsalot 4 days ago

        If an AI "listened" to music and created new samples for musicians to use for a fee, do you not think the original musicians should be compensated?

        The value transfer is basically theft.

        It isn't about the usefulness of the service, or even that something similar is a good thing ... it is about the execution and what it says about fairness for those that worked to create the data it depends on to produce value.

        • andybak 4 days ago

          I'm not sure I was clear enough when I expressed my doubts about the concept of intellectual property.

          Your musical example is playing out in the courts in multiple forms. The Marvin Gaye case, Led Zeppelin, Katie Perry etc.

          And each case pushes me further towards wanting to rip down the whole rotten edifice.

          We've lived through 4 or 5 decades of unprecedented expansion of the domain to which IP lays claim. Surely it's time for the pendulum to swing the other way?

        • HeavyStorm 4 days ago

          If I listen to music and create samples to be sold, is it theft?

          • sirsinsalot 4 days ago

            If your aim is to produce music that is directly derived from what you listened to ... yes. And this has been tested in court time and again.

            AI isn't being inspired or creative, it is mass scale and mechanised bootlegging.

            To compare it to human inspiration is naive or wishful thinking.

    • Guid_NewGuid 4 days ago

      Yes they haven't paid it forward, or back, but why fight on the occupier's territory. By calling for legal frameworks to enforce this we accept the language and terms of the dominant party. By using courts and the law and creating new law for copyright we actually move further from the goal of abolishing copyright and IP entirely.

      Every time we use courts to enforce IP we're strengthening the Walt Disneys and Nintendos of the world.

      (I accept I am in a group of like 3 people with this goal but it's my view)

      Edit: to expand slightly more on this. People should be able to decompile/reverse engineer whatever the hell they want. They shouldn't have to worry about armed goons kicking down their doors. Every time cases are used to strengthen the enforcement of IP/licensing, whether for the light (FSF) or dark (Micro$oft, Google, etc) the outcome is the same, we move further from that goal.

      • matheusmoreira 4 days ago

        > the goal of abolishing copyright and IP entirely

        Completely agree with you. It's the 21st century, once data has been published there is no controlling it anymore and all attempts to do so lead to the destruction of computer freedom. No doubt people all over the world copy code every single day with nobody even finding out about it. I'd rather get rid of all these monopolists than limit the potential of computers to whatever reality enables them.

        >I accept I am in a group of like 3 people with this goal but it's my view

        Now we're four.

      • handoflixue 4 days ago

        > Every time we use courts to enforce IP we're strengthening the Walt Disneys and Nintendos of the world.

        Can you actually point to substantial examples where Disney or Nintendo benefited significantly from a precedent set by an open source court case? Open source has been around for decades, so it should be trivial to find numerous clear-cut examples at this point... if your theory is actually correct.

        • Guid_NewGuid 4 days ago

          No, I honestly have no idea. I know nothing about the law and understand even less. I may be wrong about all of this, but if we take the (laughable) idea of justice being blind it stands to reason any precedent that protects a single open source developer also protects Amazon's code.

      • zzo38computer 4 days ago

        I also agree to abolish copyright and IP entirely.

        I agree that people should be able to decompile/reverse engineer whatever the hell they want.

        And if armed goons (whether goverment or if they are Microsoft or some company) kick down your doors, then they should be arrested for trespassing.

      • ozim 4 days ago

        Funny thing is ALL these legal frameworks are there to protect these 3 people like you.

        If there would be no enforcement of IP/licensing or legal enforcement - M$, Google etc. would not be nice - they would just come over and kick your doors cut your head off because they could do so. With legal framework they at least have to ask someone else.

        You just have to understand you don't stand a chance with your 3 buddies against 10 motivated attackers.

        Writing about "accepting terms of dominant party" you clearly never had a robbery at your house - imagine now corporations doing the same when there would be no legal frameworks.

        Read up on Dutch East India Company - or just Nestle - Microsoft or Google are still quite nice companies with Walt Disney and Nintendo.

        • Guid_NewGuid 4 days ago

          This is a slight misreading of my general political position. I am pro-government in general. I find the term "monopoly on violence" to generally indicate someone who lives a very cosseted and easy life who can spend time getting mad about like, seatbelt laws or speed limits, so I use it somewhat tounge-in-cheek.

          There's quite a lot of possibilities between DMCAs of youtube-dl repositories and Big-co death-squads decapitating people in their homes. I'd prefer where we are now to the Brazil end of that spectrum but we can imagine better models of digital and intellectual 'property'.

      • JoshTriplett 4 days ago

        Proprietary software is more than willing to use those legal frameworks. Unilaterally disarming while your opponent does not is a losing strategy.

        As long as copyright exists, copyleft should be respected.

    • Varqu 4 days ago

      People (github in this case) do something to make your life easier so that you can save time for the price of 1 latte per month and you complain?

      Software Developers seem to be the most whining profession in the world and I despise this attitude (while being a developer myself)

      • tuckerman 4 days ago

        People aren't whining because the price is too high, they are upset because some (myself included) believe Microsoft is exploiting developers by copying their work against their wishes and then turning around and selling other developers a product which may or may not be generating code which violates copyright/patent licenses. A developer who inadvertently uses a copilot suggestion which gets them into hot water is going to be spending a lot more than a the cost of a latte to defend themselves in court.

        • sirsinsalot 4 days ago

          This. It is a matter of (a) consent and (b) compensating people that, without their data, the model would be useless.

        • Varqu 3 days ago

          If someone contributes to open source, then they shouldn't be surprised that someone else uses this code. The licensing hell is something that shouldn't belong in IT.

          • tuckerman 3 days ago

            When source code is made available under an open source license, there are strings attached; attaching those strings is the author’s right! Assuming you or any company has the right to do anything you want with that code without respecting the license is immoral.

            That “licensing hell” (i.e. strong copyleft protections) is the reason we enjoy such a vibrant and large open source community today. I don’t take it for granted that open source as we have it today was inevitable: it required a lot of work and I’d hate to see that slip away.

            • Varqu 3 days ago

              The licensing hell is exactly the problem. If someone contributes to open source, which is a praiseworthy activity, then they do it with the intention that anyone can use this code but also re-adapt it, bundle in new products - it's all about bringing humanity forward.

              And all those "you can do this, but you can't do that" licenses are things that only invite lawyers to the tech world. IMHO, licensing open source is a bullshit activity.

              • tuckerman 2 days ago

                You are making a lot of assumptions about what someone wants/intends when they contribute to open source codebases. If an author chooses, for example, the AGPL, I think they clearly had a different intention. Like it or not, not everyone wants to dedicate their work to the public domain.

                • Varqu 2 days ago

                  Then why contribute to open source if you want to still be a gatekeeper? In that case better to fork it and work in a private repo.

                  • tuckerman a day ago

                    Every large successful open source project I know is explicitly not in the public domain/licensed CC0. I understand that there are some people that are very against copyright/intellectual property but you surely must interact with a large number of projects/people that disagree.

      • Philadelphia 4 days ago

        Yep, anything useful has to be legal and welcomed. Microsoft should start breaking into people’s houses and sorting their underwear drawers for them while they’re out. Million dollar idea!

    • jppope 4 days ago

      > "Yes but this copilot model takes that, adds value and doesn't itself join the public common good. Instead it takes it, and makes you pay to have it back in another form."

      $10/ month ... how much to you think this thing cost to build, and to maintain?

      • nightski 4 days ago

        That's the whole point. Without the data, it would be worthless. Microsoft is not paying the full cost because it is ripping the data without asking consent. I'm not saying what they are doing is illegal per se, but it's definitely immoral.

        • Guid_NewGuid 4 days ago

          But why is it immoral? All that code is still out there, if I had the time and the resources I could build a language model. Unlike commons in the real world (e.g. land, fresh water, etc) a code commons is purely additive. With the release of Copilot (which I don't intend to pay for or use) nothing has been destroyed, instead we'll get more code for less work where companies do pay for their developers to use it, some might even find its way back into the commons as new open-source code (whether more code of copilot generated quality in general is an unalloyed good is left as an exercise to the reader).

          • bayindirh 4 days ago

            Because copilot is violating the terms I put for my code. My code is GPL. It cannot be put into projects with incompatible licenses. That’s my code, and I share it with strings attached. You can’t just copy my code and sell to other parties no strings attached.

            If that’s fine and dandy, Microsoft should also train Copilot on their source code repositories, so we can use that knowledge, too.

            • ShamelessC 4 days ago

              I guess I've just never had to work with GPL code before, but the complaints essentially only seem to be coming from coders who like this style of open source where you still get to make it kind of a pain in the ass to actually use your software.

              I guess you have the right to do this, but it doesn't mesh at all with why I personally contribute (without any expectation of attribution), which is that (much like stack overflow), programmers mostly agreed awhile ago that it's just easier if we all share.

              So much of what's wrong with the modern economy comes down to seeking rent on an idea that should just be public knowledge.

              Sorry if my viewpoint towards your work is apathetic, but the whole field is already infested with academics who only understand citation as a useful metric. Further, the point remains that anyone with enough money could do this - not just Microsoft (Salesforce has released several models for python competitive with Copilot). Times are changing - maybe don't share code anymore? I imagine in ten-twenty years this whole conversation will seem pretty petty though when your entire program is trivially recreated from its GitHub description without ever needing to have seen it in the first place.

              • imtringued 4 days ago

                >from coders who like this style of open source where you still get to make it kind of a pain in the ass to actually use your software.

                Most "coders" don't publish anything if they don't have to. Using proprietary code is an even worse pain in the ass because you don't have access to it.

                The point of the GPL is to force people to share their code.

                >which is that (much like stack overflow), programmers mostly agreed awhile ago that it's just easier if we all share.

                >So much of what's wrong with the modern economy comes down to seeking rent on an idea that should just be public knowledge.

                The entire point of the GPL is to force e.g. hardware vendors to share their driver code under the GPL or any other opensource license to be included in the Linux kernel.

                >Times are changing - maybe don't share code anymore?

                The entire point of the GPL is to force people to share their code.

                > I imagine in ten-twenty years this whole conversation will seem pretty petty though when your entire program is trivially recreated from its GitHub description without ever needing to have seen it in the first place.

                What the hell are you talking about? If that is the case then why did humans ever bother with extensively documenting and testing their software if three sentences are enough to encode it? Your perspective is particularly annoying because copilot isn't learning to write its own code, it's entirely reliant on an army of unpaid software engineers publishing code on the internet. If it knows how to recreate a project from just the GitHub description it basically just had the codebase inside its model to begin with and merely pretend that it did everything on its own. That is actually a form of rent seeking.

                • ShamelessC 3 days ago

                  > extensively documenting and testing their software if three sentences are enough to encode it

                  Was just hyperbole for "from plain English specs/requirements".

                  I'll admit to being uninformed about GPL, but your understanding of large language models is also limited. They actually learn to interpolate between data points meaning they can compose sequences not found in the training data. Further, GitHub added a feature that checks existing code for a match and rejects predictions if any match occurs.

                  • bayindirh 3 days ago

                    Nobody disputes their ability to interpolate, I think (at least not me), but the problem is the starting points for these interpolations contains GPL licensed code, hence it derives GPL licensed code.

                    This derivation brings GPL in, and the model doesn't understand this. As a result, every time a GPL training data is mixed into the interpolation, you're converting the code GPL, or if you're not converting your code to GPL, you're violating GPL.

                    It's plain and simple.

                    On the other hand, I'm hearing "we'll write the specs, and computer will just auto-generate it" gospel since 2002. This time it won't be different. Human brain, intuition and creativity is beyond algorithmic modeling.

                    So, no, computer will not autogenerate the code from specs. It might link boilerplate together, which can be already done today.

              • namose 4 days ago

                But GPL owners aren’t seeking rent, so you’re just asking those who believe all code should be open source to unilaterally let large companies use all their code, while they reap no such benefits from the large companies

                • ShamelessC 4 days ago

                  Like I said, I understand the premise, just not the emotion behind why you want to release code to the public at all if it isn't simply a donation to all human knowledge.

                  There are better ways to gain notoriety as a coder than by essentially legally requiring your name is attached to a thing for all time.

                  I personally would be thrilled to know my work was valuable enough to be used by a company because I really just couldn't care less that about the "credit" part of it. I know what I've done and don't have anything to prove.

                  • bayindirh 4 days ago

                    It's not an emotion. It's a stance.

                    > Why you want to release code to the public at all if it isn't simply a donation to all human knowledge.

                    On the contrary. I donate my code to all human knowledge. Just not to corporate's private code corpus. I intend my code to be open to all humans to run, study, modify and share, forever. I don't give you the freedom to take it to a closed domain, and not share the further knowledge you derived from my code. If your primary intention is to return this knowledge to human kind, GPL is an enabler, not an hinderer.

                    > I personally would be thrilled to know my work was valuable enough to be used by a company because I really just couldn't care less that about the "credit" part of it. I know what I've done and don't have anything to prove.

                    I personally don't care whether my code is good enough to be used by a company. If I want to contribute code which can be used by a company, I can contribute to MIT projects (which I also do). I don't have anything to prove.

                    I release my code with the hope it'd be useful for somebody, and I don't want it to be included in any permissive or closed source base. Doesn't matter it saves your beef for today or not. That's not my problem. Go write a better one, then. I don't care.

              • bayindirh 4 days ago

                When actually using the software means "taking it, adding it to a commercial software and never telling anyone, incl. the developer of the original code, and not giving any attribution whatsoever, and earning money over that piece of code", yes GPL makes it hard. It's by design, and this is why I license anything and everything I put in the open GPLv3+.

                If anyone contributes to a GPL software, they're clearly attributed. Moreover, Git makes this attribution irrevocably visible. Before that patches were sent in with mails, and mailing lists were open, so attribution was also visible back then. So, no, GPL makes attribution visible, and irrevocable, by design.

                GPL doesn't seek rent over any idea. It forces ideas to stay open, forces you to put your improvements back in the open. You'll be attributed, your code will be in the open all the time, and nobody can grab and run your code and hide into its software to make any kind of unjust profit, which makes "Open Source" coders visibly and literally wince and cringe, because they can't grab and paste a piece of code and make their days easier.

                Again, this is by design.

                Sorry if my viewpoint towards your view is apathetic, but the whole field is already infested with programmers who only understand being able to copy and paste code left and right to develop software as a useful metric.

                It's not about Microsoft, it's just about being honoring a license. A case-tested, lawyer written, trusted license which many developers chose for licensing their work. It's a breach of contract, plain and simple.

                As I said elsewhere, some of the code I'm writing is backed by papers. I don't obfuscate my papers to prevent anyone from implementing it, but if I open my reference implementation as GPL, this is because I don't want someone to grab it and run with the code, change it a little, put into a closed source program and call the idea theirs, possibly patenting it in the process.

                I have a serious piece of research, my Ph.D. actually, and I'm still developing the code powering the whole idea. I was planning to open it under GPL license, to force its evolution in the open, but I understood that people don't appreciate that. So, probably I won't open the code. Binaries maybe. Highly obfuscated, protected binaries, probably.

          • Banana699 4 days ago

            You can say the exact same about piracy, when I take a game or a pdf book from a pirate site, nothing is destroyed, nothing is subtracted. The server still owns the data and can copy and share it infinitely, all that changed is that I now have a copy too, and I use it to enrich my own intellectual life.

            The argument has 2 main flaws

            1- It's not symmetric. The massive corporations with paid armies of lawyers aren't hugging trees and talking about how "Knowledge is - like - just free, man" with dreamy eyes, I would love if they were like that but no. They are constantly on the lookout for anyone remotely using their work. They don't deserve the language of free knowledge and open data, that would be like extending peace to an invading army, or defending a tyrant with the lingo of free speech. He Who Lives By The Sword Dies By The Sword.

            2- If the person(s) behind the data or the code lives off their intellectual labor, you are ripping them off by using it without compensation. Sometimes the compensation is as little as simply citing them, just mention their names so that they get visibility and prestige they deserve for toiling in the intellectual field to produce the ideas and brain patterns you use and benefit from.

            The whole thing is a huge mine field, digital reproduction of information and abstract structures is an extremely novel phenomenon that breaks tons of human intutions about how ideas and thinking work and spreads. But the involvement of a corporation allows you to shortcut the entire thing by invoking (1), also known as the fundamental theorem of ethics : Do Unto Others As You Wish They Do Unto You. Do corporations allow you to freely take and mix their intellectual produce and sell it back into them ? No ? then they DON'T get to do that either, except maybe among themselves.

            What I find strange is how nobody talks about how inherently repulsive and ugly the "Copilot" philosophy is, how it is fundamentally a dead end and how much it betrays a lack of understanding of how programming works on part of those who fund and market it. Code is different from natural language, the fact that we call the symbols we write algorithms in "Programming Languages" is purely a historical incident. Code doesn't have the redundant resilience and error-correcting properties of natural language, removing or modifiying or adding even a tiny bit to correct code can give you atrociously-slow correct code, or full-of-security-holes correct code, or non-correct code, or any of the 3 mixed together with other disasters. If you're going to steal people's open source code, at least do somthing interesting and intelligent with it, don't be a lazy fuck and apply an NLP technique to a highly formal and rigid domain then smile smugly and charge people for it as if this going to end anywhere useful.

    • jazzyjackson 4 days ago

      If it was just published as a public good it would probably be as illegal as sci-hub

      I consider the $10/m as a donation to the microsoft legal defense fund to allow free access to accumulated knowledge.

      • sirsinsalot 4 days ago

        To allow access to a service that grants you the accumulated knowledge's output in small bits.

        I'm all for a world where these tools help developers, but i'm not here for a system that isn't open. I want to own my tools.

        Copilot is a bit like musicians paying a monthly fee for access to a loop library. Except all the loops are rip-offs of other peoples hard work and there's no effort to compensate them.

        If I made an AI that resampled music into derivative tracks ... you can be damn sure i'd be sued until my ears bled.

        • jazzyjackson 4 days ago

          > monthly fee for access to a loop library. Except all the loops are rip-offs of other peoples hard work and there's no effort to compensate them.

          the analogy works if there were an open access library of music (restricted licenses tho they may be…) that was available to search and browse without the tool

          then an auto-composer could suggest music to fill in gaps in my own composition, using snippets of audio from the otherwise freely available library

          that's a plug-in I would pay for too, but yea if my "no commercial use allowed" melody made it into someone else's composition, I would want my license terms to be surfaced to them as well

          except I personally wouldn't want to live in a future where every line of code has to have some claim of "who authored this function first" or "who wrote this melody and rhythm first", pursuant to licensing terms in perpetuity. that sounds terrible.

        • ece 4 days ago

          I'm all here for openness and tools you own, so there could be a FOSS implementation. Microsoft could just open it up and still charge the $10/mo for hosting the model, and I hope that happens.

          Making the tool better without verbatim copying and making it more effective should be the priority, IMO. Trying to control it too much would be missing the point of the tool.

        • throwaway675309 4 days ago

          "Except all the loops are rip-offs of other peoples hard work and there's no effort to compensate them."

          Except all the loops are smaller pieces of larger loops which you as the developer than mix together in new ways to create your application. FTFY.

          • sirsinsalot 4 days ago

            Even if the sample was one snare hit. Someone worked hard to tune that snare, mic it, record it and process that sound.

            They should be compensated.

    • spullara 4 days ago

      It absolutely adds to the common good in the form of people using it to write more open source code.

      • sirsinsalot 4 days ago

        Seeing as copilot is known to output code thats a straight copy from non-permissive code where the author's permission wasn't obtained ... I'd say it is helping you steal from code authors without giving back (as there is no obligation to open source your code).

        Given Microsoft's record of persuing IP violations aggresively through the legal system, I'd say the whole thing is ironic.

  • monocasa 4 days ago

    The issue is that whether the free software people want it or not, the copyright system over code exists, and historically has been used as a cudgel against smaller players. If we got rid of copyright over code entirely I'd totally be down for this. And IIRC RMS has said the same thing; that he'd be in favor of the removal of copyright over code as a concept even if it meant neutering the protections of the GPL.

    Until that happens, and copyright protections are still used by larger entities, using the same system to protect yourself and (more importantly) your users isn't turning your back on your ideals, but instead simply adjusting your strategy to the current material conditions. Remember that Google v. Oracle (while ultimately a win versus what could have been) was a step back, with de minimis claims left on the table as not a valid defense. The play field is heavily slanted towards the big players and software freedom requires every tool it can put it's hands on at the moment.

    • zzo38computer 4 days ago

      > The issue is that whether the free software people want it or not, the copyright system over code exists, and historically has been used as a cudgel against smaller players. If we got rid of copyright over code entirely I'd totally be down for this. And IIRC RMS has said the same thing; that he'd be in favor of the removal of copyright over code as a concept even if it meant neutering the protections of the GPL.

      As someone else asked, I would also want a citation, but I agree.

      Actually, I want a license that you can do pretty much anything you want to do with it (including: lack of attribution, distribution without source codes, distribution with source codes (whether they are the original source codes or reconstructed), lack of copyright notices, reverse engineering, circumvention of your own copy and write reports about anything you want to do, to use or not use the software (and to modify or not modify) at your choice, etc), but that you are not allowed to add further legal restrictions to it (with a few exceptions dealing with trademarks (but not all) and allowing conversion to GNU (A)GPL 3 and CC-BY-SA 4.0 if you are able to satisfy the conditions of those licenses) or to derivative works, and that if someone will try to use legal processes against you relating to this, then anyone can countersue.

    • Guid_NewGuid 4 days ago

      Interesting that he's said that, I wasn't aware.

      I think at its root the problem is copyleft is a mirror image of copyright. It relies on and replicates all the cultural and legal requirements and constraints of the copyright model and curtails an imagining of other possibilities. Every sentence or thought spent on copyleft is misdirected in my view.

      Which is why I find Microsoft doing this (potential) en-masse license violation and then a bunch of GPL folks getting mad pretty funny overall. I just find the high and mighty tone annoying, like sure, they've (allegedly) screwed you, but they're going to (theoretically) get away with it because they're rich and powerful, sorry that didn't turn out how you wanted.

      • Kbelicius 4 days ago

        >I think at its root the problem is copyleft is a mirror image of copyright.

        That is the (only)point of copyleft. If it weren't for copyright it wouldn't exist. Fight fire with fire, that sort of thing.

        • rcxdude 4 days ago

          I don't think that's true: copyleft is right to repair for software. Even if the software is not copyrightable without the source code users are still relatively powerless. (Incidentally this is related to why patents were created: not to constrain or encourage innovation, but to get people to publish inventions instead of keeping them secret). If copyright were abolished and so too copyleft destroyed, linux users freedoms would probably materially go down, not up (though in general user freedom would marginally increase because most software is not copyleft).

          • imtringued 3 days ago

            Copyleft is formalized code sharing. Pretty much an excuse to tell people to share their code.

    • matheusmoreira 4 days ago

      > And IIRC RMS has said the same thing; that he'd be in favor of the removal of copyright over code as a concept even if it meant neutering the protections of the GPL.

      Do you have a citation? I was under the impression he defended copyright because copyleft depends on it.

  • marpstar 4 days ago

    > Copy my MIT licensed code without attribution? I don't give a shit, go ahead, I hope it helps

    This is my feeling as well. I don't build stuff in the open so that I can get bent out of shape at someone not properly licensing it. It's in a public repository, FFS... I assume that if anyone even notices my repo, that they may copy/paste a few lines out of my solution if it helps them.

    • cududa 4 days ago

      Exactly! Do they really think every single line of their code is so precious it requires attribution? If I publish code, I assume it might get pushed, pulled, refactored in a million ways and no one will ever know my name’s attached to it. And guess what? I DONT’T CARE. It’s code. Not a self-constructed monument to my own intelligence that needs a little placard with my name on it to follow around some clever async function I wrote

      • georgeecollins 4 days ago

        If its a couple lines of generic code, of course. That's also an indefensible copyright, btw. But if its hundreds of very specific likes of code written to do one thing under a license you don't follow, that's something else.

        This isn't just an issue of code. You can write a program that combines songs, or combines novels creating a different work that has sections that are essentially the original protected work. I don't think the authors of those novels are going to be OK with you selling or giving away a version of their work just because an AI edited it or combined it somehow.

    • sirsinsalot 4 days ago

      But this isn't everyone's feeling. And they have a right to choose how their work is used. Thats the basis of commerce being possible here.

      The mechanised license ignorance and the way original authors are not compensated is the issue.

      If you had a repo you'd worked really hard on, and offered a commercial license or GPL depending on the use (so you can be funded to work on it) ... do you think it is fair that copilot ingests that code and allows others to benefit from your work and knowledge without the commercial license as you intended?

      Note how Microsoft always throws out the capitalism "rules of engagement" when it benefits them and undermines everything else. The fact we are even trusting the situation Microsoft are creating is dire, and speaks to the short memory of our industry.

      • alar44 4 days ago

        Saying an auto complete of a line of code is "using their work" is a massive stretch.

  • georgeecollins 4 days ago

    You may not care about licensing or copyright, and I imagine many others who create code under an attribution license don't. That's still not the same as saying "copyright and licensing are bad." Too many businesses depend on them to exist for me to have that opinion.

    If an AI takes a copyright work and makes its own version-- say combining two novels by popular authors in a way that is unique but keeps large parts of the text intact, can I sell that? I think if I were the authors I would be unhappy.

    Also, how hard would it be for copilot to include a comment saying "// I got this line from x repo" when you are copying from a new repo? I am guessing not hard at all. Then at least the user would be aware of where their code was coming from and could be expected to make a judgement. If the line is "let a = b" then probably no worries. But if it is hundreds of lines of a simulation, all from the same repo with no changes, then I think some attribution is good for both parties.

    • Guid_NewGuid 4 days ago

      Don't get me wrong, I know this (copyright abolition) is pie-in-the-sky stuff. I'm using an anon account to post because even advocating for it could be troublesome for employment. But I don't accept we have to be meek or have small goals in talking about this ideological stuff. And I think this has made me realise why I find the Free Software vision so disappointing and weak. And hence why I find all these (ideologically) Free Software aligned takes of sending Billy to jail for a thousand years so irritating.

  • bayindirh 4 days ago

    > I find this whole topic very annoying, this is like the 3rd variation to reach the front page today.

    Me too. I also find three iterations of the same subject not enough discourse. We need to take this matter more seriously.

    > But it has made me realize why I instinctively dislike Free Software as a movement.

    On the other hand, this whole discourse reminds me why I absolutely love Free Software as a movement.

    > Copyright and licensing are bad, actually.

    This is why we have "Copyleft".

    > Stop getting into a frenzy of arousal about the police kicking down doors to drag Billy Gates to jail because 80 characters of fast square root is theft but 79 isn't.

    And, stop getting into frenzy of arousal about being able to use any and every code piece you see elsewhere in any project regardless of its license.

    > Where on earth is the ambition and vision!? Knowledge is public domain. A commons of knowledge is a public good. The cost of code copying is zero.

    This is why GPL is important. It forces knowledge to evolve in the open, stay in the public domain and help it actually makes public good. It also doesn't hinder ambition and vision by not taking it to private domain, and keeping it open to everyone.

    > Sure in our day job we have to pretend to care about this stuff. But when did the ideological scope of what can be achieved become rules lawyering over license text.

    You might be pretending to care about this in your daily job, but we really care. Some of the projects I take part can't ever include GPL code (because the projects are MIT licensed). These texts are court-tested licenses, so they're as proper and serious agreements as the EULAs of "particular" software companies.

    > Copy my MIT licensed code without attribution? I don't give a shit, go ahead, I hope it helps, in fact I want a truly public domain license but copyright law is so hostage to corporate interests no such thing exists in many countries.

    If I want my code to be copied and possibly closed, I'll license it with MIT or BSD-0 and forget about it, but if I'm licensing my code with GPL3, it means I want that code to stay open. As a license, I expect anyone using that code to respect that license.

    > Free the code.

    Yes, and respect the license the author selected for his/her code.

    • gopiandcode 4 days ago

      > > Copyright and licensing are bad, actually.

      > This is why we have "Copyleft".

      This. Exactly. It's suprising how many developers have strong anti-copyleft/anti-GPL opinions while being completely uninformed on what they're talking about (but hey, I guess "uninformed but strongly opinionated" is HN in a nutshell). The purpose of GPL and other copyleft licenses is exactly to combat the insanity of intellectual slavery.

      • imtringued 3 days ago

        Pretty much, copyleft is turning copyright against itself.

  • notacoward 4 days ago

    I suggest you read up on the history of free software and open source. It exists as a reaction to intellectual enclosure, to prevent that ill and create greater freedom of ideas. Yes, it uses the tools of copyright to fight greater ills of copyright, because those are the tools available, and actions like these are necessary to keep the enclosure from happening all over again. Anyone who has actually studied the matter for even five minutes can see how silly the "free software is anti-freedom" FUD is.

  • mplanchard 4 days ago

    If that's what you want, you should license your code not under MIT, but under a license that allows replication/distribution without attribution. Meanwhile, others who do care about such things can license their code under licenses that require attribution/copyleft/etc.

    • Guid_NewGuid 4 days ago

      But I can't really because the legal systems for it don't exist. I can't relinquish anything https://softwareengineering.stackexchange.com/questions/1471... (CC0 looks closer but still doesn't do what I'm after).

      And I can't because there are a bunch of, for want of a better word, dweebs who care about this stuff. I don't give a single solitary frick about the finer points of MIT vs GPL vs BSD 3 clause vs CC-BY-NC or whatever-the-hell. But y'all are forcing me to care by making the legal frameworks for software ever more strict and confusing.

      I take a maximalist view, don't want the code copied, sliced up, re-used in any form whatsoever with no credit? Don't post it on a code sharing site. Like I say in the OP, in my job I obviously have to follow the rules, but on an ideological level I'll ignore them where I can get away with it outside of work.

      If you don't want the code to be used, don't post it online,

      • tuckerman 4 days ago

        I'm curious if this view is software specific or relates to any work released online? For example, do you feel similarly about a novelist or graphic artist? I reckon at least a few software engineers look at what they produce not entirely differently from how an artist or writer looks at theirs.

        • Guid_NewGuid 4 days ago

          It's a good, and thought-provoking, question.

          First to be flippant the idea of a software developer with that view sounds so unbearably insufferable and full of themselves I hope never to meet one. All code is terrible, be less attached.

          Stream of consciousness: Should artists or writers be paid for what they produce? Yes. So why not software developers? I'm paid for what I produce. But then I don't release the stuff I'm paid for for free on the internet. But I'm against DRM, I also think Winnie the Pooh shouldn't have IP protection (now expired). What makes art or literature a different commons from software? I also think all scientific journals should be available for free. Do artists and writers have an alternative route to make money from what they publish, what is the artistic or writer equivalent of open source? I think this is the crux of it, if we're going to do open source let's actually do it and stop being precious about it but this only applies to freely-entered open source. So does that mean I support some form of copyright after all? Then again some old out-of-print books will sell for Amazon for like $4000 so we should be able to copy those for free.

          Ultimately it's a question of what a vision for society without copyright would look like. I think software is uniquely placed to start exploring that idea. How would we make a living of software if anyone could reverse engineer (even our proprietary) code freely and safely?

          • tuckerman 4 days ago

            The reason I ask with writers in particular is because, like code, having access to it necessarily means that the viewer has the ability to copy it as much as they'd like. Unlike software, however, there is no ability to keep the source code private in a book while still having users.

            I definitely agree that copyright protections have become far too strong but I don't think we can really ever know if we would have be able to build the strong open source community we have today without coopting the copyright system for copyleft protections. At the same time, perhaps we are past the point where it's necessary and now it's holding us back... it's entirely possible!

            To the first thought, I personally see some coding as a creative act (some is doing _a lot_ of work there though). It's not because I fancy myself a Picasso but because I think some (again, doing a lot of work!) solutions/ideas have a bit of their creator in them and, for those works, the author should be able to exert some control over their works. I think this is more philosophical than legal/political, but I would disagree that its flippant :)

      • mplanchard 3 days ago

        You don’t need an “official” license, although I agree creative commons is closer to what you want. I feel like you can pretty easily write a license file that explicitly waives all of your rights and responsibilities. Such simplicity is after all what made MIT such a popular license, even though it’s not substantially different from Apache.

  • vajow46267 4 days ago

    So glad this sentiment is becoming more common in the OSS community! I MIT license everything, if someone wants to make money using stuff I wrote that's awesome, and I wish them the best.

    I don't think users owe me anything at all. If people want to PR back that's cool but if not that's cool too.

  • wcoenen 4 days ago

    > I want a truly public domain license

    I think this sentence contradicts itself.

    A "license" implies that there is a copyright holder who allows usage of the work under the terms of said license.

    While "Public domain" implies that there is no copyright holder (e.g. because the copyright expired, was explicitly waived, or is for some other reason not applicable).

    If you want to put your work in the public domain, you can do so; simply include a note saying that you dedicate it to the public domain.

    • Guid_NewGuid 4 days ago

      You're right that it does contradict itself, but the unfortunate situation is that public domain declarations don't work and would make it harder for people to use your code safely in the current licensing model. The closest options are Unlicense and CC0 afaict and both don't work in many European jurisdictions.

      I just want people to be able to take my code and do whatever the hell they want with it (including commercially) and optionally contribute to it. Having a license currently makes that easier but every time the Free Software lot going zooming off into the weeds of GPL v3 versus GPL v2 versus LGPL my eyes roll back into my head and I internally start screaming "get a life!".

      • imtringued 3 days ago

        You use GPL for desktop apps, AGPL for webapps and services and LGPL for libraries. Who cares about the specific version, just pick one of them.

  • nonbirithm 4 days ago

    I think because this kind of ML is so new, we have no choice but to frame arguments for/against in terms of the structures that have been in place for decades past (copyright, open source licenses). We don't yet have the legal language to express dissent against ML in clear yes or no terms.

    I think if there were an option to add a machine learning clause and ask individual creators if they wanted it applied in that context, we would see a considerable amount of uptake. It's just that we couldn't forsee this progress happening so soon, and the issue is still not visible enough. I think it's only a matter of time before the culture catches up and new creative works in the coming years are excluded from training sets by their authors with clear and direct language.

    By that point there would be no way to argue "but they shouldn't care, they licensed it like this, so I'm assuming it's fine for ML use."

    If copyright is not enough to stop another entity from using a person's data for training, then some other protection should be invented that does.

    • popcube 4 days ago

      because big companies want this, we absolutely will accept that company can get copyright from AI

  • Schroedingersat 4 days ago

    The problem with this is 'freeing the code' in this instance leads to microsoft building a wall around it and asserting complete control in a few years.

    Copyleft exists for a reason and without the ongoing fight for the commons we lose it all.

  • nmfisher 4 days ago

    I totally agree, this reaction seems very hypocritical. If some rinky dink startup did exactly the same thing - as they are entitled to do under the licences of huge swathes of code on GitHub - hardly anyone would bat an eyelid. But just because it’s a Microsoft-owned company, it’s somehow verboten?

    That seems totally inconsistent with decades of people clamouring for more openness/liberty when it comes to IP rights.

    • bayindirh 3 days ago

      Regardless of the size of the offender, if you're not respecting the terms of a license, you'll get pushback. It's natural.

      If you're a company which executes Embrace Extend Extinguish on any technology you like yet don't own, you'll get quadruple amounts of pushback. That's normal too.

      Microsoft isn't saint, and copilot is breaking a lot of legal, ethical and moral rules. It's doubly-natural to give reaction to this.

  • progman32 4 days ago

    I see the free software movement as a variant on your ideals but rooted in practicality given the current environment.

    • Guid_NewGuid 4 days ago

      I think we share a lot of the same goals but they presuppose openness based on violence, if you don't do what their license says exactly then they're going to use lawyers and courts and the state's monopoly on violence to make you comply.

      I think at a fundamental level this abandons any vision of a true commons since as copilot discussions reveal the well is now polluted (to mix metaphors) and though in some frames the code is more free you certainly won't be if you fail to pay the penalty levied in a civil case for misusing it.

      • imtringued 3 days ago

        That is true of any license.

  • kube-system 4 days ago

    > Free Software

    > public domain

    These are incompatible concepts. RMS's vision of 'free-as-in-freedom' software doesn't let people do whatever they want. It forces those who distribute binaries to also distribute source. This is not possible with a public domain work.

  • dougmwne 4 days ago

    In this thread: many engineers nervously sweating. The moats are drying up and the wizards are about to be thrown out of the castle. This tech is the first product in a long line of products that will massively lower the barrier to entry. It has been a good run, but it was never going to last forever. We are not part of the capitalist class and were never going to be.

    • ThalesX 4 days ago

      The world might change, but software engineers have been working with and within change their entire careers presumably. I think we'll be OK, as people, no matter what happens.

      I was sweating nervously before I started using Copilot awhile ago but I've stopped since because A - it really doesn't replace me, tried really hard; B - I don't sweat nervously for IntelliSense either.

      There's also C, where being of an entrepreneurial mindset, I'd love the opportunity to hand over the software to an AI dev and just direct the implementation to my desire until I have a working product. I bet I could secure a higher room in the castle if instead of coding for 8 hours per day I could work on n products with capable AI Software Engineers. We're not there yet though.

    • imtringued 3 days ago

      Is this supposed to be a joke? You're arguing that software developers are being replaced by themselves because ML just takes in training data and is entirely dependent on real humans to provide that data. If anything, this will simply result in another productivity explosion where software developers will get paid even more.

    • LordDragonfang 4 days ago

      Copilot replaces code monkeys, not engineers. Ultimately it's just faster stack overflow, proper software engineers and system architects are going to be just as in demand as they are right now for the foreseeable future. At the point at which that stops being the case, we'll have much bigger societal and existential problems (because it implies the singularity is nigh)

      (You're correct on not being part of the capitalist class, though)

      • dougmwne 4 days ago

        There are a lot of code monkeys out there and I might be one of them. That island of job security seems like it will be shrinking.

        • account42 3 days ago

          I don't agree that we are that close to that (or that Copilot is a significant contribution to bringing it closer) but ultimately eliminating mindless jobs is a good thing. The problem only comes from the expectation built into our current society that people need a job in order to be allowed to survive. Or to put it another way, the profit from automating away jobs en masse should be shared with the whole of society, not privatized.

    • notacoward 3 days ago

      > the wizards are about to be thrown out of the castle

      You have completely misunderstood who the moat-building wizards are. That's proprietary software. Heard of it? I ask because a lot of young people nowadays don't seem to understand how dominant it used to be and the threat that it represented. (Plus a few older folks who never knew, forgot, or deny reality for other reasons.) We've been trying to throw the wizards out for decades, by making code available to everyone and making sure it stays that way via licensing. Code without a license is subject to re-enclosure as important enhancements - even necessary ones, such as security - are made behind locked doors. The open version becomes out of date, the proprietary one wins, we're back to wizards and moats. What Microsoft is doing is the same thing for code that was supposed to have legal protection so it could remain open and avoid that fate. It's taking magic back from the people and making it exclusive to the "capitalist class" (eye roll) again.

  • ssalka 4 days ago

    Information wants to be free

VoodooJuJu 4 days ago

It is now proven that copilot returns code from codebases with non-permissive licenses [1].

I'm curious - what are the legal implications of this going forward? I've so many questions.

1. Will Microsoft ever face lawsuits for these license violations?

2. If so, who/how? Class-action?

3. Will copilot be forced to open-source in the future? Under which license? Some open source licenses are incompatible with others, but copilot uses code from probably every OSS license conceived.

4. If Microsoft faces no justice, will we start seeing more OSS license violations? Will Google start using AGPL-licensed code?

[1] https://news.ycombinator.com/item?id=27710287 | Copilot regurgitating Quake code

  • mhaymo 4 days ago

    That regurgitated code exists on Github exists under an MIT license: https://github.com/jethrodaniel/fast_inv_sqrt

    "jethrodaniel" does not appear to have the copyright to offer that license, but it's hard for Github to determine that in general, so I doubt they would be liable for the error.

    • monocasa 4 days ago

      Even if it's somehow available under an MIT license (which is questionable on the part of jethrodaniel), there's still infringement. MIT isn't public domain, it still has

      > The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

      Replicating it without complying with those terms is still infringement.

      • sirsinsalot 4 days ago

        this. People are being willfully blind here, like cult members looking dead-eyed at their leader and chanting "This is great" as they drink the kool-aid.

        And from Microsoft no less, once outcast for mass poisoning.

    • vorpalhex 4 days ago

      > but it's hard for Github to determine that in general, so I doubt they would be liable for the error.

      Please insert that meme, "That's not how that works. That's not how any of this works!"

      The legal system is permission based, not forgiveness or "I didn't know" based.

      • minhazm 4 days ago

        Actually the legal system is evidence based. Microsoft has evidence that the code they are producing is licensed under MIT as far as they can reasonably know. There's no definitive way to know that who actually owns the original copyright. I could grant permission to use my repo, but maybe I got that code from someone else, who then got it from someone else and so on and so forth. It's a similar situation with stolen goods, if you unknowingly purchase stolen goods you usually cannot be charged for theft as long as there aren't obvious signs that it's stolen such as the goods being priced far below market value.

        • sammax 4 days ago

          Microsoft has evidence that the code they are reproducing is MIT licensed, so are they intentionally violating that license or does this AI thing include the license and attribution in every snippet it generates?

        • monocasa 4 days ago

          Major aspects of copyright infringement are strict liability, like a lot of civil actions around damages. It doesn't matter if you thought it was OK, there's still a damaged party that needs compensation according to the law. At best you'll simply avoid the criminal and punitive penalties.

        • BaculumMeumEst 4 days ago

          Exactly, that's why Pornhub hasn't had any liability issues arising from where its content comes from either. It's just too darned hard to tell.

          • monocasa 4 days ago

            No, PornHub doesn't have liability in a lot of cases because of 17 § 512, but has still had to deal with liability in general, which is why they nuked some 80% of their library not backed by verified individuals a while back.

            https://www.law.cornell.edu/uscode/text/17/512

            A huge part of 17§512 is the DMCA takedown process mainly in 17§512(c)(3). Does Microsoft even have the ability to truly remove training data from the model? Or do they have to retrain on each DMCA takedown?

      • Flimm 4 days ago

        I personally don't want to have to upload proof of identity to GitHub and a signed document swearing that I own the copyright to all the code I upload to GitHub, or proof that I coded it. We need to be careful what we wish for.

        • vorpalhex 4 days ago

          Excerpt from the MIT license:

          > THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

      • concordDance 4 days ago

        If they had a reasonable basis for believing they had a license they're in the clear. "I didn't know" might not be enough but "I had good reasons to think otherwise" is.

    • mrh0057 4 days ago

      I’m not a lawyer but my understanding these are torts so all you have to prove is Microsoft has liability. I think this would be easy to prove due to the way neural networks work since it’s just a way of performing a search.

      Since it’s a tort I don’t think you have to prove they should have know it would return copyrighted code, the fact that it does is enough to have liability.

    • jsiaajdsdaa 4 days ago

      That doesn't stop youtube from blasting people away over copyright issues?

      On youtube, video uploads are a cost center, whereas on github, code is a profit center

  • 542458 4 days ago

    IANAL. My understanding is that the general legal precedent in the US is that a) datamining text has no copyright implications (in the same way that reading a book has no copyright implications) and b) it is not a copyright violation to use a small amount of copyrighted material provided the context is sufficiently transformative. This might seem silly or unfair to you, but that is the current legal reality.

    But even ignoring that, everybody uploading code to GitHub has given GitHub the right to analyze that code as per the GitHub ToS. This is the same mechanism by which you can't upload code to GitHub with a license that says "nobody is allowed to display this code on the internet" and then sue GitHub.

    • aposm 4 days ago

      I can't imagine a scenario in which any lawyer would consider granting Github the right to "analyze" code anywhere close to granting Github the right to spit out that same code verbatim without your copyright notice (even if laundered by AI).

      • 542458 4 days ago

        Here's Kate Downing, an IP lawyer specializing in software license:

        > According to Downing, the answer depends to a certain extent on where that code is hosted. If it’s on GitHub, there very clearly would not be copyright infringement.

        > “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” Downing says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”

        Downing cautions that copilot output of large chunks of code complete with comments are more questionable to use, but that for the most part it looks above board.

        https://fossa.com/blog/analyzing-legal-implications-github-c...

        Here's an English lawyer on the same topic...

        > The licence is broadly worded, and I'm confident that there is scope for argument, but if it turns out that Github does not require a licence for its activities then, in respect of the code hosted on Github, I suspect it could make a reasonable case that the mandatory licence grant in its terms covers this as against the uploader.

        https://decoded.legal/blog/2021/06/github-copilot-initial-th...

        • Engineering-MD 4 days ago

          To me regardless if it is technically legal, it certainly doesn’t feel right. Furthermore, contracts rely on people understanding what they are agreeing to, and I don’t think many developers would agree to letting the code be used outside the terms of the license they uploaded it under.

          I am very surprised there hasn’t been a legal challenge to it.

          • mynameisvlad 4 days ago

            What, exactly, is there to challenge?

            “I’m sorry your honor I didn’t understand what I was signing” I don’t think has ever been a valid reason in a courtroom, similar to “I’m sorry I didn’t know I was committing a crime” is not a valid defense.

            • ghusbands 4 days ago

              Courts interpret the intended and understood meaning of contracts and terms all the time. Research the term "meeting of the minds" and case law around it.

              When the terms were written, it's exceedingly unlikely that they intended it or anyone understood it to be blanket permission to allow a trained AI to copy code for others and no user would have interpreted it that way. Microsoft/Github can't necessarily unilaterally increase the intended range without making it clear in the terms.

              If it got to a court case, and both sides could afford it, it could be a lengthy one.

              (This comment is not legal advice. I am not a lawyer.)

              • mynameisvlad 4 days ago

                How does "[allowing] a trained AI to copy code" change the interpretation of the ToS?

                By uploading your code, you give Github an exclusive license to use it to improve their services. Copilot is such a service. Just because it's an AI and it provides others code does not somehow invalidate the license you gave.

                • ghusbands 4 days ago

                  Again, research "meeting of the minds". It's a standard legal term directly relevant to all contracts and terms. Also, "transparency" is another important one.

                  Many online services have very wide terms around what they can do with your data, which most people who bother to read them interpret as being what is required for them to handle the service for you without breaking copyright law. In that context, being able to use and analyse your data to improve their services could be another catch-all that lets them do specific performance optimisation on their backend.

                  One party instead deciding they've got blanket permission to do whatever they like with your work, including selling it to others, may well not hold up in court.

                  Contracts aren't programs and one party tricking the other rarely works out in court - courts world-wide tend to rule against trickery and deception.

        • BaculumMeumEst 4 days ago

          > “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” Downing says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”

          That's assuming that all code on GitHub is uploaded in good faith by the copyright owner, which is not always going to be the case.

        • zerohp 4 days ago

          Many repositories on Github were put there by people that do not own the copyright and never agreed to GitHub's Terms of Service.

          Linux, for example, does not require copyright assignment. The original contributor of a change owns the copyright for that code and may have never used Github.

  • concordDance 4 days ago

    There's also one more question:

    5. Even if it is illegal, is it actually bad? No one can possibly sell code snippets, the transaction costs are many orders of magnitude greater than any reasonable price. In my opinion, at least in this case the benefits massively outweigh the costs and the law should not apply here.

    • xtracto 4 days ago

      I really, REALLY like the idea of Copilot. I think it is a glance at what the future of AI can bring to improve programming. I understand where all the litigation and "uneasiness" is coming from, both from commercial and open-source projects.

      I've not installed or used it for the same reason (don't want to use AGPL or GPLd code by accident, and don't want my closed source code to be used accidentally as well), but the thought of Copilot being "killed" due to litigation/copyright/licensing issues is sad.

      For me, It's kind of like when MP3 first appeared: Sharing music in Napster or downloading Mp3s from Geocities was just amazing. The idea of having such things at your fingertips. Even though I understood the issue the authors had with the unpaid distribution of their music... still, the idea of "what could be..." made it amazing.

      I guess Microsoft could be a bit forward thinking, and implement the "Spotify" model in code: Pay OpenSource developers (whoever owns the repo, or whoever made a commit?) a small amount whenever their code gets used through Copilot.

      I'm super excited by how "Copilot" related services will look like in 10 years. And I really really hope that the technology/idea doesn't get killed by litigation.

      • PaulKeeble 4 days ago

        Microsoft could have trained this on their own code and there would be no issue. The problem is instead of doing that they knew full well the approach would reproduce the code and they decided they would rather breach GPL than expose their own code. But I bet Microsoft has more than enough lines to train an AI, there was a clear choice to breach other peoples licenses in preference.

      • frazbin 4 days ago

        Huh... These comments have given me an idea: MS needs to be forced to train a model to compensate (pay) code authors and codebases based on snippet suggestions given by their tool: the Spotify model replacing Napster!

        • sirsinsalot 4 days ago

          See: Who Owns the Future by Jaron Lanier

        • Graffur 4 days ago

          The comment you replied to gave you that idea nearly word for word..

          • frazbin 3 days ago

            huh you're right! sorry I haven't been sleeping well lately :)

    • midasuni 4 days ago

      Some people won’t let you use their copyrighted work no matter how much you pay, that’s reasonable.

      By all means allow repos to opt in, although if it’s licensed under something like GPL there’s no way to convert it to non gpl without permission from every contributor. I for one am not interested in Microsoft or anyone else paying me to close my code.

      Allowing people to pay $xxx to copy my copyrighted work without my agreement is simple piracy.

      Either they international agreement to drop copyright as a concept, or obey the law.

      • concordDance 4 days ago

        > Some people won’t let you use their copyrighted work no matter how much you pay, that’s reasonable.

        Is it really? From a consequentialist utilitarian perspective?

        • midasuni 2 days ago

          Yes, otherwise you’re saying rich people can break the law with no consequences.

          The alternative would be to completely remove copyright, which would be ok.

    • citizenkeen 4 days ago

      Then the law should change. Saying "it's illegal but it's good/harmless" is a terrible stance.

      • anamexis 4 days ago

        Seems like an eminently reasonable stance, and exactly the stance you would take to get the law changed.

        • citizenkeen 4 days ago

          Fair. I had read "and the law should not apply" as "so we ignore it", not "so we change it".

    • kraf 4 days ago

      Of course it's bad. Noone who put up their work as open source wants some huge company taking it and selling it to get even more competitive advantage and influence in the world. And that's without mentioning the people who put that into their license pretty much explicitly. Taking GPL code and getting away with it is a failure of our justice system, and that can't be made right with throwing pennies at developers.

      • concordDance 4 days ago

        In this case it seems obvious to me that the huge time savings for thousands of developers outweigh the fact that the original writers are offended.

        A world with copiloted snippets seems like a better one to live in than a world without.

  • pwdisswordfish9 4 days ago

    Is there any leaked Microsoft code on GitHub? Someone should check if Copilot regurgitates that as well, then see how Microsoft reacts when someone slaps an AGPL license on that…

  • rifty 4 days ago

    It seems like Microsoft could be in the clear on the basis of it being essentially "search". But it also seems like anyone who uses it could be risking to a high degree getting infected with copyright violating code.

    My question is, if it isn't a copyright infringement issue to use copilot in its current form right now, why not just claim copilot was used whenever accused of copyright infringement hence forth?

    • solveit 4 days ago

      > why not just claim copilot was used whenever accused of copyright infringement hence forth?

      Without speaking to the particulars of copilot, this situation where laws seem toothless because of the ease of plausible deniability is actually fairly common. And in many such cases, the law is not as toothless as it seems, because

      1. Getting multiple people to stick to a script under oath is difficult and dangerous.

      2. Criminals frequently send each other messages like

      A: "lol I just crimed, hope nobody figures it out."

      B: "lol just say you used copilot".

      A: "lolol yeah fuck the law"

      Obviously this only gets the worst criminals, but there seems to be lots and lots of them.

  • Beltalowda 4 days ago

    > It is now proven that copilot returns code from codebases with non-permissive licenses [1].

    That same Quake example from last year is repeated every single time.

    Aside from the fact that GitHub has since added a protection for this, that this example gets repeated time and time again instead of a *list of examples leads me to believe this is (and was not) a common occurrence.

  • blihp 4 days ago

    1) Most likely

    2) TBD

    3) Not likely. Worst case a judgement will go against them, they'll effectively pay a fine and then they'll retrain it on a more restricted set of source code.

    4) OSS has a pretty tragic history re: enforcement. It wins nearly every skirmish but has no interest in the war so from a big picture standpoint, it loses due to apathy.

  • bastardoperator 4 days ago

    You don't think a mountain of MSFT lawyers in every state, including partner law firms around the world haven't thought about this? Do you practice law or are you speculating based on emotions?

  • throwaway23234 4 days ago

    Big meh. That quake code was MIT.

    • monocasa 4 days ago

      A) Public Quake is GPL. Just because someone else dumped it in an MIT library doesn't change that.

      B) MIT still requires attribution to not infringe.

antihero 5 days ago

I mean, if it's autocompleting a fairly simple line, and can do that because it's analysed a lot of lines, I don't really see that as "stealing anything".

If you are using it to write whole complex functions thatare the same as other people's, I guess that is copying.

But if you do the second thing you are not a great dev, and would have probably ended up copy pasting it anyway.

I think the first use case is far more common, and creating boilerplate that is so generic you could never really attribute it anyway.

  • rob74 5 days ago

    > But if you do the second thing you are not a great dev, and would have probably ended up copy pasting it anyway.

    If you do that on your own, it's your (legal) responsibility. If Copilot does it for you, it's GitHub's/Microsoft's responsibility.

    • pronik 4 days ago

      You are responsible for your tool use. That's the same discussion as with whether uTorrent is responsible for your torrenting copyrighted stuff or with Tesla's auto-pilot. You buy the tool, you are responsible for what you create with the tool.

      • stavros 4 days ago

        Napster was liable for copyright infringement.

        • pronik 4 days ago

          True, however, the users have been liable too. If my company gets sued because I used Copilot, it won't matter that much that the plaintiff also sued GitHub/Microsoft.

        • strictnein 4 days ago

          Napster's raison d'etre was copyright infringement.

          • stavros 4 days ago

            Which they were then liable for.

    • alkonaut 5 days ago

      > If Copilot does it for you, it's GitHub's/Microsoft's responsibility.

      Is this true? It hasn't been tried yet I assume?

    • Hamuko 4 days ago

      >If Copilot does it for you, it's GitHub's/Microsoft's responsibility.

      GitHub/Microsoft says that it's still your responsibility.

      >You should take the same precautions as you would with any code you write that uses material you did not independently originate. These include rigorous testing, IP scanning, and checking for security vulnerabilities. You should make sure your IDE or editor does not automatically compile or run generated code before you review it.

      I'm not really sure how am I supposed to go about validating that I can in fact use this code that the magical black box barfed into my IDE using a bunch of different weights.

      • dragonwriter 4 days ago

        > GitHub/Microsoft says that it's still your responsibility

        If Copilot is fair use, and has no restrictive license, than how is it anyone's responsibility?

        If Copilot isn't fair use, it's Microsoft's responsibility.

        (For copyright; for patent that's another issue, but you can violate patents by similarity without exposure or copying, anyway.)

        • Hamuko 4 days ago

          Training Copilot is fair use, using Copilot is ???.

      • nnoitra 4 days ago

        Just what a horrible shady behavior.

        Give us your money but you are responsible for the code that OUR tool generates.

      • lowercased 4 days ago

        Let MS buy BlackDuck scanner and integrate in to GitHub/CoPilot. They could then suggest code and also scan it for any license violations, and give you both sides of the equation in the same tool.

    • __warlord__ 5 days ago

      Why should be GitHub's/Microsoft's responsibility? No one is forcing you to use copilot.

      If I use grammarly, are they responsible for what am I aiming to write?

      • 32bitkid 5 days ago

        If I pay for grammarly, and it plagiarizes an existing work but represents it as an entirely new, independent work and I am unaware of the existing work that is being stolen, who is doing the stealing?

        • seanmcdirmid 5 days ago

          This makes more sense for text message auto complete: you just take the suggested next word after a one word start deed, it might reproduce a Wikipedia entry. But what did tub expect? The same would be true with grammarly if you somehow got it to produce a bunch of new text. You expected garbage, but somehow infringed on copyright instead. But I guess think the user deserves some responsibility in realizing their expected garbage output isn’t for some reason.

        • scotty79 5 days ago

          If you pay a shady character to get you a modern laptop for $100 you can't claim that you were unaware that it was most likely stolen and the fact that you paid for it something doesn't absolve you morally.

          • ClumsyPilot 5 days ago

            does shady guy have his name on the side of a building, and run ads: "buy my shady stuff" and then pay taxes and his earning? That kind of shady guy?

            • scotty79 4 days ago

              Sometimes. Like Amazon, widely known for their workers and 3rd party vendor exploitation practices.

              You can no more claim ignorance of where the github copilot code comes from than where the Amazon's low, low prices come from.

              Whether you care is totally on you regardless of whether you pay ir not. You pay for product or service not moral absolution.

              • ClumsyPilot 4 days ago

                Are thousands of amazon employees going to be in same docket with me?

              • nnoitra 4 days ago
                • scotty79 4 days ago

                  I see you just came from there. Welcome on HN. Please start by reviewing FAQ and Guidelines.

      • ClumsyPilot 5 days ago

        So it's my job to check my supplier, to make sure lines from co-pilot are legit.

        At the same time when fast fashion companies sell T-shirts made with slave labour, its not the company's responsebility to check what their suppliers are doing.

        And if tesla autopilot kills you and your family its not their fault either.

        Neoliberal morality - companies are never accountable for anything, it's heresy to suggest they should do their job properly.

        • wang_li 4 days ago

          Other than the first sentence nothing you wrote is true. If a company doesn’t do due diligence on their suppliers they face fines and possibly criminal charges. The news came out the other day that the NTSB is considering whether to require Tesla to recall all their vehicles with self driving enabled. Companies of all types face huge fines and civil liability for product safety issues.

          • ClumsyPilot 4 days ago

            have you never googled "slavery fast fashion"?

            Zara's clothes sometimes have notes in their pockets from people being held as slaves, pleading for help. I havent heard of anyone going to jail

            most of our electronic waste end up illegally exported to poor countries, again when was the last tomw someone faced the music for that?

      • purerandomness 5 days ago

        Does Grammarly gerate pages of content for you?

  • afiori 5 days ago

    The when Oracle won its copyright lawsuit against google it was because of a 8 line bound checking utility function.

  • alpaca128 5 days ago

    The first can be automated without ML though. And once you use ML you cannot guarantee it won't copy-paste existing code.

    This whole thing would be fine if GitHub hadn't just used all public code on their platform, ignoring all involved licenses.

    • rob74 5 days ago

      The problem is, if they had used only code with a license that allows copying without attribution, there wouldn't have been a lot of code left...

      • alpaca128 5 days ago

        Difficulty doing something legally doesn't justify breaking the law.

    • xupybd 5 days ago

      It changes the code for use. I'm not sure it can be considered a copy. It much like reading someone else's code and drawing ideas and patterns from that code.

      • alpaca128 5 days ago

        It has been shown often enough that Copilot can reproduce exact copies of snippets.

      • afiori 5 days ago

        Copyright sensitive environments are very careful not to do that.

  • dobin 5 days ago

    I neither see it "stealing". The neuronal network was trained with code as input. It's creating code as output. The output has nothing to do with the input once it is trained. Do people dont know how neuronal network work?

    It's like saying GPT-3 created text is copyright infringement, because some author used the same sentence in a book before.

    • ImprobableTruth 5 days ago

      Overfitting: One weird trick that copyright lawyers don't want you to know!

    • eloisius 5 days ago

      So if I fit a network to output entire chapters of a book when given the chapter number as input, I can print and sell copies of it that way?

      • dobin 4 days ago

        1) Copilot is not designed to output the source code for a project source file 2) It does not re-create the whole source code, just parts of it (sentences, not chapters) 3) The source code license, e.g. BSD, works on "the code" - copying a line like "void main(void) {" will not trigger it, obviously

    • f1refly 4 days ago

      1. Create a neural network that produces an x264+dts stream of a movie 2. distribute it 3. checkmate copyright lawyers

    • imtringued 3 days ago

      >Do people dont know how neuronal network work?

      I could ask you the same thing.

  • wodenokoto 5 days ago

    > If you are using it to write whole complex functions thatare the same as other people's, I guess that is copying.

    > But if you do the second thing you are not a great dev, and would have probably ended up copy pasting it anyway.

    How would I know that the boiler plate I ask copilot to write for me is copied verbertim from a codebase, that neither I nor Microsoft has licensed to use?

  • carom 5 days ago

    My problem is with the weights not being released. They are a derivative work of open source code in the most literal sense. The weights would not exist without those lines. Gradient descent is using literal derivatives.

coldtea 5 days ago

>Hector Martin: If you use Copilot, you are basically playing Russian Roulette that the random mashup of existing, copyrighted, hegerogenously licensed code that you get out of it qualifies as an original work, mostly by chance. Or that nobody will ever sue you otherwise.

Well, that's already the case with Stack Overflow copypasta enterprise code. If anything, use of Copilot would be an improvement...

  • tagyro 5 days ago

    Do people really copy/paste from StackOverflow?

    I feel this is more a meme, rather than reality. I do check StackOverflow, but never have I took an answer verbatim. I try to see if it's the same problem and what was the approach in deconstructing it, which I find more useful in the long run.

    • Flimm 4 days ago

      According to Stack Overflow's blog:

      "One out of every four users who visits a Stack Overflow question copies something within five minutes of hitting the page."

      https://stackoverflow.blog/2021/12/30/how-often-do-people-ac...

      • icoder 4 days ago

        Well, to be fair, most of that is probably just copying the a particular syntax or built-in function, which (I think?) has nothing to do with copyright.

        At least for me, that's most of the copies I do, followed by the ones that basically are 'call these functions in order', then paste it as a comment and use it as cheat sheet, and only very rarely I copy a 'creative' snippet almost verbatim, like a regexp matching email addresses, a to-hex or a crc calculation. And perhaps that's actually tricky.

      • tagyro 2 days ago

        1 out of 4 feels about right, thank you for the link!

    • coldtea 4 days ago

      >Do people really copy/paste from StackOverflow?

      All the time.

    • ldoughty 5 days ago

      I've done it, and I know others that have, but I think it depends on people's definitions of copy/paste.

      I've certainly copied a sort anonymous function from SO, it was one-liner. Is that copy/paste? or is it only copy/paste if it's X lines?

      Otherwise I agree, usually I just get hints and go my own way.

    • Timwi 4 days ago

      It depends on what you need. In most cases the code on StackOverflow is not exactly what you need, so you need to understand it in order to adapt it. But if you're looking for a specific well-defined algorithm (MD5, say) then you can just copy & paste it.

    • Aeolun 4 days ago

      This has more to do with the code never being immediately copy-pasteable, not so much my reluctance to copy-paste from SO for licensing reasons.

    • mullen 4 days ago

      I catch people using cut and paste code all the time. If there is a spelling error in code (Especially if it is in a code comment), I can guarantee you that someone copied and pasted it from StackOverflow.

    • concordDance 4 days ago

      Anecdata: Everyone I work with does.

  • t0suj4 5 days ago

    That quote applies to any creative work. Be it code, audio or video.

    • coldtea 5 days ago

      He talks about code, and Copilot works with code, so I'm not sure how it "applies to any".

      If you mean that if you make a "random mashup of existing, copyrighted, hegerogenously licensed" works of art (audio/video), it also applies that you might be sued for it, then yes.

      But that's not much of an issue with Copilot if you're using it for enteprise code that's already a mashup of copypaste "existing, copyrighted, hegerogenously licensed" and that you wont release and nobody will see anyway.

      Whereas audio/video you generally want to release.

      If you make them for your own consumption, then it's my response that rather applies: since nobody will see it, and you don't release/sell/circulate it, you can go ahead and mix Michael Jackson, Disney and Star Wars material - nothing will happen to you.

  • Hamuko 5 days ago

    If you post content on Stack Overflow, your contribution is distributed using the CC BY-SA 4.0 license.

    • coldtea 4 days ago

      Yes, but nobody that copies it cares...

      (Where nobody is a stand-in term, to mean "less than 1% of those do")

  • moffkalast 4 days ago

    > If anything, use of Copilot would be an improvement

    What do you mean, Copilot regularly pastes stuff directly from SO. One of those automatic doc generators was able to point me to the exact answer where one of them was from.

    • coldtea 4 days ago

      That it doesn't just "copy and paste" but does more involved "AI" mixing

      • moffkalast 4 days ago

        I don't think renaming variables and adjusting spaces holds up in court.

borishn 4 days ago

Copilot is fair use, get over it!

Copilot is not writing your code any more that Google search is writing your code. You are writing your code, and Copilot is just making suggestions.

US constitution secures limited copyright to "To promote the progress of science and useful arts". Copilot is just that, get over it!

  • Buttons840 4 days ago

    A good and well argued opinion made hostile by saying "get over it" twice! Saying "get over it" discourages further discussion. Your comment would be better without it.

    • borishn 4 days ago

      You are right, but it is so frustrating how people whine about this.

    • cududa 4 days ago

      Get over it.

  • nescioquid 4 days ago

    Not an expert, but fair use generally covers education, criticism, parody, and satire. There is a test for meeting fair use and it includes things like amount copied and commercial or non-profit interest.

    The amount copied from any particular source might be small, but an aggregate strip-mining of many copyrighted sources is an interesting twist. Another might be, as you suggest, it might be a machine that itself does not violate copyright, but has the effect of causing users (who accept the suggestions) to violate copyright.

    • collegeburner 4 days ago

      Google does the same thing taking snippets out of pages or even completely caching them so you can see the entire page from their servers.

  • brianmcc 4 days ago

    Wait till it suggests something Disney can argue they own rights to...

    • nojs 4 days ago

      You mean like DALL-E? This debate is going to get interesting when “in the style of” illustrations and videos go mainstream.

    • acuozzo 4 days ago

      LucasFilm → Pixar → Disney. I wonder if the mouse owns Duff's Device…

  • zerocrates 4 days ago

    Yes, the copyright clause gives as its purpose "the progress of Science," but that doesn't mean that anything which claims to be "progress" gets a free pass.

    • ajb 4 days ago

      Indeed, the US supreme court pointedly refused to accept that the purpose clause limits the power of copyright in "Eldred Vs Reno" (at least, that is my understanding as a non lawyer)

  • jazzyjackson 4 days ago

    Personally I think I'll just claim all the code I write with co-pilot is a parody.

  • humanwhosits 4 days ago

    Citation needed for copilot being fair-use

pen2l 4 days ago

Bit of a stretch to fashion AI-derived/AI-coauthored works as other people's work. Are DALL-E portraits done Picasso-style unrightfully selling Picasso's works? Is an individual selling portraits done Picasso-style unrightfully selling Picasso's works?

No, of course not. Joyce's literature was influenced by Ibsen, Mozart looked up to Haydn, Newton was humble enough that he openly professed he stood on the shoulders of his predecessors, Perelman refused the Millennium prize because it wasn't also offered to his colleague Hamilton.

All human innovation is iterative, and derivative. https://www.youtube.com/watch?v=jcvd5JZkUXY

Our skill doesn't grow in vacuums, without outside mentorship and guidance. There are areas where I am upset about the application of AI, but this is not one of them. Consider copilot a gentle guiding hand for those without access to a second pair of eyes nearby to give you reminders on what you may otherwise have on the tip of your tongue.

But in the way that Led Zeppelin refused to recognize how heavily their music was influenced by delta blues artist was unbecoming, I can accept the argument that it is perhaps douchey of Github to sit on Copilot as squarely their creation.

shireboy 5 days ago

I do feel these arguments are valid if a little overstated. Most devs have googled, found some code, and pasted it in without thinking about attribution. Doesn’t make it right, but it is a question of how much code is being copied and how specific. For example, I peruse open repos to learn - I learned about the spread operator in JavaScript that way- doesn’t mean every time I use it I need to attribute whatever repo I saw it in. But, yeah, if I copied a larger chunk and the owner wants attribution, probably wrong.

I like the idea of having the bot automatically update a attribution file if it detects it’s used licensed code. Seems like it would be fairly trivial. Also a robots.txt for repo owners to control automated use.

Also, they should totally pay back a portion of revenue to the community and support the repos used to train. That seems like it would be a good PR move if nothing else.

  • kachhalimbu 4 days ago

    I like this take. Copilot to me seems a glorified (very intelligent) auto-search-paste/autocomplete service. It is just mimicing what usual devs do which is to copy-paste code from StackOverflow/github for many mundane types of codes like for loops, mongo find queries, callback func definitions etc for JS devs for eg.

    The idea of auto-attribution if copilot surfaces licensed code is best because then it keeps the copilot user honest where the code is coming from and honor the original license.

    • teakettle42 4 days ago

      > It is just mimicing what usual devs do which is to copy-paste code from StackOverflow/github for many mundane types of codes like for loops, mongo find queries, callback func definitions etc for JS devs for eg.

      I’m genuinely disturbed to see how many people in this thread think that casual plagiarism is the norm for “usual devs”.

      • shireboy 4 days ago

        Again, I get the argument, just think it’s overstated. First, when referring to stack overflow and blogs, generally, that’s intentionally shared with the express purpose of people copying it- hopefully while learning from it at the same time. Second, again with some code bits it’s not really plagiarism any more than all iambic pentameter is plagiarizing Shakespeare.

        Devs often look at code to see basic syntax, understand algorithms, etc. There is absolutely nothing wrong with this. One should draw a line somewhere, but to say I need to attribute […somevar] every time I use it because I happened to see it one time on a blog post is silly.

        A thought experiment may help: Scrape Github for all unique strings longer than X and store in a file with a timestamp and owner. How large does X have to be before attribution is required? If not length, then how do you determine whether attribution is required?

      • ParetoOptimal 4 days ago

        > I’m genuinely disturbed to see how many people in this thread think that casual plagiarism is the norm for “usual devs”.

        I'm disturbed it is likely the reality.

      • Aeolun 4 days ago

        Dunno what devs you work with, but I’ve someone care literally never.

        None of the code I work on is public, so attribution is pointless in the first place.

  • Aeolun 4 days ago

    > Also, they should totally pay back a portion of revenue to the community and support the repos used to train.

    Aren’t they already doubling all Github sponsorship money?

    • david_allison 4 days ago

      Not doubled any more, but they don't take a cut, and pay the processing fees for you.

albertzeyer 5 days ago

So, how often does it actually happen? Does it happen more often than for a human? Does anyone actually have numbers on this?

Of course, if you provide already a copyrighted prefix, and it has seen that code, the chances are high that it would complete the copyrighted code (because that is what you actually would also expect).

So, for real use cases in the wild, where you write some own real novel code, how often would it suggest some copyrighted code? And how often would a human?

I have used Copilot the last months and I have never ever seen such a case (I can be pretty sure because all the identifier names are really unique, and the code was very custom).

However, I assume that I myself might have produced copyrighted code unknowingly because if you write common patterns (e.g. some tree or graph search, or some sort function, implement LSTM or Transformer, whatever), the chances are not so low.

Ciantic 5 days ago

I'm bit mixed on this, code Copilot usually autocompletes me is not particularly novel, it's just mundane stuff I would write anyway. Most of these snippets are not copyrightable in my opinion, because it was obvious in the first place. Like CSS nth-child odd / even logic, or one case it filled me ~10 lines JS logic of filtering rows by category stored in dataset, which I would have written anyway.

Then there are cases where it amazes me completely, it wrote 10 lines of C++ code for rendering a monochrome glyphs with bits using Freetype library. It though had odd subtle bug, the glyphs came reversed and it worked with only certain font size which it seemed to pick up from different file all together.

JacobiX 5 days ago

It’s the same problem with those ML models, the other day someone generated a children’s book using GPT3, turned out that there is a real children's book with the same name and a very similar content: The Very Lonely Firefly by Eric Carle.

  • bartq 5 days ago

    Other thing I'm worried about: how to retract facts from ML model? I guess it's impossible, you need to retrain from scratch with part X removed from training set. Or... people could invent layered ML models similar to docker - each layer would be marked what data it was trained with. Then at least you'd have some cache of trained model to re-use in next training session. Nasty stuff.

    • alpaca128 5 days ago

      Or instead of inventing complicated layered ML models Github could just use each repo's license information to decide what's okay to use. Detecting licenses is already a feature on that site.

      • afiori 5 days ago

        Many licenses requite attribution, which would be hard to track.

  • icoder 4 days ago

    Interesting, it's a big question I've had for a while, how 'original' stuff coming from these AI systems is, and also the distribution of uniqueness over many answers. I haven't dived into it yet, but I find it surprising how little this comes up when these systems are discussed (ie here on HN).

    Does anyone even know? Can we even check? What if 1 in a thousand, or one in a million outputs is (very close to) something existing? I find this especially relevant when generating faces.

parhamn 5 days ago

Pretty soon the world is going to come to realize art/creation is just blending, incrementing and repurposing prior art.

No book, painting, codebase, sonnet, design is theft-less.

The art is the space reduction, otherwise we’d just bruteforce away.

  • mihaic 5 days ago

    This type of argument always distracts from the fact that figuring out where we draw the line between theft and reimagining.

    The Magnificent Seven for instance was a reworking of Seven Samurai, but stands on its own as an original creation. Going into a cinema and filming a picture to later put on a torrent site is not artistic reworking.

    The hard discussion is about what is acceptable, we all know prior art exists.

    • ajuc 5 days ago

      > The hard discussion is about what is acceptable

      What if we just say "both"? Libraries were a thing for millenia and writers still wrote books. There are costs to IP laws and the benefits aren't obvious.

      • Veen 5 days ago

        As a writer, the benefits are quite obvious to me.

        • Timwi 5 days ago

          Convenient, isn't it?

          As a consumer, it's quite obvious to me too how it benefits only the writer/creator at the detriment of everyone else.

          • barthvr 4 days ago

            Because writing a book, shooting a movie, composing a song, takes time ?

            So either those pieces are IP-protected, and their author can make money with it, or we have to set up a basic income for everyone, and art becomes free.

            • regularfry 4 days ago

              It's perfectly consistent to say both that there needs to be a system to ensure creators are compensated and that the current system for doing so is terrible.

              • Veen 4 days ago

                It is consistent but useless if you have no suggestion as to what would replace the current system in a way that preserves the benefits to both parties.

                1. Creators get a sustainable reward for their work. They wouldn't do it otherwise. I certainly don't do it for fun.

                2. Consumers get to access that work as they wish.

                (Of course, this being HN, I'd expect any ideas to apply to developers as well as to writers and artists i.e. if writers have to give up copyright, so do developers, startups, and so on.)

                • regularfry 3 days ago

                  Keeping the benefits intact for both parties is a non-goal.

                  How about 14 year max copyright terms? Make copyright unsellable and uninheritable, so you don't get massive copyright hoarding entities that can distort legislation for their own benefit?

                  That's just two suggestions off the top of my head. I do get tired by false dichotomies.

        • js8 5 days ago

          Benefits of what? Of copyright enforcement, or of sharing?

          • bryanrasmussen 5 days ago

            the grandparent comment said the benefits of IP Laws were not obvious. So it is of the benefit of the laws as they currently exist, that implies enforcement of said laws.

      • meheleventyone 5 days ago

        Libraries pay fees to lend books, at least in our modern capitalist society.

        • jrochkind1 4 days ago

          Not in the USA, where the "first-sale doctrine" means once you buy a book, you can do whatever you want with that copy of the book (lend, rent, sell, destroy) without needing a license. Libraries in the USA definitely don't pay a fee beyond the purchase price of the book (or they can legally lend donated books etc). Copyright holders don't make any additional money from library lending.

          I am not familiar with how it works in other countries, but I have heard something about there being such a fee.

          (It's not quite true to say libraries have existed for "millenia" though, with regard to this issue. Mass produced printing hasn't in fact existed for millenia, libraries 1000 years ago had hand-copied manuscripts, probably mostly scrolls. The effect on "the market"? For whatever reason authors were writing then it was not to make money by selling reproductions of their writings, that wasn't a thing. Which means, yeah, btw, people still wrote things and made up stories even when they couldn't make money by charging people for copies to read...)

        • ajuc 5 days ago

          It was a news to me so I checked and it's true. Since 2016 in my country ;)

          And it's a symbolic amount for vast majority of authors (country-wide it's around 5-5000 USD per year per author and the distribution is heavily skewed towards 5 USD).

          So yeah :) I think authors were fine without these 5 bucks a year.

          EDIT cause it might not be obvious. It's not per library. It's per country.

    • Griffinsauce 4 days ago

      > This type of argument always distracts from the fact that figuring out where we draw the line between theft and reimagining.

      This seems to be missing a word, could you clarify?

      Also: since you mentioned theft: this actually comes down to the discussion whether you can own thought and/or digital artifacts which can be replicated without taking anything away from the "owner".

      Given the absolute choice I'd rather pick complete freedom than restriction. I suspect that anyone's opinion on this follows what they value higher: creation or exploitation.

      • mihaic 4 days ago

        Sorry, I should have double checked, that sentence was incomplete. Yes, I meant to say that a more nuanced approach is crucial, and that means rejecting that we have to choose between Disney-backed extreme IP laws or total freedom.

    • scotty79 5 days ago

      There are many differences between those acts of thievery or inspired creation however you might call it. But there are many similarities too. Fascination with the original is one. Desire to own it in one way or another is one too. Differences are in the skills, the means, the result, what was stolen and financial success that came out of the act.

  • izacus 5 days ago

    > Pretty soon the world is going to come to realize art/creation is just blending, incrementing and repurposing prior art

    If that happens, the big copyright/IP conglomerates will immediately jump on that and make sure that laws are adjusted and they get their cut of every single word and line anyone puts near their smartphones ;)

  • pera 5 days ago

    I'm not sure what do you mean by "theft-less" but I believe you might be conflating inspiration with derivative work: Copilot can produce verbatim copies of open-source code, this would make it more similar to how some musicians sample other people's music to create new music.

  • wnkrshm 5 days ago

    So the only thing left is handiwork I guess. Engineering isn't different from art in any way, the constraints are just stricter.

  • natly 5 days ago

    Unless every invention is gonna be AI generated (which is kind of a scary situation), intellectual property still needs to be a thing (otherwise people won't have incentive to invent, it'll just be stolen from them).

    • pydry 5 days ago

      People have an innate desire to invent and create. This is why so many people do it for zero extrinsic reward. Hell, this is the case for almost every musician. They are fed a pittance in streaming, only a bit more than most OSS developers get.

      This intrinsic motivation is more normally "farmed" by investors who capitalize and capture the IP value for themselves. This actually has a detrimental effect on innovation.

      Doing away with or watering down intellectual property protections will just take big meaty chunks out of the stock market and partly equalize wealth distribution.

      It'll probably spur innovation too - historically it usually has, but preserving the existing social order takes precedence over that which is why a lot is invested in persisting the myth that it aids rather than hinders innovation.

    • ModernMech 5 days ago

      > otherwise people won't have incentive to invent, it'll just be stolen from them

      Citation needed. Speaking personally, I spend most of my creative energy on a project which is open source and permissively licensed to the point where I’m fine with anyone stealing it. I expect to earn negative money from it at the limit.

      Why do I do it? I dunno it’s fun. Can’t that be enough?

    • Timwi 5 days ago

      It's remarkable how many people still repeat this unsubstantiated cliché.

  • Chris2048 5 days ago

    Is it really "just" that? Is there no original creativity in the choices (and skill) in the blending, and choosing what (and how) to blend?

    Would you describe a parody, or a critique/review, as equally without original merit?

  • Agamus 5 days ago

    This idea has been around for a while - why... "pretty soon"?

    And I'm sure I couldn't disagree with you more. Or are 'influence' and 'theft' the same now?

    • coldtea 5 days ago

      >Or are 'influence' and 'theft' the same now?

      They have been the same for most of history. People could openly copy titles, plots, parts, phrases, etc from prior work. Same for mechanical designs. The only thing preventing them was obscurity (e.g. the inventor trying to make it hidden) not any law or ethical idea that it's bad (there wasn't any). That's how things from math to gears to tunes got better (or changed over time, in the case of art, as better/worse is subjective there).

      E.g. globally and historically folk music has been basically taking whatever you want from tunes and songs where everybody does the same with no "permission" asked or needed to be given.

      Like 4 verses but want to add a fifth or change some part? Go ahead. Want to play it exactly like you've heard it? Go ahead again.

      The idea of "theft" in that regard came in the last 2 or so centuries, and was enforced with artificial legal barriers and new "ethical" concepts that are neither "natural", not present for the vast majority of history (including golden ages of art production).

      • Agamus 5 days ago

        Not sure why I'm being downvoted here - I agree that this idea has been the same for most of history.

        Your example of folk music is an odd one, for exactly that reason - it largely repurposes existing art. For example, Wagner wrote extensively about why we shouldn't respect folk music for this reason. I mostly disagree with him, but his comparison at least illuminates that this isn't so black and white. And that's really just scratching the surface of a complex topic.

        I sense that if someone came along 2400 years ago with the exact play that Sophocles had just produced and claimed they had just composed it themselves, immediately after a public performance, someone would claim that theft had occurred. Do you disagree?

        • coldtea 5 days ago

          >I sense that if someone came along 2400 years ago with the exact play that Sophocles had just produced and claimed they had just composed it themselves, immediately after a public performance, someone would claim that theft had occurred. Do you disagree?

          Yes. They would say it was "plagiarism", which is different than theft.

          And there was no law against either case.

      • trention 5 days ago

        Except that AI will not lead to "golden ages of art production" because nobody gives a sh*t about art created by AIs. And nobody will.

        • coldtea 5 days ago

          >because nobody gives a sht about art created by AIs. And nobody will.*

          You'd be surprised. Especially if people don't care/are told/whether it's "created by AI or not".

          Whether in "high art" or lowly pop, "generative music" (and fine art) has long been a thing. And people do attach to it (e.g. to Brian Eno's generative works made by rule based systems he programs).

          • trention 5 days ago

            No, I will not be surprised. Outliers are outliers. "Art" created by AIs will just have price (and cost) of ~0 and, like everything that has a price/cost of 0, nobody will give a sh*t about it. The only real question is how will human artists (provided they exist in your preferred dystopia) will prove that they have created something themselves.

            • coldtea 5 days ago

              >No, I will not be surprised. Outliers are outliers. "Art" created by AIs will just have price (and cost) of ~0 and, like everything that has a price/cost of 0, nobody will give a sht about it.*

              Art doesn't touch people because it has cost.

              In fact, for ages certain types of art had no cost - poetry, public festivals, and so on. And many still don't (e.g. free punk/underground/indie/etc public performances), Soundcloud music, and so on.

              Most movies and series seen on TV are also ~0 (and for kids, everything is ~0, as their parents foot the bill), but they're still touched by them.

              >The only real question is how will human artists (provided they exist in your preferred dystopia) will prove that they have created something themselves.

              Note the loaded words "your preffered dystopia" (who says whether I prefer it or not? I merely describe what's the case. You have some ethical/political point to make).

              As for the answer to the question, they wont have to. People respond to the quality of the work, not who made it (and whether they used AI or chance - another popular method - or not).

              In fact tons of genius artists have described themselves not as the creators but as "mere conduits", and say the music/words/etc come from "elsewhere" (implying god, some muse, some spirit, etc). Especially when they fell the most "inspired" (the word itself means "visited by the spirit").

              • trention 5 days ago

                None of those things had zero price and zero cost. The fact that the consumer didn't pay directly for them is irrelevant. You can try testing your theory by trying to sell a "painting" created by DALLE/whatever for more than a third-rate amateur painter can sell one of his. Good luck with that, especially when access to the model becomes easy.

                >People respond to the quality of the work, not who made it

                This is so painfully incorrect and naive (and contra anything we know about the value of everything which creation has been automated before) that I think it's meaningless to continue this conversation.

                • coldtea 5 days ago

                  >You can try testing your theory by trying to sell a "painting" created by DALLE/whatever for more than a third-rate amateur painter can sell one of his. Good luck with that, especially when access to the model becomes easy.

                  As if that proves anything? Sale price is irrelevant. There are paintings sold for millions that 99.9% of the people could not give less fucks for, and "amateur painter" stuff that touch most people who see them.

                  It's also not like a $2 million in production costs Michael Jackson song with $50M sales is "better" artistically (as opposed to commercially) than a song composed and played by some random guy on an acoustic for ~0.

                  >This is so painfully incorrect and naive (and contra anything we know about the value of everything which creation has been automated before) that I think it's meaningless to continue this conversation.

                  It was meaningless to begin with, as you don't discuss, you present your "ultimate truth" ("contra anything we know", lol).

                  In fact there are tons of works where the creator is anonymous (from folk music and art to early house, techno and rave music, a scene with cherished anonymity), and people respond to it just fine...

        • Nowado 5 days ago

          That's a lot of people to dehumanize with a single swift no true Scotsman.

      • js8 5 days ago

        > The idea of "theft" in that regard came in the last 2 or so centuries, and was enforced with artificial legal barriers and new "ethical" concepts that are neither "natural", not present for the vast majority of history

        This is true for other forms of property as well, like land ownership.

    • TremendousJudge 5 days ago

      The idea has been around a while, but the legal system doesn't reflect it.

      I don't think it will any time soon though.

spupe 5 days ago

If you assigned a task to a junior dev, and he/she used some code from open source projects and Stack Overflow to develop a custom program for the task, would you say that this person is selling you other people's code? Is it common or expected for this type of use to be acknowledged?

  • XCabbage 5 days ago

    People I've worked with have different philosophies on this, but personally, if you check in code that is distinctive enough that I can identify the source you copied and pasted it from, and you provided no indication (whether in a comment or a PR description) that you copied it, I will really get quite grumpy at you about it.

    Way too often I burn half an hour needlessly during review in one of two ways:

    * trying to figure out how the heck someone figured out some "magic" code that achieves something by invoking a bunch of poorly documented library or framework internals, and trying to reverse engineer WTF all the magic does by diving into the framework's source... only to eventually think to google the whole snippet rather than each individual method call, and discover it's copied from a Stack Overflow answer

    * trying to figure out why something was written in an unidiomatic or overcomplicated way rather than a more obvious approach, and commenting at length on how I'd simplify it... only to eventually realise it was copied from a Stack Overflow answer

    Attribution isn't just about making sure the right person gets credit, or about license compliance; reviewers and maintainers frequently need to be able to see where stuff was copied and pasted from in order to do their jobs effectively, even for snippets of just a few lines.

    • spupe 5 days ago

      I understand where you are coming from. However, I think you are making the assumption that this person simply copy/pasted some code with no understanding of it, or that this code is then very different from your codebase and needs to be refactored. If using Stack Overflow did not add to your overall development time but subtracted from it, because it was used as an appropriate piece of a much bigger puzzle - a far more realistic scenario for both Copilot and our general use of SO -, then I see no issue with it whatsoever. Certainly no moral or copyright issues as this person on Twitter implies.

      • thfuran 4 days ago

        No copyright issues in the sense that no entity is likely to ever pursue the matter, sure. But copying and commercially using someone else's nontrivial bit of code that doesn't have a license that says you can is quite blatantly a copyright violation.

  • genezeta 5 days ago

    About 10 years ago or so, I was working at a certain place. They put me into a small team apparently focused on some R+D project under the direction of an "architect".

    Basically, the project was to package Cordova + Backbone + Marionette, plus a couple of tools, under their own commercial name. Then they'd go around potential clients presenting it as the perfect solution to build hybrid applications for web/mobile/smartTV/whatever.

    A certain Monday, the "architect" arrived boasting. He did that often, but this time he was more boastful. He explained that he had spent the whole weekend coding. He had written an incredible tool that would create a skeleton for a project from zero. You would type something like `tool create` and it would create the whole project with all the scripts and some example views and whatnot.

    It was Yeoman's yo CLI tool, of course. He had just changed the copyright in the comments, removed most of the comments, he had deleted any mention to yeoman or the original creators, changed the name of the executable script and that's it.

    The whole thing was OS code picked up from various repos and packaged as their own. The company used it to sell development projects. The so-called-architect used it to sell himself inside the company and then jump away into a startup as CTO.

    Is this common or is it just anecdata? I don't know. It's clearly not the only time I've seen something like this and I do know that in certain companies around here it isn't exactly uncommon. But I can't say how common or uncommon it is.

    Would I call this "selling other people's code"? Yes, I would.

    • spupe 5 days ago

      This is clear-cut fraud, but it is also not even close to what Copilot or most junior devs are doing.

  • whatatita 5 days ago

    If the solution was made up of ideas from OSS and snippets from Stack Overflow? No; that's fine.

    If the solution was copied from an OSS project without proper attribution? Yes. Absolutely. And they'd have words with a senior dev and maybe even legal if the code they copied made its way into production without attribution.

    Many copyleft OSS licenses require attribution and distribution of derivative works that we wouldn't allow.

  • mbreese 5 days ago

    It depends on the source of that code and the expected license of the code you paid them for. If everything is MIT/BSD (and attributed), no problem. If the code was GPL and I’m making a commercial product, we have an issue.

    I’d also expect for any stack overflow code to include a comment with a link to the stack overflow page.

    I think one of the key points is to make sure any code taken from another source is cited appropriately. If it isn’t, or the junior dev is passing it off as their own work, then we have problems.

  • ben-schaaf 5 days ago

    If I found out a junior dev had been copying copy-left or proprietary code then I'd have to rip out that code, have a chat with them and figure out what to do from there. Even if the code isn't copy-left it's still someone else's code, sometimes that's ok but sometimes it's definitely not.

  • jhugo 5 days ago

    No matter how complex a program is, and no matter whether it uses techniques sometimes described as "AI" in its implementation, it's not a person. Copilot is just a very complex pipeline from other people's code to your editor, which ignores the license of those other people's code.

  • thelastbender12 5 days ago

    This is a good thought exercise. I wouldn't call it stealing, though I am not sure how legal liability is assessed, say if they picked up GPL code unknown to the company, and the company is later sued over it.

    This isn't derived from principled reasoning, but I think of it as similar to community norms. Not the best example, but you wouldn't mind someone subletting their homes to Airbnb, but if all of your apartment complex does it, it invites regulation. A product like copilot enables copying code (even if inspired, and not verbatim) at a scale that individual developers can't. So respecting software licenses needs to be codified (legally?) while previously it was left unmonitored.

  • trention 5 days ago

    It's absolutely fine to allow humans to do that while prohibiting (commercialized) AI to do the same thing.

    • spupe 5 days ago

      I don't see why that should be the case in this particular scenario, or what benefit is gained from that. Could you elaborate?

      • jhugo 5 days ago

        Could you elaborate on why you think a computer program and a person should be treated the same way in this respect?

        We can take as self-evident that a human is capable of reading about something, conceptualising it, and then writing something completely new with the knowledge they have gained.

        I think it's also pretty uncontroversial that the primitive "AI" we currently have is nowhere near the level of even an average human at these things, and thus we can't just blindly assume it is conceptualising rather than copying. Copilot regularly produces verbatim copies of existing code when working on non-trivial things.

        Forget about the "AI" label: Copilot is just a complex computer program, that takes code from other people and inserts various permutations of it into your editor, whilst ignoring the license of that code.

        • spupe 5 days ago

          I think it's best if we sidestep these big conceptual questions about what cognition or creativity really are. It's hard to find agreement, and perhaps it is not necessary to do so.

          My position is that if a person hired in a company can currently use Google, Stack Overflow and GitHub to help develop their custom scripts, and no moral or copyright issues are infringed (ie, you don't try to say you came up with it on your own, and you use only enough that it is clearly fair use), then I think an AI should be able to assist in that task. There is no need to complicate things by legislating what the AI is doing and what Google is doing, as they are very similar things and in fact even use similar methods.

          • jhugo 5 days ago

            I would agree with you if the AI was genuinely assisting with that task, but it isn't.

            It's taking inputs, ignoring their licenses, permuting them in ways that are not understandable to the user, and then outputting them.

            That's an entirely different task than the user reading SO or using Google and then writing their own code, because the "AI" is not capable of writing its own code at that level.

            Relying on this tool means ignoring the license of code that you're copying, without even knowing that you're doing it.

            • spupe 4 days ago

              > That's an entirely different task than the user reading SO or using Google and then writing their own code, because the "AI" is not capable of writing its own code at that level.

              I would say it's a very similar task. If I need to remember how to use a certain function, I can Google for documentation and examples, or I can tell Copilot what I want to do. The fact that the solution was presented by Copilot or a SO thread is, in my view, irrelevant. And to compound on that, I doubt anyone checking SO truly knows where that answer came from. The person could simply be reproducing a snippet from somebody else, you have no way of knowing if it was licensed.

              I don't think this is bad either. Even our current shitty copyright laws protect that kind of use. I shouldn't have to worry whether my little prime number generator uses an algorithm first created by John Carmack or Microsoft. Programming has evolved rapidly in great part because we can all use other people's work and use it to improve ours. Of course you shouldn't just copy and paste everything and call it a day, but that's hardly what Copilot enables anyway.

              • jhugo 4 days ago

                You really seem to be ignoring the core issue by focusing on SO though. Everything on SO is fair game, but code on GitHub is under a variety of licenses, and when Copilot regurgitates it, no matter how complex and inscrutable the process is that leads it to do so, it may be causing the user of Copilot to misuse that code because it doesn't even give them the opportunity to know where it came from or what license it was released to the public under.

                • spupe 4 days ago

                  Again, how does that differ from Stack Overflow? Do you go and check whether a given reply belongs to a licensed project?

                  Also, please consider that there is a toggle that allows you to block Copilot from using public code.

                  • jhugo 4 days ago

                    > Do you go and check whether a given reply belongs to a licensed project?

                    All SO questions, answers and comments are CC BY-SA. The terms of the site say that anyone submitting this content agrees that it's licensed that way, and when you visit the site you agree that you are provided with the content under that license. It's not necessary for you to check whether the submitter had the right to offer it under that license; that's their problem. The same goes for any content offered to you under a given license on any platform. I don't understand what your question has to do with the conversation.

                    The problem with Copilot, and I really can't believe this has to be restated over and over again, is that it takes code from projects with various licenses, and outputs it in your editor in various transformed-or-not-transformed ways (the fact that the transformation is extremely complex doesn't change anything), and gives you no way to know where the code came from, how it was licensed or how it has been transformed. So, despite the fact that if you use it enough you are virtually guaranteed to use code in contravention of its license, you cannot even know which projects you have stolen code from or which licenses' terms you are breaking.

                    > Also, please consider that there is a toggle that allows you to block Copilot from using public code.

                    Great. I'm sure its utility doesn't go down at all if you turn that toggle off...

                    • spupe 4 days ago

                      > All SO questions, answers and comments are CC BY-SA. The terms of the site say that anyone submitting this content agrees that it's licensed that way, and when you visit the site you agree that you are provided with the content under that license.

                      Have you ever read GitHub's conditions to know whether they also have the right to use your code that way, no matter how you decide to license it? I feel that you are overly focused on the legal part here, which I'm sure was handled by Microsoft's lawyers. I'm more interested in the question of principle.

                      No matter what the terms of use at SO say, anyone can give you an answer that is a copy of some code they don't own. You may consider that immoral, but I don't, not at the scope SO is used for. In addition, the vast majority of cases at SO and Copilot are not about complex functions being stolen, it's about some dumb code you would have found in 2 minutes of googling. What I'm trying to argue here is that if we are all cool with SO and think it's useful, there is no fundamental difference here. We never cared too much about licenses for boilerplate code, and I think we shouldn't start now.

                      • jhugo 4 days ago

                        > Have you ever read GitHub's conditions to know whether they also have the right to use your code that way, no matter how you decide to license it? I feel that you are overly focused on the legal part here, which I'm sure was handled by Microsoft's lawyers. I'm more interested in the question of principle.

                        I have, and there is not. Neither could there be — in many cases the person uploading code to GitHub is not the copyright holder — they are just doing something permitted under the license — and for a large open source project there could be thousands of copyright holders. A random person mirroring some source code to GitHub is in no position to negotiate different license terms on behalf of the copyright holder(s).

                        > No matter what the terms of use at SO say, anyone can give you an answer that is a copy of some code they don't own. You may consider that immoral, but I don't, not at the scope SO is used for. In addition, the vast majority of cases at SO and Copilot are not about complex functions being stolen, it's about some dumb code you would have found in 2 minutes of googling. What I'm trying to argue here is that if we are all cool with SO and think it's useful, there is no fundamental difference here. We never cared too much about licenses for boilerplate code, and I think we shouldn't start now.

                        I don't understand why you think a person writing an answer on SO and a computer program outputting some permutation of its inputs into your editor are the same thing. The person writing an SO answer is intelligent and capable of conceptual understanding, the computer regurgitating code without regard to its license is not.

                        • spupe 4 days ago

                          >> Have you ever read GitHub's conditions to know whether they also have the right to use your code that way, no matter how you decide to license it? > I have, and there is not.

                          At least one IP lawyer strongly disagrees, suggesting anything you host on GitHub is fair game [1].

                          [1] https://fossa.com/blog/analyzing-legal-implications-github-c...

                          > The person writing an SO answer is intelligent and capable of conceptual understanding, the computer regurgitating code without regard to its license is not.

                          From a copyright perspective, that is irrelevant. In fact I would think Copilot has more incentives to not infringe than a random SO user, who is very unlikely to be sued. I already argued in another post that in my view, from any perspective, it is also irrelevant whether it's a person or AI doing the same work Copilot does.

                          • jhugo 4 days ago

                            > At least one IP lawyer strongly disagrees, suggesting anything you host on GitHub is fair game [1].

                            The question is whether Copilot's users can use the regurgitated code without following the license terms, not whether Copilot was allowed to train their model on it. I agree it's likely fine for them to train the model, but the use of Copilot would seem to be a legal minefield.

                            A little thought makes it clear that an affirmative answer would be absurd. This would mean that using a simple tool (let's say `cat`) to make a copy of some code and subsequently ignoring its license terms is infringement, but if the software used to make the copy is more complex (or perhaps if it has the "AI" label stuck to it!) the same actions are not infringement.

          • simion314 5 days ago

            If I make a script and train it on Windows source code do you think MS will like it if I use that script on Wine ? I am sure MS will say the license did not allows it and your script transformations are not original, so GPL or similar license should be respected by Microsoft too.

            >My position is that if a person hired in a company can currently use Google, Stack Overflow and GitHub to help develop their custom scripts, and no moral or copyright issues are infringed (ie, you don't try to say you came up with it on your own, and you use only enough that it is clearly fair use),

            Only a judge will determine if it is actually free use, if you by change copied some super clever and unique code into your code base then I am sure a judge will not say it is fair use, copilot was proven it will do this(though MS said they put some IF-ELSE checks in the AI to prevent the plagiarism to be detected by removing obvious results and maybe obfuscating stuff more).

            Maybe Stack Overflow license allows you to copy paste the answers in your code, but GitHub code has repo specific license that you need to respect.

            If MS trained the model on all their private repos too and made the model free software then many would not have this issues. Or keep the model proprietary and train it only on the MS repors, BSD and similar licensed repos.

          • trention 5 days ago

            You are saying that the AI should be treated the same way as a person would regarding its 'output'. I disagree. This is a conceptual disagreement and you cannot just sweep under the rug "what cognition or creativity really are".

            At the end, when in several (2-5) years we start seeing structural unemployment emerging because of AI deployments, this will be resolved by the legal system, most likely by some sort of partial prohibition of training/monetizing such systems.

            • spupe 5 days ago

              I think I still have not understood your argument. Are you saying that you are afraid that AIs will become too powerful and cause unemployment, and therefore we should regulate them now before they do so?

              Many people are worried about this, which is why there is a lot of debate about minimum income programs. However, at present, what Copilot is doing is similar to what Google does, and it is certainly not going to replace devs any time soon. Personally, I think we should exploit technology to its fullest, and the only reason we can have this conversation is because in the past, we haven't given too much consideration about the mailmen, secretaries, delivery workers and everyone else who got displaced by our use of the internet and similar technologies. We merely adapted to better exploit them.

              • trention 5 days ago

                I am not saying (in that last comment) what should happen, I am saying what will happen. Past automation in terms of impact is nothing compared to what's coming and people and lawmakers will react accordingly - not in favor of the automators.

        • nl 5 days ago

          Copilot understands concepts as well as may humans. You can see primitive versions of this in the old Word2Vec demos showing how those models understand how London:England ~= Paris:France

          Copilot is much more sophisticated than that, and it no more copies code than a human does. It generates on a character by character basis given the contextual probability of the next character conditioned on the previous set of tokens with the "heat" being a factor how how randomly it will choose characters.

          This is much more similar to how a human writes than "copying".

          • jhugo 5 days ago

            "it no more copies code than a human does" < that's a very big call right there, considering how much verbatim copying has already been documented in Copilot. The primitive understanding Copilot has of what it is generating doesn't even approach that of the most average programmers. It's classic AI: impressive on the surface.

            • nl 5 days ago

              This isn't true.

              All the "copied code" I've seen is where the person prompts it with a large amount of very unique preamble and then it fills in the exact example they are quoting from.

              Try it without doing that.

              And it's weird people think it can't understand conceptual relationships. Word2Vec demonstrated that nearly 10 years ago and that's a much weaker model in terms of both size and techniques than this is.

              • jhugo 5 days ago

                > And it's weird people think it can't understand conceptual relationships. Word2Vec demonstrated that nearly 10 years ago and that's a much weaker model in terms of both size and techniques than this is.

                Saying that Word2Vec or Copilot have "understanding" of their input requires a redefinition of the word "understanding".

                • nl 4 days ago

                  What's your definition?

captainbland 5 days ago

If we're all standing on the shoulders of giants (specifically code that other people wrote) then really what Copilot is selling is a ladder to get onto those shoulders faster. I think that's a legitimate aim, as such. However it should be careful about not including unlicensed code and should have a specific 'GPL' option for a model trained with GPL code included.

I suppose it should also generate appropriate copyright notices to satisfy many open licenses. I'd be surprised if copilot could actually link back to the original code like that, though.

noisy_boy 4 days ago

Say, I want to write a getter method like below:

    String getName() {
        return name;
    }
Let us also assume that this snippet, unsurprisingly, has been in several copyrighted repos that didn't grant Github the right to share this code.

So I start tying "getName" and copilot suggests the exact snippet above. If I use this snippet, is it plagiarism? Even though the above code is the most "obvious" way to write this getter and I would have written it this way even without copilot's suggestion? Or does the "uniqueness" or "non-trivial quantity" of the suggestions have any bearing in determining copyright violation? How/where do we draw the line?

  • warkdarrior 4 days ago

    Clearly your code could be improved with some `Factory` objects and some dependency injection!

  • glouwbug 4 days ago

    Lucky for you if you, if you wrote a noise function that copilot returned as an implementation of Perlin noise you'd be breaching a _patent_! Said patent just expired a 20 year run, so you'll be okay this time!

mojuba 5 days ago

Can I suggest a hypothesis that if you find Copilot useful it means the problem you are solving is a boring one? I might be wrong of course.

  • alpaca128 5 days ago

    I disagree. Most large projects, software or otherwise, use existing parts. If you design an innovative device you'll still use some standard components like chips, memory modules etc.

    There's already a way to quickly solve the boring parts in development - libraries which were built and licensed around that purpose. But Copilot passes you code of unknown origin, with unknown license terms and no information about how close it is to an existing codebase. It's like a person trying to sell you Macbooks for a hundred bucks per unit but you don't know where they came from and who made the holiday photos stored on the harddrive.

  • alkonaut 5 days ago

    99% of the "problems" I'm solving when I'm working even on very interesting and challenging problems, are boring subproblems. If I can get those out of the way then that would be great.

  • viraptor 5 days ago

    The most interesting problem will have extremely boring bits. If you write a cli tool to solve all of world problems by changeling magic, you'll still need to add the parameter handling and do some error management. Which is repetitive and likely well generalised and predictable based on other projects.

  • mistercow 5 days ago

    That hypothesis is easily disproven by spending an afternoon on a side project with Copilot.

    No matter how interesting your problem is, translating it into code is going to involve a lot of grunt work. This isn’t just boilerplate, but also the large portion of your code which is going to be gluing things together.

    The time you spend working through those menial parts of your code is time when the context of the interesting part of the problem fades. Once you get the mechanical stuff out of the way, you have to load the interesting stuff back into your brain.

    This is where AI coding tools really shine. They dramatically reduce the intervals between when you can think about the actual problem you’re solving by letting you get the boring mechanics out of the way more quickly.

    • mojuba 5 days ago

      I'm very curious to see some examples where Copilot autocompleted something truly useful and saved you time - and that also disproves my hypothesis that you are doing something boring or with the wrong tools/languages/frameworks. Things that a non-ML autocomplete could do don't count.

      • mistercow 4 days ago

        I can give you an example of an entire (well, I still consider it alpha) library I wrote several months ago, using Copilot: https://github.com/osuushi/triangulate

        This is an implementation of a 1991 paper on polygon triangulation into Go. So the deepest thinking about how to solve the problem was obviously already done for me, but there were a number of edge cases that I had to invent my own solutions to, and the translation itself involved keeping a lot of context in my head.

        I can’t tell you in precise detail what Copilot did, and what I wrote by hand. I wasn’t taking notes or recording my screen. But there’s a reason you don’t see a lot of blocks in there where I forgot to comment anything, because my entire process for this was “type what I want to do in English, and see if Copilot will generate the next snippet, or something close”. I didn’t do this out of bloodyminded dedication to the AI cause, but because it continued to be an extremely effective way to get the code written quickly.

        I can give a few specifics:

        - My linear algebra is rusty, and Copilot was extremely helpful here. I would often just type the basic thing I was trying to do in pretty vague linear algebra terms, and it would generate the formula.

        - I wrote a lot of tests like this https://github.com/osuushi/triangulate/blob/main/internal/sp.... This is a minor thing, but those aren’t copy-pasted. Instead, I would write the first test, and for the most part, I could just type something like `func TestConvertToMonotones_SquareWithHole`, and it would figure out how to adapt the previous test automatically.

        - It generates exactly the error strings I want based on context an enormous percentage of the time.

        I want to stress that I’m just giving a few examples of things that I specifically remember because I talked about them at the time, not characterizing the majority of the experience of using Copilot. The majority of the experience of using Copilot is that you write comments, and then the things you were about to type appear on the screen before you have to type them.

        • ilikehurdles 4 days ago

          When I find myself writing comments of this style I see, I usually ask myself if this thing would be better extracted into a function. These comments are primarily stating the obvious.

          If I find myself writing a 200 line function with nested or repetitive loops I expect to hear from colleagues about how I should refactor it.

          I feel that the solution to writing boring, repetitive boilerplate shouldn’t be to automate writing more of it, but to reduce or remove it entirely. Seeing things like this just reinforces my preconception that Copilot acts in low quality code environments to produce fittingly low quality code, or with languages like Java where the language is married to boilerplate.

          • mistercow 4 days ago

            This reply feels pretty bad-faith. But you know, feel free to open a PR if you have something concrete you feel can be improved.

  • triknomeister 5 days ago

    99% of work in 100% of interesting projects is boring.

  • para_parolu 5 days ago

    The problem may not be boring. Typing boilerplate code is. I work on games as hobby. Sometimes I implement mechanics requiring vector math. Working on mechanics is interesting. Writing down math is not. Copilot helps with later.

    • mojuba 5 days ago

      Then another hypothesis: you probably haven't found the right tools for it yet. I find myself writing biolerplate mostly around some obscure system framework calls (iOS/macOS), but that's rather rare. But even OS API's and frameworks do evolve over time into requiring less boilerplate. Just take the evolution of CoreAudio, the modern Swift interface is so much better. So at the end of the day it's about the tools and interfaces: boilerplate is rarely absolutely necessary with the right tools.

      • triknomeister 5 days ago

        Maybe github copilot is the right tool.

        • mojuba 5 days ago

          I don't think so. A human-verified, tested and maintained code is obviously superior to a snippet blindly copied and mixed by a statistical system.

          • mistercow 5 days ago

            That’s not how you use Copilot, any more than it’s how you’d use any other autocomplete tool. I don’t know why so many people seem to think that using Copilot is just closing your eyes, hitting tab fifty times, and then committing.

            You work on your code, Copilot makes a suggestion. You read that suggestion and verify that it’s close to what you were already going to do. If it is, you hit tab, then you tweak it. There’s nothing blind about this process.

      • dx034 4 days ago

        I'd rather use Copilot than including yet another library just to avoid writing one function.

  • workingon 5 days ago

    Seems like a narrow vision. Is every line of code you write to solve a problem “not boring”? I solve problems I find interesting, but writing matplotlib code to visualize data never is.

  • trention 5 days ago

    This is true for the current iteration of the model. Probably won't be true at least to an extent in 5 years. Besides, there is nothing wrong with solving boring problems. Not everyone can be Bjarne Stroustrup.

  • muzani 4 days ago

    Yeah, it's for boring problems. Drawing a circle or detecting a specific format of number in some string, for example.

habibur 5 days ago

We stand on the shoulders of giants. That had been the way for decades. A newer stack over the older one without much thought. And someone in the future will build even a newer stack over the current ones.

dgb23 5 days ago

Is it smart enough to:

- respect attribution

- respect copyleft

- respect proprietary licences

- give the user appropriate hints about the above

Or does it just copy code without doing any of this?

  • spupe 5 days ago

    No, it doesn't do any of that. However, it does not "copy code" except in marginal use cases, the far more common scenario is that it will suggest you very basic code that is akin to a Stack Overflow reply.

    • dgb23 5 days ago

      I read a lot of open source code and might subconsciously absorb techniques and patterns that are common. When I write code I might be influenced by what I read, not line per line, but rather generally.

      Is it like that?

      • spupe 5 days ago

        Kinda, but I think you are imagining something bigger than it is. At least in my experience, it works well for simple stuff like "iterate over x and extract y" or similar queries that I imagine are well represented in its training data. When you get to very specific functions, its answer will be less reliable and more likely to be a wonky rehash of the few examples it has for that case.

bborud 5 days ago

My personal reasons for not using copilot are a bit simpler. I believe the act of researching which solutions to use for a given problem is not so much about time, or the code you end up with, but about developing a better understanding of what you are doing. You may end up just cutting, pasting and modifying a piece of code you found, but hopefully, you were exposed to a few different ways to accomplish the same thing, and it made you aware of other choices that could have been made.

You could think of the evolution of practical problem solving in software engineering like this:

1. I have to invent a solution (because nobody else in the world has a computer) 2. I have to know of a solution (education, word of mouth...) 3. I have to look up a solution in the books I have (commoditized knowledge) 4. I can look up solutions on the internet <-- (we are here) 5. The computer suggests something and I accept (some are here too)

From 1 to 4 the amount of cleverness required to solve small problems drops a bit, but your productivity and exposure to knowledge probably goes up.

I'm not quite sure what happens from 4 to 5. Personally I'm actually more interested in the context solutions are presented in than just the solution. In fact, I rarely copy and paste code from the Internet, but I often look at multiple suggestions/solutions and then borrow ideas or combine ideas from several sources.

  • Yenrabbit 4 days ago

    At least the way I use it, it's not taking much away from my problem solving. It's just that instead of having to type `particlesGeometry.setAttribute('position', new THREE.BufferAttribute(positions, 3))` I just write `//Add as an attribute` and then hit TAB, since Copilot is smart enough to see that I've just prepared some geometry and populated an array of positions (both operations also sped up by not having to type the obvious bits). You're still having to think through the solutions (I'm not just typing '//make a cool particle sim') but no longer need to hit SO every few minutes for syntax examples when using a new library or something.

    • tartoran 4 days ago

      So you’re just getting a helping hand doing code plumbing. That sounds good as long as you let the helper take the lead

    • ModernMech 4 days ago

      That sounds like a problem that could be better solved through language and library design rather than an AI that sucks up all the code in the world.

      • williamcotton 4 days ago

        And yet after all of these decades, after countless advances in libraries and languages, I am still writing boilerplate in C, JS, Python, et al.

        I’m not sure that a language or library can ever understand the context of code without following an ML approach.

        Languages and libraries will always allow for more than the immediate task at hand. The innovation is that this tool understands which specific language or library features are probably going to be needed next!

        • aembleton 4 days ago

          Frustratingly I've had it insert Java code into Kotlin.

  • ok123456 4 days ago

    It replaces a few google searches to look up how to do something with a new language or library. Keeping you in your editor and from having to context switch, and possibly distract/derail you, is worth it.

  • kraftman 4 days ago

    I would be interested to know how many people are actually using copilot to generate entire chunks of code that they don't understand. For me it's just autocomplete on steroids, its not answering any questions I don't know the answer to (other than syntax ive forgotten), it's just making the boilerplate faster to write so I can think about the actual problem I need to solve.

    • tartoran 4 days ago

      Not using copilot but if I did Id use it in the way you expressed as well, just for plumbing and tedious stuff.

tremon 5 days ago

I might start considering Copilot if Microsoft were to train it on their own internal codebases (Windows, Office, SQL Server). Until they do, it's clearly a "tool for thee but not for me" type of situation.

  • clircle 4 days ago

    "tool for thee but not for me" <- what does this even mean?

HumanReadable 5 days ago

Sorry for the unproductive tone of this comment, but there's something about the attitude of this tweet that really grinds my gears.

Any time someone invents something new and incredible, there's always a crowd of negative nancies eager to discredit and explain why the invention is nothing new and a detrement to society.

I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

'co-pilot just sells code other people wrote' is such a ridiculous understatement of what co-pilot does. Instead of marvelling at the human ingenuity that went into creating it, they sneer at the audacity of openAI to do something without first asking their permission.

  • meheleventyone 5 days ago

    They own their code and it either has a license for use or is implicitly rights retained if not. If Copilot regurgitates their code, from a project that is public but with a non-permissive license they are having their IP rights violated so are totally correct in being unhappy about that.

    Just because you’ve made something cool doesn’t give you the right to harm others in the process.

    If MS or OpenAI don’t think this is the case then they should have also included their private repositories.

    • zarzavat 5 days ago

      The entire point of a fair use right is that you don’t need the copyright owner’s permission to be able to exercise it. Fair use allows you to do things that the copyright owner doesn’t like.

      Is fair use on a massive scale still fair use? Courts generally think so, otherwise Google would have been out of business a long time ago.

      • meheleventyone 5 days ago

        Is this fair use? I don't think that's been established yet. And if it is why didn't MS and OpenAI train it on their private code repositories? Fair use for thee not for me isn't very in keeping with the spirit of that claim.

        • komadori 4 days ago

          Gosh, can you imagine if they had trained it on their internal source code repositories and it constantly suggested using Hungarian notation for your variables? ;-)

        • jimnotgym 4 days ago

          I sometimes read people's open source code on github and use the ideas from that to develop my own ideas. In fact sometimes I copy and paste short passages and then rework them. I also employ a team of people who may do the same. Is that fair use, yes of course it is. Is co-pilot automating that fair use, I would say so.

          • nirvdrum 4 days ago

            Many people would claim what you're doing is a derivative work. I'm not sure the "of course it is" is very clear-cut (at least in the US). I've worked at big companies that have lawyers that care very much about this topic and what you're describing is prohibited. But, maybe it's different if you're not distributing your source.

            • zarzavat 4 days ago

              > I've worked at big companies that have lawyers that care very much about this topic and what you're describing is prohibited.

              They are doing this to make sure that any lawsuit can be easily dismissed. It has nothing to do with the legality of the action (which sounds like fair use as the parent described it), and everything to do with the expense of a potential lawsuit compared to the cost effectiveness of simply telling developers “don’t do that”.

              Most people think that the law has two shades: lawful vs unlawful. But the more practical distinction is expensive lawsuit vs dismissed lawsuit. This is the lens through which corporate lawyers see copyright and it might explain why so many programmers think that copilot is “obviously” breaking the law and “stealing” their code.

              • nirvdrum 4 days ago

                If the usage was very clearly fair use, there'd be no need to be defensive about it; the case could be dismissed trivially. In reality, the question would need to be sorted out in court.

                Questions of derivative works and fair use come up fairly frequently even in the open source world. This isn't solely a question of corporate lawyer posturing. I don't know any copyleft authors that would be okay with someone copying & pasting their code, making trivial changes, and saying it isn't a derivative work. Of course, their understanding of the law may be flawed. You'll get to find out in court.

                You're right. A lot of this boils down to how much you want to spend in court proving your usage is just under fair use. We've moved beyond the question of ethics if you're intentionally violating a project's source license and relying on fair use to do whatever you want with the code. If you want to poke someone with a stick, you can't be surprised when they hit back. I contend what the OP described isn't clearly fair use (note I'm not saying that it clearly isn't fair use either). It ultimately doesn't impact me because I'm just not going to copy & paste code from projects without attribution and following the license, but I'd be worried about anyone reading that comment as objectively true.

          • grayfaced 4 days ago

            Or alternately, "I sometimes listen to other people's songs and use those ideas to develop my own. In fact sometimes I copy and paste short melodies and then rework them."

            Courts have held that it doesn't apply to music, why do you think different rules apply to code?

            • zarzavat 4 days ago

              Courts are definitely aware of the need to protect the creative process and that no song is truly “original” in all aspects. e.g. the Katy Perry case[0]

              Songs are different from code, in that the “hook” that makes the money may be only a few seconds long. There are many creative choices that a songwriter/producer can fit into just a few seconds: the harmony, melody, rhythm, lyrics, timbre, effects, ...

              Whereas for code, the space of creativity is limited by functional considerations. A creative choice is protected by copyright but not all choices that programmers make are creative. Often the choices are limited by the API/interface or by efficiency considerations and it turns out that there’s only one good way to do something.

              A function may be very intricate, yes, while still containing almost no creative value (e.g. a Vulkan setup function). Music doesn’t have an equivalent to this - the placement of every note is a creative act.

              [0] https://en.m.wikipedia.org/wiki/Gray_v._Perry

          • aahortwwy 4 days ago

            Microsoft's internal policies don't allow their employees to do this without legal approval.

            • aaaaaaaaata 4 days ago

              So they don't ask.

              • leereeves 4 days ago

                I think aahortwwy's point was that Microsoft won't permit their own employees to do what Copilot does.

          • zzo38computer 4 days ago

            > I sometimes read people's open source code on github and use the ideas from that to develop my own ideas.

            Yes, I too, and probably many people will do.

            > In fact sometimes I copy and paste short passages and then rework them.

            This I usually don't unless I check the license first. (Everybody ought to be allowed, but sometimes the license might not be.)

          • jcelerier 4 days ago

            What you are doing is very certainly illegal

        • matharmin 4 days ago

          For public repositories, whether copying small parts of code is considered fair use is just a copyright question.

          On the other hand, if you copy from private repositories, it quickly gets into the territory of stealing trade secrets.

        • jimnotgym 4 days ago

          Just because there has not been a test case yet does not make it illegal! If MS think it is fair use then they are free to go ahead. Business is all about recognising and assesing risks like this.

          • tremon 4 days ago

            And even if there had been a legal test case, that does not make it moral! If people think this is socially wrong then they're free to argue their case. Business is all about ignoring ethical quandaries if it gives them an edge.

            "Microsoft does it, therefore it must be right" does not a sound argument make.

            • aaaaaaaaata 4 days ago

              > Business is all about ignoring ethical quandaries

              No, businesses are — not business. Not necessarily...

      • jrumbut 4 days ago

        I don't think releasing a commercial product that copies people's code without complying with the license is anywhere near fair use.

        Also, the open source community has far less leverage to apply pressure to Google than it does to GitHub. We may be able to do something about this.

        • CrazyStat 4 days ago

          > I don't think releasing a commercial product that copies people's code without complying with the license is anywhere near fair use.

          The whole point of fair use is that the license doesn't matter. You can have a license that says I'm not allowed to use what you wrote for any purpose ever and I can still use it under fair use.

          • jrumbut 4 days ago

            Yes but among the four factors that are used to evaluate fair use claims are whether it is being used commercially (it is) and how it affects the market for the thing that was copied (it clearly would since one way code is used is being imported by other code, if Copilot didn't insert my code into the new app, they might very well use my open source project that provides the same code).

            • CrazyStat 4 days ago

              I wasn't staking a position on whether Copilot is fair use, just pointing out that fair use doesn't care about license.

              That said, copilot itself is not a replacement for your open source project that it was trained on. The code it generates may or may not be, but that's probably not Github's problem as far as copyright law is concerned.

          • Longlius 4 days ago

            IANAL but fair use is primarily about the public interest. What public interest is served by allowing proprietary software vendors to copy GPL code that's reserved for the commons?

            I don't really think this argument passes muster.

        • pmarreck 4 days ago

          > I don't think releasing a commercial product that copies people's code without complying with the license is anywhere near fair use.

          It's just automating the copying and pasting (and slight reworking) of boilerplate code that would normally take me much longer to do, especially when I am working with a language I'm less familiar with but is necessary for my stack. I've literally never seen it suggest code that is more or less almost exactly what I would have come up with given a lot more time. In essence, it eliminates tedium- exactly the point of all of programming: Work elimination.

        • RHSeeger 4 days ago

          It seems fairly similar, at least to me, to a search engine copying snippets of other people's web sites and displaying them on a page. Admittedly, there's still some discussion as to whether or not _that_ is fair use, but I think enough of the population think it is (with many news organizations disagreeing).

      • izacus 4 days ago

        Unless Copilot is "commenting" or "parodying" the code you've wrote, it's not fair use. Copying and using the code in another project sure as heck IS NOT fair use.

      • bayindirh 4 days ago

        An automated system will devour all my code, which is under a case-tested copyleft license, and regenerate its parts in any place, without respecting the license terms, and call it "fair use".

        I have two questions:

        1. Why have licenses, then?

        2. What if I just use leaked sources of closed source software and call it fair use?

        • zarzavat 4 days ago

          1. Why have licenses, then?

          The default under copyright law is that any substantial copy is infringement.

          A license is a legal document that grants someone permission to use a work that they otherwise would not have had.

          However the law also gives its own permissions to use a work - it defines what is unlawful infringement and what is lawful fair use.

          The code snippets that copilot generates look more like fair use than infringement. They are small, adapted to the destination context, and usually not direct copies of one source but more of an average of many different sources. And usually the programmer does not keep the suggestion that copilot suggests unmodified - the programmer does their own editing of the snippet afterwards to further tune it to the surrounding context.

          2. What if I just use leaked sources of closed source software and call it fair use?

          As pointed out upthread, if it the source code is leaked then there may be trade secret protections. The GPL specifically allows the code to be posted online, so by design it is not secret.

          • bayindirh 4 days ago

            > As pointed out upthread, if it the source code is leaked then there may be trade secret protections. The GPL specifically allows the code to be posted online, so by design it is not secret.

            The reverse maybe true. I may be GPL'ing a code to prevent a useful algorithm from being buried deep inside a commercial code with an incompatible license. What makes it a "trade secret" level code? I have a 25 line algorithm which is worthy of its own paper. What if I open its reference implementation with AGPLv3+?

            I have no problems with you reading the paper, and implementing it. I don't obfuscate my papers, but I put the implementation out with AGPLv3+. You can't use that in a codebase with an incompatible license. I expect and want you respect the license of my implementation.

            > The code snippets that copilot generates look more like fair use than infringement. They are small, adapted to the destination context, and usually not direct copies of one source but more of an average of many different sources. And usually the programmer does not keep the suggestion that copilot suggests unmodified - the programmer does their own editing of the snippet afterwards to further tune it to the surrounding context.

            Emphasis mine. First, there's no consensus on fair use, yet. Second they may be direct copies of the code. Third, they're remixed with other code pieces, which makes it a derivative work of many code pieces at once, then lastly, programmer re-derives the derived work. Which is clearly a derivative of GPL code, which brings in GPL license with itself (if what copilot derives the code from GPL licensed repositories, which it does).

            I have no problem with Copilot as a technology. I have no problems with other licenses, which are not breached when used by Copilot and derived and used. The point which makes my blood boil is copilot using this GPL corpus, and don't admitting it publicly, breaching the terms of GPL en masse, and outright ignoring it. Then feeding this GPL derived code to any and all projects which pay for a copilot membership, and calling it a day.

      • akagusu 4 days ago

        When co-pilot reproduce substantial parts of someone else code without respecting the license terms, it is not fair use,it is just a disguised license abuse.

      • kybernetikos 4 days ago

        > otherwise Google would have been out of business a long time ago.

        I do think there are ethical questions around whether it's right for google to digitise physical books without the permission of the authors, and keep them on their servers and make money from them without recompensing the authors. That's something an individual would not get away with doing, so it seems wrong that it's OK for google.

      • dkersten 4 days ago

        Fair use is quite narrowly defined though. This doesn’t look like fair use to me, especially when its been shown that copilot does, at least sometimes, spit out code that is completely unchanged from the source material, without advising the user of any license requirements (most permissive licenses require at least attribution).

        The SCO vs IBM lawsuit was over only a few lines of code, after all.

        I cant use a derivative of Mickey Mouse in my product, even if I change his colour and give him a hat, even if these changes were made by an AI. Why would it be different for code? I cab only use Mickey Mouse as fair use if its done for a specific barrow set of proposes (satire, news reporting etc).

      • Hamuko 4 days ago

        What about us that are not Americans?

        • zarzavat 4 days ago

          Then you need to check the laws in your country. But that is nothing new to copilot. Copyright laws vary significantly from country to country.

          • rurban 4 days ago

            Not really. They are mostly the same across countries: https://en.wikipedia.org/wiki/International_copyright_treati...

            There are just minor deviances, not relevant to this case, such as how long Disney bullied the countries to protect a work.

            Software is usually considered a work. The AI needs to know if has permissions to copy and use the code, and then offer derived work on the proper terms and conditions. copilot doesn't do that. It might copy GPL code into non-GPL code, thus violating the GPL license, thus being an extreme risk.

            • tzs 4 days ago

              What are examples of Disney getting countries to extend copyright terms?

              In the US there have only been two extensions of copyright terms since Disney came into existence.

              The first was in 1976, as part of a major overhaul of US copyright law to update the previous law (from 1909) to take into account the large changes in technology since then, and to make US law work more like the rest of the world to pave the way for the US later joining the Berne Convention. The changes for Berne compatibility included longer terms.

              I assume Disney did support this, but only because as far as I can tell it had pretty widespread support. It had enough support that it would have passed even if Disney had adamantly opposed it.

              The second was in 1998, and that was specifically a term expansion (as opposed to a term expansion like that of 1976 that was a side effect of harmonizing US law with the rest of the world). Europe had expanded terms a few years earlier, so the 1998 change in the US might have been motivated at least in part by harmonization, but I don't think the differences in terms between the US and the EU would have been enough to get it passed without some major interests pushing for it, so it is probably fair to give Disney a good part of the credit or blame for this one.

              • rurban 4 days ago

                I was referring to the extension in 1976 from 28 to 50 years, and the subsequent extension in 1998 to 70 years, which everybody agreed upon that both were on Disney's request (hence its name "Micky-Maus-Schutzgesetz"). Other lobbying partners were the George Gershwin heirs and the Movie Industry (Jack Valenti).

                Here you see the countries which did not extend it the 2nd time to 70 years. https://en.wikipedia.org/wiki/List_of_countries%27_copyright...

                There was of course no widespread support for these extensions, as all its arguments were flawed and not only violated logic but also several constitutions. https://de.wikipedia.org/wiki/Copyright_Term_Extension_Act#G... (the en version is mostly cleaned on these counter arguments)

      • lupire 4 days ago

        "on a massive scale" is one of the legal definitions of unfair use.

    • jillesvangurp 4 days ago

      I'm sure the MS lawyers thought long and hard about this and are patiently awaiting any actual lawsuits with confidence in their position. It would be very hard to prove ownership of any snippets. To the point where you can argue that it is just fair use and to the point where companies would think long and hard before committing any resources to fighting MS on this in court at great expense.

      I don't think that will happen but it might be interesting if it did.

      • amelius 4 days ago

        It will stop being fair use when someone makes an AI that creates cartoon characters based on the figures in Disney movies.

      • vlovich123 4 days ago

        MS is unlikely to be sued here because the infringement claim wold be against their users and my guess is the license indemnifies them against you suing them for defects in the tool you use (ie use at your own risk and if you get sued you agree you won’t sue us).

      • aaaaaaaaata 4 days ago

        > companies would think long and hard before committing any resources to fighting MS...at great expense

        This is the end of Microsoft's actual calculation.

    • core-utility 5 days ago

      Do we have any evidence that copilot doesn't check/filter by license?

      • meheleventyone 5 days ago

        One of the (ex?) programmers from Valve managed to get it to spit out parts of the Source engine verbatim. He posted a Twitter thread yesterday I believe.

        • mustyoshi 5 days ago

          Does that prove it ignores licenses or does that imply the source engine exists verbatim (minus licenses) multiple times on Github?

          • meheleventyone 5 days ago

            If it's minus a license then it should be assumed that rights are retained (in the same way you can't just take ownership of an image you find on the internet) so if it were filtering it shouldn't take code from repo's without explicit and favorable licenses. If it is taking code only from repo's with permissive licenses (e.g. MIT) then why aren't they following the attribution requirements?

            I don't think you can have your cake and eat it on this one.

            • moffkalast 4 days ago

              If I steal some code and put it on Github under MIT that doesn't really make it MIT, I'm just lying that it is. If Copilot then uses that it's still in violation of the law I'd assume (ignorance doesn't exonerate you etc.). So they'd have to verify on a case by case basis, which they obviously haven't given the volume of data they had to feed the thing.

              It's kinda shocking that they think they can sell this, even providing it for free is extremely sketchy but at least complies with BSD/GNU/CC licensed stuff I guess.

              • lupire 4 days ago

                Why do you think that the recipient is responsible for verifying that no one else has copyright of code they recieved under license?

                Is every product user liable when a vendor ships some stolen code?

                • Closi 4 days ago

                  > Is every product user liable when a vendor ships some stolen code?

                  The user would be unlicensed, and in lieu of the vendor resolving this then the user would need to purchase licences to continue using the software legally (ie if a vendor gives you a pirate version of photoshop, you can’t just use it forever just because someone sold it to you).

                  There are usually clauses in enterprise software agreements that attribute liability for unlicenced components to the vendor for this reason. But ultimately if there isn’t a contract or the vendor vanishes, the user will need to go get a licence.

                  If you want to test the theory, I’ll send you a few images to put on your website, and when you get a claim through from the copyright owner you can try to argue that I sent it across without a copyright notice so I am liable ;)

                • ryukafalz 4 days ago

                  > Is every product user liable when a vendor ships some stolen code?

                  No, but the difference is the users of a product are typically not making and distributing copies. That’s not the case if you use someone else’s code in your project.

              • Hamuko 4 days ago

                And especially with such blanket statements as "the code you write with GitHub Copilot’s help belongs to you".

          • Closi 4 days ago

            It would prove that it doesn't honour all licences - just because the source code exists on Github without a licence doesn't automatically grant a licence to Copilot from a legal perspective.

          • micromacrofoot 4 days ago

            just because someone else ignored the license doesn’t mean github is free to blindly vacuum that up

        • leakbang 5 days ago

          Can you post the link to that?

          • meheleventyone 5 days ago
            • dekhn 4 days ago

              3 lines of fairly generic code?

              That's not what copyright is protecting.

              • meheleventyone 4 days ago

                Just for the record I was providing some evidence to support this question: "Do we have any evidence that copilot doesn't check/filter by license?"

                • dekhn 4 days ago

                  I mean, even if the license was placed on the code, that doesn't mean, if it's not protected by copyright, then it's fair game for copilot to scrape, learn from, and emit variations of, the code.

                  I believe github's lawyers would have had hundreds of hours of dicussion about this and at this point, they believe they are in the right, and anybody who disagrees should use the legal system to resolve the matter.

                  In the meantime, what it is and isn't doing wrt licenses seems to be poorly understood externally.

      • samatman 4 days ago

        This is in fact impossible.

        All they could do is filter by the LICENSE file in the repo.

        Unfortunately for them, by law copyright and license are determined by the authors and merely represented by a LICENSE file, which could be lying about both.

        The court isn't going to accept that excuse when this goes to trial.

        • gjadi 4 days ago

          And you can have multiple licenses in the same repository, folders with copyright exceptions, etc.

          It's hard enough for us human to find our way in this mess, I've little hope for an AI.

          But maybe it's just the first step. The final step being able to sell an AI that understands Copyright management. I'm sure there is a big market for that.

          • mroche 4 days ago

            I feel like a few guidelines and standards could help simplify a baseline process:

            1) Require each repository to opt-in to be learned from.

            2) Require any source file used for learning to have an SPDX license heading.

            3) Have a list of approved permissive licenses to avoid any proprietary or copyleft arguments.

            Using SPDX headings as the explicit guide would solve the problem of different code content using a different license within a project. An example being QtWayland: the client pieces are Proprietary/LGPL/GPL, whereas the compositor parts are Proprietary/GPL. That's not something you'd know from the license files at the root of the project (and post-6.3 they use SPDX instead of the prior license template heading).

            Granted, this doesn't solve the problem of the chain of trust (is the individual publishing the code truly the copyright owner), but I think it would be a basic start for a program like this. The opt-in nature would make things... difficult, but I think that's a fair trade-off for something like this.

            • gjadi 4 days ago

              Yes a standard would probably solve the issue.

              But until lawyers push for a standard that would make this part of their work irrelevant, I can't see how it could happen :)

        • mnd999 4 days ago

          And that is why this project should never have made it past the brainstorming session.

      • bayindirh 4 days ago

        There was a tweet by Nora Tindall (which is deleted) having a screenshot of a mail direct from GitHub stating that GPL code is included in the training of the Copilot and will indeed use it.

    • Zambyte 4 days ago

      > from a project that is public but with a non-permissive license

      Permissive or not doesn't matter. Public Domain or not is what matters. Permissive licenses still require you to propagate the copyright notice, which Copilot strips.

    • causi 4 days ago

      Unfortunately the way IP law works, at least in the US, is that you can use essentially whatever you want as training data and it's up to the user to make sure none of the generated code violates licensing agreements.

      • SahAssar 4 days ago

        If that's the case then GH/MS should at least disclose that for the code generated to actually be legal you have to hunt down the actual source (will be hard in a lot of cases) and check the license against your own license.

      • monocasa 4 days ago

        Can you point to case law backing that up?

        • causi 4 days ago

          Sure.

          https://jtip.law.northwestern.edu/2021/05/28/copyright-issue...

          However, even if infringement occurs during machine learning, training AI with copyrighted works would likely be excused by the ‘fair use’ doctrine.[ii] For example, in Authors Guild v. Google, Inc.[iii], Google had scanned digital copies of books and established a publicly available search function. The plaintiffs alleged that this constituted infringement of copyrights. The Second Circuit held that Google’s works were non-infringing fair uses because the purpose of the copying was highly transformative, the public display of text was limited, and the revelations did not provide a significant market substitute for the protected aspects of the originals.

          • monocasa 4 days ago

            That's training for search to lead to a full copy of the original work with citations, not training for regurgitating verbatim chunks of copywritten works to be incorporated at scale into other copyrighted works while obfuscating their original source.

            The Second Circuit's tests listed in your citation specifically fail in this case. It's not highly transformative since it's just regurgitating snippets to be used in other competing works rather than applying the body of works to a different domain. And it's specifically to provide a market substitute for the protected aspects of the original works.

            Additionally, none of this says 'its all great and it's on the user to figure it out'.

            • causi 4 days ago

              In the US copyright violation is a strict liability statute. Regardless of whether or not a court directly confirms or denies Microsoft's right to use code in that way, the end developer is still liable for whatever he or she uses.

      • lupire 4 days ago

        Did you just make that up? Github is distributing the copied code to users.

        • monocasa 4 days ago

          They did make it up; their cited case law says nothing of the sort.

        • causi 4 days ago

          Did you just make that up?

          Unfortunately not. It's really stupid.

    • wowokay 4 days ago

      I think you might be missing the point of their frustration.

      Lots of companies do not put their code in public repositories, granted I understand the perspective of violating a license, but the point is if you don’t want your code used by someone else (even with the risk of not getting credit, don’t know why that matters) then don’t make your repo public period.

      To that point, what’s to stop GitHub from making a policy that states: “All public repositories will be utilized in AI training”?

      • ryukafalz 4 days ago

        > even with the risk of not getting credit, don’t know why that matters

        The point is that it’s not respecting the license, not just that it’s not giving “credit”. If I release code under a GPL license, I damn well don’t want someone using that code under a license that’s not GPL-compatible, no matter how it got there.

    • drexlspivey 4 days ago

      Owning code snippets sounds ridiculous to me, like can I own this snippet?

          def average(*numbers):
              return sum(numbers)/len(numbers)
      
      if not is it because it is too small? what’s the minimum line number that ownership kicks into? what if I change the function name and the variable names?
      • bayindirh 4 days ago

        If that's under a copyleft license, I can't just copy & paste it under my non-copyleft licensed code and call it mine.

        That's as simple as that.

        • sidlls 4 days ago

          It can’t be that simple. The function in the GP is not an original idea and is far too simple to merit protection just by slapping a license on it.

          • bayindirh 4 days ago

            I don't expect, or support, licensing that small amount of code, and suing everyone to oblivion.

            The point I'm trying to make is if something is under a copyleft license, you can't copy and paste it verbatim to something non-copyleft. It's what the license says.

            Also, to be pedantic, the function I'm commenting on is pure maths, and you can't license/patent mathematics.

            On the other hand, if there's some magic sauce of doing something, let it be 25 lines, what will you say? It's just 25 lines, so you can't license it? To be more pedantic, I actually have an algorithm, which is around 25 lines and does something novel. I've published a paper on it.

            If I license the reference implementation with AGPLv3+, and you use it and close it, and if I can't go after you, what's the purpose of the license?

            You can read the paper and try to implement it. It's free in that regard.

        • williamcotton 4 days ago

          It seems rather silly to me that such small innovations would be worthy of legal protections under either a copyright or copyleft license.

          Isn’t there already precedent in other forms of IP, such as chord progressions in music, sentence length in literature, etc?

        • dekhn 4 days ago

          Copyleft isn't really a good example. Let's talk about copyright. That fragment of code is not copyrightable on its own. Too small, too trivial.

          • bayindirh 4 days ago

            Let's say I have 25 line function which does something novel and can be published as research (which I did, BTW, no joke), and I opened its reference implementation with AGPLv3+.

            Is it again too trivial?

            • drexlspivey 4 days ago

              is 25 lines the limit then? do you count comments? can I codegolf a few lines to get below the limit?

              • bayindirh 4 days ago

                > is 25 lines the limit then?

                I don't know. That's my function's length.

                > do you count comments?

                No comments, no blank lines.

                > can I codegolf a few lines to get below the limit?

                You bet. But, if you copy my reference implementation, you need to get the license as well.

                However, the research is on the open. Read it, implement it. That's no problem.

                But, CoPilot is not reading my paper. It's reproducing my function verbatim, which is under a license which has share-alike mechanics.

        • trasz 4 days ago

          Not really. You can't copyright a trivial snippet, same way you can't copyright headers.

          • bayindirh 4 days ago

            I've provided a more realistic and logical examples in this thread, please refer to them.

      • cupofpython 4 days ago

        if you write it yourself, it's fine. if you directly copy it from somewhere you arent allowed to copy from, then it is wrong.

        There are no rules about the form of the code itself that governs whether or not someone owns it. Common sense applies. Sure you could "steal" very small, common, code snippets and get away with it; but that doesnt make it less wrong.

        When a commercial entity explicitly does it, however, some times we can catch them. Like if they do it through algorithms that we more or less know how they work - i.e. the algorithm is using advanced control flow logic to copy and paste from it's training data set and copyrighted material is in that data set

        • drexlspivey 4 days ago

          Point is that you can ask 100 programmers to write an average function and probably most of them will come up with this answer verbatim. How can copyright law handle this? There is also the opposite problem, I can copy a complicated snippet and change the variable names. Am I absolved from liabilities now?

          • cupofpython 4 days ago

            If they come up with it on their own, it shouldnt be an issue. Likewise, swapping the variable names does not absolve you from liability.

            Copyright really is not only concerned with what exactly is on the page, but also how you got there, and where the knowledge came from to get you there.

            What if I read your codebase, and then years later while programming for myself I inadvertently use solutions you came up with while thinking I came up with it myself?

            There really are no hard set rules, and this is something that is handled on a case-by-case basis based on whether or not a convincing argument can be made that you copied a novel idea from someone else and claimed it as your own.

            We can argue the semantics of it all we want, but the subject area is an active battleground. Typically it only matters when money starts to get involved, since no one usually presses the issue or gets involved with random personal projects. So when an enterprise level company leverages that lack of caring into a proprietary pay-to-use project that operates by copying and pasting code from copyrighted material, then it seems like a case might be able to be made for it.

      • ipaddr 4 days ago

        Someone trademarked the word THE yesterday and a few common musical notes and your video gets banned

    • nojs 4 days ago

      It doesn’t really “regurgitate code” all that much in practice though. It’s a super impressive product and these arguments seem more like people looking for an excuse to hate new, scary technology.

    • jstummbillig 4 days ago

      In light of this potential new paradigm it's bewildering how people still manage to focus on the license of training material as if it even moved the needle in this context, even a little bit.

      OSS knights: THE LICENSE.

      MS: Aight, I guess we have a few lines of hq src to help out with…

      Github: Same.

      Other OSS people: We really don't care one way or the other.

      As long as the word of the lincense was upheld for another 2 weeks before it ceased to matter for the rest of all time.

      Jesus fucking christ. People. I get that oss licensing is dear to the collective hn heart – but, at best, it's completely irrelevant in regards to where this will inevitably lead, regardless of current questions/issues with license violations. You can (if all the repos of MS and Github are not enough to train this thing on, which is a laughable idea) even fucking buy additional source code if that's what it takes to strengthen Copilots legal foundation. The cost is insignificant. People will be happy to sell for super cheap. It's a non issue.

      Why do you wilfully choose to be distracted instead of facing and thinking about the future together?

  • lin83 5 days ago

    > Instead of marvelling at the human ingenuity that went into creating it, they sneer at the audacity of openAI to do something without first asking their permission.

    Something being cool doesn't exempt it from discussion of its ethics and certainly doesn't exempt it from legal consequences. Often what people call "disruption" is often just exploiting resources/people/their work in unsustainable ways until oversight is introduced.

    If CoPilot is copy/pasting large amount of code with unknown licenses, that is a large and real risk for users aside from violating open source projects licenses.

    • moffkalast 4 days ago

      Moreover it's a genuine danger for non-hobbyist developers since you could be including stolen code into a market product.

      Even including something banal like Linux is already problematic since it's GNU licensed, which by extension makes your entire project GNU licensed and you can't keep the exclusive rights to it.

      • ryukafalz 4 days ago

        Just to clear this up, since I’ve heard this a lot before:

        > since it's GNU licensed, which by extension makes your entire project GNU licensed and you can't keep the exclusive rights to it

        This is incorrect. Including GPL code in your product cannot automatically relicense your code. It’s just a copyright violation if your product’s license isn’t GPL-compatible and you don’t abide by the GPL.

    • leereeves 4 days ago

      > Something being cool doesn't exempt it from discussion of its ethics and certainly doesn't exempt it from legal consequences.

      Indeed. The heist in Ocean's Eleven was cool, but it was still theft.

  • nextaccountic 5 days ago

    > I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

    Because they shared the code under a license, and they have the right to complain if people use that code but don't follow the license.

    For example, what happens if Github Copilot spits a copy of some copyrighted code verbatim? Is laundering open source code through a machine learning model a loophole for not having to follow the license?

    Often following the license is as simple as giving credit to the original author.

    • bborud 5 days ago

      I've done a fair number of technical due diligence projects on acquisitions and potential partnerships, and on some project I've hired outside firms to analyze the code and figure out its origins and what licenses apply.

      There are tools that will analyze a codebase and identify where chunks of varying size seem to come from. Mostly to determine if the code is encumbered by problematic licenses, but also to detect where the programmers may have borrowed code from.

      If memory serves, some of these companies also have closed source codebases in their database, enabling them to detect if unpublished code has been re-used.

      The times I've used this in due diligence it has rarely been a deal-breaker when we do find large chunks of code that may be problematic. For instance due to licensing terms that are not acceptable. You just make a note of it and have them rewrite the code before the transaction can take place. (Or you figure out if you can accomodate the license terms).

      • nextaccountic 4 days ago

        Yeah, but wouldn't it be great if the tool that performed "AI-generated code" were also required to run such analysis themselves, to eliminate this licensing violation at the moment it were inserted?

        It's as if Microsoft were banking on the fact that most violations will be unnoticed

  • highwaylights 5 days ago

    This seems disingenous.

    People don’t have a problem that AI is being used in some form to provide the service.

    The complaint is pretty clearly that code is being lifted from repositories without attribution or compensation, and being redistributed into other applications.

    How impressive the work behind copilot is or is not really isn’t relevant.

    • tiborsaas 4 days ago

      I've made use of a ton of open source tools and have not paid any attribution or compensation. By made use of, I mean I used them as their intended purposes and not their source code. I have a FOSS OS, server, CMD tools and libraries powering my ideas, it's part of the deal that I don't have to pay.

      If I modify them I know what I have to do, but Co-pilot is somewhere in-between the two, it's abstracting knowledge from these codebases. We don't yet know how to deal with it properly, but this will change with time, that's why having these conversations are important.

      I think that AI models will gain a new legal state, whatever they learn will be considered original work if it's not repeating non-trivial work 1:1.

      • moffkalast 4 days ago

        > it's not repeating non-trivial work 1:1

        But that's basically all copilot does? It's just a fancy compression system with a search function.

        • tiborsaas 4 days ago

          No, it customizes the snippets to your context, the code is synthesized and not looked up in a db like a web search engine.

          • ay 4 days ago

            I tried it for the first time today, so treat this with a grain of salt.

            https://twitter.com/ayourtch/status/1539928018138931200 is my experiment. The code in question has a very specific format - it’s C with a lot macro sauce. I described the intent in the comment and pasted the includes lines. Then I started the #define of a unique looking token, and it added the lines with the correct boilerplate. What you see in gray is more boilerplate that it suggests when prompted.

            I would dare to assert that “xxxayourtchtestxxx” is not going to be in anyone else’s code than mine.

            So you can see the example of copilot generating completely new code.

            Not saying it’s 100% of what it does - but this side looks very useful.

            I also did a test with Rust: described a function canonicalizing MAC address, and then when it saw ![test] prompts, it started to make very passable unit tests for the function which was not even written yet - it was only the comment of what it would do.

            Also a massively useful lever to have, if it can do so consistently.

            My attempts to make it generate a bug-free canonicalization function didn’t work - but it was interesting to see it try different approaches based on the existing test code (and no, they didn’t always satisfy the tests, unlike one would expect :)

            So this angle is “pair programming with a creative novice”, which also can be useful - it can give ideas to explore that you didn’t think of.

            Of course this was all fairly trivial code, I do not know yet how it will behave in a more tricky situation.

          • moffkalast 4 days ago

            But it kind of is when you think about it. Network weights are just a db written in an incomprehensible format and the synthesis part is searching and converting it back to readable data.

            Even if it changes the var names and formatting a bit, it's still at best highly derivative. And at worst it spits out the exact code verbatim.

            • tiborsaas 4 days ago

              > Network weights are just a db written in an incomprehensible format

              That makes all the difference IMHO, its complexity makes it much more than "just a DB". The synthesis part takes into account the context also, so it does intelligent things automatically, a smart SQL query does not.

              My brain also works kinda like this. My knowledge is encoded in an incomprehensible format and I convert my knowledge into code based on the problem at hand.

    • csee 5 days ago

      This is how it always works, though. Moderna is standing on the shoulders of centuries of cumulative human knowledge without compensating all the sources of that knowledge. Musicians learn from other musicians and imitate to an extent, which is why all the musicians in a genre sound very similar, and we don't see present day rappers compensating the previous generation of rappers.

      This is where some modest taxation comes in. To reallocate a slice of the output of value creation to its actual source in a rough kind of way wherever more direct compensation isn't feasible.

      • cycomanic 5 days ago

        > Musicians learn from other musicians and imitate to an extent, which is why all the musicians in a genre sound very similar, and we don't see present day rappers compensating the previous generation of rappers.

        You clearly don't know how copyright around sampling works. Yes rappers are paying shitloads to previous generation musicians for samples they use.

        • csee 5 days ago

          Sure, if we're talking about sampling, which is analogous to co-pilot copy and pasting chunks of code verbatim (which we've seen happen). But the complaints about co-pilot go far deeper than that. Quoting from the tweet: "it just sells code other people wrote". Do musicians "just" copy from all the people they've been inspired by and learned from?

          • cycomanic 4 days ago

            What does "inspired" mean in the context of a computer program?

      • jacquesm 5 days ago

        Yes, but those humans are humans, not machines. With machines the scale changes dramatically. Which, incidentally is something copyright law has addressed explicitly: if you mechanically transform at best you end up with a derived work.

        • csee 5 days ago

          I don't understand the difference between Co-Pilot on the one hand and Moderna (on the shoulders of medical research) or SpaceX (on the shoulders of physics knowledge and cumulative rocket engineering knowledge) on the other. They all heavily use technology, automation and machines. I don't see where the distinction is coming from, and if there is a technical legal distinction, is it an ethically important one?

          • jacquesm 5 days ago

            The distinction is a legal one: intellectual property can not be re-used without permission of the rights holder, be it a patent or a chunk of source code.

            And you can bet that SpaceX using physics knowledge and cumulative rocket engineering knowledge are very careful to either license the tech they use or be very explicit about documenting their own.

            That you can't see the difference is entirely on you, going 'against the flow' of society sometimes leads to change but more often it simply results in friction and a lack of comprehension.

            Keep in mind that open source is based on copyright law, and without copyright law the protections that open source offers would be gone.

            To give an extreme example: if you had a chunk of software that was constructed in such a way that it would spit out a complete copy of 'the Gimp' without the license file if you started to write an image processing program that would be a very clear case of copyright violation.

            If you then start breaking the Gimp down into smaller and smaller re-usable fractions at some point you might be able to argue that such a generic and oft used snippet should be free of copyright. But that only works as long as you then don't string together a whole pile of pieces that you each copied somewhere else, the whole idea is that your creation is an original one.

            Medical research (which quite often leads to patents, which I don't believe should be possible, especially if that research was publicly funded) and physics knowledge are of a different kind than copyrighted program code. The latter would be better compared to universally present language constructs and constraints, such as 'memory management', 'data manipulation' etc. Once you make those explicit in an implementation copyright applies.

            Or, to make another analogy: it's like comparing the skill of writing to the product of that skill. The skill isn't protected, but the output of the act of writing is.

            • Xunjin 4 days ago

              An amazing argument and analogy, also I do agree about Medical research, being possible to patent a work which is publicly funded is an A*** move.

          • meheleventyone 5 days ago

            There are thousands of novel decisions in the work of Moderna and SpaceX beyond their cultural starting points. Same thing with art. Copilot isn't inventing nor is DALLE-2 being artistic.

          • lelanthran 4 days ago

            > I don't understand the difference between Co-Pilot on the one hand and Moderna (on the shoulders of medical research) or SpaceX (on the shoulders of physics knowledge and cumulative rocket engineering knowledge) on the other. They all heavily use technology, automation and machines. I don't see where the distinction is coming from, and if there is a technical legal distinction, is it an ethically important one?

            They are all in compliance with intellectual property laws? Seriously, that's a bloody big difference.

            Co-pilot is not in compliance with many of the source code it is using!

            Whether you like it or not, compliance with the law is necessary.

      • Dracophoenix 4 days ago

        > This is where some modest taxation comes in. To reallocate a slice of the output of value creation to its actual source in a rough kind of way wherever more direct compensation isn't feasible.

        I was with you until this statement. The vast majority of society consumes, but doesn't create something new in the process. I'm bewildered as to why you think taxation is a solution rather than a disincentive towards creating. As far as compensating the giants upon whose shoulders most stand, there are plenty of vehicles for that: royalties, patents, copyrights, pensions, awards and prizes, paid fellowships, etc. These are relatively easy to calculate and write a contract for.

  • hansword 5 days ago

    If I enter 'Mickey Mouse' into an ML-TTI thing like Craiyon (Dall E mini) do you think I will be able to sell the resulting image on a Tshirt?

    No, I won't, because Disney has fancy lawyers, the average open source developer hasn't. What you are saying is: Screw little people, let M$ make their money.

    Either copyright is for everyone, or for no one. I prefer the latter, but this is not the world we live in.

    • hourago 4 days ago

      There big difference is that by copying Micky Mouse you are hurting one of the most known and very powerful corporations in the world, by copying code you are just hurting open source projects and individual developers.

      It should not be different, or if anything, it should be worse to punish people with less resources. But here we are.

    • jimnotgym 4 days ago

      Isnt this an indictment of the justice system rather than the big firms.

      I once heard this quote, "English justice is open to all, in the same way that The Ritz [very expensive hotel] is open to all."

      • gilrain 4 days ago

        The useless justice system has been engineered by the firms for their benefit.

    • fonix 4 days ago

      This is more like entering "cartoon mouse nose" into Craiyon though. You're getting incohesive code snippets returned to you based off a single line (appropriate word for code and a drawing).

  • teakettle42 5 days ago

    My code is shared under a license (MIT) that mandates attribution.

    That’s all I ask — if you use my code, give me credit.

    Stealing my code to train your bot — which will replicate portions verbatim! — is no different whatsoever than the casual plagiarist that copies and pastes a novel snippet manually.

    Its absolutely my legal and ethical prerogative to complain about people stealing my code by failing to respect the license under which it was freely provided.

    • Xunjin 4 days ago

      That would be great to Copilot show where it found this snippet and give the person credit about. Even if it's unlicensed.

      • seba_dos1 4 days ago

        If it's unlicensed, then you can't use it at all, so giving attribution wouldn't change much in that case.

    • tiborsaas 4 days ago

      Is is really stealing when your code is used to change a parameter value from 0.3623727247 to 0.3623727321?

      • iamevn 4 days ago

        It does not matter what the internal representation is. What matters is that Microsoft is selling a tool which reproduces non-public domain works while claiming to grant the user ownership of the output.

  • jacquesm 5 days ago

    They are complaining about license violations, they are not pissing on this incredible (is it?) achievement.

    Reselling other people's content like this without attribution (which, is a pretty mild form of payment) is not nice. But at least you now have one more reason in the list of reasons why Microsoft acquired Github: to be able to launder their open source contributions and resell them.

  • matthewmacleod 4 days ago

    I also disagree with the tone of that tweet, but your dismissal is equally shallow and gear-grinding.

    There are real, serious, and genuinely interesting issues to be discussed regarding Copilot. It is neither "just selling code that other people wrote", nor is it something that we should applaud merely because it demonstrates "human ingenuity".

    The comments here regarding this are honestly a total dumpster fire. It's mostly a bunch of paper-thin hot takes, either:

    - The blatantly stupid "you willingly shared your code so why are you complaining that one of the world's biggest companies is now hoovering up code from your carefully-selected open-source license and reselling it as a service!!!"

    - The blatantly lying "I have literally never looked at any other computer software while developing any obviously anybody who has ever seen other source code is a plagarist"

    It's dumb because there is an actual interesting discussion here but I guess we're not going to bother having it.

    • HumanReadable 4 days ago

      Fair enough, I agree.

      I actually didn't intend for my comment to be an argument in favour or against, and I am a bit surprised it is the most upvoted of the section.

      I agree that there's a pretty interesting discussion to fair-use and the limits of copyright, and that my original comment was not conducive to having that discussion. In my defense, neither was the tweet this thread is about!

  • Sakos 5 days ago

    I share my code without a license because I want others to be able to see how I solved things. However, this doesn't mean I'm okay with wholesale copying my code. If it's some random guy, then whatever. If it's a corporation like Microsoft, then yeah, I have a problem with it. Under German law, the code is legally not allowed to be reproduced or used without explicit permission even if it doesn't have a license. I retain ownership of it until and unless I explicitly relinquish my ownership rights.

    • Xunjin 4 days ago

      Well, it depends on where you post it, right? Because if you are using a GitHub which probably is US based, you follow the laws related to US?!

      Demanding that the law of a country should be followed by another is totally no sense. They can agree, make agreements about it, and even take legal action to the Highest court, so it could be evaluated, but using your nationality as an argument of what you can do, it's just plain wrong.

      • Sakos 4 days ago

        https://choosealicense.com/no-permission/

        I always find it weird how people respond to my comments. Why didn't you check what the US law is like for source code? A lot of places have similar laws around source code, primarily in the West because of efforts to normalise laws across countries, driven by US efforts. And other countries? Well, it's the same for any kind of IP. Either the country has strong IP law and you have the resources to pursue an issue or not and you can't do anything about it.

    • paulcole 4 days ago

      > Under German law, the code is legally not allowed to be reproduced or used without explicit permission even if it doesn't have a license

      This is nuts. How can anbody be expected to both know that you’re German and German law when you post on an international website?

      Or is this a German law that exists to prevent other Germans from doing things but that the rest of the world scoffs at?

      https://choosealicense.com/

      • giaour 4 days ago

        US law is pretty similar in this regard, isn't it? If you don't have a license for a particular piece of code, you can't use it without the author's/copyright holder's permission, even if you found it posted online.

      • solar-ice 4 days ago

        You're expected, wherever you are, to look into where any code you use comes from and what legal rights you have to use it. (The author not offering you a license means you can't use the code, nearly anywhere in the world - pretty basic Berne Convention stuff.)

        This is the legal expectation in general, not just for software - you can't just come across a design for a neat widget somewhere and start using it in your product, there's probably both copyright and patent on it. Software isn't special. Not everything in Github can be copied into your code verbatim.

      • falcolas 4 days ago

        That’s how us law works too. Works are automatically under copyright, even if you don’t say so. It needs a license to lessen the copyright restrictions.

  • the_gipsy 4 days ago

    > share their code on github where it is publicly available just to complain when others make use of that knowledge

    I put a fucking license on it so that it doesn't get abused by some fucking corporation. Jesus Christ, it's not hard to understand.

  • Mizza 5 days ago

    Pretty fucking simple explanation for it, actually:

    I don't make Free software so that Microsoft can sell it to people for use in proprietary projects.

  • bambax 5 days ago

    The world would probably be a better place if there were no copyright.

    But the world we actually live in is one where corporations have copyright, and individuals don't.

    That's what irks people, I think rightly.

  • akagusu 4 days ago

    > I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

    People like you should understand that publicly available code doesn't mean "do whatever you want" code.

    The majority of publicly available code hosted on Github as a license that tells you what you can and what you cannot do with that code.

    If someone uses this code without respecting the license, authors have the right to complain and even legally enforce the license if they want.

    Now, you should know that there's nothing "cool" to take other people's work without permission.

  • DoreenMichele 5 days ago

    Meanwhile, creators of FOSS projects are often underfunded and lots of people are in such dire straits that rich people talk of mollifying them with a few paltry dollars via UBI rather than fix anything.

    That's likely the crux of the issue. If you do it right, you can steal from other people and get rich. Meanwhile, those same people (whose work was stolen) may be left out in the cold no matter how original, creative, hardworking etc they are.

  • Chris2048 5 days ago

    > willingly share their code on github where it is publicly available just to complain when others make use of that knowledge

    because it's not unconditional, there are often licence terms of usage, and copilot is potentially laundering those.

  • rglullis 5 days ago

    > why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.

    For other individuals to collaborate, to make the software available to other people, etc. Certainly not for github's profit and much less for the benefit of github's customers who will have access to open code that violates license agreements.

  • throwoutway 5 days ago

    I hear you, but this isn’t a “marvel at this free open clever academic thing we built”

    It’s a product by a business. Why is that not open to criticism?

  • rockbruno 4 days ago

    My problem with this conversation is how we can have a 200 comment thread without anyone providing any kind of proof to these claims. Is there any instance of this bot printing an actual copyrighted algorithm instead of a mundane uncopyrighteable piece of logic?

    • sascha_sl 4 days ago

      One of the earliest examples was Copilot printing Quake's fast inverse square root verbatim, including swearing in a comment.

      Quake's source code is GPL.

      There are plenty more if you're willing to look.

    • dgb23 4 days ago

      There are examples of it providing literal copies of code without attribution etc.

    • Xunjin 4 days ago

      The famous “burden of proof” fallacy. In the end, I'm eager to anyone who can prove it, sue them and see the results from it.

  • ThePhysicist 4 days ago

    I mean I'm not an expert but it's a valid point as people share code under a given license, and as far as I'm aware Copilot does not make this knowledge available. Nothing to do with the fact that Copilot is an amazing technological achievement.

    If I, as a human, go to a public repository on Github and copy/paste a non-trivial 200 line code snippet into my proprietary code base I have to abide by the license of that original code, even if I slightly modify it. I don't see how this cannot be true for Copilot. I'm sure the legal folks at Github have thought of a response though, you could e.g. argue that the snippets produced by Copilot are not affected by the copyright of the original author as they do not reach the required treshold of originality. Seems rather shaky for me though.

  • pmarreck 4 days ago

    I think copilot is amazing. I don't care what, if any, of my code snippets it uses because I also gain from it by skipping boilerplate (as well as things like bash idiosyncrasies). Using it feels like I am working with dozens of invisible collaborators

  • hdjjhhvvhga 4 days ago

    > Any time someone invents something new and incredible, there's always a crowd of negative nancies eager to discredit and explain why the invention is nothing new and a detrement to society.

    It is not true. Whenever there is something really useful, everybody is happy, and while of course they always are some nansayers, they're very few.

    However, when you do something controversial, you can expect to hear criticism. You are of course free to dismiss that criticism, but when a lot of people are telling you what you are doing is unethical, maybe it's time to stop and think about it.

  • nixpulvis 4 days ago

    You should read more about peoples ideologies and philosophies of Open Source.

    One big reason I support it is because it grants me the right and ability to change things I need/want to change.

  • ricardoplouis 4 days ago

    Wouldn't you rather have a healthy dose of skepticism and pessimism surrounding new inventions? Even if the negativity is off base, it's far more preferable to a world where everyone is always positive and praises what geniuses the creators are. The former atleast breeds discourse while the latter only serves to make people feel good.

  • zitterbewegung 4 days ago

    Why can’t startups understand what a open source license is ? Apache 2.0 could be ingested by this tool but it is a horrible license for your database as a service. AGPL would be a great license for a database as a service but should not be ingested by OpenAi / GitHub copilot.

  • Tryk 4 days ago

    This doesn't address the point of the Tweet, you are simply attacking the form of their argument.

    Moreover it is possible to BOTH marvel at the human ingenuity that went into making copilot AND disagree with their methods. Some things can be marvelous and wrong at the same time.

  • nerdponx 4 days ago

    Both things can be true. It's clear that it violates the licenses of many software projects. But I do agree that denigrating it as "just selling other peoples code" is missing the whole point of the product and of what you pay for when you subscribe to it.

  • isitmadeofglass 5 days ago

    Yes but,

    Sorry for the unproductive tone of this comment, but there's something about the attitude of this tweet that really grinds my gears. Any time someone invents something new and incredible, there's always a crowd of negative nancies eager to discredit and explain why the invention is nothing new and a detrement to society. I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge. 'co-pilot just sells code other people wrote' is such a ridiculous understatement of what co-pilot does. Instead of marvelling at the human ingenuity that went into creating it, they sneer at the audacity of openAI to do something without first asking their permission.

    — This comment brought to you by HN-Comment-AI ©

    • bryanrasmussen 5 days ago

      whoa, I think this should definitely be highlighted far and wide on the internet, think of the ingenuity of the people who made the HN-Comment-AI, it's probably the smartest comment bot out there, able to take the ramblings of people on HN and nonetheless generate a comment so astute!

      Although I have to say the use of the phrase 'negative nancies' shows that even the best machine-learning algorithm still comes up with unlikely to occur in real life text.

  • gumby 4 days ago

    People get paid to write code having learned from writing code for others and from reading code others wrote. In this regard I dont see why github copilot is any different.

    • lupire 4 days ago

      People don't memorize chunks of code and copy it.

      • gumby 4 days ago

        People often do write things because they learned a common approach at a previous job or because they saw such an approach when reading someone else’s code. People are often hired specifically because they have experience in a certain area from a previous employer, so are dointhe same sort of thing at a higher level.

        We fought this battle over a couple of decades with remix culture (“you stole that line/beat out of my song!”) and the world is better because the over-clingers lost.

        There is no shortage of reasons not to like copilot, but I don’t consider this one of them.

  • lobocinza 4 days ago

    Plagiarism isn't new or incredible.

  • hk1337 4 days ago

    Usually they want some recognition for their contribution and with GitHub copilot they get none of that.

  • sAbakumoff 4 days ago

    It's the negativity bias beauty in action. You have it too.

  • pwdisswordfish9 4 days ago

    ‘Facebook just sells personal information of other people’ is such a ridiculous understatement of what Facebook does. Instead of marvelling at the human ingenuity that went into creating surveillance capitalism, they sneer at the audacity of Facebook to do something without first asking their permission.

  • rambojazz 5 days ago

    Sounds like they're not selling any of your code

    • barthvr 4 days ago

      Copilot access is $10/month.

      Think about how Napster was treated back in the day, or torrent websites. You pay to access some copyrighted content. Is it legal ?

  • olalonde 4 days ago
    • ParetoOptimal 4 days ago

      You can be both a professional software developer and a caring human that considers ethics and exercises empathy.

      It's easy to convince oneself they can only be a professional developer to escape ethical responsibilities which require significant time and energy.

      • olalonde 4 days ago

        Of course you can and you should. But going from the Twitter bio and personal website, it doesn't appear to be the case here. They're an activist who lives from soliciting donations and selling 30$ videos on how to be anti-racist (like, literally).

        • dxdm 4 days ago

          Doesn't mean what they're saying is wrong. Probably makes more sense to attack the substance of their argument, and not their bio.

          • olalonde 4 days ago

            The comment I was replying to already did a good job at that, I was just adding some context because it helps explain the attitude.

        • ParetoOptimal 4 days ago

          > They're an activist who lives from soliciting donations

          Assuming this is the case, are you concluding they don't have time to be a real software developer?

          Maybe they were a professional developer, but now are an activist 50% of the time.

          > selling 30$ videos on how to be anti-racist

          Is the indictment here that they are a capitalist?

          My basic point is you seem to really want to dismiss their views you don't like by arguing they aren't credible rather than attacking their ideas.

          • olalonde 4 days ago

            > Is the indictment here that they are a capitalist?

            No and they are actually anti-capitalist according to their Twitter bio.

            > My basic point is you seem to really want to dismiss their views you don't like by arguing they aren't credible rather than attacking their ideas.

            The comment I was replying to already did a good job at that, I was just adding some context.

  • B1FF_PSUVM 5 days ago

    > negative nancies

    Not bad for everyday use - I like "nattering nabobs of negativism" (as scripted by William Safire), but it is really a bit over the top.

  • OrwellianTimes 4 days ago

    Fully agreed. It's just people getting mad and jealous but hear me out.

    Copilot is NOT SELLING coed other people wrote, it is simply acting as a curator to show you all the solutions people HAVE WRITTEN for free.

    Copilot does NOT write entire programs, it's simply an assistant. And there is not much copyright you CAN apply to 3-4 lines of generally understandable code.

    I've used Copilot and am actively paying for and I have not seen many cases where it's generating bad code. It's only there to remove boilerplate and common problems, not there to write entire applications.

    Why are people getting so salty?

    • boesboes 4 days ago

      Because they _are_ verbatim copying code and not respecting the license. It's not that complicated.

      Github knows better, can do better and should.

      • olalonde 4 days ago

        Do you have an example of Github Copilot doing that? Like a snippet of code generated by Copilot and a link to the original source code.

        • falcolas 4 days ago
          • olalonde 4 days ago

            Thanks. Personally, I feel like such small and widely used mathematical algorithms should not be copyrightable (or using them should fall under fair use). It even has its own Wikipedia page[0], where the source code is also reproduced without copyright notice.

            [0] https://en.wikipedia.org/wiki/Fast_inverse_square_root

            • zzo38computer 4 days ago

              I also implemented this algorithm in MMIX:

                 % Constants
                FISRCON GREG #5FE6EB50C7B537A9
                THREHAF GREG #3FF8000000000000
                 % Save half of the original number
                 OR $2,$0,0
                 INCH $2,#FFF0
                 % Bit level hacking
                 SRU $1,$0,1
                 SUBU $0,FISRCON,$1
                 % First iteration
                 FMUL $1,$2,$0
                 FMUL $1,$1,$0
                 FSUB $1,THREHAF,$1
                 FMUL $0,$0,$1
                 % Second iteration
                 FMUL $1,$2,$0
                 FMUL $1,$1,$0
                 FSUB $1,THREHAF,$1
                 FMUL $0,$0,$1
              
              (Note this assumes that the input number is not too small; if it is, then it will not be possible to compute half by this algorithm. Also, like with the original code, the second iteration may be omitted if desired.)

              (This comment and the MMIX code it contains, and all other comments that I wrote on here, are I agree release it to public domain.)

            • falcolas 4 days ago

              It’s the verbatim replication of the comments that makes this a damning piece of evidence against the “it’s not copying code, it’s an AI” argument.

              • olalonde 4 days ago

                Yes, it is clearly copying code from Quake, I wasn't denying that.

GuB-42 4 days ago

> Copilot just sells code other people wrote

So what? Selling code other people wrote is the foundation of the free software movement. It is the entire business model of countless companies, and it is a good thing. Among them are most major linux distro vendors like Red Hat and Canonical.

The value added by Copilot is that they sell you the lines "code other people wrote" you want out of billions.

I still think it is derivative work, and that they should only process code under permissive licenses, or, if they want to include GPL code, make a GPL-only version, usable only for GPL projects. I thought it is what they did, there is so much code under permissive licenses that is should be enough to train their model, but apparently, they don't care, as long as it is public, it is included. For me, they are shooting themselves in the foot, several companies have already banned Copilot due to the potential issues with copyright.

floor_ 4 days ago

I started self hosting when Microsoft bought github and with this mass theft of copyrighted material and then reselling it for money I'm even more happy with my decision.

k__ 5 days ago

Isn't that what Web2 is all about?

Someone creates content for free, and companies monetize it.

  • WesolyKubeczek 5 days ago

    The real Web3 is companies sue original creator for infringement.

bmacho 5 days ago

On a side note, I do believe that short programs or functions should be copyright free by law.

Or we as a community need to create a better bsd, a cc0 for everything.

Almost everything is nontrivial, and almost everything is copyrighted, at least with the pressure to name the original author (BSD, GPL, other major permissive licenses).

Say you want to use a library, then you check for examples in the documentation, now you have to denote somewhere that the example is from the documentation (best if you put it in the source code, so you don't lure other people to copy what you copied and refer you as the author).

It is a major PITA at least for me.

  • stagas 4 days ago

    What about a law that makes all code available but then requires you to use a portion of your earnings to compensate the people their dependencies you used?

tpoacher 5 days ago

Does this mean I can steal stuff if I say I trained an AI to do it for me?

  • bmacho 5 days ago

    Is cat an AI?

    • tpoacher 4 days ago

      Nobody said it can't overfit 100%, right?

wolframhempel 5 days ago

When my last company got acquired, part of the due diligence process was a scan of our codebase for snippets from stack overflow. Every snippet found that wasn't posted with a clear license by the author was challenged and we rewrote it.

Now, I'm not entirely sure how necessary this was from a legal perspective. But introducing an AI into the mix will bring up a lot of uncertainty when it comes to how much change is required for something to no longer be considered a copy/derivative.

  • dmortin 5 days ago

    Did the scan find the process if they changed the variable names, for example? Or is that considered a differing snippet then?

    • wolframhempel 4 days ago

      This is exactly where it gets murky. We had the usual 1-4 line snippets. We went the extra mile to change them, rewriting them from scratch, partially with different implementations. Did we need to do that? Would it have been enough to just change a variable name or some spacing or similar? I don't think there's a clear standard.

      The music industry has struggled with this for a long time. When is a song derivative, when a copy, when is it "inspired by"...

      • anonymoushn 4 days ago

        That sounds rough. Here's an 8-line snippet, please make sure you don't infringe my copyright:

            p = mmap(
                null,
                size,
                PROT_READ | PROT_WRITE,
                MAP_PRIVATE | MAP_ANONYMOUS,
                -1,
                0,
            );
  • redox99 5 days ago

    Isn't all stack overflow content creative commons?

    https://stackoverflow.com/help/licensing

    • wolframhempel 4 days ago

      it is - which is a problem if you want to repackage something under a different license.

      • redox99 4 days ago

        But you said

        > Every snippet found that wasn't posted with a clear license by the author was challenged and we rewrote it

        How is the license not clear?

  • dmix 5 days ago

    That sounds like legal paranoia or a make-work program.

thewoolleyman 4 days ago

Artificial Intelligence is causing us to revisit the difference between free as in beer and free as in speech (https://en.wikipedia.org/wiki/Gratis_versus_libre).

It is putting a new spin on some traditional Open Source Lessons (https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar#L...).

People share and reuse snippets of unattributed snippets of MIT-licensed and GPL-licensed code on the internet all the time, StackOverflow, etc.

StackOverflow is profiting from that activity indirectly by facilitating it. They profit passively through ad revenue, and actively through the Teams subscription offering.

But nobody seem too upset about that.

How is an AI which facilitates the same code sharing fundamentally any different? Because it’s scraping it itself, rather than humans contributing it?

Seems like a tenuous argument at best.

mullikine 4 days ago

Traditional 'real' (as opposed to 'imaginary') programming is like writing in assembly code; It's outmoded because of generative models, in a way similar to 'C' outmoding assembly code. The most important thing, I think, is that free (libre) software developers are able to work with the language models directly, so that libre software is allowed to continue progressing into what I call Imaginary Programming. That's because with a generative internet all you really need is blockchain + prompting.

https://huggingface.co/spaces/mullikine/ilambda

Language models are able to 'steal' the linguistic meaning-making 'essence' of the software, by modelling:

- How the software is used (mimicing its function) - external meaning

- How functions are 'inspired' - internal meaning (reflection)

https://github.com/semiosis/imaginary-programming-thesis

The models themselves should be clear about where the data came from. However, this is only possible in a fair world which we do not live in. Compromise must be made to protect national interests.

Generative models are license blind and there's very little that could be done to prevent progress. Like what the invention of the camera has done for art.

Large language models including Codex are a transformative technology.

Bi-directional fair-use is probably the best result we can hope for.

So long as Microsoft and OpenAI are not selling back usage of the model to the open-source community, I think it's OK, though it's the bare minimum obligation.

Havoc 4 days ago

Yes, though in a way so does stackoverflow & friends. Large chunk of dev ecosystem is copy paste and I don't think this is inherently problematic. It is always a case of standing on the shoulders of giants.

Its more of a licensing issue to me. As far as I can tell it was train on a blend of licenses which to me makes it inherently non-compliant. At least some of it is going to be copyleft and find its way into closed source.

nathias 5 days ago

Copilot is a new way for corporations to break copyright while enforcing it for everyone else, this will be the first big use for AI when other corpos follow.

yaseer 5 days ago

Technically, programmers search, copy and modify code all the time.

One might argue copilot puts into software an algorithm that humans are already doing. Software like that is usually inevitable.

Still, it sucks there's no benefit for the contributors.

The most ethical thing I can think of is some kinda 'Spotify-like' revenue sharing model, based on how often their code is used by others. Not that they'd ever implement that if they can get away with it!

  • omnicognate 5 days ago

    > One might argue copilot puts into software an algorithm that humans are already doing.

    That argument only works if you think what Copilot is doing is meaningfully similar to what humans are doing. The debate about how these models relate to human thought might have legal implications.

    As I understand it (IANAL) copyright doesn't protect ideas and concepts. It protects the content itself. In theory, if I read some copyrighted work, understand some idea in it and then create a new work using that idea, without copying that original work, then that is not a derivative work. (I think this is at least how it's supposed to work - would love to be corrected if that's wrong.)

    So if I took a copyright work and rot-13ed it before distributing copies, I think that would be clear copyright violation, but if I made my own works using concepts I gleaned from reading it, it wouldn't be.

    So should Copilot be treated like the rot13 algorithm or like me understanding concepts and generating new works using them? That sounds like a fascinating legal debate to be had.

  • teakettle42 5 days ago

    > Technically, programmers search, copy and modify code all the time.

    When following the license terms, preserving the original copyright, etc, sure.

    However, honest, ethical people (including programmers) do not plagiarize.

    Copying and pasting code without attribution is plagiarism. Doing it without following the licensing terms is a copyright violation.

    • redox99 4 days ago

      I don't consider copying a 3 liner from stack overflow and not writing an attribution plagiarizing (regardless if technically speaking it is or isn't according to the law).

      • teakettle42 4 days ago

        Plagiarism isn’t a legal concept, it’s an ethical one.

        You need to either attribute the source, or rewrite it in entirely your own words — just like when writing a paper.

        Confirming to the license is also required; iirc, SO requires attribution under the CC-SA license.

        • redox99 4 days ago

          > Plagiarism isn’t a legal concept, it’s an ethical one.

          Well if it isn't a legal but an ethical concept, then that's just your opinion, since there isn't some universal body that establishes exactly what is ethical and what isn't. And as I said in my previous comment, "I don't consider".

          > You need to either attribute the source, or rewrite it in entirely your own words — just like when writing a paper.

          Often times a three liner can not be changed in any way, and is the only solution to a problem. In some cases you may be able to change it only in terms of indentation and variable names (in others you can't even change that).

          But assuming you can do that, it makes no sense at all just changing indentation and variable names just for the sake of changing it.

          > Confirming to the license is also required; iirc, SO requires attribution under the CC-SA license.

          As I said I'm not talking about the legalities.

          https://stackoverflow.com/questions/55319570/how-can-i-raise...

          Are you going to attribute that every time you use Math.pow?

          • teakettle42 4 days ago

            > Well if it isn't a legal but an ethical concept, then that's just your opinion

            Plagiarism being unethical is just my opinion?

            > Are you going to attribute that every time you use Math.pow?

            Does a simple 2-ary function call of a well-defined API qualify as “taking someone else's work or ideas and passing them off as one's own.”?

            If not, then it’s not plagiarism.

            • redox99 4 days ago

              > Plagiarism being unethical is just my opinion?

              What constitutes as plagiarism and what doesn't, outside of what the law says, yes.

              > Does a simple 2-ary function call of a well-defined API qualify as “taking someone else's work or ideas and passing them off as one's own.”?

              So you agree that taking some code verbatim from SO is not plagiarism then?

              What about this, would copy pasting this verbatim be plagiarism?

              https://stackoverflow.com/a/959004

              And this?

              https://stackoverflow.com/a/45049763

              • teakettle42 4 days ago

                > What constitutes as plagiarism and what doesn't, outside of what the law says, yes.

                It’s pretty clear what it is.

                The definition of plagiarism hasn’t changed since you were in grade school and were taught not to copy sentences into your papers.

                If you still don’t understand what plagiarism is now, yours is a willful ignorance that doesn’t excuse unethical behavior.

                > What about this, would copy pasting this verbatim be plagiarism

                > https://stackoverflow.com/a/959004

                Yes, that’d be plagiarism. It’s also bad code.

                You should use the example to understand the underlying problem, at which point you will be well-equipped to write your own one-liner.

                If you can’t write it using your own understanding of the problem, you’re not an adequate programmer and need to improve your skill-set … which won’t happen if you just keep plagiarizing code you don’t understand.

                • redox99 4 days ago

                  You're basically just repeating that your opinion is the right opinion.

                  I don't agree that such example is plagiarism and I'm sure a lot of people also would disagree that that's plagiarism.

                  > You should use the example to understand the underlying problem, at which point you will be well-equipped to write your own one-liner.

                  > If you can’t write it using your own understanding of the problem, you’re not an adequate programmer and need to improve your skill-set … which won’t happen if you just keep plagiarizing code you don’t understand.

                  Who says you can't write it by your own, or you don't understand it? Stack overflow and tools such as copilot are often about saving time, not that you would be unable to figure it out by yourself.

                  And besides that, the point of those examples is that a lot of people without searching for those stack overflow posts, would type that exact same code character by character.

  • kaibee 5 days ago

    > The most ethical thing I can think of is some kinda 'Spotify-like' revenue sharing model, based on how often their code is used by others. Not that they'd ever implement that if they can get away with it!

    Based on my understanding of how NNs work, I'm not sure its even possible to implement something like that.

0x_rs 4 days ago

I'm not a lawyer, nor very well versed in the vast world of licenses and their definitions in court contexts, but I've been wondering about something with the growing appeal ML-generated content has for the average person (and the "high" barrier for entry in the market) — are licenses in some form or another going to adapt to this phenomenon? From a brief search, I have not found any new license with a no-dataset-usage clause (assuming fair use does not apply, that's another big question). What are the chances anything of the sort will become an option for any "creative" work that's usually shared freely (such as artwork, code, et cetera) even despite copyright? What about the ownership of the dataset? It seemed to be questionable years ago already that possibly IP-protected content goes through the black box and resembling material gets on the other side, whose ownership is it really? I'm guessing some notable court cases in the future could define this in the following years if the popularity continues growing.

iptq 5 days ago

I know this isn't really related to the whole copying ethics debate, but I definitely feel like there's some sort of foul play happening here. For all of the unlicensed projects out there, the license that is automatically granted to Github includes:

> the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time

It's insane how vague this is. Is Copilot a "Service"? Sure, by its definition:

> The “Service” refers to the applications, software, products, and services provided by GitHub, including any Beta Previews.

And since much of the code was published before Copilot's inception, this means Github can just arbitrarily add more "services" and milk the code for whatever it wants. Automatically service-ify any public repository? Sure, pay us for quotas. It's like a legal loophole to let Github just bypass any license restrictions you put on it.

ThereIsNoWorry 5 days ago

1. You most likely agreed to that by using GitHub.

2. Copy&Pasting Code by manual search exists.

3. This is just a smart tool so you don't have to figure out yourself what to copy&paste (in the best case) and save a lot of time.

Sometimes I truly wonder how people can genuinely be upset about things like this. What is broken are copyright and patent laws in the 21st century.

  • keraf 5 days ago

    The point of this Tweet is about licensing. When using an MIT licensed library for example, you would have to give attribution. But you can easily rewrite portion of that library yourself using Copilot, which could potentially use code from the initial lib, without any attribution or whatsoever. It's even more problematic with licenses such as the GPL.

    I guess Copilot could address this by checking the licenses of the projects it uses. Even when combining code, it could pull in the required attribution or avoid GPL licensed code (unless enabled) for example.

  • teakettle42 5 days ago

    > Sometimes I truly wonder how people can genuinely be upset about things like this.

    Tell me you regularly plagiarize without telling me you regularly plagiarize.

    • ThereIsNoWorry 5 days ago

      Code plagiarization is not a thing by all practical purposes (it's even almost impossible to go to court with that for very obvious reasons). And that's good. Because with that insane lockdown of "Intellectual Property" nothing would ever get done. So, think what you want.

      • lin83 4 days ago

        I don't think that's true and, if it was, it would be the death knell for open source.

        Code Plagiarism is taken very seriously by every company I have worked with. Multiple companies have been sued for violating the GPL. The SFC is currently fighting Vizio in court for example. While not commonplace, to say it's "almost impossible" is a stretch. Every large company complies with code copyright obligations for a reason. My company publishes changes to GCC and a dozen other GPL projects. Entire products like Protocode and BlackDuck exist to ensure code compliance. Even small code snippets are flagged.

        Over the past few years the source code for Windows, SQL server, Bing and Cortana have all been leaked. If someone built a product using that code, how long do you think it would take Microsoft to sue? CoPilot is one rule for mega-corps and another for everyone else.

      • teakettle42 5 days ago

        > Code plagiarization is not a thing by all practical purposes

        Of course it is. Plagiarism is “the practice of taking someone else's work or ideas and passing them off as one's own.”

        It’s unethical and it will get you fired at any reputable company.

        • ThereIsNoWorry 5 days ago

          Ok, then there doesn't exist a single reputable company with a tech division and we're all unethical. Have a nice unethical day.

          • teakettle42 5 days ago

            > Ok, then there doesn't exist a single reputable company with a tech division and we're all unethical. Have a nice unethical day.

            I’m deeply disturbed that you think this form of plagiarism is universal — I can assure you that is not the case.

            I work at a FANG currently, and plagiarism is absolutely not tolerated.

            In fact, plagiarism has been considered a fireable offense at every other company I’ve worked at over my 25 year long career, and prior to that, considered a serious form of academic misconduct in school.

            It’s clearly unethical and I’ve never plagiarized in my life.

            I’ve only run into one instance of someone else plagiarizing code in my career, and that individual was fired.

            • lin83 4 days ago

              > I’m deeply disturbed that you think this form of plagiarism is universal

              This thread is an eye opener for me too. Do engineers not get trained on their legal obligations? My company is old and not a tradition tech company but we have been running workshops on the issue for years. Even if they don't, what about their legal teams? Or CI tools to scan for licence violations? Some of the responses here are so naive it's crazy. I hope no one is identifying the companies they work for.

              • ThereIsNoWorry 4 days ago

                Obviously we do. Don't copy paste 10 pages of source code unaltered and sell it as your own.

                But that's something entirely different from small code snippets, changed and adapted to solve the same problem a thousand other people already had. Nothing else are developers doing going on GitHub, StackOverflow or any other website to find answers to their questions. That's not naivety, that's how coding works (partially). If you would have to re-invent the wheel everytime you build something new, good luck.

                • lin83 4 days ago

                  There isn't a threshold for copyright violation. If you copy a 3 line function from a GPL library, you have to comply with the licence. Tools like BlackDuck will pick it up.

                  Snippets aren't exactly defined but I see them as more than just a single line like "here's how to flatten a list in Python", it's some functionality - e.g. an algorithm implementation or some task.

            • ParetoOptimal 4 days ago

              > I’m deeply disturbed that you think this form of plagiarism is universal — I can assure you that is not the case. > I work at a FANG currently, and plagiarism is absolutely not tolerated.

              It's universal in any company that doesn't take measures against it. So basically startups, small, medium, and even some large companies.

            • ThereIsNoWorry 4 days ago

              I'm disturbed you believe regurgitating code snippets is plagiarisation.

              • teakettle42 4 days ago

                It’s literally plagiarism by definition.

                • ThereIsNoWorry 4 days ago

                  https://stackoverflow.blog/2021/12/30/how-often-do-people-ac...

                  I feel like you're arguing in bad faith. So, whatever.

                  • teakettle42 4 days ago

                    Explaining a fundamental ethical concept you should have learned in primary school when writing your first book reports is not arguing in bad faith.

                    SO’s license requires attribution.

                    If you don’t want to be a plagiarist, you either need to include attribution, or you need to rewrite the solution in entirely your own words.

                    • ThereIsNoWorry 4 days ago

                      So, rearranging conditionals or loops or variables then, problem solved. You cannot 1:1 copy paste anyway. That never works. You always have to adapt it to your particularity. So it's "reworded" by default. And CoPilot is doing nothing else. It's not just 1:1 memorising code, it's a tiny bit smarter than that. I strongly believe you're not a developer. Point taken. I understand your considerations. You should write code sometimes to solve a complex problem that uses some libraries and see how far you get without consulting the internet or books.

                      • ilikehurdles 4 days ago

                        I absolutely attribute things I find on SO to where I found them. You finished college maybe a year ago and are already making some absolute judgments about what makes other people qualified to call themselves developers simply because they don’t develop as you do.

                      • teakettle42 4 days ago

                        > I strongly believe you're not a developer.

                        > You should write code sometimes to solve a complex problem

                        There’s a very high chance you posted your comment from a device using code I wrote.

                        Glueing together plagiarized code copied from SO, or stolen from OSS projects on GitHub, is not software engineering.

  • zufallsheld 5 days ago

    As to your first point, there are many repositories on github that the author of code did not upload there or where not all contributors to the code are on github or agreed to let their work be used in such a case.

    • redox99 5 days ago

      That's really no different than somebody uploading proprietary code they don't own (stolen, leaked, whatever reason etc) on Github. Github has to assume that you are allowed to do so. What are they going to do otherwise, somehow manually verify that each repository is legit?

      Now you might say, what about GPL code you don't own. You are allowed to redistribute it (upload to github). But because you are not the owner you can't license it to Github under new terms (that allow them to use it for ML training). But the question still is, is there anything in the GPL that forbids it's code being used for ML training? Even if the generated model is proprietary, has no attributions, etc?

      • megous 4 days ago

        Ok, takedown requests exists. Say Qualcomm finally wises up and asks github to takedown a copy of the millions lines of their super proprietary 4G modem firmware implementation from github. Will github retrain the model after each such takedown? :D

        If not, then it's kinda stupid to argue the point about the lack of knowledge, since lack or not lack of knowledge clearly doesn't matter. Github will happily continue using confidential code even from trigger happy companies like Qualcomm for copilot.

        • redox99 4 days ago

          I guess they would add some kind of filter to copilot output that removes results that clearly come from code that was DMCAd.

          It's kind of like some employee that worked at Qualcomm and has seen the code. Do you retrain him (aka hit his head until he forgets) after leaving the company?

          The comparison might seem silly but as AI advances I expect more and more arguments (especially in court) to come from analogies of humans and AIs.

          • megous 4 days ago

            What kind of filter? I thought copilot does not output the input data verbatim.

            Creating an output filter based on millions lines of DMCAd code that would not cripple the copilot output completely at the same time, sounds like one of those hard problems. Especially if there's no agreed upon definition of copyright "violation" here.

  • SahAssar 4 days ago

    > 1. You most likely agreed to that by using GitHub.

    Are you saying that I would need all the original authors consent to upload a repo to github even if I include all the original attribution and licenses? Because what you are implying is that when uploading I'm granting github a license far outside the bounds of the license included, which only all the contributors can do. For example, would the linux project need to contact each and every contributor ever to upload a mirror to github, since their contributions were under GPL but you are implying that the license given to github is much, much broader?

    This would make any project not originally started on github and with a few contributors basically impossible to host there.

    > 2. Copy&Pasting Code by manual search exists.

    The question is who is doing the infringement here. Github copilot is obfuscating the copying and telling it's users that the code is theirs to use, own, etc. as they please but is also taking large chunks of code it does not have the right to redistribute, even less grant licenses to.

  • IdiocyInAction 5 days ago

    I don't think that something like CoPilot is what most GH users had in mind when they published their code. Also, licenses exist (which CP demonstrably doesn't give a shit about).

  • dmix 5 days ago

    > Sometimes I truly wonder how people can genuinely be upset about things like this

    90% of Twitter is just inventing new ways to whine about things

    • ParetoOptimal 4 days ago

      There's some truth there, but there is more negative in outright dismissing the uncomfortable but important ethical dilemmas one might be introduced to.

aetherspawn 4 days ago

Copilot is a fancy pattern bot.

Humans make original patterns, but since Copilot cannot think, then Copilot does not. It squashes together a bunch of small individual patterns, each under their own license, but at no stage does it do anything more than pick a line from here, and a line from there.

It doesn’t think, and it doesn’t create new IP.

It is like making a picture out of small snippets of a thousand other pictures, and then selling it.. clearly not OK. You still ripped off the original artists.

Or like plagiarising 100 of your class mates’ assignments. Are you less guilty because you went to the effort to steal just a few sentences from each?

A criminal who steals a cent from every account at the bank is a more sophisticated thief than someone who holds up a petrol servo.

If Copilot doesn’t create new IP (it doesn’t; we established this), then it uses existing IP. And in that case it is no different to any of the three analogies above.

rosmax_1337 4 days ago

I think this problem has no good solution until IP laws around the world are properly reimagined from the ground up. I'm of the quite radical stance that code, music, art in terms of their intellectual existence should be free for anyone to take. (you can own a harddrive with code on it, and claim noone should steal it, but not the idea of the code itself)

If you have ideas, code, music or art which you wish for noone to partake in, do your best to keep them secret. Certainly, breaking into secret areas should be illegal, but once the cat gets out of that bag it gets out of the bag.

The creative people behind these ideas I believe will be able to find good compensation nonetheless in society, IP-laws nowadays only serve to protect megacorporations to the detriment of creativity and ideas.

  • zzo38computer 4 days ago

    I agree. This will fix it. I think that copyright and patent should be abolished, but that if it is secret then it is still secret (unless someone else manages to come up with the same thing (e.g. by decompiling a published computer program to reconstruct the source code), which case it can be public). And so then also the AI can copy the code too just as much as you may do so manually; if it is published then you can do it and it should not be illegal to write such things.

maxbaines 5 days ago

Initially not thought about co-pilot and other ai generators this way, but now I have I’m finding it hard to ignore.

jarenmf 5 days ago

I guess the question is where you draw the line between a derivative work and "learnt by an AI algorithm"

  • asimpletune 5 days ago

    Who needs a line when there are plenty of obvious examples lifted verbatim?

  • triknomeister 5 days ago

    If the media copyright industries and their ContentID is anything to go by, it doesn't matter. It's all derivative.

madrox 4 days ago

I don't think any professional community is aligned on how to think about ML-generated content yet. We don't know how to apportion rights between the data owner, the model owner, and the end user, and I don't think existing copyright law is ready for it. At least for software, I think the way forward is for the next generation of software licenses to explicitly state whether the code can be used to train ML models and what those models can be used for. Without explicit language, we'll be squabbling over interpretations of fair use.

There's going to be some big cases here. It's going to end up in the Supreme Court sooner or later, and if it were to go there today I think I know what they'd say.

rictic 4 days ago

Copilot very rarely copies code verbatum, and when it does it's very short snippets. When Oracle sued Google over allegedly copying short and fairly trivial snippets of code they were justly derided.

I can't speak to the legal side, but I just don't understand the moral outrage over very occasionally copying such short snippets of code. The key innovations and the actual value that licenses are intended to protect aren't in these short snippets.

And what does copilot bring to the community? Free use by students, free use by open source maintainers, and a huge boost in productivity for a modest fee for professional devs, for a service that no doubt costs a lot to run, even on the margin.

ewalk153 4 days ago

If the portion of code that Copilot lifts is the "heart" of the original work, that would be much less likely to be considered fair use[1], regardless of the length.

> For example, it would probably not be a fair use to copy the opening guitar riff and the words “I can’t get no satisfaction” from the song “Satisfaction.”

I wonder how this could be integrated into the system?

[1] https://fairuse.stanford.edu/overview/fair-use/four-factors/...

stakkur 4 days ago

At every turn, in every instance, for decades, all stories involving Microsoft end in "...and then Microsoft fucked people over." I've witnessed this firsthand since the 80s.

pornel 5 days ago

Tough pill to swallow. Microsoft's actions don't seem fair, but fighting them with copyright could weaken fair use:

https://felixreda.eu/2021/07/github-copilot-is-not-infringin...

There's a good argument that demanding copyright protections on scraped datasets and short snippets is a double-edged sword. It could harm search engines, distribution of news, and non-commercial ML research too.

williamcotton 4 days ago

Should the snippets that Copilot is regurgitating be considered for copyright in the first place?

It seems akin to trying to copyright a certain drum pattern or chord progression.

Also, the history of the GPL, MIT, commercializing lisp machines, Symbolic, infighting, etc… seems a very different context than Copilot so I am having difficulty seeing the systemic problems that tools like this encourage.

There is of course a surface level similarity in that a corporation is profiting from IP in the public domain but the devil is in the details.

powerapple 4 days ago

Why is it a bad thing? You either have people spending time reading code and learn every little thing and produce the same work in days, or have Copilot saves human life time for hours. Coding would be more efficient, it is a win-win for everyone in this industry, right? I know people attach to the code they write, but we all learn from books, and the result is common enough.

  • thfuran 3 days ago

    What's good about producing a bunch of code that no one understands and only probably does the right thing?

sirsinsalot 4 days ago

Jaron Lanier's book "Who Owns the Future?" Is all about AI and compensating those that input in training these very valuable models.

I highly recommend everyone read it.

janosdebugs 4 days ago

It'd be nice to see some proof here. Copyright is not absolute and does not extend, for example, to things that have no creativity in them. There are only so many ways to write a for loop or an if condition. Training an ML model from a large body of code IMHO violates copyright no more than any of us reading code and learning from it, as long as GH Copilot doesn't spit out code that's exactly the same as something already existing.

BiteCode_dev 4 days ago

It is incredible to use though. I pasted the return value of an API call in comment, then started to write a schema class. Codepilot just created the entire class for me. wanted to extract a subset of the data, I typed get_<_name_of_the_subset>(), it wrote the code I would have written.

So even without using someone else code, just the pattern understanding and the production of simple boiler plate code is great.

seydor 5 days ago

Programmers are fine when their creations, pretty much all of tech, resells content that other people wrote for free, but no, not code, that one must be expensive

  • onpensionsterm 5 days ago

    The only one making money here is github. Very few programmers are selling open source code. And programmers are (in)famous for not buying software.

  • anonymoushn 5 days ago

    I also don't think it's acceptable for TurnItIn to monetize content without paying the authors. My opinion about whether students should have their work stolen and monetized by a company doesn't seem to have much impact though.

  • zx8080 5 days ago

    %s/programmers/tech capitalists/g

vbezhenar 4 days ago

I somewhat agree with that. Yesterday I edited some exotic configuration (Kubernetes CSI driver for Cinder) and Copilot suggested me config which looked like someone's config. There were no values, so it was good at filtering them out, but it definitely looked like cleaned part of code which resides in some project.

I don't think that's bad though. Code sharing is good for overall productivity.

Aeolun 4 days ago

> what github / microsoft is counting on here is that open source developers do not have enough collective power to do anything to stop this

I think it much more likely that they count on everyone liking it way too much to give a shit about their MIT code not being attributed correctly.

I certainly don’t. MIT just seems like the most convenient license for people that need licenses (corporations?), so that is what I use.

c01n 4 days ago

MS and Github are thieves, all their code is closed source, yet they sell copyrighted code they don't own. If they told us years ago that our code will be automatically stolen by an "AI", most coders would not have created an account. The innovation here is that they have access to most of the worlds open source code and automated the stealing.

capableweb 5 days ago

If GitHub could guarantee that the code Copilot had ingested was only made with OSS licenses, then I don't see what the problem is.

But as far as I understand, GitHub trained Copilot on any public repository on GitHub, meaning even if it doesn't have a license specified (so the user publishing it still has the copyright to it), then I don't see how it can be OK.

  • galoisgirl 5 days ago

    Here's an example: https://twitter.com/ChrisGr93091552/status/15397316329318031...

    > I checked if it had code I had written at my previous employer that has a license allowing its use only for free games and requiring attaching the license. yeah it does

    • nl 5 days ago

      That's a pretty bad example. He prompted it using the exact function header taken from the code he is complaining about.

      It'd be much more interesting if he setup a function that was doing a similar thing but with different parameter types and names, and a different order of parameters (ie, like a real problem).

      • triknomeister 5 days ago

        Does that matter? A code provided should be provided with the license needed to use the code, otherwise the user is opening themselves up to litigation.

        Hence why I agree with another comment somewhere that Microsoft is banking on software developers not litigating about use of their open source code in closed source projects.

  • saghul 5 days ago

    Even if it was trained with OSS licenses, some of them require proper attribution, which copilot doesn’t do.

    Now, where the threshold is for substantial derivative work in order to require attribution is an interesting question.

  • thelastbender12 5 days ago

    It is hard to see how verifying licenses is a solvable problem, when licensing for code dependencies can be transitive. For ex - if I copy code from a GPL codebase like Linux and create a Github repository with an MIT license.

    • danuker 5 days ago

      You should be able to choose flavors of the model trained only on public-domain code which does not require attribution, for example.

      But that would mean Microsoft acknowledging license violations.

      • thelastbender12 5 days ago

        Sorry, to be clear, I meant even if a Github user asserts their code is public-domain/no-attribution/unlicensed, they could have lifted it off a codebase that doesn't allow it. It would be tricky for Github to establish the code was indeed original and hence their agreement with the user allows them to train their models on it.

        • danuker 5 days ago

          > they could have lifted it off a codebase that doesn't allow it

          Ah. But then someone else is guilty of redistributing code without permission.

          But you're suggesting, GitHub should implement something like ContentID but for code. Which should be cheaper (since code is cheap to analyze, while videos are much more bandwidth-intense). And this would kill two birds with one stone.

  • hooby 5 days ago

    many OSS licenses require attribution

  • redox99 5 days ago

    Maybe when you accepted GitHub ToS you gave them permission for your code to be used for ML training.

    • eloisius 5 days ago

      I can’t say I remember the terms saying anything to the effect of granting Microsoft a perpetual unlimited license in addition to whatever license I package with the code when I signed up. Not doubting it, but I would have expected that to raise some suspicion long before Copilot was around.

      • redox99 4 days ago

        It could be something as innocuous as "you allow your code to be analyzed, processed or otherwise handled by Github software" I suppose, which wouldn't raise suspicion.

eline43 5 days ago

There needs to be an update to either licenses or GitHub (and other) software directly, or even software terms of services, that gives the user an opportunity to opt-out of their data being used to train proprietary AI models.

'I don't agree with having an AI trained on/with my data.'

IMHO, all other problems with copilot stem from this.

zokier 5 days ago

Sure, the concern is valid but I feel like this tweet adds absolutely no substance to the discussion and just repeats the same opinion that was already rehashed to death since copilot originally launched. As such, especially with the tone that the tweet has, I don't expect constructive discussion to raise here.

dgb23 4 days ago

Reading many of the comments here I feel like one important thing is being left out that is not related to legal, but to social issues:

Who is on the side of open source? Where are the big, powerful institutions and companies that deeply care about authors and communities providing free software that so many of us rely on?

andrelaszlo 4 days ago

There are a few reasons why this could be considered ethical. First, open-source code is typically free to use, so the company would not be taking advantage of anyone by using it to train their AI. Second, the company would be providing a service that people are willing to pay for, so they would be generating value for society. Third, the company would be transparent about what they are doing and would not be hiding anything from the public.

...the above was generated by GPT-3 (text-davinci-002). Prompt: Write an argument for why using open-source code to train an AI and then sell the code generating service (without open-sourcing it) is ethical.

The main argument against this is that it takes away from the open-source community that contributed to the development of the code in the first place. By selling a code-generating service without open-sourcing it, the company is profiting from the work of others without contributing back. This is unfair and takes away from the overall open-source ecosystem.

Added two characters to the prompt :P

pabs3 5 days ago

I wonder if FOSS folks could copyleft originally public/leaked but proprietary code using CoPilot.

nl 5 days ago

This isn't how a language model works.

It's SO frustrating that even on HN people still fall for this naive and incorrect analysis. Pasting bits I've said before on this topic:

Language models do not work like this. They can copy content but usually that's for something like the GPL language text.

Generally they work on a character by character basis predicting what is the most likely character to appear next.

This very rarely results in copying text, and almost never rare text.

Mechanically it has learnt both syntax of language and how concepts relate. So when it starts generating it makes sentence that are syntactically valid but also make sense in terms of concepts.

That's really different to just combining bits of sentences, and it gives rise to abilities you wouldn't expect in something just cutting and pasting bits of sentences. For example, few shot learning is mostly driven by its conceptual understanding and can't be done by something with no way to relate concepts.

  • tyingq 4 days ago

    If this were true, then they would have trained it on all of MS's proprietary source code too.

    • nl 4 days ago

      It is true.

      And that doesn't follow at all.

      • tyingq 4 days ago

        There's enough examples of it regurgitating longish verbatim code out there, and not just comments or GPL license text.

        If they are comfortable training it on code that isn't licensed for unrestricted copy/paste, I don't personally understand why they can't train it on their own code that's also not licensed for that.

        Edit: They even added 'q rsqrt,' to their banned word list to squelch an example of long verbatim code passages.

        Basically, it's not that I don't understand your explanation. It's that it does emit long passages of unchanged code in practice, for whatever real-world reason.

olalonde 4 days ago

I'm going to make a bold prediction: no one will ever lose a copyright lawsuit due to usage of Github Copilot generated code. The code snippets it produces are too small or trivial to qualify for copyright infringement.

  • ModernMech 4 days ago

    CoPilot is a new technology, and smallish snippets of code are all it is capable of at this point. Microsoft will surely work to expand its capabilities to produce larger and more complex programs, don’t you think?

lfrigodesouza 4 days ago

It's as the saying go, "when a product is free to use, the real product is actually you". In this case, our code is the product. Just considering now on swapping to another git provider...

mawadev 4 days ago

What stops me from re-uploading copyrighted source, where I remove the notices and push it with an MIT license? If such a data set has been trained with, how do you get it out?

thih9 4 days ago

Is github copilot using private repositories for the learning process?

If yes, how do they mitigate the risk of exposing private data when something is quoted verbatim?

If not, then why are repos with non permissive licenses ok?

oytis 5 days ago

Copilot sells the service of finding the code that makes sense for what you write. Would be better if it could correctly attribute the source(s) though, I hope they will solve this problem at some point.

sirsinsalot 4 days ago

Beware geeks with gifts. This is Microsoft. The question isn't "is it good?" but "Why are Microsoft offering it and how is it undermining everyone else?"

  • dougmwne 4 days ago

    Microsoft will benefit from cheaper and more productive engineers.

    • namose 4 days ago

      If that’s their motive they should stop charging for it. Or, if they need to cover source costs, open source copilot code and allow people to host their own

LeonTheremin 4 days ago

And social media sells ideas other people thought.

Copilot is limited to public code now, but it may easily be trained on non-public code - albeit this probably won't be for sale to the public.

FeepingCreature 5 days ago

All I can think of is Steve Yegge [1]: "They have no right to do this. Open source does not mean the source is somehow 'open'."

My code is on Github so that people can read it, reuse it and learn from it. "The freedom to study how the program works", as the FSF says. If some of the people reading it are machines, why would that matter?

[1] http://steve-yegge.blogspot.com/2010/07/wikileaks-to-leak-50...

  • happymellon 5 days ago

    Because a lot of this code would be put into closed source software, which is against the licence and would prevent people from exercising the right to study how a program works.

    • FeepingCreature 5 days ago

      But I don't care if closed source programmers read my GPL code! The freedom to learn is not copyleft. So long as they put independent effort into their work, they're good in my book. Shared knowledge is a vital commons, and I'm honored if I can contribute to it.

      Maybe this goes back to that debunked paper that claimed that transformers were only remixing input samples?

      • happymellon 4 days ago

        They aren't reading your code. This is a program copy/pasting code without attribution.

        • FeepingCreature 4 days ago

          Again, the paper that said that transformers only copypasted input samples was highly misleading.

          It seems clear to me that Codex has true understanding.

          (Yes, I know that people have gotten secrets to appear in the output by prompting it in clever ways. That this happens doesn't prove that Codex doesn't understand what it's doing, it just shows that Codex doesn't understand everything.)

iLoveOncall 5 days ago

Github Copilot is selling code other people wrote as much as the author of this thread is profiting from words other people invented.

Absolute nonsense.

  • nextaccountic 5 days ago

    The difference is that words aren't copyrighted and doesn't come with an open source license.

presentation 5 days ago

Google just sells content other people wrote.

acuozzo 4 days ago

This is, in part, why I will continue to use the original 4-clause BSD license for the code I write.

AtNightWeCode 5 days ago

Copiliot will be that bandmate that plays a new riff and leave you wondering about where it was borrowed from.

blitz_skull 4 days ago

Man, people really do be angry that the public code they put on a public platform is being used publicly.

Wild.

  • namose 4 days ago

    Wild that people who draw up licenses for their code which up until this point have been reliably enforced expect them to continue to be enforced!

boomer_joe 4 days ago

We need a licence that forbids use in ML and the people willing to sue github for it ASAP.

  • ilikehurdles 4 days ago

    But using it in a GitHub project would be akin to those Facebook comments that demand the company not monetize them.

shahar2k 4 days ago

and Dalle2 sells art other people created

(I'm actually not being sarcastic, I think there needs to be some sort of pipeline for compensating the artists who are used to train these models

fimdomeio 5 days ago

what AI is showing is the fuzzy line between creating and copying. The truth is they are both always present in everything we do, we've just been trying to hide it.

So it should be as simple as if you're using other people's content for your own profit you should properly compensate them.

Or we could just abolish copyright law and assume that everything humans create emanates from culture so its always collectively built and everything should be open source.

Or we just do the same we've been doing. Create even more complex laws trying to define this fuzzy line in a way that companies can keep profiting from it a lot more than individuals.

marstall 4 days ago

most of the code I write is glue sticking together 8 proprietary systems nobody's ever heard of. how is copilot gonna help me with that?

pvaldes 4 days ago

Each day sounding more as Zopilote, it seems.

tiku 5 days ago

I'm using it for a day now and i'm really impressed. It is so aware of stuff in old code, that it is scary. I'm working in an old application with Zend Framework.

whywhywhywhy 5 days ago

Same deal for Dall-e if they ever productize it.

sytelus 4 days ago

Google just sells content other people wrote.

HeavyStorm 4 days ago

So much bullshit my head hurts.

SMAAART 4 days ago

Once again Innovation challenges IP.

honkler 4 days ago

license issues will save many thousand jobs.

amelius 5 days ago

"Good artists copy. Great artists steal."

:)

abdulhaq 5 days ago

That's like saying a plumber just sells parts that other people made

  • WesolyKubeczek 5 days ago

    Except that a plumber buys them first. For money.

  • gtf21 4 days ago

    Which the plumber has bought and paid for and then installs for you, which makes this pretty fundamentally different.

janandonly 5 days ago

Isn't every programmer in history (except the gall who invents her own language and writes all her own code) simply an archeologist for other people's work?

We all Duck/Google for code anyway. Why not admit and make it easier?

  • eline43 5 days ago

    You don't understand the difference between many open source licenses or the concept of crediting open source code authors... it does not mean that the code is free for everyone to just use as they please...

    https://www.gnu.org/licenses/license-list.en.html for a quick intro

    Also, are you okay with other people selling *your* work and *you* getting nothing out of it? Many people are not.

  • pacifika 4 days ago

    Copilot is doing this on an industrial scale. It’s the difference between copying sample code and outsourcing your work to a third party colkectively

danamit 5 days ago

The code Copilot suggest from any given project most of the time is not enough to credit such project, when I look up code in some GitHub repo, and copy it fully or part of it, I do not credit that project.

I do not see Copilot as useful anyway.

Separo 5 days ago

GitHub provides the repo hosting and tools for free on public projects. I'm happy with this deal.

  • jalfresi 5 days ago

    This does raise a point - do we now have to assume that all those services that provide free hosting/access/service to open source projects will be strip-mining the work of the open source community to sell them back to us all? I almost feel stupid believing it was an altruistic move to contribute back to the shoulders of giants they were already standing on...

    • eloisius 5 days ago

      I feel scammed too. At this point it should be obvious, but I’m finally savvy to the fact that every tech company that offers anything free, and you use it to create “your” content, is not your friend and you don’t even own the works you host with them. I feel scammed that GitHub was cool about 10 years ago. It was like the professional/cultural center of gravity in my career. GitHubbers we’re cool people. Everyone cool hosted their site on GitHub Pages. I didn’t want to see a resume; what’s your GitHub? Now I feel stupid for having contributed whatever tiny bit of brains I did to this AI by thinking that I was using the cool, developer-first code website.

    • Separo 4 days ago

      No. You still have the option not to buy Copilot and still use GitHub's services for free on public projects. Or, if you're not comfortable with your open source code being perused by an AI, you can set up your own privately hosted public Git repo pretty easily.

      I honestly don't understand the general outrage at this fair seeming deal to me.

spupe 5 days ago

I disagree. Copilot is selling content-aware code suggestions, which is a result of code that other people wrote in their platform, and which in no way affects the work of these people.

lakomen 5 days ago

I don't understand what's going on there.

I don't use github. Can someone explain what the author means?

Edit: in detail

  • npteljes 5 days ago

    GitHub Copilot is a paid feature, but that's a red herring in this discussion - people are free to monetize free software, neither or the major licenses forbid this.

    GitGub Copilot is an advanced autocomplete / code generation system, based on a machine learning model. The code used for training the model is taken from projects hosted on GitHub. These projects were published under different licenses.

    The main questions are:

    Some of the licenses need something from you if you create a derivative work. Does the Copilot training itself count as creating a derivative work?

    Sometimes the autocomplete basically quotes the original code. Does the original license then apply to the autocompleted / generated code too? How much of verbatim code quoting does it need for the result to be considered a derivative work?

    • kaetemi 4 days ago

      Those instances where people demonstrate verbatim copies, are mostly either well known snippets which have been copied a million times already, or obvious completions of a partial verbatim piece of the supposedly copied code that any coder could extrapolate.

  • niek_pas 5 days ago

    Google “GitHub copilot”

  • lakomen 5 days ago

    Nice, being downvoted for asking questions. Nice asshole culture on HN.

    • martin_a 5 days ago

      Just like with StackOverflow, people are expected to invest some time or amount of work in getting familiar with the topic.

      Your question seemed to lack this kind of work and was probably therefore downvoted.

      I don't think that's so much about "asshole culture" but more like time management, as not everything can be explained to everybody in every topic.

    • tjpnz 5 days ago

      You can ask questions but they can't be low effort and need to add something to the discussion.

skc 5 days ago

I get the feeling this entire debate would have been non-existent had this been a Jetbrains product instead.

The whole thing is just bizarre when the vast majority of developers constantly look at OSS code daily and lift ideas/patterns/snippets from there regularly without once looking at whatever license is attached.

  • foxhill 5 days ago

    > I get the feeling this entire debate would have been non-existent had this been a Jetbrains product instead.

    why so?

    > The whole thing is just bizarre when the vast majority of developers constantly look at OSS code daily and lift ideas/patterns/snippets from there regularly without once looking at whatever license is attached.

    well, yes, copying an idea or pattern is generally.. accepted, to be kosher. copy-pasting too, in small amounts (a function, a type). that said, i would (and have) attribute even a notional similarity when writing something open source.

    i don’t think co-pilot even allows the user to find where the code came from.

  • Luc 5 days ago

    > the vast majority of developers constantly look at OSS code daily and lift ideas/patterns/snippets from there regularly

    Perhaps in your circles, but that's certainly not something I've encountered over a 25 year carreer.

    • skc 5 days ago

      So when you google a problem and it leads you to a code snippet that solves it that just happens to be OSS, you immediately scrub your brain and pretend you never saw it and instead instead come up with your own completely independent solution after the fact?

      • avereveard 5 days ago

        Google usage is outright forbidden for work in institutions that care about intellectual property rights, so the brain scrub issue is just arguing at the wrong level.

        If you're googling solutions around you're already not taking intellectual property seriously enough to care about what happens after you lift ideas around.

        • skc 5 days ago

          Very surprised to hear about this actually.

          Maybe I live in a bubble, but the likes of Google/StackOverflow have been part and parcel of a developers toolbox for many years now.

          And in any case I wonder how that is enforced. Eg, Someone goes home in the evening and visits github, learns a new trick and comes into the office the next day and implements it.

        • anonymoushn 5 days ago

          Can you name these institutions? I am surprised to hear that some institutions would prevent devs from viewing e.g. documentation of the APIs they are using or academic papers about algorithms for computing the multiplicative inverses of 64-bit integers, if they accessed those things via google

          • avereveard 5 days ago

            IBM and another I'm currently under nda

            I think them being also patent farm has a role in it.

            Approved dependencies had api doc linked so no need to Google these.

        • bloat 5 days ago

          This is interesting. Is the internet completely cut off? Do they have internal libraries of documentation for third party stuff they are using (paper? digital?) Do you have any example institutions, or what domain they are working in? Thanks.

          • swader999 5 days ago

            I think it would be for super secure military coding. But business domains? Hardly ever.

            • avereveard 4 days ago

              The issue doesn't solely rest in copyright

              A concern, which I think is legit, is that it is quite easy for someone with a strong presence in search, web advertising, analytics and mobile to puzzle together what a company is investing in based on the aggregated research and web access from known locations

      • teakettle42 5 days ago

        > … and instead instead come up with your own completely independent solution after the fact?

        Yes, I’m not a plagiarist.

        If you’re literally copying and pasting code snippets without attribution, you’re plagiarizing.

        You’re also probably violating the OSS project’s license.

        It’s no different than copying and pasting someone else’s sentence or paragraph into a written paper.

  • goerz 5 days ago

    I am not a lawyer, but my legal intuition / common sense says that “code snippets” are not copyrightable. There’s some sliding scale on when a code snippet would become so non-trivial that a reasonable (!) judge would consider it copyrightable, but nothing Copilot does is anywhere close to that limit, IMO.

    • shakna 5 days ago

      One of the main claims in Google LLC v. Oracle America [0], was based around a 9-line rangeCheck function. Whilst some code can be too simple and small to copyright, programmers and lawyers are probably not going to view snippets the same way. Copilot creates risk.

      [0] https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_...

bborud 5 days ago

Well, this does invite an interesting comparison. If we imagine something like Copilot applied to music I believe the chances of ending up in court would be pretty high. There are a lot of examples of plagiarism lawsuits in popular music and the outcome seems to be entirely random.

One could argue that the information density in chord progressions, bass lines and beats is extremely small. And that any recognizable part of a musical idea that has been "borrowed" would necessarily make up a larger percentage of the complete work than would be the case for a typical application with borrowed snippets.

That's not a bad argument, but it is unsatisfactory because it means that at some point someone has to make a judgement on how much you can borrow.