loumf a day ago

I stopped reading when it said the repos had once been public and were in a Bing cache. That is probably true of a lot of once public, now private, data. Not just in Bing, but also in archive.org, google. It said the repo in question had a “public phase”.

Anyway, there is no indication that your data would be exposed if you had a private repo that was always private.

  • lol768 a day ago

    You've got to treat any repository made public - no matter how briefly - as compromised / downloaded / accessed / viewed.

    I found the article title pretty misleading. There's perhaps an interesting conversation to have around the possibility of wanting to be able to scrub training data after-the-fact and how (or if) that could work - but that's not what the article's headline tried to convey.

  • Goofy_Coyote a day ago

    Same here.

    It’s like saying Wayback machine is exposing private data because it was captured when it was public.

    What an absolute waste of an item on the HN front page.

    • creshal a day ago

      Wayback Machine is aware that this could be a problem, which is why they let website owners expunge sites.

      And what a surprise! So does Bing: https://www.bing.com/webmasters/help/bing-content-removal-to... (As do Google etc.)

      Neither of these expunging mechanisms work if you're not the domain owner, however, so this is just one more reminder that any content you upload to somebody else's website is never fully under your control.

fph a day ago

Probably with enough jailbreaking one could even extract API credentials or private keys from Copilot. There must be some in those repositories.

asmor a day ago

Is it just me or is it painfully obvious that article was written by AI? Or am I being mean to the last real person writing titled bullet points every 3 paragraphs.

neilv a day ago

Wronged parties should sue over this reckless negligence.

  • bpicolo a day ago

    > repositories that were once public but later made private

    Or maybe the post needs a better title

    • creshal a day ago

      I suppose they could sue themselves for accidentally making their own private data public.

      • neilv a day ago

        Or the wronged parties were those who entrusted someone else not to put their data (e.g., closed software, NDA material, proprietary internal info shared with contractor, access control secrets) in a public repo.

  • mouse_ a day ago

    Who's got better lawyers than Microsoft? lol

    • wolfi1 a day ago

      CrowdStrike?