loumf 3 days ago

I stopped reading when it said the repos had once been public and were in a Bing cache. That is probably true of a lot of once public, now private, data. Not just in Bing, but also in archive.org, google. It said the repo in question had a “public phase”.

Anyway, there is no indication that your data would be exposed if you had a private repo that was always private.

  • lol768 3 days ago

    You've got to treat any repository made public - no matter how briefly - as compromised / downloaded / accessed / viewed.

    I found the article title pretty misleading. There's perhaps an interesting conversation to have around the possibility of wanting to be able to scrub training data after-the-fact and how (or if) that could work - but that's not what the article's headline tried to convey.

  • Goofy_Coyote 3 days ago

    Same here.

    It’s like saying Wayback machine is exposing private data because it was captured when it was public.

    What an absolute waste of an item on the HN front page.

    • creshal 3 days ago

      Wayback Machine is aware that this could be a problem, which is why they let website owners expunge sites.

      And what a surprise! So does Bing: https://www.bing.com/webmasters/help/bing-content-removal-to... (As do Google etc.)

      Neither of these expunging mechanisms work if you're not the domain owner, however, so this is just one more reminder that any content you upload to somebody else's website is never fully under your control.

fph 3 days ago

Probably with enough jailbreaking one could even extract API credentials or private keys from Copilot. There must be some in those repositories.

asmor 3 days ago

Is it just me or is it painfully obvious that article was written by AI? Or am I being mean to the last real person writing titled bullet points every 3 paragraphs.

neilv 3 days ago

Wronged parties should sue over this reckless negligence.

  • bpicolo 3 days ago

    > repositories that were once public but later made private

    Or maybe the post needs a better title

    • creshal 3 days ago

      I suppose they could sue themselves for accidentally making their own private data public.

      • neilv 3 days ago

        Or the wronged parties were those who entrusted someone else not to put their data (e.g., closed software, NDA material, proprietary internal info shared with contractor, access control secrets) in a public repo.

  • mouse_ 3 days ago

    Who's got better lawyers than Microsoft? lol

    • wolfi1 3 days ago

      CrowdStrike?