loumf 10 months ago

I stopped reading when it said the repos had once been public and were in a Bing cache. That is probably true of a lot of once public, now private, data. Not just in Bing, but also in archive.org, google. It said the repo in question had a “public phase”.

Anyway, there is no indication that your data would be exposed if you had a private repo that was always private.

  • lol768 10 months ago

    You've got to treat any repository made public - no matter how briefly - as compromised / downloaded / accessed / viewed.

    I found the article title pretty misleading. There's perhaps an interesting conversation to have around the possibility of wanting to be able to scrub training data after-the-fact and how (or if) that could work - but that's not what the article's headline tried to convey.

  • Goofy_Coyote 10 months ago

    Same here.

    It’s like saying Wayback machine is exposing private data because it was captured when it was public.

    What an absolute waste of an item on the HN front page.

    • creshal 10 months ago

      Wayback Machine is aware that this could be a problem, which is why they let website owners expunge sites.

      And what a surprise! So does Bing: https://www.bing.com/webmasters/help/bing-content-removal-to... (As do Google etc.)

      Neither of these expunging mechanisms work if you're not the domain owner, however, so this is just one more reminder that any content you upload to somebody else's website is never fully under your control.

fph 10 months ago

Probably with enough jailbreaking one could even extract API credentials or private keys from Copilot. There must be some in those repositories.

asmor 10 months ago

Is it just me or is it painfully obvious that article was written by AI? Or am I being mean to the last real person writing titled bullet points every 3 paragraphs.

neilv 10 months ago

Wronged parties should sue over this reckless negligence.

  • bpicolo 10 months ago

    > repositories that were once public but later made private

    Or maybe the post needs a better title

    • creshal 10 months ago

      I suppose they could sue themselves for accidentally making their own private data public.

      • neilv 10 months ago

        Or the wronged parties were those who entrusted someone else not to put their data (e.g., closed software, NDA material, proprietary internal info shared with contractor, access control secrets) in a public repo.

  • mouse_ 10 months ago

    Who's got better lawyers than Microsoft? lol

    • wolfi1 10 months ago

      CrowdStrike?