crazygringo a day ago

I wonder why cp on the Mac doesn't create a clone by default, the way duplicating a file in Finder does? And therefore why Python doesn't clone by default as well?

Is there any use case where you need to avoid a clone on the same disk? I see only benefits (less disk usage). Am I missing something? Isn't the clone status just as file system implementation detail?

  • danudey a day ago

    Presumably the intent was to keep the behavior of `cp` consistent with the behavior in previous releases of macOS and other POSIX operating systems (none of which, IIRC, use reflinks by default if available).

    There are circumstances where having a reflink that doesn't take up space could be problematic or result in unexpected behaviour. An example: you have a huge file, e.g. databases or a huge "big data" dataset, maybe it's a terabyte or something, but it's on a disk that only has 500 GB free.

    Now you want to do a nightly backup. With reflinks you `cp` the file and it succeeds despite not having enough space on the disk. Then your system starts up during the day, loads up the data set, and starts making changes; you zero-out the dataset and then start re-importing the data. Now because you're writing new blocks, your re-import fails halfway through because the disk is full.

    Without reflinks, your nightly copy fails and you get an alert about it, and then you go clean it up.

    This is kind of an idiotic, contrived example, but it's an example of the kind of thing that could happen with reflinks that wouldn't happen otherwise.

mikeyla85 a day ago

Be careful with messing under the hood on this one. I’ve used this feature in the finder for years, but recently used fclones to replace all my duplicate files with clones on my Mac, and completely messed up my free space. Months later, some programs see 1TB free and others see almost nothing.

zahlman a day ago

> Although cloned files share data, they’re independent – you can edit one copy without affecting the other (unlike symlinks or hard links). APFS uses a technique called copy-on-write to store the data efficiently on disk – the cloned files continue to share any pieces they have in common.

... So, reflinks?

... It seems so. https://pypi.org/project/reflink/ claims to support this for APFS, specifically by using `clonefile`.