The various scripts I use to back up my home computers using SSH and rsync

225 points by tosh 3 years ago

I find it very, very hard to go wrong with Syncthing (for stuff I truly need replicated, code/photos/text-records) and ZFS + znapzend + rsync.net (automatic snapshots of `/home` and `/var/lib` on servers).

The only thing missing is -> I'd like to stop syncing code with Syncthing and instead build some smarter daemon. The daemon would take a manifest of repositories, each with a mapping of worktrees->branches to be actualized and fsmonitored. The daemon would auto-commit changes on those worktrees into a shadow branch and push/pull it. Ideally this could leverage (the very amazing, you must try it) `jj` for continous committing of the working copy and (in the future, with native jj formart) even handle the likely-never-to-happen conflict scenario. (I'd happily collaborate on a Rust impl and/or donate funds to one.)

Given the number of worktrees I have of some huge repos (nixpkgs, linux, etc) it would likely mark a significant reduction in CPU/disk usage given what Syncthing is having to do now to monitor/rescan as much as I'm asking it to (given it has to dumb-sync .git, syncs gitignored content, etc, etc).

JeremyNT 3 years ago

> Given the number of worktrees I have of some huge repos (nixpkgs, linux, etc) it would likely mark a significant reduction in CPU/disk usage given what Syncthing is having to do now to monitor/rescan as much as I'm asking it to (given it has to dumb-sync .git, syncs gitignored content, etc, etc).
Are you really hitting that much of a resource utilization issue with syncthing though? I use it on lots of small files and git repos and since it uses inotify there's not really much of a problem. I guess the worst case is switching to very different branches frequently, or committing very large (binary?) files where it may need to transfer them twice, but this hasn't been a problem in my own experience.
I'm not sure you could really do a whole lot better than syncthing by being clever, and it strikes me as a lot of effort to optimize for a specific workflow.
Edit: actually, I wonder if you could just exclude the working copies with a clever exclude list in syncthing, such that you'd ONLY grab .git so you wouldn't even need the double transfer/storage. You risk losing uncommitted work I suppose.
- wereallterrrist 3 years ago
  
  inotify has pretty paltry limits. My ~/code is only 40-50GB but there's no way inotify can watch it all.
  Thus, syncthing basically constantly has to rescan. It's not great.
  And yes, rebasing linux+nixpkgs on even an hourly basis is absolutely devastating. lol
killingtime74 3 years ago

For code I just use a self hosted git server
than3 3 years ago

I hate to be the one to point out the obvious, but replication isn't a backup. Its for resiliency just like RAID, the two aren't the same.
- reacharavindh 3 years ago
  
  Replication to another machine that has a COW file system with snapshots is backup though :-)
  We backup our data storage for an entire HPC cluster, about 2 PiB of it to a single machine with a 4 disk shelves running ZFS with snapshots. It works very well. Simple raunchy every night, and snapshotted.
  We use the backup as a sort of Time Machine should we need data from the past that we deleted in the primary. Plus, we don’t need to wait for the tapes to load or anything.. it is pretty fast and intuitive
- jerf 3 years ago
  
  The person you're replying to said "Syncthing ... and ZFS + znapzend + rsync.net" though. You're ignoring the rsync.net part.
  I have something similar; it's Nextcloud + restic to AWS S3, but it's the same principle. You can give people the convenience and human-comprehensibility of sync-based sharing, but also back that up too, for the best of both worlds. Though in my case the odds of me needing "previous versions" of things approach zero and a full sync is fairly close to backup, but, even so I do have a full solution here.
- NelsonMinar 3 years ago
  
  Syncthing has file versioning but I don't know for sure if it's suitable for backup. https://docs.syncthing.net/users/versioning.html
- wereallterrrist 3 years ago
  
  When I mentioned de-duping and append-only logs, I had this in mind. It's hard to imagine implementing a backup system with those two properties that don't include snapshotting nearly by design-necessity.
  (Beyond even the fact that ~/code is also on a ZFS volume that is snapshotted and replicated off-site, which I argue can be used in all of the same important ways any other "backup" is used.)
  Hence the comment! After all this blockchain hoopla and everyone's understanding of how "cool" Git is, we really, really deserve better in our backup tools.
- jrm4 3 years ago
  
  But, it makes things easy. I have e.g. a home computer, a server in the closet thing, a laptop and a work computer all with a shared Syncthing folder.
  So to bolster that other thing, I just have a simple bash script that reminds me every 7 days to make a copy of that folder somewhere else on that machine. It's not precise because I often don't know what machine I will be using, but that creates a natural staggering that I figure should be sufficient of something goes weird and lose something; like I'm likely to have an old copy somewhere?
- whalesalad 3 years ago
  
  What is the actual difference between a backup and replication? If the 1’s and 0’s are replicated to a different host, is that any different than “backing up” (replicating them) to a piece of external media?
  
  jjav 3 years ago
  
  > What is the actual difference between a backup and replication?
  Simplest way to think about it is that a backup must be an immutable snapshot in time. Any changes and deletions which happen after that point in time will never reflect back onto the backup.
  That way, any files you accidentaly delete or corrupt (or other unwanted changes, like ransomware encrypting them for you) can be recovered by going back to the backup.
  Replication is very different, you intentionally want all ongoing changes to replicate to the multiple copies for availability. But it means that unwanted changes or data corruption happily replicates to all the copies so now all of them are corrupt. That's when you reach for the most recent backup.
  That's why you always need to backup and you'll usually want to replicate as well.
  
  chrishas35 3 years ago
  
  When those 1s and 0s are deleted and that delete is replicated (or other catastrophic change, such as ransomware) you presumably don't have the ability to restore if all you're doing is replication. A strategy that layers replication + backup/versioning is the goal.
  
  natebc 3 years ago
  
  I'll add that _usually_ a backup strategy includes generational backups of some kind. That is daily, weekly, monthly, etc to hedge against individually impacted files as mentioned.
  Ideally there is also an offsite and inaccessible from the source component to this strategy. Usually this level of robustness isn't present in a "replication" setup.
  
  than3 3 years ago
  
  Put more simply, backups account for and mitigate the common risks to data during storage while minimizing costs, ransomware is one of those common risks. Its organizational dependent based on costs and available budget so it varies.
  Long term storage usually has some form of Forward Error Correction (FEC) protection schemes (for bitrot), and often backups are segmented which may be a mix of full and iterative, or delta backups (to mitigate cost) with corresponding offline components (for ransomware resiliency), but that too is very dependent on the environment as well as the strategy being used for data minimization.
  > Usually this level of robustness isn't present in a "replication" setup.
  Exactly, and thinking about replication as a backup often also gives those using it a false sense of security in any BC/DR situations.
hk1337 3 years ago

I use Syncthing between Mac, Windows (have included Linux in the mix at one point), and with my Synology NAS. Syncthing is more for my short term backup though. I will either commit it to a repo, save it to a Synology share, or delete it.
*edit* my gitea server saves its backups to synology
ww520 3 years ago

Yes. I just let Syncthing sync among devices, using it for creating copies of the backup. The daily backup scripts do their things and create one backup snapshot, then Syncthing picks up the new backup files and propagate them to multiple devices.
acranox 3 years ago

Sparkleshare does something kind of similar. It uses git as the backend automatically sync directories on a few computers. https://www.sparkleshare.org/
fncivivue7 3 years ago

Sounds like you want Borg
https://borgbackup.readthedocs.io/en/stable/
My two 80% full 1tb laptops and 1tb desktop backup to around 300-400G after dedupe and compression. Currently have around 12tb of backups stored in that 300G.
Incremental backups run in about 5 mins even against the spinning disk's they're stored on.
- 0cf8612b2e1e 3 years ago
  
  Python programmer here, but I actually prefer Restic [0]. While more or less the same experience, the huge selling point to me is that the backup program is a single executable that can be easily stored alongside the backups. I do not want any dependency/environment issues to assert themselves when restoration is required (which is most likely on a virgin, unconfigured system).
  [0] https://restic.net/
  
  SomeoneOnTheWeb 3 years ago
  
  You can also take a look at Kopia (https://kopia.io/).
  I've been using Borg, Restic and Kopia for a long time and Kopia is my personal favorite - very fast, very efficient, runs in the background automatically without having to schedule a CRON or anything like that.
  Only downside is that the backups are made of a HUGE number of files, so when synchronizing it can sometimes take a bit of time to check the ~5k files.
  
  wanderingmind 3 years ago
  
  Highly recommend Kopia that has a nice UI and can work with rclone (so any cloud back end)
  
  klodolph 3 years ago
  
  I’ve been using Kopia, I recommend it.
- wereallterrrist 3 years ago
  
  No, I distinctly don't want borg. It doesn't help or solve anything that Syncthing doesn't do. The obsession with borg and bup are pretty baffling to me. We deserve better in this space. (see: Asuran and another who's name I forget...)
  Critically, I'm specifically referring to code sync that needs to operate at a git-level to get the huge efficiencies I'm thinking of.
  Syncthing, or borg, scanning 8 copies of the Linux kernel is pretty horrific compared to something doing a "git commit && git push" and "git pull --rebase" in the background (over-simplifying the shadow-branch process here for brevity.)
  re: 'we deserve better' -- case in point, see Asuran - there's no real reason that sync and backup have to be distinctly different tools. Given chunking and dedupe and append-logs, we really, really deserve better in this tooling space.
  
  formerly_proven 3 years ago
  
  borg et al and "git commit" work in essentially the same way. Both scan the entire tree for changes using modification timestamps.
  
  dragonwriter 3 years ago
  
  > borg et al and "git commit" work in essentially the same way. Both scan the entire tree for changes using modification timestamps.
  But git commit doesn’t do that. If you want to do that in git, you typically do it before commit with “git add -A”.
- codethief 3 years ago
  
  I don't think GP was talking about backups (which is what Borg is good for) but about synchronization between machines which is another issue entirely.
- _dain_ 3 years ago
  
  They work together. I use syncthing to keep things synchronized across devices, including to an always-on "master" device that has more storage. Then borg runs on the master device to create backups.

anotherevan 3 years ago

I use a Raspberry Pi as my backup orchestrator. It backs up my Linux desktop, my wife's Windows desktop, and a couple of other Linux based devices including itself. Every night, it:

* Mounts the external drive.

* Starts Restic's Rest Server.

* SSH into each machine to be backed up to kick off the script that will backup to the above server.

* Stop Rest Server.

* Rsync the external drive to my office (which is a 25 minute drive away) for off-site protection.

* Unmount the external drive.

* Emails the results of it all to me.

Has been working really well so far.

Notes:

* The RPi has limited SSH access to each machine. The only thing it can really do is start the backup script on the machine.

* The Linux machines are on all the time, but the Windows machine sleeps. So first it sends a wake-on-lan. Using Cygwin for SSH and scripting on Windows. The script on the Windows machine sets the power configuration to not go to sleep during the backup, and restores the setting afterwards. Restic's ability to create a VSS snapshot on Windows is awesome.

* I still need to incorporate my two kid's Windows laptops into the backup somehow. I doubt the wake-on-lan tricks will work reliably with them. I've yet to explore Urbackup which I think I can use to have them backup to my Linux desktop periodically when they are awake.

scubbo 3 years ago

+1 for Restic. I'm only just barely scratching the surface of its use, and it's still just amazing. Here's the article I referenced: https://www.seanh.cc/2022/04/03/restic, though it looks like you're doing a lot more wizardry than that!

PopAlongKid 3 years ago

>I don't use Windows at the moment and don't really mount network drives, either. That might be a good alternative to consider.

Regarding Windows:

I have successfully mirrored a notebook and a desktop[0] (single user) with Windows using robocopy, which is a utility that comes with Windows (used to be part of the Resource Kit but I think it is now in the base product). When I say "mirror" I mean I can use either machine as my current workstation without any loss of data, as long as I run the "sync" script at each switch.

I use "net use" to temporarily mount a few critical drives on the local network, then robocopy does its work, it has maybe 85% of the same functionality of rsync (which I also used extensively when administering corporate servers and workstations). Back in the DOS days, I wrote my own very simple version of the same thing using C, but when robocopy came along I was glad to stop maintaining my own effort.

[0]or two desktops, using removable high-capacity media like Iomega zip drives.

EvanAnderson 3 years ago

Robocopy is very nice but has no delta compression functionality. For things like file server migrations (where I want to preserve ACLs, times, etc) robocopy is my go-to tool.
I've used the cwRsync[0] binary distribution of rsync on Windows for backups. I found it worked very well for simple file backups. I never did get around to trying to combine it with Volume Shadow Copy to make consistent backups of the registry and applications like Microsoft SQL Server. (I wouldn't expect to get a bootable restore from such a backup, though.)
[0] https://www.itefix.net/cwrsync
- rzzzt 3 years ago
  
  I used QtdSync, another frontend backed by a Windows rsync binary. A nice feature was that it supported the "duplicate entire target folder with hard links, then overwrite changes only"-style on NTFS volumes, so I could have lots of browseable point-in-time backup folders without consuming extra disk space: https://www.qtdtools.de/page.php?tool=0&sub=1&lang=en
gary_0 3 years ago

I use MSYS2 on Windows in order to run regular rsync and other such utilities. It's served me very well for years. I also have some bash scripts that I can conveniently run on either Linux or Windows via MSYS2.
- paravz 3 years ago
  
  I switched from Cygwin+rsync(over ssh) to robocopy+samba to speed up backups (up to saturating 1Gbit connection):
  for %i in (C D) do robocopy %i:\ \\backup-server\b-%COMPUTERNAME%\%i /MIR /DCOPY:T /NFL /NDL /R:0 /W:1 /XJ /XD "System Volume Informatiowsn" /XD "$RECYCLE.BIN" /XD "Windows" /XD "Windows.old"

UI_at_80x24 3 years ago

ZFS snapshots + send/receive are an absolute game changer in this regard.

I have my /home in a separate dataset that gets snapshotted every 30 minutes. The snapshots are sent to my primary file-server, and can be picked up by any system on my network. I do a variation of this with my dotfiles similar to STOW but with quicker snapshots.

customizable 3 years ago

ZFS is a game changer for quickly and reliably backing up large multi-terabyte PostgreSQL databases as well. In case anyone is interested, here is our experience with PostgreSQL on ZFS, complete with a short backup script: https://lackofimagination.org/2022/04/our-experience-with-po...
pmarreck 3 years ago

Came here to say this. Can you list your example commands for snapshotting, zfs send, restoring single files or entire snapshots, etc.? (Have you tested it out?) I am actually in the position of doing this (I use zfs on root as of recently and I have a TrueNAS) but am stuck at the bootstrapping problem (I haven't taken a single snapshot yet; presumably the first one is the only big one? and then how do I send incremental snapshots? and then how do I restore these to, say, a new machine? do I remotely mount a snapshot somehow, or zfs recv, or? Do you set up systemd/cron jobs for this?) Also, having auto-snapshotted on Ubuntu in the past, eventually things slowed to a crawl every time I did an apt update... Is this avoidable?
- customizable 3 years ago
  
  Yes, the first snapshot is the big one, the rest are incremental. Restoring a snapshot is just one line really. Something like ;)
  sudo zfs send -cRi db/data@2022-12-08T00-00 db/data@2022-12-09T00-00 | ssh me@backup-server "sudo zfs receive -vF db/data"
GekkePrutser 3 years ago

Zfs send/receive is nice but it does lack the toolchain to easily extract individual files from a backup. It's more of a disaster recovery thing in terms of backup.
- customizable 3 years ago
  
  You can actually extract individual files from a snapshot by using the hidden .zfs directory like: /mnt-point/.zfs/snapshot/snapshot-name
  Another alternative is to create a clone from a snapshot, which also makes the data writable.
  
  GekkePrutser 3 years ago
  
  A snapshot yes but not a zfs send stream which is a single file.

falcolas 3 years ago

So, quick trick with rsync that means you don't have to copy everything and then hardlink:

    --link-dest=DIR          hardlink to files in DIR when unchanged

Basically, you list your previous backup dir as the link-dest directory, and if the file hasn't changed, it will be hardlinked from the previous directory into the current directory. Pretty nice for creating time-machine style backups with one command and no SSH.

Also works a treat with incremental logical backups of databases.

paravz 3 years ago

--link-dest is also used in hrsync, another rsync wrapper: https://github.com/dparoli/hrsync/blob/master/hrsync#L52
amelius 3 years ago

This is good to know, I used an extra "cp -rl" step in my previous scripts.
- rsync 3 years ago
  
  Yes - they accomplish the same thing.
  --link-dest is just an elegant, built-in way to create "hardlink snapshots" the same way that 'cp -al' always did.
  But note:
  A changed file - even the smallest of changes - breaks the link and causes you to consume (size of file) more space cascading through your snapshots. Depending on your file sizes and change frequency this can get rather expensive.
  We now recommend abandoning hardlink snapshots altogether and doing a "dumb mirror" rsync to your rsync.net account - with no retention or versioning - and letting the ZFS snapshots create your retention.
  As opposed to hardlink snapshots, ZFS snapshots diff on a block level, not a file level - so you can change some blocks of a file and not use (that entire file) more space. It can be much more efficient, depending on file sizes.
  The other big benefit is that ZFS snapshots are immutable/read-only so if your backup source is compromised, Mallory can't wipe out all of the offsite backups too.
  
  falcolas 3 years ago
  
  It also reduces the amount of data transferred, making the backup faster.
  > We now recommend
  Who's we?
  
  jwiz 3 years ago
  
  The poster to whom you replied is affiliated with rsync.net, a popular backup service.
- falcolas 3 years ago
  
  One thing of note - the file is not transferred, so backups happen faster and consume less bandwidth (important if your target is not network-local to you).

e1g 3 years ago

Recent versions of rsync support zstd compression, which can improve speed and reduce the load on both sides. You can check if your rsync supports that with "rsync -h | grep zstd" and instruct to use it with "-z --zc=zstd"

However, compression is useful in proportion to how crappy the network is and how compressable the content is (e.g., text files). This repo is about backing up user files to an external SSD with high bandwidth and low latency, and applying compression likely makes the process slower.

greggyb 3 years ago

Compression is useful even with directly attached storage devices. Disk IO is still slower than compression throughput unless you are running very fast storage.
If your workload is IO-bound, then it is quite likely that compression will help. Most people, on their personal machines, would likely see IO performance “improve” with filesystem level compression.

kkfx 3 years ago

Oh, curious, It's the first backup in clojure I've seen :-)

My personal recipe is less sophisticated:

- znapzend on all home machines send to a homeserver regularly (with enough storage), partially replicated between desktops/laptop

- homeserver backup itself via simple incremental zfs send + mbuffer with one snapshot per day (last 2 days), one per week (last 2 w) and one per month (last 1 month) offsite

- manually triggered offline local backup of the homeserver on external USB drives and a physically mirrored home server, normally on weekly basis

Nothing more, nothing less. On any major NixOS release update I rebuild one homeserver and a month or so later the second one. Desktops and homeserver custom iso are built automatically every Sunday and just left there (I know, it simply took to much time checking so...).

Essentially in case of a fault of a machine I still have data, config and ready iso for a quick reinstall. In case of logical faults (like a direct attack who compromise my data AND zfs itself) there is not much protection beside different sync times (I do NOT use all desktops/latptop at once, when they are powered off they remain behind and I have normally plenty of time to see most casual potential attacks.

Long story short for anyone: when you talk about backups talk about how you restore, or your backups will probably be just useless bits a day...

neilv 3 years ago

You can combine this with restricted SSH and server-side software, so that the client being backed up to the server can only add new incremental backups, not delete old ones.

(So, less data loss, in event of a malicious intruder on the client, or some very broken code on the client that gets ahold of the SSH private key.)

yehia2amer 3 years ago

Did anyone tried https://kopia.io/docs/features/

It is Awesome !

It’s very fast usually I struggle with backup tools on windows clients. And it ticks all my needs. deduplication, End-to-End Encryption, incremental Snapshots with error Correction if any, mounting snapshots as a drive and using it normally or to restore specific files/folders, Caching. The only thing that could be better is the GUI but it works.

mekster 3 years ago

Backup tools are nothing until it can prove its reliability which can only be proved with many years of usage.
In that regard, I don't trust anything but Borg and zfs.
- yehia2amer 3 years ago
  
  zfs is not an option with windows clients and even most linux clients. Also finding these set of features is really scarce not sure why ! I am using zfs on my server though!

pmontra 3 years ago

His backup rotation algorithm is very close to what rsnapshot does.

https://rsnapshot.org/

NelsonMinar 3 years ago

I use rsnapshot still! It feels very old fashioned but it works reliably and is easy to understand.
- russdill 3 years ago
  
  You may really like https://github.com/bup/bup if you want something a bit more modern but in the same style
  
  pmontra 3 years ago
  
  > bup stores its data in a git-formatted repository.
  I understand the benefits for deduplication etc. but this is a show stopper for me. I greatly prefer to be able to navigate my backups with cd and ls or the file manager in the GUI and inspect the files directly without having to extract them first. After all I only have to backup a laptop and little else.
- mekster 3 years ago
  
  It's good to keep multiple backups with different implementations local and remote.
  Rsnapshot is hard to break by using very basic principles of file system based files and hard links. If your file system isn't zfs, I think it's a viable backup strategy for local copy while you can use others to take remote backups.

LelouBil 3 years ago

Speaking about backups, I recently set up a back up process for my home server including a recovery plan, and that makes me sleep better at night !

I have Duplicati [0] that does a backup of the data of my many self hosted applications Every day, encrypted and stored in a folder on the server itself.

Only the password manager backup is not encrypted by Duplicati, because it's encrypted using my master password, and it stores all the encryption keys of the other backups.

Then, I have a systemd service to run rclone [1] every day after the backups finished to sync the backup folder towards :

- Backblaze B2

- AWS S3 Glacier Deep Archive

For now I only use the free tier of B2 as I have less than a GB to backup, but that's because I haven't installed next cloud yet !

However, I still like using S3 because I am paying for it (even though deep Archive is very cheap) and I'm pretty sure if something happens with my account, the fact that I'm a paying customer will prevent AWS from unilaterally removing my data (I have seen posts about google accounts being closed without any recourse, I hope I'm protected of that with AWS)

Right now I only have CalDav/CardDav, my password manager and my configs being backed up, but I plan to use Syncthing to also backup other devices towards the home server, to fit inside what I already configured.

If anyone has advice on what I did/did not do/could have done better please tell me !

[0] https://www.duplicati.com/

[1] https://rclone.org/

smm11 3 years ago

I gave up on this at home long ago, and just use Onedrive for everything. I don't even have "local" files. My stuff is there, and in the event my computer won't start up I lose what's open in the browser. I can handle that.

At work I use Windows backup to write to empty SMB-mounted drives nightly, then write those daily to another drive on an offline Fedora box.

My super critical files are on an encrypted SD card I sometimes put in my phone when cellular connection is off, and this is periodically backed up to Glacier. The phone (Galaxy) runs Dex and can be my computer when needed to work with these files.

proactivesvcs 3 years ago

If one uses software meant for backups, like restic, there are so many advantages. Independent snapshots, deduplication, compression, encryption, proper methods to verify the backup integrity and forgetting snapshots according to a more structured policy. Mount and read any backup by host or snapshot, multi-platform, single binary and one can even run its rest-server on the destination to allow for append-only backups. The importance of using the right tool for the job, for something as crucial as backup, cannot be understated.

litoE 3 years ago

All my backups go, via rsync, to a dedicated backup server running Linux with a large hard disk. But I still lose sleep: what if someone hacks into my home network and encrypts the file systems, including the backup server? Other than taking the backup server offline, I don't see how I can protect myself from a full-blown intrusion. Any ideas?

jerezzprime 3 years ago

What about more copies? Have a copy or two in cloud storage, across providers. This protects against other failure modes too, like a house fire or theft.
greggyb 3 years ago

ZFS snapshots are immutable, rendering them quite resilient to encryption attacks. This may alleviate some of your concern.
saltcured 3 years ago

There's no perfect answer, since different approaches to this will introduce more complexity and inconvenience at the same time they block some of these threats. You need to consider which kinds of loss/disaster you are trying to mitigate. An overly complex solution introduces new kinds of failure you didn't have before.
As others mention, backup needs more than replication. You recover from a ransomware attack or other data-destruction event by using point-in-time recovery to restore good data that was backed up prior to the event. You need a sufficient retention period for older backups depending on how long it might take you to recognize a data loss event and perform recovery. A mere replica is useless since it does not retain those older copies. With retention, your worry is how to prevent the compromised machines from damaging the older time points in the backup archive.
The traditional method was offline tape backups, so the earlier time points are physically secure. They can only be destroyed if someone goes to the storage and tampers with the tapes. There is no way for the compromised system to automatically access earlier backups. You cannot automate this because that likely makes it an online archive again. A similar technique in a personal setting might be backing up to removable flash drives and physically rotating these to have offline drives. But, the inconvenience means you lose protection if you forget to perform the periodic physical rituals.
With the sort of rsync over ssh mechanism you are describing, one way to reduce the risk a little bit is to make a highly trusted and secured server and _pull_ backups from specific machines instead of _pushing_. This is under the assumption that your desktops and whatnot are more likely to be hacked and subverted. Have a keypair on the server that is authorized to connect and pull data from the more vulnerable machines. The various machines do not get a key authorized to connect to the server and manipulate storage. However, this depends on a belief that the rsync+ssh protocol is secure against a compromised peer. I'm not sure if this is really true over the long term.
A modern approach is to try to use an object store like S3 with careful setup of data retention policies and/or access policies. If you can trust the operating model, you can give an automated backup tool the permission to write new snapshots without being allowed to delete or modify older snapshots. The restic tool mentioned elsewhere has been designed with this in mind. It effectively builds a content-addressable store of file content (for deduplication) and snapshots as a description of how to compose the contents into a full backup. Building a new snapshot is adding new content objects and snapshot objects to the archive. This process does not need permission to delete or replace existing objects in the archive. Other management tools would need higher privilege to do cleanup maintenance of the archive, e.g. to delete older snapshots or garbage collect when some of the archived content is no longer used by any of the snapshots.
The new risk with these approaches like restic on s3 or some ZFS snapshot archive with deduplicative storage is that the tooling itself could fail and prevent you from reconstructing your snapshot during recovery. It is significantly more complex than a traditional file system or tape archive. But, it provides a much more convenient abstraction if you can trust it. A very risk-averse and resource rich operator might use redundant backup methods with different architectures, so that there is a backup for when their backup system fails!

dawnerd 3 years ago

I’ve been using borg + rsync to a google drive and s3. Works great. Used it a few weeks ago for recovery and it went smoothly.

greensoap 3 years ago

I recommend Backuppc for these requirements. Pooling, no sw install required, uses rsync and dedupes across clients and uses client side hashing to avoid sending files already in the pool.

https://backuppc.github.io/backuppc/index.html

ndsipa_pomu 3 years ago

I'm using BackupPC https://backuppc.github.io/backuppc/ to do these kinds of backups. It does all the deduplication so the total storage is smaller than you'd expect for multiple machines with lots of identical files.

pjdesno 3 years ago

I've got a solution that I've used to back up machines for my group, but never did the last 10% to make it something plug-and-play for other folks: https://github.com/pjd-nu/s3-backup

Full and incremental backups of a directory tree to S3 objects, one per backup, and access to existing backups via FUSE mount. With a bit more scripting (mostly automount) and maybe shifting some cached data from RAM to the local file system it should be fairly comparable to Apple Time Machine - not designed to restore your disk as much as to be able to access its contents at different points in time.

If you're interested in it, feel free to drop me a note - my email is in my Github profile I think.

arichard123 3 years ago

I'm doing something similar but running a zfs pool off a usb dock and using zfs snapshot instead of hardlinks. Usb is slow but it's still faster than my network, so not the bottleneck.

photochemsyn 3 years ago

I've been using command-line git for data backup to a RPi over SSH, once it's set up it's pretty easy to stay on top of, and then everyone once in a while rsync both the local storage and the RPi to separate USB drives. Also, every 3-6 months or so, rsync everything to a new USB drive and set it aside so that something like a system-wide ransomware attack doesn't corrupt all the backups.

blindriver 3 years ago

I use Synology to back everything up, and then from there I use Hyperbackup to backup to 2 external hard drives every week. When the hard drives get full, I buy a new one that is larger and I put the old one into my closet and date it.

Now that you reminded me, it might be best to buy a new larger hard drive if there are any pre-Christmas sales.

kevstev 3 years ago

Have you looked into backing up into the cloud? I used to do this way back in the day, but by using AWS I get legit offsite storage. Its really cheap if you use glacier, and I was actually looking this week, and there is now an even cheaper option called Deep Archive. It costs me about $2 a month to store my stuff there. I just back up the irreplaceable things- my photos, documents, etc. All the other stuff is backed up on TPB or github for me.
- blindriver 3 years ago
  
  I don't trust backing up to the cloud, I just do everything on site and hope there's nothing catastrophic!

armoredkitten 3 years ago

For anyone using btrfs on their system, I heartily recommend btrbk, which has served me very well for making incremental backups with a customizable retention period: https://github.com/digint/btrbk

nickersonm 3 years ago

I highly recommend this as well, although I just use it for managing snapshots on my NAS.
For backup I use hourly & daily kopia backups that are then rcloned to an external drive and Backblaze.

Whatarethese 3 years ago

I use rsync to backup my iCloud Photos from my local server to a NAS at my parents house. Works great.

aborsy 3 years ago

ZFS send receive is perfect, except there is almost no ZFS cloud storage on the received side. You have to set up a ZFS server offsite somewhere, like in a friend’s house.

Restic is darn good too! It has integration with many cloud storage providers.

minideezel 3 years ago

I've heard of zfs.rent but I haven't used it to provide a review.
Reddit also says rsync.net will accept a zfs send.
kim0 3 years ago

I always wondered about this! If anyone wants me to build this, let me know! ZFS encryption these days is good for data privacy (i.e I wouldn't be able to see your data!)

ysopex 3 years ago

https://github.com/bcpierce00/unison

I use this to keep a few machines synced up. Including a machine that does proper daily backups.

ranting-moth 3 years ago

I used to do similar things until I met Borg Backup. I highly recommend it.

emmelaich 3 years ago

> They're often sleeping, with the lid down...

You should be able to wake them up remotely with wake-on-lan.

alchemist1e9 3 years ago

Let’s not forget zbackup. Excellent useful low level tool.

europeanguy 3 years ago

This looks like work. Just get synching and stop complicating your life.

symlinkk 3 years ago

Seriously, why reinvent the wheel?