Ask HN: What's the 2025 stack for a self-hosted photo library with local AI?
First of all, this is purely a personal learning project for me, aiming to combine three of my passions: photography, software engineering, and my family memories. I have a large collection of family photos and want to build an interactive experience to explore them, ala Google or Apple Photo features.
My goal is to create a system with smart search capabilities, and one of the most important requirements is that it must run entirely on my local hardware. Privacy is key, but the main driver is the challenge and joy of building it myself (an obviously learn).
The key features I'm aiming for are:
Automatic identification and tagging of family members (local face recognition).
Generation of descriptive captions for each photo.
Natural language search (e.g., "Show me photos of us at the beach in Luquillo from last summer").
I've already prompted AI tools for a high-level project plan, and they provided a solid blueprint (eg, Ollama with LLaVA, a vector DB like ChromaDB, you know it). Now, I'm highly interested in the real-world human experience. I'm looking for advice, learning stories, and the little details that only come from building something similar.
What tools, models, and best practices would you recommend for a project like this in 2025? Specifically, I'm curious about combining structured metadata (EXIF), face recognition data, and semantic vector search into a single, cohesive application.
Any and all advice would be deeply appreciated. Thanks!
I think Immich checks a lot of these
https://immich.app/
Immich is what I'm using right now. I'm running it in a Docker container on my Synology. It was very advantageous to spin up another docker container on my laptop to do the face recognition work because the Synology was going to take forever on it.
We no longer are auto uploading to Google or Apple.
So far, I really like it. I haven't quite gone 100%, as we're still uploading with Synology's photo app, but Immich provides a much more refined, featured interface.
If you want a solid "just upload the photos" experience, PhotoSync on iOS is really great.
I think you can use Immich to just look at a folder and not use the backup from phone bits.
I tried PhotoSync, but it feels like a really misleading name. When I delete photos from my phone, they don't delete on the sync destination, so it's not actually a "sync".
So my photo storage on my home server is getting filled with a bunch of useless images that I only have on my phone temporarily and that I end up deleting shortly after.
ACK. The best part is the one-time-pay option to unlock background sync with many different triggers which can be combined - mine 03:00 am with charger connected in my WLAN. Love the software.
May I ask: why not use Synology's own photo stack? The web UI is pretty good, the iPhone app is great, it runs locally without depending on Synology servers, and does have face recognition and all other features.
I didn’t want to be attached to the Synology system or hardware anymore. Synology Photos is great (and we’re still using it for the upload atm), but Immich lets me control the whole thing, top to bottom.
I’m running a DS1813+. It’s stopped getting new feature updates. This approach lets me keep the storage running while migrating away the server components.
Have you tried Immich? It is extremely polished and has every feature you mentioned, along with being open source with tons of community energy and no lock in.
To avoid the same debacle that already happened once with Video Station?
> We no longer are auto uploading to Google or Apple.
May I ask why? Just curious as the main reason I use Immich is for the auto upload
Edit: Ugh. Can’t read. I somehow read don’t auto upload to Immich.
because you don't want your data being held by Google or Apple?
Self hosting and owning your own data
This. It's a fascinating project, it is hard to believe how can an FLOSS project be so high quality. In my book it's on the level of Postgres (although it's a smaller project, probably).
Their frontend is amazing, their apps are not as performant, and the backend is (IMHO) the worst of them all.
No hate here, I'm really grateful for what they've achieved so far, but I think there's a lot of room for improvement (e.g: proper R/W query split, native S3 integration, faster endpoints, ...). I already mentioned it in their channel (they're a really welcoming community!) and I'm working on an alternative drop-in replacement backend (written in Go) [1] that will hopefully bring all the needed improvements.
TL;DR: It's definitely good, especially for an open-source project, and the team is very dedicated - but it's definitely not Postgres-good
[1]: https://github.com/denysvitali/immich-go-backend
Why the focus on S3 for a self-hosted app? Anyway kudos for the effort, I'm not experiencing performance issues in my locally self-hosted Immich installation but more performant software is always welcome.
S3 compatible means one can point it at any storage that talks S3, which is a lot more flexible than POSIX or NFS.
I have and love my self-hosted immich install. If self-hosted could also use S3 storage, that allows me to use Garage (https://git.deuxfleurs.fr/Deuxfleurs/garage) , which also lets me play games with growable/redundant storage on a pile of second-hand hard drives. IIRC it can only use a mounted block device at the moment, (unless there is a nfs-exposed s3 translator ....)
A lot of existing tooling supports the s3 protocol, so it would simplify the storage picture (no pun intended).
I'm wondering the same thing. He had me until he said "S3".
Likely means S3 compatibility so it can be used with anything, be it a cloud provider or a locally hosted solution like minio
S3-compatible storage. In my case, Backblaze B2. The idea is to make the backend compatible with rclone, so that one can pick whatever storage they want (including B2 / S3 and others)
I backup my immich photos in B2 with rclone but I prefer having it as a separate process (also, the backup is append-only). I don't need "hyperscale", and storing directly on S3/B2/remotely breaks a bit the 3-2-1 rule I want to follow.
On B2 (and S3 storage in general) you can set a retention policy for what happens after you delete an object (e.g: object lock with persistance for at least 30 days). Of course this is not a substitute for a backup - but it's better than discovering that you deleted your whole 1TB library when it's too late
Looking at the world around me, so much of it is driven by open source. In fact, I can't name a single piece of electronics around me that isn't using it.
Most tend to be backend only or much lower level. Open source projects with complex UIs and mobile apps is pretty rare I think
I would find that argument plausible if the comment I replied to didn't mention Postgres as the bar.
Again, Postgres is lower level software
Apologies, misread.
Been running immich on my home server for about a year now.
Near zero maintenance stack, incredibly easy to update, the client mobile apps even notify you (unobtrusively) when your server has an update available. The UI is just so polished & features so stable it's hard to believe it's open source.
This seems in stark contrast to others complaining enough about breaking updates that I haven't bothered to try it until it is deemed "stable".
Is it really that stable and flawless in terms of updates?
Because I'm sat here with ZFS, snapshotting and replication configured and wondering why people scare others off of it when the tools to mitigate issues are all free and should be used anyway as part of a bog-standard self-hosted stack.
I've only been running it for about a year (August last year) & from skimming those comments I get the impression I got in at the right time - there's a sense that they've improved stability a lot lately compared to what it was like & it may still be burdened with the fallout of reputational damage from that period.
I also perform all my updates manually - it's fully automated: a simple script that runs in seconds across my entire home server - but I don't have it on any schedule so I'm not doing anything blind. That at least affords me the luxury of being present if/when anything breaks (though for Immich that has not occurred yet).
I've also been running it for a year or two now. There used to be a lot more "breaking" releases but that's slowed way down recently as they approach a "stable" release. As long as you don't use Watchtower or other tools that blindly update containers immediately, you're all set. When there are breaking changes, they are extremely clearly marked in the release notes with migration steps included. So as long as you read those you're all set
Been running Immich for a couple years now and it has been awesome. There are a few rough edges but I’m sure most of them will be smoothed out by the first stable release.
Haven't had great results with the AI portion though, even with the recommended model. Embeddings seem really poor, and has lots of misses and false positives.
Given how good the new multimodal models are, I've been thinking it would be much better to just have a multimodal model describe the image, and let the searching be done by the already included melleisearch.
That said, due to reasons I haven't had time to mess with it past couple of months, so perhaps something drastic has changed.
I feel this is the only answer needed. It's also very average-person friendly.
I currently use photoprism, but it's moving rather slowly. Facial recognition misses a lot of faces, the automatic clustering works fine at first but once you tagged a few thousand faces the implementation grinds to a halt and the background worker runs for hours pegging single cpu core.
The dev is really reluctant of accepting external contributions, which has driven away a lot of curious folks willing to contribute.
Immich seems to be the other extreme. Moving really fast with a lot of contributors, but stuff occasionally breaks, the setup is fiddly, but the Ai features are 100x more powerful. I just don't like the ui as much as photoprism. I with there was some kind of blend of the two, on a middle ground of their dev philosophies.
While Immich development release versions every 2-3 weeks on average, and a breaking one every 4-6 months, they are approaching the stable release, so the pace should also down a bit. The setup to be honest is pretty standard IMO.
This may not interest you, but Ente checks most of these boxes for me. It has face recognition and AI-based object search out of the box, and you can self-host their open-source server without any restrictions. The models they used might be useful for your project.
Ente is a tremendous proposal. I don't know why I hadn't heard of it before, but I don't think it meets what I'm looking for. But the fact that the software is completely open is impressive.
I currently use Ente.io as a secondary photo syncing service in addition to Google Photos.
While I really like it — snappy and encrypted — I was surprised by how much the missing Ultra HDR implementation affects me. Photos are currently uploaded with brightness information but not displayed with it. Therefore, my photos look great in Google Photos but far less vivid in Ente.
For what it's worth, I found a discussion about Ultra HDR. It doesn't seem to be a priority right now, though: https://github.com/ente-io/ente/discussions/779
Their pricing page doesn't say anything as far as I can find but do you still pay pay Ente if you self host the server as well as the photos ("S3-compatible object storage")?
> do you still pay pay Ente if you self host the server as well as the photos ("S3-compatible object storage")?
No. (I self-host Ente and use their published ios app.)
The Ente self-hosting proposition seems strange. Why would I want to e2e encrypt my photos that I self-host? Sounds like it will only make life more difficult.
1. "Self-hosted" doesn't always mean "on your own hardware." Some people rent VPSes. This helps keep their data safe.
2. The software is provided without modification; I think it would be stranger to remove the encryption.
> Some people rent VPSes. This helps keep their data safe.
This is exactly how I self-host Ente and it has been great.
Machine leaning for image detection has worked really well for me, especially facial recognition for family members (easy to find that photo to share).
I have the client on my Android mobile, Fire tablet (via F-Droid), and my Windows laptop.
My initial motivation was to replace "cloud" storage for getting photos copied off the phone as soon as possible.
TB-scale VPSes are not economical vs a home NAS. I see how that can be useful for smaller collections, though.
Because you want to access your photos remotely, or give access to more people to certain albums. If the point is to just store them locally and no remote access is needed, a hard drive would probably be enough.
That's why you need a server. e2ee does not help with any of that.
If there's a server involved, there's no reason not to have sensitive files and information end-to-end encrypted, whether self-hosting or not.
You do want to have things encrypted in transit and at rest. e2ee means server admins (I) cannot access the user's (mine) photos.
The server admin can still access their own photos via the client. They wouldn't be able to access the photos of other users.
edit: To explain further why it's almost always desirable:
You guarantee that you and your users' information is safe if the server is compromised, if an admin goes rogue, or if local bodies of power request their information from you.
The information can't be sent to third-parties by design.
Any operations / transformations that need to be applied to the information will have to either be done via homomorphic encryption or on the client-side (which is much more likely to be open source / easy-to-deobfuscate compared to blackbox server code).
I understand what e2ee is, thank you. I just don't think it’s justified for self-hosted photo servers.
E. g., “Any operations / transformations” includes facial recognition, CLIP embeddings, &c; you want to run this on the server, overnight, and to be able to re-run at a later date when new models become available. Under e2ee, that’s a round-trip through a client device at every model update. So that’s a significant downside, for no important upsides in the case when you and your family are the only users.
I was explaining why e2ee has important upsides, not how e2ee works. With Ente (and I think Immich as well), facial recognition and generating new CLIP embeddings are done on-device[0], usually right when the photo is taken / before they're uploaded to the server.
[0] https://ente.io/blog/image-search-with-clip-ggml/
Immich does it on the server.
What happens if there’s a new, better model? You’d need to re-download, decrypt, and run inference on all your past media, which is in terabytes for many.
I understand the benefit of e2ee in a situation where there is no trust between user and admin. In personal self-hosting, that’s the same person (or family), and the upsides are not as relevant. The downsides (possibility of data loss for, e. g., kids who are not very good with passwords/keys; difficulties with updating models / thumbs; …) remain important, and outweigh the benefits, even assuming the e2ee is implemented well.
You do you, but the trust is beyond just admin and users. And family photos are treated as treasures. Data loss is a fair point, but if you're self-hosting a photos app I imagine server/db backups are part of your routine. Account recovery is all that's needed to recover lost photos from there. Well, unless your VPS is compromised in a manner of data loss for longer than you wished before your backups ran, in which case it's still better that such sensitive info was e2ee'd.
edit: also feel like I'm echoing the classic dropbox comment, but self-hosting in a sane and secure manner is harder than it's made out to be. It needs to be taken seriously.
You may want to self-host for your family or close friends while guaranteeing them privacy.
I'd prefer to guarantee they don't lose access, despite their key management practices.
e2ee makes it easier to sell their hosted version, and there's probably not enough incentive to justify the additional overhead of having an unencrypted option.
Also, my house is less secure than commercial data centers, so e2ee gives me greater peace of mind about data safety.
> Also, my house is less secure than commercial data centers, so e2ee gives me greater peace of mind about data safety.
I think you overestimate security of data centers.
At rest, you use full-disk encryption anyway, so the extra layer just makes things harder.
I have been building something like this but for personal use.
As of now, I use SentenceTransformer model to chunk files, blip for captioning (“Family vacation in Banff, February 2025”)) and mtcnn with InsightFace for face detection. My index stores captions, face embeddings, and EXIF metadata (date, GPS) for queries like “show photos of us in Banff last winter.” I’m working on integrating ChromaDB for faster searches.
Eventually, I aim to store indexes as:
{
}I also built an UI (like Spotlight Search) to search through these indexes.
Code (in progress): https://github.com/neberej/smart-search
I don't know about the photo-management aspects. However, I've had very good experiences running gemma3 (4b and 12b) locally via ollama
I've used gemma to process pictures and get descriptions and also to respond questions about the pictures (eg. is there a bicycle in the picture?). Haven't tried it for face recognition, but if you already have identified someone in one photo, it can probably tell you if the person in that photo is also in another photo
Just one caveat, if you are processing thousands of pictures, it will take a while to process them all (depending on your hardware and picture size). You could also try creating a processing pipeline, first extracting faces or bounding boxes of the faces with something like opencv, and then passing those to gemma3
Please post repo link if you ever decide to open source
Thanks nico for sharing your experience! That's really helpful. The idea of using OpenCV to create a processing pipeline for face detection before passing it to Gemma is brilliant I hadn't thought of that. I'll definitely look into using gemma with ollama.
And for sure, if I get this to a point where it's open-source, I'll post the link here!
I think a really valuable feature in a photo library app would be something that can identify sets of very similar or identical photos and decide which one is the "best" and offer to discard the rest.
I must be wasting so much storage on the 4 photos I took in a row of the family pose, or derivatives that got shared on whatsapp and then stored back to my gallery, and so on, and I know I'm not the only one.
Both Immich and PhotoPrism do have this feature: https://immich.app/docs/administration/system-settings#dupli... https://docs.photoprism.app/user-guide/library/duplicates/
Yeah. Lots of “culling” software out there but agree it would be a good feature for standard photo organisation apps
https://imagen-ai.com/
https://aftershoot.com/
It's not self-hosted, but https://ente.io/ is an independent commercial solution with E2E encrypted cloud storage and local AI (EDIT: apparently you can also self-host)
You can, in fact, self host it.
https://help.ente.io/self-hosting/
I've been running Nextcloud in Docker with the Recognize and Memories apps for about a year and half now. It's in an off-lease refurbished Dell Precision tower from 2018.
I'm using docker compose to include some supporting containers like go-vod (for hardware transcoding), another nextcloud instance to handle push notifications to the clients, and redis (for caching). I can share some more details, foibles and pitfalls if you'd like.
I initiated a rescan last week, which stacks background jobs in a queue that gets called by cron 2 or 3 times a day. Recognize has been cranking through 10k-20k photos per day, with good results.
I've installed a desktop client on my dad's laptop so he can dump all of the family hard drives we've accumulated over the years. The client does a good job of clearing up disk space after uploading, which is a huge advantage in my setup. My dad has used the OneDrive client before, so he was able to pick up this process very quickly.
Nextcloud also has a decent mobile client that can auto-upload photos and videos, which I recently used to help my mother-in-law upload media from her 7-year-old iPhone.
I run a pretty similar configuration on a pi 4 mounted to an external hard drive which I offload to other hard drives from time to time. The mobile app auto sync specific folders when my phone is connected at the home network. It's not flying performance wise but I mainly need a backup solution.
Gonna check the apps that you mentioned. Feel free to share more details of your set up. Why are you running 2 instances? Edit: I see, probably for the memories app.
Memories and Recognize work fine with the base Nextcloud docker image. My host has a GPU so I use go-vod to leverage hardware transcoding. The base NC docker image can't access Nvidia cards (probably other GPUs as well). I could script in a way to do this but would need to run it after each update. Recognize runs fine on my CPU so I haven't explored this yet.
I have an OpenMediaVault VM with a 10tb volume in the network that runs the S3 plugin (Minio-based) which is connected through Nextcloud's external storage feature (I want to migrate to Garage soon). I believe notify_push helps desktop clients cut down on the chatter when querying the external storage folder. Limiting the users that can access this also helps.
I was having issues getting the notify_push app [1] to work in the container with my reverse-proxy. I found some similar setups that did this [2], so I added another nextcloud container to the docker-compose yaml like so:
[1] - https://apps.nextcloud.com/apps/notify_push[2] - https://help.nextcloud.com/t/docker-caddy-fpm-notify-push-ca...
Who's your host?
selfhosted at home.
I'm using OPNsense as the main firewall/router, with the HAProxy plugin acting as reverse-proxy. Cloudflare DNS proxies my home IP address and keeps it hidden from the public, and the DDNS plugin in OPNsense updates the A record in CF when my ISP changes my public IP address every few months.
Do you know if there's any way to `Recognize this image now` via the GUI? Whilst twiddling my thumbs for a few decades of photos to import, I go through what's there and occasionally a family member will point and tell me who a pic is of, but can't seem to immediately prioritise or `Recognize` the specific pic so I can add a name to the face.
No way to do it straight from the GUI. If you get the OCC web terminal plugin, you could use a command to get recognize to scan for new files. Properly configured notify_push and cron jobs should get Recognize going within 10-15 minutes after a new file upload, but it depends on what else is in the queue and the server's processing power. The initial runs need to finish before any of this is relevant, though.
Once you get everything ingested and the initial classifications and clustering done, the process runs pretty quickly as you upload new photos.
i swear the single best feature for me would be:
take my photo catalog stored in google photos, apple pictures, Onedrive, Amazon photos. collate into a single store, dedupe. Then build a proper timeline and geo/map view for all the photos.
Take a look at something like rclone and it immediately becomes clear that the photo app vendors you listed have no interest in allowing their users to easily access their data programmatically from their services in any meaningful way.
Example: https://rclone.org/googlephotos/#limitations
Glaring example:
> The current google API does not allow photos to be downloaded at original resolution. This is very important if you are, for example, relying on "Google Photos" as a backup of your photos. You will not be able to use rclone to redownload original images. You could use 'google takeout' to recover the original photos as a last resort
(and semantically index/search, face recognition... what else does AI get us these days?)
iPhoto used to do this. The Mac photos app that has replaced it since is nowhere near as good.
In fact I would go so far as to say my personal photo management never really recovered from the transition.
It's a pretty deep rabbit hole. For semantic search CLIP and cosine similarity are just fine. SmolVLM(2) mentioned by spacecadet looks interesting though. I haven't integrated face recognition myself, but [deepface] seemed pretty complete.
I focused more on fast rendering in [photofield] (quick [explainer] if you're interested), but even the hacked up basic semantic search with CLIP works better than it has any right to. Vector DBs are cool, but what is cooler is writing float arrays to sqlite :)
[deepface]: https://github.com/serengil/deepface
[photofield]: https://github.com/SmilyOrg/photofield
[explainer]: https://lnar.dev/blog/photofield-origins/
Slightly Off Topic: I have always wanted (old) Apple to make Time Machine / Personal Cloud where Data is stored and processed in my property. While only offering Subscription based storage as long term storage Cloud backup and software update.
For Features. I dont know why there's isn't a tag for Screen Caps. I made lots of them and I want to group them together.
That sounds similar in concept to the original Apple TV (before the black puck one) that had a hard drive and basically ran Front Row (view your photos on your TV etc), but combined with the oldskool Apple Time Capsule.
Nextcloud with a few addons. Now this might look like overkill for your use case but I get the impression that you might want to go further in future.
Stock NC gets you a very solid general purpose document management system and with a few addons, you basically get self hosted SharePoint and OneDrive without the baggage. The images/pictures side of things has seen quite a lot of development and with some addons you get image classification with fairly minimal effort.
The system as a whole will quite happily handle many 100,000 files with pretty rubbish hardware, if you are happy to wait for batch jobs to run or you throw more hardware at it and speed up the job schedules.
NC has a stock phone app which works very well these days, including camera folder uploads. There are several more apps that integrate with the main one to add optional functionality. For example notes and voip.
It is a very large and mature setup with loads of documentation and hence extensible by a determined hacker if something is missing.
This is my dream. I started building something that would upload all my photos from my phone to my desktop, back them up somewhere and then present them 6 at a time on a local website solely so you could look at them again and decide if you wanted to keep them. Heart any you wanted to keep, favorite some, and delete the rest then show me 6 more.
The addition of an AI tool is a great idea.
The gallery I use has an "internals" page in their docs: https://docs.home-gallery.org/internals/
It gives a sort of high level system overview that might provide some useful insights or inspiration for you.
In addition to all of that I want an AI solution that pre-selects good images for me, so I do not have to go through all of them manually. Similar to Apple Memories or Featured Photos. Is there anything self-hosted like that?
https://immich.app/
https://ente.io/
https://photonix.org/
https://github.com/LibrePhotos/librephotos
https://github.com/photoprism/photoprism
I wanted to like Photoprism because unlike Ente and Immich, it supports SQLite databases and doesn't require postgres (I want to keep home lab maintenance to a minimum) but the UI was difficult to like and I couldn't get hardware encoding working on my Intel N100 GPU.
What about Postgres isn't low-maintenance?
The ball-ache of SQLite not scaling outweighs any "maintenance" Postgres needs (it really is just set-and-forget and use a Docker container to schedule database backups—whole thing takes a couple minutes).
“SQLite doesn’t scale” is a common belief, but simply not true. It has limitations on certain sorts of loads, but you won’t run into them on this sort of app—in fact, I would expect consistently better performance from SQLite than from PostgreSQL in typical apps like this. And then SQLite is definitely easier to maintain, being just files.
I'm only now just starting out my on prem photo library and have a couple of thousand which sqlite with WAL seems to have no problems with.
Also considering the type of workload, I imagine photo albums to be write heavy upon photo imports but read heavy afterwards which sqlite should excel at. I'll mostly be syncing pictures from our phones, and it'll be me and the wife using it. Postgres is overkill for my needs.
What about having to do db migrations across major updates?
Have you tried all of these? How are they with very large photo collections?
I've used PhotoPrism and Immich. Everyone's definition is different I have about 100k photos and videos which are a bit over 1 TiB (original data, not thumbnails and previews). Nether had any performance issues with a few minor exceptions on Immich (I don't recall anything from PhotoPrism but it has been a while now since I switched)
1. The Immich app's performance is awful. It is a well known problem and their current focus. I have pretty high confidence that it will be fixed within a few months. Web app is totally fine though.
2. Some background operations such as AI indexing, face detection and video conversion don't work gracefully when restarted from scratch. They all basically first delete all the old data, then start processing assets. So for many days (depending on your parallelism settings and server performance) you may be completely missing some assets from search or converted videos. But you only need to do this very rarely (change encoding settings and want to apply to the back catalog or switch AI search model). I don't upload at a particularly high rate but my sever can very easy handle the steady state.
1 is pretty major but being worked on and you can work around it by just opening the website. 2 is less important but I don't think there is any work on it.
There are some spectacular local models for generating text descriptions of images now. I suggest starting with Mistral Small 3.2, Gemma 3 and Qwen 2.5VL - all available via Ollama.
I expect we will see a Qwen 3VL soon.
Dedupe over edited photos, and handling highly approximate date information are my "nobody has this right yet" criteria.
I have used https://www.photoprism.app/ and have found the face recognition to work quite well.
Photoprism is ok, but the AI features of immich are far superior
What value are you finding in the AI?
Haven’t tried it yet (I’d love to find something like this too) but I saw a conference talk on https://docs.voxel51.com/ that looked pretty interesting. It is kind of a data frame for images with a GUI for exploring them. They make it pretty easy to rip various models over your images to add tags, and to evaluate the results.
Are any of these systems doing true image based entity resolution? It seems like its only pair-wise similarity checking. If you are trying to index say 20 years of family photos how do they do linking kindergardeners to thier adult images?
I would try the Qwen models before LLaVa
Do you need the embeddings to be private? Or just the photos?
For photo indexing I'd run CLIP directly and save on compute, no need to use a whole language model.
It looks as you are primarily using a phone to view and share? We often (visually) share via our living room TV (via attached computer). Is that something you're looking to incorporate?
https://www.digikam.org/ does a lot of what you're looking for.
Not web based, and really starts to show its age.
I don't think the OP specified web based?
Personally I'd love a separate thing that could crawl the photos in a folder I point it to and then let me search using semantics and natural language. But can it please just be an exe I can double click when I need it? If it involves maintaining a server or faffing about with Docker I'm probably not going to bother.
I'm also curious as to the best local high quality background removal, such as for gradation images where people are wearing tassels
Flux Kontext is probably the best for local for a few reasons, but it's slow, uses a lot of VRAM, and changes the quality and resolution. Amazing results if you want <2MP final images, though.
If you need a detailed mask for editing in another application, florence2 or SAM. Or rembg for decent all purpose one shot removals, as long as you have a touchup process or don't mind rerunning the failures.
Stable Diffusion (Web UI or whatever) has add-ons (e.g. rembg) that are really good at this last time I checked
i'm still old school syncthing + photoprism. Perhaps I should give immich a better look
I believe Ente supports all of this, and can be self-hosted. All of the AI stuff is done locally.
I pay them for service/storage as it’s e2ee and it doesn’t matter to me if they or I store the encrypted blobs.
They also have a CLI tool you can run from cron on your NAS or whatever to make sure you have a complete local copy of your data, too.
https://ente.io - if you use the referral code SNEAK we both get additional free storage.
I built this same solution for myself last year, used Hugging Face's "SmolVLM". It works surprisingly well. I use the model to generate verbose descriptions of each image, embed the descriptions using another model, which I also use for the query embedding.
The stack is hacky, since it was mostly for myself...
Photoprism and Immich
From all the comments I've been reading, this combination seems solid. I'll definitely be checking it out thoroughly.
The Browser. Just pure JavaScript, HTML, CSS and WebGPU running on a bulletproof sandbox.