Show HN: Dayflow – A git log for your day

465 points by jerryliu12 3 days ago

Hi HN! I've been building Dayflow, a macOS app that automatically tracks what you're actually working on (not just which apps you have open).

Here's what it does:

- It creates a semantic timeline of your day;

- It does it by understanding the content on your screen (with local or cloud VLMs);

- This allows you to see exactly where your time went without any manual logging.

Traditional time trackers tell you "3 hours in Chrome" which is not very helpful. Dayflow actually understands if you're reading documentation, debugging code, or scrolling HN. Instead of "Chrome: 3 hours", you get "Reviewed PR comments: 45min", "Read HN thread about Rust: 20min", "Debugged auth flow: 1.5hr".

I was an early Rewind user but rarely used the retrieval feature. I built Dayflow because I saw other interesting uses for screen data. I find that it helps me stay on track while working - I check it every few hours and make sure I’m spending my time the way I intended - if I’m not, I try to course correct.

Here’s what you need to know about privacy:

- Run 100% locally using qwen2.5-vl-3b (~4GB model)

- No cloud uploads, no account

- Full source available under MIT license (https://github.com/JerryZLiu/Dayflow)

- Optional: BYO Gemini API key for better quality (stored in Keychain, with free-tier workaround to prevent training on your data)

The tech stack is pretty simple, SwiftUI with a local sqlite DB. Uses native macOS apis for efficient screen captures. Since most people who run LLMs locally already have their tool of choice (Ollama, LLMStudio, etc.), I decided to not embed an LLM into Dayflow.

By far the biggest challenge was adapting from SOTA vision models like Gemini 2.5 Pro to small, local models. My constraints were that it had to take up <4GB of ram and have vision capabilities. I had to do a lot of evals to figure out that Qwen2.5VL-3B was the best balance of size and quality, but there was still a sizable tradeoff in quality that I had to accept. I also got creative with sampling rates and prompt chunking to deal with the 100x smaller context window. Processing a 15 minute segment takes ~32 local LLM calls vs 2 Gemini calls!

Here’s what I’m working on next:

Distillation: Using Gemini's high-quality outputs as training data to teach a local model the patterns it needs, hopefully closing the quality gap.

Custom dashboards where you can track answers to any question like "How long did I spend on HN?" or "Hours until my first deep work session of the day

I'd love to hear your thoughts, especially if you've struggled with productivity tracking or have ideas for what you'd want from a tool like this.

andrewmutz 3 days ago

You should sell this to Lawyers and other professionals who bill per hour to reconstruct their billables for the day without missing anything. They would pay big money for something that recovered forgotten(unbilled) work throughout the day.

1zael 2 days ago

Devil's advocate opinion - this would also show how little per hour lawyers spend working :)
- protocolture 2 days ago
  
  One of my first ever bug reports, was a submission to a company that made legal software.
  In particular, it was a document management system built as a plugin for MS Outlook. (ew)
  Most users, had no issue. However for one user, a lawyer, in particular, she would open and close a bunch of documents (using the built in pdf viewer) and then the application would crash, taking outlook with it, often requiring a restart.
  I went over to view the behavior, and she was some kind of robot. Unlike her peers, she had 12 documents open at once, and she could update and bill (in minimum 6 or 7 minute increments) 12 customers cases in 15 minutes. It was like meeting the Usain Bolt of law practitioners. My back of the napkin math is that she billed like 3-4 hours for every hour she was online.
  Open Email
  Load Attachment
  Review Attachment
  Reply to email
  Assign Email thread to case number
  Close attachment.
  12 times in 15 minutes.
  The bug was that, after loading ~6 pdfs, the application would back off and wait to deallocate the memory. It would then later, randomly decide to write to that memory when another pdf was loaded, and go kaput.
  Just to replicate the issue, I had to close and reopen pdfs so quickly my hands hurt.
  It took 3 revisions of the bug report to get the software company to accept it and resolve it. And even then I think the pdf limit just increased, before we submitted another report and had it resolved permanently.
  On that note, the principal of another law firm I supported would require us to cleanse his personal laptop of porn themed golf games he had downloaded on a regular basis.
  The impression I get is that, lawyers work but the work is just unevenly distributed.
  
  squigz 2 days ago
  
  Were they porn-themed golf games, or golf-themed porn games? This is an important distinction.
  
  protocolture 2 days ago
  
  Digging deep in to my memory, I recall that the user had at least one instance of Strip Putt Putt. I cant conceive of how that answers your question but its the best I can do.
  
  NetOpWibby 2 days ago
  
  It’s never occurred to me that either existed but…rule 34, I guess.
  
  netsharc 2 days ago
  
  Something like Strip-Golf? For every under par she (or he) takes off an item of clothing. For every over they put on one, ha.
  Sounds like a game Epstein and Trump (and some "enigmas") could've played at Mar-A-Lago. With cheating, though.
- darkstar_16 2 days ago
  
  I get the sentiment behind your comment but I have a few lawyers in the family and they work round the clock. They might be in meetings or pouring over documents all day that might not look like work to the average software engineer but trust me, they do work hard. And it's true for everyone - from junior interns to senior partners.
  
  throw-qqqqq 2 days ago
  
  > They might be in meetings or pouring over documents all day
  FWIW it’s “poring over” when reading carefully.
  From Merriam-Webster
  “As a verb, pore means "to gaze intently" or "to reflect or meditate steadily." The verb pour has meanings referring to the falling or streaming of liquid (or things that move like liquid).”
- bryanrasmussen 2 days ago
  
  As a general rule I find lawyers are real honest about their hours, especially as you get to bigger firms.
  Anybody have a different experience?
- s1mplicissimus 2 days ago
  
  > Devil's advocate opinion
  Not sure if the pun was intended, but I'm here for it
- ramonga 2 days ago
  
  lawyers have minimum billable time, if they reply to an email in 1min they will still bill you 10-15min. Ask me how I know :(
  
  andreasmetsala 2 days ago
  
  Wouldn’t you? If I switch context and interrupt my flow to answer a question I’m losing at least 20 mins to regaining focus, why shouldn’t that be reflected in billing?
  Knowledge work is knowledge work, no point belittling colleagues in a different profession.
  
  protocolture 2 days ago
  
  Thats how MSPs operate too. At least the good ones. Billing increments are sometimes as low as 6 minutes, or as high as 30. 15 minutes is average in my experience.
MollyRealized a day ago

I'm a litigation legal admin - I have been for 25-30 years. I instantly brought this up to an associate, telling them, "Maybe not now, but before you retire, this'll be the norm in the industry."
She had been complaining the day before about having to reconstruct a huge bunch of little 0.1 entries involving e-mails to various individuals in cases. If it could be done automatically, through a local LLM? chef's kiss
Trust me, law is definitely where you want to land this thing.
In all honesty, I have absolutely no negotiating power or decision-making authority for my firm, but it's a big one -- if that's a direction you want to go, can't guaranty I can swing enough weight, but I probably could find you the right people to talk to, give you an introduction.
- MollyRealized a day ago
  
  I'll also have to add, though, that you'd have to figure out a way for it to be cross-platform or live outside just macOS. Unfortunately, that's a very uncommon choice in the legal world (or anywhere else).
whalesalad 3 days ago

I’m a software contractor and I’ve wanted this forever. I’m prototyping something on Linux now.
- resonious 2 days ago
  
  Let me know if this Linux app gets anywhere!
  
  sally_glance 2 days ago
  
  I've used https://github.com/ActivityWatch/activitywatch before on Linux, it's actually quite good but takes a fair bit of fiddling to get good results.
  
  thalesac 2 days ago
  
  There's no plugin for screenshot or recording in 1fps.
  
  sally_glance 2 days ago
  
  Yeah, but it's written in a modular way and extending it is not as painful as one would expect. I actually went that way and wrote a couple of custom watchers for things like that.
- thalesac 2 days ago
  
  +1! I was actually looking to fork ActivityWatch to support your exact flow.
mellosouls 3 days ago

Per hour? In the UK they bill by the 6-minute! If ever anything told you something about a profession...
- typpilol 2 days ago
  
  They sell in blocks of 1/10th hours??Thats gotta be the jail call special
  
  jedberg 2 days ago
  
  Every lawyer in the USA that I've ever worked with also bills in 6 minute increments. Which means every email is 6 minutes. Every phone call is at least 6 minutes. etc.
  
  PMunch 2 days ago
  
  Isn't this better though? The alternative being that every email or phone call is an hour, or that they batch it up based on gut feel. If I get a phone call that could easily steal 6 minutes of focus time (note that I specify focus time, even if the call is only 30 seconds you'd have to mentally switch tasks back and forth).
  
  notpushkin 2 days ago
  
  I’m not sure that 6 minutes is a useful denomination of focus time. Maybe legal work is too different from IT and it makes sense there, but when I need to focus on something, 15 minutes is the smallest amount I would allocate for a task.

laurieg 3 days ago

Really nice! I currently use ActivityWatch for tracking tasks on PC.

Some things I would like to be able to do with software like this:

- Identify the 'spark' of a distraction. For example, opening my email inbox to read a specific email also shows me many unrelated emails. These can easily be the cause of a 5-15 minute distraction. This information is often actionable. I installed browser plugins to hide my youtube suggested videos and my distractions went down. I made sure to close all unused windows to avoid catching a glimpse of unrelated work.

- Identify repeated tasks, and the cadence of those tasks. Do I manually make an invoice once a week for a particular edge case? Is the process basically identical every time. Could this be automated?

- How was I feeling before, during and after a task. (This is a very broad and intentionally not well-defined question, but I think it has the most promise for improving procrastination and task initiation).

jerryliu12 3 days ago

Yep, helping people understand their distraction patterns would be an amazing feature. I find myself doing the same thing, funnily enough I also have that same Youtube extension.

rw2 2 days ago

I love the product concept but the fact this person has an almost empty github and suddenly launches an app that can easily be spyware concerns me a lot :). A lot of security concerns with password etc.

astafrig 2 days ago

Could easily dismiss those concerns by looking at the source code instead of snooping around their profile, especially if you’re on GitHub anyway :)
- owenpalmer 2 days ago
  
  Not saying the author has this intent, but if I was trying to spy on people, I wouldn't include spyware in the initial release...

yewenjie 3 days ago

I would not be comfortable sending my bank info passwords and all sorts of other sensitive data that I input and see on my screen to Gemini. How much is the qualitative performance difference with a local model?

jerryliu12 3 days ago

If I had to put a grade on my own experience and evals, Gemini 2.5 pro produces A- results and qwen2.5vl is maybe like B-/C+. Obviously everything's nondetermistic, so it's hard to guarantee a level of quality.
I'm reading through papers that suggest it should be possible to get SOTA performance on local models via distillation, and that's what I'll experiment with next.
- zbrw 2 days ago
  
  Any insights on qwen-3 omni yet?
  
  jerryliu12 2 days ago
  
  Looks awesome, but a 30B model is too big. Vast majority of people probably have 32GB of RAM or less unfortunately.
CIPHERSTONE 2 days ago

Also, if your not using an enterprise edition of gemini where your data is not used for model training, your sensitive data prompts and responses is 100% available to google.
muzani 2 days ago

Google owns my email, browser, phone operating system, and a small amount of passwords. I assume that it has already stolen all my confidential data by now.
nemo1618 2 days ago

Your passwords should never be visible on screen anyway: They go straight from a password manager into a censored input field.

LocalPCGuy 2 days ago

I haven't seen this mentioned, but I immediately thought this could be a great tool for folks with ADHD. The potential for seeing what kinds of things regularly trigger distraction (I know, everything, right?) and any patterns that exist (i.e. every time I make a git commit, I go check Hacker News and lose 15 minutes). As well as being able to review day that was captured automatically is huge. The best success I had with tracking what I did was when I used to use TimeRescue to ensure I had accurate record of hours for clients, but every attempt to use anything that requires manual entry fails very quickly (either too distracting everytime I use it, or I literally just forget to use it).

Going a step further, "real time" (given processing delay) to help stay on task when the focus has shifted to something unrelated (maybe allow the individual to define this or say yes/no to train the prompts as it goes).

Anyways, it looks great. I also liked the _idea_ of Windows Recall, so to see something like this that can be privacy first is really nice.

olex 2 days ago

Pretty nice. How does it handle multiple displays? I've set it up with local Ollama, and it seems to only record and analyze one of my two secondary displays. It would be ideal if I can select which one is used if the recording is limited to a single display, or even better if it can record and analyze the entire multi-monitor desktop surface.

edit Nvm, it seems it always records the display that is currently in focus. That is probably the better way to handle it, since it automatically solves the "ignore what's shown but not interacted with on secondary displays" problem.

LocalPCGuy 2 days ago

This was my question also, I think "even better if it can record and analyze the entire multi-monitor desktop surface" would be the best option. I don't know what the impact of that would be on both recording size and AI processing time, but just because one monitor is focused doesn't always mean what's happening on another is ignored. Some examples: an ongoing meeting or watching a video on one screen while taking notes on another; or coding on one screen and a browser/app auto-refreshing on another.
jerryliu12 2 days ago

Yep, you figured out how it works! That was the easiest solution I could come up with. I'm sure theres additional context on other screens but this was a good 90/10 solution.
thalesac 2 days ago

I see a potential issue where you're in a zoom call in one monitor and working in something else in the other (multitasking ) how to handle that ?

jappwilson 2 days ago

Similar Idea to screenpipe. That gives you more customization:

github.com/mediar-ai/screenpipe

louis030195 2 days ago

screenpipe founder here, love to see more products in this area, ideally OSS, local, no lock in, API/MCP friendly
kind of sad it's macos only, i'm mostly windows user now :)
- jappwilson 2 days ago
  
  side effect of working for enterprise customers. congrats on the fund raise.

requilence 2 days ago

Great project! I’ve had a similar experience with Rewind and the related privacy concerns. A quick thought: if I recall correctly, Rewind performs OCR locally, so it only needs to send textual data. Since you’re focusing on macOS, you could rely on VNRecognizeTextRequest and skip the extra OCR complexity. It might also help to detect and mask sensitive information with lightweight models (e.g., BERT), especially when leveraging cloud-based AI.

jerryliu12 2 days ago

Woah didn't know about VNRecognizeTextRequest, that's super cool thanks for flagging!

ahoog42 2 days ago

if you are on a Zoom/video call, does anyone know if you would have to declare that your "recording" it? I'm thinking more from the legal perspective of wiretapping/consent laws. If you have live transcripts/subtitles does that change any legal requirement.

hx8 2 days ago

Yes, in my state it's generally illegal to take a screenshot of a zoom call without announcing you are recording the conversation. I'm not certain, but I think the issue is the storage of the 1fps video, not the AI summary.

tmychow 3 days ago

Woah this is fab; much less cognitive load than manually using a time tracker. And I'm glad that there's a local option and a "BYO key" option for privacy!

Feel like something of this shape should have existed for a while, but this is very well executed!

r0bbie 3 days ago

I'd only ever consider doing it with a local model, but this looks really cool!

jerryliu12 3 days ago

Thanks! Between my friends and I, it's about a 50/50 split between local and cloud. I think it's great to be able to pick the tradeoff between quality/privacy based on your own privacy preferences.

zeroq 3 days ago

On one hand I'm super enthusiastic about your project.

This could help battle procrastination, organize your time in a long run, bill your clients more efficiently, etc. 20 years younger, hyper productive me would kill for such product.

But then I recall when I accidently suggested TimeRescue to my boss at one time, and suddenly he was skimming though everyones daily logs to see if they're spending 100% of their times in business facing apps.

When I first heard about "covid mouse mover devices" that faked activity for remote workers I thought it was a joke. Seriously.

But I'm afraid this is the dystopian future. Employers constantly looking at your screen and getting spreadsheets with your daily effort.

Overall, very disturbing product.

defgeneric 3 days ago

This was my first thought too. The last generation of activity tracking, while still dystopian, was a little different at least in that it was mainly statistical. So action-wise, it might point managers at "potential problems," but doesn't make its way into a performance review (e.g. "your mouse only traveled 81.72 screen-miles this quarter, 2 standard deviations below the mean, while you also scored the lowest on number of keystrokes with vscode as the active window..."). If a manager really wanted to summarize exactly what was done they had to spend an almost equal amount of time to watch. To some degree, this alleviates that.
jerryliu12 3 days ago

Yea, honestly I would hate if people used this to track _other_ people, especially bosses. I wanted to build something that gave people more agency to do more with their precious time, but there definitely is a fine line here.
- herewulf 3 days ago
  
  Probably someone long ago said the same about hope for pointy sticks not being used just for hunting animals. Yet someone will likely make a pointy stick whether you do it or not.
  
  zeroq 3 days ago
  
  oppenheimer said that about dynamite /s

tolerance 3 days ago

For those expecting something more along the lines of a `git log` sort of thing, like a command line tool, there's `doing`.

https://brettterpstra.com/projects/doing/

pastapliiats 2 days ago

Who doesn't love Windows Recall

7bit 2 days ago

Im confused how much loves this finds here while Recall was rightfully critisised. ITS the Same Pictures

ttoinou a day ago

I've been using today and it seems like it's using 1 euro of credit per hour, is this normal ? Seems a bit expensive. I'm not running the trial of Gemini anymore. Would be nice to detect when there's no mouse movements / keyboard activity and stop recording in those case. And also stop recording when a media player is at fullscreen.

lucfranken 2 days ago

I am currently testing the app. Maybe, for more engagement, start processing cards faster just after installation. It feels weird to have to wait 30 minutes, just show me at least a card. Like the fact that I am installing DayFlow would be a positive experience.

Compliments for the Wizard - that one works perfect at least with Gemini. One little detail: You have a Github Star button in it, that really was at a non-logical place and made me think.

tiernano 3 days ago

wait... isnt this pretty much what Microsoft was doing with Recall?

jerryliu12 3 days ago

Recall (and Rewind) are similar in the sense that they both use screen data, but it's designed for retrieving specific things you saw, not semantically summarizing your time. My opinion is that they're completely different feature sets.
- pimlottc 3 days ago
  
  The backlash for Recall was not based on the feature set, it was because of the massive privacy and security concerns.
  
  aeon_ai 3 days ago
  
  Which are wildly different when comparing a third-party hosted product (i.e., Microsoft), and a self-hosted OSS application that can use a self-hosted Ollama model.
  The feature isn't the problem.
- LocalPCGuy 2 days ago
  
  There really isn't a reason that the screen data, once you have it, can't be used for more than one thing. I would guess that there isn't a whole lot stopping Windows Recall from doing very similar things.

mrklol 2 days ago

"Records screen at 1 FPS in 15-second chunks.“

If it’s recording 15 seconds, how often are you doing that? Once every 15m as the analysis interval is 15m?

pi-err 2 days ago

Looks like it's recording all the time and analyzes 900 screenshots every 15 minutes? And it keeps records for 3 days.
So I'm not sure I buy the lightweight/low-impact claim.

jauntywundrkind 3 days ago

It's somewhat related two other recent submissions,

Replace PostgreSQL with Git for your next project for git data storing. https://news.ycombinator.com/item?id=4535144 https://devcenter.upsun.com/posts/why-you-should-replace-pos...

Consumer.today day-logging single user microsite. https://consumed.today/ https://news.ycombinator.com/item?id=45351446

Cute serendipity, rule of three. Neat project too; conceptually it sounds like an amazing ability to be able to better watch ourselves. Doing it via screenshots & AI feels like a fun sense-making adventure that actually makes a lot of sense, that can maybe try to pick through & discern what the screen is doing in a lot of different scenarios.

mustaphah a day ago

Nicely done.

Funny enough, I had a similar idea a few weeks back; I jotted it down in my idea sketchpad. It felt a bit ambitious for an open-source side project, and I wasn't sure if it could even work with a local LLM. I was genuinely excited about it, nonetheless.

Now that I know it's totally viable, I've got even more reasons to build a Linux version myself.

novoreorx 2 days ago

Given that ScreenMemory [1], an app that automatically records screenshots, has already saved my screen at intervals, it would be great for it to also have this AI summarizing ability.

[1]: https://screenmemory.app/

rcarmo 2 days ago

This is amazing and yet I think it would need an existential angst mode to capture those days when I am doing video calls with various teams all day.

Maybe patching https://github.com/JerryZLiu/Dayflow/blob/main/Dayflow/Dayfl... to say "Describe what you seen in this computer screen in the style of Werner Herzog" would do it...

ghm2199 3 days ago

I would imagine this could be one of the inputs along with a STT system as context to an LLM. Because in general we can speak faster than we can write/type and for me, specifically, after a point in the day typing creates a higher cognitive load than speaking.

1. "Create a reminder for reading this email at 5:00 pm" and this could infer what to do from the screen shot's description(plus a local MCP tool for calendar)

2. "Can you fetch that file form that project in that workspace and implement the pattern in the code on my vscode terminal?" It can lower cognitive fatigue of typing and clicking a bunch of place.

3. Take notes as I describe something on the screen. It could be for prompt composition e.g. get the link from my browser and the file on vscode and write code that does XYZ.

anyg 3 days ago

Couldn't we get a low-res version of this info by tracking the active window using a cli tool? For linux, there are several options. Not sure about Mac.

Another approach is to run OCR on 1FPS screenshots. Everything runs locally without draining the battery like an LLM would.

jerryliu12 3 days ago

You definitely could! I think it would just be harder to get good semantic understanding of what you did during a segment of time without LLMs.

lucfranken 2 days ago

Really cool!

As already seen in the comments there are lots of desires to add more data compared to just screen input.

Could be things like:

- Apple HealthKit / watch - custom apps - Phone logs

Also you stated, and true, that there is much focus needed on improving your core feature.

It might be interesting to allow some kind of API / plugin area. So that people can expand on your core feature and add the desired parts. Might in the future expand to some kind of AppStore like feature with plugins.

That would keep your work focused and allows others to make it complete in their vision, and for others.

netnameus a day ago

Testing it today, because my Mac thinks I'm sharing my screen, it's suppressing notifications on phone, mac, etc. Need to figure out how to fix that.

p_zuckerman 2 days ago

This would be helpful also for companies? Hence, ethically point of view it would violate the employee time in screen so there would arise issues with employees rights and HR?

philipallstar 2 days ago

Looks like 98% of my day is Hacker News. This thing must have a bug.

Right?

rokob 2 days ago

I think this is pretty cool but I spend most of my day on a laptop I don’t own and there’s no way I could get this on there.

dpflan 2 days ago

This reminds me of Stephen Wolfram's efforts to analyze his life: https://writings.stephenwolfram.com/2012/03/the-personal-ana...

Klaster_1 3 days ago

Wow this is awesome! Wish I could try this on Windows. This is genuinely one of few time tracking solutions that piqued my interest. For now, I'll stick to manual labeling activities with my custom, simple tool: https://github.com/Klaster1/timer-5

christoph123 2 days ago

If you're on Windows, give this a try: https://donethat.ai - cloud-based but you can use your own gemini key

graeme 2 days ago

Would be helpful to have a screenshot test tool alongside the api test tool. The app didn't create an application support folder yet. Possibly not enough time has passed but would be great to be able to troubleshoot sooner.

jerryliu12 2 days ago

Thanks, yeah I do need to flesh out the debugging options. In the menu bar you can click the Dayflow icon which should allow you to view the recordings folder. The sqlite db is in that folder too if you want to poke around there as well.
- graeme 2 days ago
  
  Thank you! I can see that now. Checked, I have recordings. The SQLite seems to have very little data, despite recordings being present, and no cards show up in app. Possibly an LM Studio issue. I'll fiddle around with it and send an email if I can't get it working. Will test with Ollama in case there's some LMStudio error the API tester isn't catching.
  This is the error I got: reason: No valid observations generated from frame fallback

sipjca 3 days ago

This is super rad. Love it being Open Source, and with the option to choose local models. You’re awesome, thanks!

atoav 3 days ago

Curious how this works with multi monitor setups, e.g. watching a viedeo while researching travel plans.

xp84 3 days ago

Very cool idea. When using the Gemini option, what kind of cost would be expected to be incurred? I'd be satisfied by knowing the approximate number of tokens one would expect to be consumed by processing an hour of these recordings, and which specific model is being used.

jerryliu12 3 days ago

Gemini 2.5 Pro is pretty expensive, mostly because videos take up a lot of tokens. It's roughly 1 million input tokens/hr, with a relatively insignificant amount of output tokens. Fortunately, Gemini has a very generous free tier, which is more than enough to cover daily usage. If you set up one paid project (and just don't consume any tokens), you can still use the free tier on a different project, and they can't train on your data.

pgcosta 2 days ago

I though about having something like this! This can be a great tool! For engineers this can be a great tool to summarize the standup update, or even to recall what did we do yesterday I'll check it out now

danielfalbo 2 days ago

It would be useful for freelancers if Dayflow automatically detected the client they were working for, to count hours spent on client, similar to what toggl.com does but automated

fsto 2 days ago

Love the idea, how you present it here and in the product. Clear, trustworthy and calming. Just installed and looking forward to try out. Half-random question: how has your way to the current UI / UX (visually and copy)?

sawyna 2 days ago

I have installed this and configured the API key, it's been three hours nothing is happening for some reason. The app doesn't show anything. Is it because I have a multi monitor setup?

mellosouls 3 days ago

Congrats on a nice looking app that will be very useful for individuals (though potentially misused by toxic managers).

Kudos particularly for the efforts you've gone to on explaining privacy implications.

jerryliu12 3 days ago

Thanks! Wanted to build something I'd personally be comfortable using.

voidUpdate 2 days ago

What makes this similar to "git log", other than it show events happening in an order? It looks more like my calendar layout than git log to me

blef 2 days ago

The onboarding flow is neat. I will give it a try over the next days. The local setup makes the computer heat a bit every 15 minutes, but it's ok

dmd 2 days ago

You seem to have a naming conflict with https://dayflow.ai/

danielfalbo 2 days ago

Maybe both asked the same LLM for inspiration

user3939382 2 days ago

It’s nice to forget everything you’re working on periodically and examine the pieces of what you’ve built and redecide what they mean if anything.

wayeq 2 days ago

I'm sure my employer would be thrilled with a background process taking screenshots every second

smcleod 3 days ago

Nice work, does this work with local (100% offline) models assuming you have decent hardware and are serving them up with llama.cpp or similar?

jerryliu12 3 days ago

Yep! Have tested it out on Qwen 2.5VL 3B and it works reasonably well on my 16GB Macbook Air. The only thing I will say is that I don't think it's a great idea to run local models on laptop battery, since it's quite compute intensive and drains kinda quickly. Have tested with Ollama and LMStudio, but you should be able to use any OpenAI compatible local server.
- deanputney 3 days ago
  
  Would it be possible to check for the power adapter and run processing then? These are the types of things I've been thinking about for my own app: https://stardateapp.com
  
  jerryliu12 3 days ago
  
  Wow, yeah that's clever I hadn't thought of that. Will add as an advanced setting.
- jastuk 2 days ago
  
  You've mentioned in the docs that:
  > Gemini leverages native video understanding for direct analysis, while Local models reconstruct understanding from individual frame descriptions - resulting in dramatically different processing complexity.
  For people like me who haven't dabbled much with AI video processing and have no intuition for it, could you clarify the drawbacks of such a local-only approach vs what Gemini offers? I don't mean the performance or power/battery impact (that part is clear), just in terms of end-result and quality what the practical differences are.
  I'm in the only-100%-offline camp here but would like to know what I'm missing out on since I won't even try Gemini here.
- smcleod 3 days ago
  
  Nice that's great. I have a 96GB M2 Max that's plugged in 99.9% of the time so that's not an issue. Cheers for the response!

chewhongjun96 3 days ago

Is it possible to include wearables as a data sources?

i.e. apple watch for sleep, running, activity levels? it could really give a 360 view of your life

jerryliu12 3 days ago

That would be really cool, but for the foreseeable future there's still a lot of room to improve how screen data is used so I'll mostly be focused on that.

rememberlenny 3 days ago

Congrats. This is very well executed.

muggermuch 3 days ago

This is amazing - just the tool I needed; thank you so much!

zeeqeen 2 days ago

Great! This is what I want for long! And UI is good too.

ttoinou 2 days ago

Wow such a great idea. Will you monetize this ?

akhilnchauhan 2 days ago

this is very cool - thanks for sharing!

VadimPR 2 days ago

I need this so badly - but on Linux :)

tonyhart7 2 days ago

the fact that people already built this with open source multi modal model is astonishing

ctrlp 2 days ago

Very nice. Beautiful UX.

rasulkireev 2 days ago

this is amazing! thanks for doing this!

scuff3d 2 days ago

Am I the only person that sees "AI" and "screen capture" and thinks no fucking way? I switched to Linux specifically to get away from data collection, why on earth would anyone want to opt into it?

graeme 2 days ago

The app has a local only mode. That's just your computer chip/gpu running a language model locally. It works even if all outgoing connections are blocked, as far as I can tell. What's the threat you're worried about in that scenario?
- scuff3d 2 days ago
  
  Fair enough, I didn't see that part. I'm still not interested but that seems like a far more reasonable option.

j1000 2 days ago

What kind of problem is this solving? Why would I install spyware on my machine? lmao

syngrog66 2 days ago

wow

just wow

2025 is getting surreal online

matthewparal a day ago

[dead]