bckr a year ago

Anyone using this?

It's not spelled out, but it's apparent you just run the python file containing the app definition and leave it running in the background?

Looks very clean and pythonic.

  • Miksus a year ago

    Of course I have this running though it's still running older version (been too busy with developing this). It has been running over half a year for my scrapers without a single interruption even though the machine has the worst specs available. I have tested this with Linux/Unix and Windows at least. Of course, I have gotten message from various people saying they are using it. Some have said they migrated from Celery or other alternatives as they found Rocketry more suitable for their needs.

    And that's true: it's 100% Python and basically there is a main loop that checks starting conditions of tasks (and some other things) and if a task's starting condition is reached, the task is run. Tasks can be executed synchronously by setting execution as "main" or concurrently with async, threading or multiprocessing. Maybe in the future with another interpreter as well. The main loop is left running in background.

    So in short, it's a Python that's constantly loop running. It sleeps defined amount of time after checking a set of tasks to lower the resource consumption but you can also create a task with execution as "main" and do sophisticated sleep like "sleep more when CPU usage is X%" or estimate the time when the next task should start from the tasks' conditions.

    And thanks for the positive comment!

    • pid-1 a year ago

      Hey cool project, congrats!

      How does Rocketry saves execution state? Like, if it crashes and goes back up again, does it know which tasks were executed and which ones were not?

      • Miksus a year ago

        Thanks a lot, nice to hear!

        The system knows which task ran and when by extedning logging (from standard library). There is a logger called "rocketry.task" that should have a handler which can be read as well: redbird.logging.RepoHandler. An in-memory logger is created if nothing is specified. This handler abstracts simple read and write to a data store which can be an SQL database, in-memory Python list, MongoDB or CSV file.

        Seems I forgot to implement a method mentioned in the docs but here's an example to specify a task log repo: https://github.com/Miksus/rocketry/issues/108#issuecomment-1...

        The latest success time, starting time etc. are also stored in the tasks themselves and there is some optimization (which can be turned off) to reduce the reads in some cases. In the start-up these attributes are set in each task (if logs found).

    • aidos a year ago

      This looks brilliant. I like that it’s kept light as a concept - feels like you can just sprinkle it over your existing tasks without getting bogged down in complex configuration.

      We have a couple of hand rolled variants of this that run into all the issues this solves. Will definitely look at taking this for a spin.

  • pid-1 a year ago

    I was scratching my head, looking at the docs and asking myself "Ok do I need a database? Is the scheduler separated from workers? Does it have a UI?"

    Being just a lib is actually quite refreshing compared to complex behemoths like Airflow. I guess you could just use your favorite service runner (systemd, k8s, nomad, none at all...).

realyashnag a year ago

Biggest win I see here is the native support for async methods. Celery, the default option for most, does not support it and there are only hacky ways to make async work. Kudos to the team @ Rocketry.

lep a year ago

I haven't looked too closely yet so please excuse me for asking this question but how dynamic can i make the timed events? I have two use-cases in mind:

1. I would like to run a task each day 30 minutes before dawn so i have to compute that time at some point.

2. I run a task normally every hour but if something happens i want to run it 20 minutes after that event.

  • jon-wood a year ago

    Given the "before dawn" constraint here I'm going to assume this is somehow related to home/building automation, in which case you should go look at Home Assistant, which has built in support for things like "do this every hour, or 20 minutes after device X was triggered" and indeed "do this 30 minutes before sunrise".

    • lep a year ago

      Well it's a mixed bag. The dusk/dawn stuff would just be like nice-to-have. I would like to be reminded 30m before dawn to for example walk the dog while the sun is _just_ still out. The other for example could be used for handling "social" events. Like some games only happen during evening hours where i check more often than in off-hours but if any game happens whenever i would like to handle it after the regular game time.

      Now, nothing of that would be too hard to implement myself but these task runners pop up every so often and i would like to leverage other peoples work. Home assistant just feels a bit big for this. Do note that i currently do neither of those but i always try to evaluate these use-cases for these task-runners.

      • powvans a year ago

        Celery supports solar scheduling, though it's not obvious how you would do +/- some time from the solar event. You would probably need to extend their implementation, but I don't think that would be too hard.

        https://docs.celeryq.dev/en/stable/reference/celery.schedule...

        EDIT: I think you just need to provide a nowfun that offsets datetime.now() by the desired timedelta.

      • orwin a year ago

        https://sunrise-sunset.org/api

        I used this myself for my father's chicken coop, it's quite easy to use.

        • lep a year ago

          Thanks but computing the time aint the hard part (modules are available for many languages). I'm more interested in how easy it is to fit this task into the scheduler.

      • xani_ a year ago

        Well, unless you like running in rain you'd have to feed it weather info too.

        On other side it would allow it to be fancy, like rescheduling it earlier if rain is close to the sunset

  • xwowsersx a year ago

    I'm not 100% sure, but take a look at the section on "Manipulating Other Tasks" https://rocketry.readthedocs.io/en/stable/cookbook/controlli... It seems like what you could do is have a task that runs at some regular interval which would compute the 30 minutes before dawn each day and then add the task with the correct start time directly on the rocketry.args.Session: session.create_task(func=before_dawn_task, start_cond=some_condition)

  • musingsole a year ago

    You can build arbitrary conditions (the example given is `file_exists`) that can run whatever code and only need to return True or False.

    If you can write the condition in English, it would seem to me you can build a custom Rocketry condition to suit it.

  • tonyhb a year ago

    For #2, you want an event driven scheduler that can coordinate between events.

    We've built this at https://www.inngest.com. You can run functions based off of schedules or events, with things like "when this event happens, run 20 minutes after the event". Or, "run, wait for another thing to happen, then continue".

    Event driven schedulers do all the regular scheduling, but with a few benefits:

    - It's reactive

    - You can fan-out, so one event runs many functions

    - You can store all events for debugging, replay, local testing, typing, etc.

    We could plumb in an event source for #1 which indicates sunrise and sunset. Heh.

    • xwowsersx a year ago

      Nice project and good looking page. Just fyi but the "s" in "background jobs" is a bit cut off for me (Chrome on Pixel 6 Pro) https://ibb.co/42j18fk

  • charlieyu1 a year ago

    Isn't it better to fetch dawn times everyday then set accordingly?

smaccona a year ago

Does this support multiple time zones simultaneously? We have clients that are in different time zones, and (for example) they all want weekly summary emails every Monday morning at 6am, but in their own timezone. So California users get theirs at 6am US/Pacific, New York users get theirs at 6am US/Eastern, and we want to be able to handle this without having to worry about updating crontabs the night of a daylight savings change.

For this reason, we are using fcron[0] instead of regular cron, which allows you to specify the timezone at the start of each crontab line. If this tool supports that sort of scenario, it might be worth switching.

[0] https://github.com/yo8192/fcron

jwmoz a year ago

Interesting. How does this compare to APScheduler?

How does it deal with concurrently scheduled tasks and the possibility of missed tasks?

  • Miksus a year ago

    I wrote my own ideas of how it compares to APScheduler (and other alternatives) here: https://rocketry.readthedocs.io/en/stable/rocketry_vs_altern...

    Note that this is my own opinions which probably are a bit biased. At the moment there are no built-in missed task launchers but it should be fairly easy to do such by creating a condition that checks the task run periods and whether the task did not ran the latest interval. This is not hard to do but the problem is that I haven't had time to document the time period utilities which are actually pretty extensive. I have plans and some prototypes to do pre-built a misfire condition which one can just add to any task using the OR operator.

    There are 3 options for concurrent tasks: async, thread and process. Just change the execution argument of a task. Choose which suits you and remember there are pros and cons in each. All of them supports parameters etc.

pplonski86 a year ago

What backends do you support? Do I need Redis or it can work with PostgreSQL? Cant find this info in readme.

  • Miksus a year ago

    You can do without a database backend but of course then the task logs are not kept in case of restart. Currently you can use any SQL database that SQLAlchemy supports, MongoDB or CSV files, or any other if you wish to extend Red Bird. It uses Red Bird (another project of mine) to abstract the data store: https://red-bird.readthedocs.io/en/latest/. And it just extends the logging library for reading task logs.

    It seems I did not yet implement the set_repo method even though the docs talk about this but here's one way to set a CSV repo, for example: https://github.com/Miksus/rocketry/issues/108#issuecomment-1...

alexmolas a year ago

Does it keep track of the status of each task?

Imagine I run an app with three processes A, B, C. A runs perfectly, but B fails and halts the app. If I start the app again, is it going to know that A has been already executed? Or is A going to be executed again?

  • Miksus a year ago

    Yep it does, the main process and thread is responsible of communicating with the logs (see the other comment in which I explained the logging mechanism). If you run a task in subprocess, the logs are relayed via queue to the main process and the main process logs it to avoid conflicts.

    There are also an option to force reading the status always from the logs. I'll provide later how's that changed but by default there is some optimization to avoid unnecessary reads from disk as often there is only one scheduler reading/writing to the log data store.

    The logs are stored in memory by default but this can be changed to any data store (if you are willing to expand Red Bird). At the moment CSV, SQL and MongoDB are supported + the in-memory.

svennek a year ago

... or make CLI version of your tasks and let the system mangement daemon ("cron" or in my case systemd timers) handle it.

For clarity make a subfolder called "tasks" or something like that.

Then you get consolidate logging, retries and all kind of stuff for free in a battle-hardened setup and a standardized way to lookup what is enabled and what is not.

  • anyfactor a year ago

    I remember a project that could convert any python script to a python CLI. This was taken to another level by another project that could convert any python CLI to a GUI.

    Can anyone help me with this?

  • wodenokoto a year ago

    I didn’t know from could schedule jobs dependent on other jobs. Isn’t it what this brings to the table?

    “When job a and job b are done, run job c” kind of things.

    • svennek a year ago

      In systemd you can have multiple ExecStarts, which will be run in order (if I remember correctly), and ExecStopPost is brilliant for notifying problems..

    • bhargav a year ago

      The main benefit of cron is your code stops once it’s done, process is cleaned up. There isn’t a provided way to do dependencies but that can be done using some shared locks and scheduling. Won’t be completely accurate which why solutions like Airflow are used.

      Edit: forgot the most obvious way to do dependencies… just execute A & B together as one cron job; still need something like airflow if it gets into a DAG territory

nickjj a year ago

I quickly went through the docs but didn't see a reference to be able to dynamically schedule and unschedule tasks at runtime. I've used APScheduler in the past and it does support this.

The use case is wanting to have let's say a web form where a user can say they want to run a task at XYZ interval and then they can schedule and unschedule it on demand. APScheduler will pick these up without needing to restart anything.

Does your library support that? If not, is that a planned feature?

  • Miksus a year ago

    Do you mean with "dynamically schedule and unschedule" that you sort of manually (or using another task) stop running a task in its specified interval (or condition)? It does support this, there is an argument "disabled" in the tasks that can be set True and then the task won't be run unless explicitly forced to run (calling run method of a task). The task can be enabled by setting it back to False. This can be done in runtime in another task using main, thread or async execution.

    It could be the docs don't mention this. I'll need to check and add it there in case it's missing.

    Or did I misunderstand?

    • nickjj a year ago

      Dynamic as in you don't need to predefine the task in a config file or decorator before you start your server.

      This way you can load and unload tasks at runtime based on user input which you can optionally and independently save in your own database.

      Like imagine a user wanting to control when a backup happens. You can ask them to fill out a form on your site to say "ok run this every day at 4am" and that would spawn a new job that executes at that interval and the user can also delete that and the job would be removed. There might be 100 different users each with their own individual backup jobs that are running or not running.

      • Miksus a year ago

        Sorry, I'm in quite a hurry (so sorry for the language and lack of ellaboration).

        You can create tasks dynamically and you can create them after starting the scheduler. You can use app.session.create_task and pass "func" (Python function) for it or path and func_name if you wish to lazily load the task function (imported only when executing the task). You can also pass a command for this method as well.

        And you can create a task that runs on startup (on_startup=True) and create your other tasks using this task. Use main, async or thread as execution. Then you can create other metatasks that create/modify/delete the tasks on runtime with any logic you want. For example, sync them with a database.

        I'm planning on doing a proper demo about this at some point.

antman a year ago

Does it have storage options in terms of status e.g. for restarts?

  • Miksus a year ago

    There is a repository mechanism to store the logs. The task logger is simply an extension of logging library. Seems my docs are slightly off on setting up the CSV repo but you can just add a RepoHandler (from redbird) to the logger called rocketry.task. At the moment there are MemoryRepo, CsvFileRepo, SQLRepo and MongoRepo.

    You can find more finer details of the repo mechanics in Red Bird's docs: https://red-bird.readthedocs.io/.

    And there are methods in the session to shut down or restart the scheduler in various ways. There is also a shut condition to end the scheduling when a condition is reached.

chirau a year ago

I don't really use schedulers for work and have never really worked with them. So this may sound trivial but do i have to keep a terminal open and the script running for this to work? Or it works in the background like cron jobs? If i have to keep a terminal alive for it, what is a scheduler's advantage over using a good old fashioned loop with a sleep or time on the function call?

  • Miksus a year ago

    Putting it mildly, this is nothing more than a sophisticated Python while loop. And it's not as performance friendly as Cron due to that Rocketry runs on Python. You need to be able to leave Python program running in order to use Rocketry. As bad as that sounds, it's not really a problem with modern machines though. Have run this on Raspberry and with a machine with even poorer specs.

    However, this has a lot of features that Cron doesn't and which are not obvious to create yourself like create task dependencies (like "run this after that has succeeded or this has succeeded"), error management, integrating with APIs, parametrizing etc. Also if you need to run concurrently/parallel tasks, you be facing a lot of odd errors due to race conditions if you tried to do it yourself in a loop. I have even found a bug in Python's time/datetime modules while developing Rocketry. It sounds easy but I advice you don't go to the same rabbit hole as I did. Please don't, it's not good for mental health.

    Of course if you need something very simple, go ahead and do it with a simple loop. Rocketry however makes easy and complex problems easy so it's still a good candidate as in case you realize your problem was more complex than you thought, it possibly has the answer or an obvious way to implement.

    Compared to similar alternatives like Celery or Airflow, (I think) it is much easier to set up and more complex scheduling problems are much easier with Rocketry than with them. Of course if you are a data engineer, I suggest to use Airflow as that's the industry standard.

bhargav a year ago

If you want to run something once a month or a year, I imagine this requires the script to be running in a main loop that whole time?

  • BeefWellington a year ago

    Not really, that's just the most basic way to think of scheduling. The usual way to avoid this in a big loop is a notifier thread that looks at when it next needs to wake and then sleeps for an interval just shorter than that time.

    Due to clock errors and accounting for thread wake wonkiness IME it's usually a good idea to have this "loop" fire in a bit of a window around when the event needs to happen (say +/- 25ms, YMMV) and then trigger the event only at the specified time. After triggering the event repeat sleeping until the next scheduled event.

reedf1 a year ago

Being able to configure simply the execution (async, process, thread) is pure gold, I am tempted to use it for this alone.

  • moffkalast a year ago

    I'm tempted to use it for the name alone, I don't even care what it does.

L3viathan a year ago

Neat!

Looks very clean. Is parallelism/multiprocessing (or even distribution over multiple workers) a thing you plan on making possible?

hamasho a year ago

Looks nice and clean!

Can it support static type checking like mypy? Not only type level like str, but the time format level like “hh:mm” and “n minutes” too.

I hate when I misspell “3 minuts” and realize only after execute it. Much nice if my editor tells me.

encoderer a year ago

I’d love to have a Cronitor integration for this.

OP, if you make one we will feature you in our blog and email newsletter (goes to around 25k devs monthly)

rajasimon a year ago

Since we are here does anyone have idea about how to schedule 1000 calls per day?

Background task:

I am building a service to send twitter messages daily. The limit of Twitter API is I can't send more than 1000 messages so I have to limit the call in the backend.

The closet solution I found that is I schedule a celery beat background task exactly 3 minutes and I call the Twitter API 2 times per task so daily it can send upto 960 messages in 24-hour window.

So I am still finding the solution to make this happen. Suggestion welcome.

  • pottertheotter a year ago

    Why not just schedule the task for every 86 seconds and count how many times you've done it so you stop at 1,000?

gigatexal a year ago

Add some logging and connectors and you have an airflow replacement with the pipelining or am I missing something?

laserlight a year ago

Hmm, this definition reads a bit wrong:

  @app.task((weekly.on("Mon") | weekly.on("Sat")) & time_of_day.after("10:00"))
  def do_twice_a_week_after_ten():
Expression reads Monday or Saturday, yet its meaning is Monday and Sunday according to function name.
  • andijcr a year ago

    if you take it as a boolean test it makes sense. it will be true for any timepoint that is after ten and is either on monday or saturday

    • Miksus a year ago

      Yep, exactly. Rocketry works on conditions which are either true or false thus you need to give it a time range.

      Points in time do not actually make much sense in terms of scheduling as nothing can be run exactly at specified point. There should always be some buffer of tolerance. I think Cron has a tolerance of a minute or so and in Rocketry the tolerance is made obvious and completely customized.

      For those interested more about those two types of time conditions. "time_of_..." are conditions that check whether the current time is in the specified range. The "secondly", "minutely", "hourly", "daily" etc. also check that current time is as specified but also that the task did not yet run on the interval. By combining the two you can create quite complex scheduling strategies easily.

      • BeefWellington a year ago

        It would simplify things a little if the weekly.on accepted multiple days, so you could reduce it to one call:

            weekly.on("Mon", "Sat")
        • Miksus a year ago

          That's a great idea actually. Thanks, I think that should be pretty easy to do!

  • drothlis a year ago

    `|` is a bitwise-or in Python, not a boolean-or.

    • laserlight a year ago

      That's right, but it doesn't change how the expression is read.

qkhhly a year ago

if it provides monitoring ui (e.g. success/failure status for last N runs), then i'd seriously give it a try.

mpeg a year ago

How does it compare to prefect?