Luigi central scheduler

### Intended audience The intended audience are analysts and other luigi *users*. After watching these slides, you'll have a better idea of what luigi is actually doing each time you start it. In particular the `complete()` checks and its interaction with the central luigi scheduler.

## What is luigi? > Conceptually, Luigi is similar to GNU Make where you have certain > tasks and these tasks in turn may have dependencies on other tasks. > There are also some similarities to Oozie and Azkaban. One major > difference is that Luigi is not just built specifically for Hadoop, > and it’s easy to extend it with other kinds of tasks

## Reminder of Task DSL ```python class MyTask(luigi.Task): some_parameter = luigi.Parameter(default="hello") def complete(self): return True or False def requires(self): return [TaskA(), TaskB(param='yay')] def run(self): print self.some_parameter, 'world' ```

## Reminder of Central Scheduler ![Central Scheduler](img/user_recs.png "luigid")

Luigi's Task DSL doesn't dictate a scheduling paradigm. There could be different scheduling paradigms, like polling vs event-driven. Luigi also comes with a build in scheduler.

These slides are about the implementation of the built in scheduler. It's known as luigid.

Summary of luigid * about 1500 lines of python, 1000 html and 1500 js * graphing capabilities * A single point of failure in case of hardware catches fire. * Distributes tasks to luigi clients, first come first serve.

Current client-server API: * `add_task(task_id, worker_id, status)` * `get_work(worker_id)` * `ping(worker_id)`

When tasks fail

When tasks fail too often

The scheduler, when configured to, will DISABLE a task if it fails more than X times in the last Y minutes. The gets reenabled again automatically after Z minutes.

# Questions?

### Thanks for listening [@Tarrasch](https://github.com/Tarrasch) on github http://tarrasch.github.io/luigid-basics-jun-2015/ https://github.com/Tarrasch/luigid-basics-jun-2015