Skip to content

Handling Errors in Long-Running Scripts

Principles

  • fail fast on programmer errors
  • retry transient network failures
  • alert on repeated failure

Retry with backoff (simple)

retry_backoff.py
import time
 
 
def retry(fn, tries=3, base_delay=1.0):
    last = None
    for i in range(tries):
        try:
            return fn()
        except Exception as e:
            last = e
            time.sleep(base_delay * (2 ** i))
    raise last
retry_backoff.py
import time
 
 
def retry(fn, tries=3, base_delay=1.0):
    last = None
    for i in range(tries):
        try:
            return fn()
        except Exception as e:
            last = e
            time.sleep(base_delay * (2 ** i))
    raise last

Wrap your main

main_guard.py
import logging
 
log = logging.getLogger("job")
 
 
def main():
    ...
 
 
if __name__ == "__main__":
    try:
        main()
    except Exception:
        log.exception("job failed")
        raise
main_guard.py
import logging
 
log = logging.getLogger("job")
 
 
def main():
    ...
 
 
if __name__ == "__main__":
    try:
        main()
    except Exception:
        log.exception("job failed")
        raise

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did