Handling Errors in Long-Running Scripts
Principles
- fail fast on programmer errors
- retry transient network failures
- alert on repeated failure
Retry with backoff (simple)
retry_backoff.py
import time
def retry(fn, tries=3, base_delay=1.0):
last = None
for i in range(tries):
try:
return fn()
except Exception as e:
last = e
time.sleep(base_delay * (2 ** i))
raise lastretry_backoff.py
import time
def retry(fn, tries=3, base_delay=1.0):
last = None
for i in range(tries):
try:
return fn()
except Exception as e:
last = e
time.sleep(base_delay * (2 ** i))
raise lastWrap your main
main_guard.py
import logging
log = logging.getLogger("job")
def main():
...
if __name__ == "__main__":
try:
main()
except Exception:
log.exception("job failed")
raisemain_guard.py
import logging
log = logging.getLogger("job")
def main():
...
if __name__ == "__main__":
try:
main()
except Exception:
log.exception("job failed")
raiseIf this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
