Mastering Celery and Celery Beat in Django: PostgreSQL Backup cronjob.
Aug 13, 2024.
Introduction
Managing background processes and scheduling recurring tasks is an essential part of the software development and maintenance workflow. Whether it's sending out regular notifications, cleaning up databases, or backing up crucial data, finding the right tools to automate these tasks is essential. That’s where [Celery](https://docs.celeryq.dev/en/main/index.html) and [Celery beat](https://docs.celeryq.dev/en/main/userguide/periodic-tasks.html) come in.
Celery provides a powerful way to handle background tasks, while Celery Beat functions like a cron job scheduler, automating these processes at regular intervals.
In this blog, I’ll walk you through how developers can leverage these tools to streamline their workflows, ensuring that essential tasks run smoothly and automatically in the background.
Prerequisites
Before diving into using Celery and Celery Beat, there are a few prerequisites you'll need to have in place.
First, you'll need a basic understanding of Python and Django since we'll be working within a Django project.
Familiarity with PostgreSQL is also important, because we would be backing up data from a postgreSQL database
- Python 3
- Django 5.0.8
- Linux or macOs is needed to run redis locally. Windows users can run redis-cloud or install WSL or WSL2
Setting Up Environment
To get started with Celery and Celery Beat, you'll need to set up your development environment and install the necessary packages. Below are the instructions for Windows, Linux, and macOS.
Create a folder and name it preferably *celery-django* and then open the folder with your preferred code editor. I will be using VS code on my end.
After opening the folder, open your terminal to activate the [virtual environment](https://www.w3schools.com/django/django_create_virtual_environment.php)
*Windows:*
Initialize virtual environment
`python -m venv venv`
Activate environment
`venv\Scripts\activate`
*Linux and macOS:*
Initialize virtual environment
`python3 -m venv venv`
Activate environment
`source venv/bin/activate`
Installing Django and Celery
Once your virtual environment is activated, you can install the required versions of Django and Celery. Use the following command to install Django 5.0.8 and Celery 5.4.0:
`pip install django==5.0.8 celery==5.4.0`
After installing the packages, start a new Django project by running the following command:
`django-admin startproject async_project .`
Note the use of the dot (`.`) at the end of the command. This ensures that the project is created in the current directory without creating an additional folder named *async_project* inside it. You can name the project anything you like, but I chose *async_project* because I prefer having my project folder appear at the top of the folder structure alphabetically in VS Code.
Creating a New App Called Backup
With the Django project set up, the next step is to create a new app within the project. We'll name this app *backup*, as it will handle the logic for backing up our PostgreSQL database. Run the following command to create the app:
`python manage.py startapp backup`
This command creates a new directory named *backup* within your project, containing all the necessary files for the app. You can name the app whatever you prefer, but *backup* is a descriptive name for our use case.
plaintext
celery-django/
├── async_project/
│ ├── __init__.py
│ ├── asgi.py
│ ├── settings.py
│ ├── urls.py
│ ├── wsgi.py
├── backup/
│ ├── __init__.py
│ ├── admin.py
│ ├── apps.py
│ ├── migrations/
│ │ └── __init__.py
│ ├── models.py
│ ├── tests.py
│ ├── views.py
├── manage.py
├── venv/
│ ├── bin/
│ ├── include/
│ ├── lib/
│ └── pyvenv.cfg
Setting Up Environment Secrets
Before configuring Celery, it's important to securely manage your environment variables. To do this, you can use the *python-dotenv* package, which allows you to load environment variables from a *.env* file. Start by installing *python-dotenv* with the following command: `pip install python-dotenv`
Next, create a *.env* file in the root of your project directory. This file will store sensitive information such as your Django secret key and Celery broker URLs. Make sure not to commit this file to your version control system. Below is an example of what your *.env* file should look like:
python
DJANGO_SESSION_KEY=your-secret-key
BROKER_URL=redis://localhost:6379/0
RESULT_BACKEND=redis://localhost:6379/0
Updating Project Settings
We need to update *INSTALLED_APPS* in the *settings.py* file. Additionally, you'll use *python-dotenv* to load your environment variables and ensure that your settings are properly configured to use Celery. Here's how your *settings.py* file should look after these updates:
python
"""
Django settings for async_project.
Generated by 'django-admin startproject' using Django 5.1.
For more information on this file, see
https://docs.djangoproject.com/en/5.1/topics/settings/
For the full list of settings and their values, see
https://docs.djangoproject.com/en/5.1/ref/settings/
"""
from pathlib import Path
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
# Build paths inside the project like this: BASE_DIR / 'subdir'.
BASE_DIR = Path(__file__).resolve().parent.parent
# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/5.1/howto/deployment/checklist/
# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = os.environ.get('DJANGO_SESSION_KEY')
# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True
ALLOWED_HOSTS = ['*']
# Application definition
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'backup',
'django.contrib.staticfiles'
]
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]
ROOT_URLCONF = 'async_project.urls'
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
WSGI_APPLICATION = 'async_project.wsgi.application'
# Database
# https://docs.djangoproject.com/en/5.1/ref/settings/#databases
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': BASE_DIR / 'db.sqlite3',
}
}
# Password validation
# https://docs.djangoproject.com/en/5.1/ref/settings/#auth-password-validators
AUTH_PASSWORD_VALIDATORS = [
{
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
},
]
# Internationalization
# https://docs.djangoproject.com/en/5.1/topics/i18n/
LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'UTC'
USE_I18N = True
USE_TZ = True
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/5.1/howto/static-files/
STATIC_URL = 'static/'
# Default primary key field type
# https://docs.djangoproject.com/en/5.1/ref/settings/#default-auto-field
DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'
# Celery settings
CELERY_BROKER_URL = os.environ.get('BROKER_URL')
CELERY_RESULT_BACKEND = os.environ.get('RESULT_BACKEND')
Highlighting the New Fields in settings.py
Here’s a breakdown of the new fields added to the *settings.py* file:
`load_dotenv()`: This function loads the environment variables from your `.env` file, making them accessible within your Django settings.
*CELERY_BROKER_URL*: This setting defines the broker that Celery will use to send and receive messages. It's typically set to a Redis or RabbitMQ URL. As you can see above in the *.env* file, our choice of message broker is *redis*. Don't worry, we would talk about redis soon
*CELERY_RESULT_BACKEND*: This setting specifies where Celery should store task results. This could also be a Redis URL, for instance.
Redis can be run locally or in the cloud for the purpose of this tutorial
The Producer, Broker, Worker, and Result backend
In a typical Celery setup, the workflow involves four main components: the *Producer* (Django), the *Broker* (Redis), the *Worker* (Celery), and the Database (which also serves as the *Result Backend* and could be any database, but in this setup, we're using Redis).
The celery workflow
Celery to the rescue
The Producer, which is the Django application, generates tasks that needs to be processed asynchronously. (These tasks could be resource consuming)
For instance imagine trying to design a system that processes a large video file and convert it to gif. The whole program has to wait for the task to be processed once started.
At that point, a simple `GET` request to the server from another user that should take about `200ms` to complete would take longer due to the fact that the video processing being CPU bound may occupy the thread till completion before allowing other operations use the thread.
And due to the python's [Global Interpreter Lock(GIL)](https://en.wikipedia.org/wiki/Global_interpreter_lock) design, only one thread can run at a time in a python [(Cpython)](https://www.codeconquest.com/blog/cpython-vs-python-are-they-the-same-or-different/) application.
But however with this implementation, these tasks are sent from the producer *(Django)* to the *Broker,* which is a message queue system—in our case, *Redis.* The Broker then queues these tasks and makes them available for the *Worker*, which is *Celery*.
The Celery Worker fetches tasks from the Broker, creates a new process for each task, separate from the main Django process with its own thread, works on them them in parallel, and then stores the results back in the Database *(Result Backend)*, which, again, is Redis in our setup.
This workflow allows for efficient handling of time-consuming tasks, such as sending emails or processing large datasets or image buffers, without blocking the main application process, thereby improving the overall performance and responsiveness of a Django project.
Redis, the broker
Redis is a fast, open-source, in-memory data structure store that can be used as a message broker for Celery.
As the Broker, Redis queues the tasks sent by Django and makes them available for Celery Workers to process.
To get started with Redis, you'll need to [install the Redis server.](https://redis.io/docs/latest/operate/oss_and_stack/install/install-redis/)
On Linux and macOS, you can install Redis with a package manager like `apt` or `brew`. For Windows, you can use the Windows Subsystem for Linux (WSL) or install Redis from a pre-compiled binary.
Once installed, start the Redis server, and then verify that the Redis client can connect to the server by running the `redis-cli` command. If your installation is succesful, you should see the redis host and port usually `127.0.0.1:6379`. Then run `ping` in the cli you should get a `PONG` response
bash
$ 127.0.0.1:6379> ping
PONG
Setting Up Celery configuration
To configure Celery in the Django project, you'll need to create a new file named *celery.py* inside your *backup* directory. This file will initialize Celery and connect it with your Django settings. Below is the code you should add to this file.
python
import os
from celery import Celery
from dotenv import load_dotenv
# Set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', '_core.settings')
app = Celery('backupPG')
# Load environment variables from .env file
load_dotenv()
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django apps.
app.autodiscover_tasks()
New folder structure
Once you've added the `celery.py` file inside your `backup` directory, your project's folder structure should look similar to the following:
plaintext
celery-django/
├── async_project/
│ ├── __init__.py
│ ├── asgi.py
│ ├── settings.py
│ ├── urls.py
│ ├── wsgi.py
├── backup/
│ ├── __init__.py
│ ├── admin.py
│ ├── apps.py
│ ├── celery.py
│ ├── migrations/
│ │ └── __init__.py
│ ├── models.py
│ ├── tests.py
│ ├── views.py
├── manage.py
├── venv/
│ ├── bin/
│ ├── include/
│ ├── lib/
│ └── pyvenv.cfg
Ensuring Celery is Loaded with Django
To make sure that Celery is always imported and ready whenever Django starts, you need to add a specific code to your `async_project/__init__.py` file. This step is crucial because it ensures that `shared_task` will be linked to your Celery instance. The code below imports the Celery app from your `backup` module and makes it available to other parts of your project.
python
#async_project/__init__.py
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from backup.celery import app as celery_app
__all__ = ('celery_app',)
Running the Celery Worker
To start Celery and see it in action, you can run the following command in your terminal: `celery -A async_project worker --loglevel=info` However, when you run this, you may encounter an error indicating that Redis is not available.
But wait. didn't we just download and install redis a while ago?
Yes! we downoaded the Redis server which is running in the background on your machine. However your Django project also needs a Redis client to connect to the Redis server.
To resolve this, we need to install Redis as a dependency in your project. You can install Redis by running `pip install redis` inside your virtual environment. Once installed, Celery will be able to connect to Redis, and the worker should start successfully when we run the command `celery -A async_project worker --loglevel=info`
You should see a response that shows some information about the celery session.
Create a postgres database
To create a celery task, we need to create a PostgreSQL database we would be backing up, you'll first need to download and install PostgreSQL. If you don't have postgres installed on your machine You can find the installation instructions for your operating system by following the [PostgreSQL Downloads instruction](https://www.w3schools.com/postgresql/postgresql_install.php). Once installed, you can [verify a successful installation](https://www.w3schools.com/postgresql/postgresql_getstarted.php)
Create a new database using the `psql` command-line tool or any graphical client such as pgAdmin. Run the following command to create a new database:
`CREATE DATABASE your_db_name;`
Replace `your_db_name` with your desired database name. Make sure to note the database name, username, and password, as you will need these details to connect with the database we want to backup.
Also it is nice to [insert data](https://www.w3schools.com/postgresql/postgresql_pgadmin4.php) to the database since the whole idea is backing up our database
Sign Up for Dropbox and Get Access Token
In order to back up your PostgreSQL database to Dropbox, you'll need to sign up for a free-tier Dropbox account at [Dropbox](https://www.dropbox.com/). After signing up, navigate to the Dropbox Developer portal and create a new app to [obtain an access token](https://dropbox.tech/developers/generate-an-access-token-for-your-own-account). Before generating the access token, make sure you [grant write permissions](https://i.imgur.com/yHDLUt9.png) in the dropbox developers dashboard so that you can write data to dropbox from the django app. Once you have your access token, update your `.env` file with the following variable:
`DROPBOX_ACCESS_TOKEN=your_dropbox_access_token`
Additionally, you need to install the Dropbox and Requests Python libraries to interact with the Dropbox API. You can do this by running:
`pip install dropbox requests`
Now you're ready to use Dropbox in your project for backup purposes.
You should note that the access token expires in about 4 hours. After which you need to generate a new one. To get a token you can use infinitely when deploying, you should read this [guide](https://developers.dropbox.com/oauth-guide) or [reach out](https://www.linkedin.com/in/odunayo-alo-b48586255/)
plaintext
DJANGO_SESSION_KEY=your-secret-key
BROKER_URL=redis://localhost:6379/0
RESULT_BACKEND=redis://localhost:6379/0
DATABASE_USER=your_db_user
DATABASE_PASSWORD=your_db_password
DATABASE_NAME=your_db_name
DATABASE_HOST=localhost
DROPBOX_ACCESS_TOKEN=your_dropbox_access_token
Registering the Tasks
To handle the database backup and upload it to Dropbox, we need to register a Celery task. First, create a new file inside the `backup` directory called `tasks.py`. Then, place the code provided below in the `tasks.py` file. This task will back up the PostgreSQL database, upload the backup file to Dropbox, and optionally remove the local file after the upload. The task uses the `pg_dump` utility to create the database backup and the Dropbox API to upload it. Make sure your `.env` file is configured with the correct Dropbox access token and database details before running this task.
python
from __future__ import print_function
from celery import shared_task
import os
import subprocess
from datetime import datetime
import dropbox
from dotenv import load_dotenv
load_dotenv()
from django.conf import settings
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
DROPBOX_ACCESS_TOKEN = os.environ.get('DROPBOX_ACCESS_TOKEN')
@shared_task
def backup_postgres_and_upload(database_name, user, password, host, port, backup_dir):
# Step 1: Perform the database backup
timestamp = datetime.now().strftime('%Y%m%d%H%M%S')
backup_file = f"./{database_name}_backup_{timestamp}.sql"
backup_path = os.path.join(backup_dir, backup_file)
dump_cmd = f"PGPASSWORD={password} pg_dump -U {user} -h {host} -p {port} {database_name} > {backup_path}"
subprocess.run(dump_cmd, shell=True, check=True)
# Step 2: Upload the backup to Dropbox
upload_to_dropbox(backup_path)
# Optional: Remove the local backup file after upload
os.remove(backup_path)
def upload_to_dropbox(file_path):
# Create a Dropbox client
dbx = dropbox.Dropbox(DROPBOX_ACCESS_TOKEN)
# Upload the backup file to Dropbox
with open(file_path, 'rb') as f:
dbx.files_upload(f.read(), f"/{os.path.basename(file_path)}")
print(f"Backup uploaded to Dropbox: {file_path}")
# Example Task Execution:
# result = backup_postgres_and_upload.apply_async(
# kwargs={
# 'database_name': os.environ.get('DATABASE_NAME1'),
# 'user': os.environ.get('DATABASE_USER'),
# 'password': os.environ.get('DATABASE_PASSWORD'),
# 'host': os.environ.get('DATABASE_HOST'),
# 'port': 5432,
# 'backup_dir': '.'
# }
# )
# result.get(timeout=2000)
# print(result)
Explanation of the Code and Running the Celery Worker
The code inside `tasks.py` defines a Celery task `backup_postgres_and_upload`, which performs two key operations. First, it creates a backup of your PostgreSQL database using the `pg_dump` utility.
The backup file is stored locally in the specified directory.
Second, the task uploads this backup file to Dropbox using the Dropbox API. After a successful upload, the local backup file is optionally removed to save storage space.
The `upload_to_dropbox` function connects to Dropbox using the access token retrieved from the `.env` file. This function handles the file upload.
Once this task is registered, you can run it from anywhere in your Django application.
To ensure that the tasks are executed, make sure to start the Celery worker again using the following command:
`celery -A async_project worker --loglevel=info`
Make sure your Redis server is running and that your `.env` file is correctly configured before running this command.
Creating [urls.py] in the Backup Directory
To route requests to the backup functionality, you need to create a `urls.py` file inside the `backup` directory. This file will define the URL patterns for the `backup` app. Inside `urls.py`, add the following code to map the URL path `/backup_data/` to the view that handles the database backup task. This ensures that any request made to `/backup_data/` will trigger the corresponding task in the views.
python
from django.urls import path
from .views import *
urlpatterns = [
path('backup_data/', backup_data),
]
Creating views.py in the Backup Directory
Now, we need to define the logic for the backup operation. Create a `views.py` file inside the `backup` directory. Inside this file, add the code provided below. This view triggers the Celery task `backup_postgres_and_upload` when the `/backup/` endpoint is hit. It will execute the backup process asynchronously and return a JSON response with the task status. If an error occurs, it will handle the exception and return an appropriate error message.
python
from django.http import JsonResponse
from .tasks import backup_postgres_and_upload
from dotenv import load_dotenv
import os
load_dotenv()
# Create your views here.
def backup_data(request):
try:
# Trigger the task and wait for it to finish
result = backup_postgres_and_upload.apply_async(
kwargs={
'database_name': os.environ.get('DATABASE_NAME'),
'user': os.environ.get('DATABASE_USER'),
'password': os.environ.get('DATABASE_PASSWORD'),
'host': os.environ.get('DATABASE_HOST'),
'port': 5432,
'backup_dir': '.'
}
)
# Wait for the task to complete
result.get(timeout=2000) # You can adjust the timeout as needed
return JsonResponse({"message": f'The task with id {result.id} status is {result.status}'})
except Exception as e:
# Handle other potential exceptions
return JsonResponse({'error': str(e)}, status=500)
Updating the Main [urls.py] File
To make the backup URL accessible within the main project, you need to update the main `urls.py` file located in the `async_project` directory. Add the `include()` function to reference the `backup` app's URL configuration. This way, the backup routes defined in the `backup/urls.py` file will be included in the project's main URL patterns, making the `/backup/` endpoint accessible.
python
from django.contrib import admin
from django.urls import path, include
urlpatterns = [
path('admin/', admin.site.urls),
path('backup/', include('backup.urls')),
]
Testing the Backup Process
Once you've configured everything, it's time to test the backup process.
First, open a terminal and run the Celery worker using the following command: `celery -A async_project worker --loglevel=info`.
In a separate terminal, spin up your Django development server using `python manage.py runserver`.
Now, using Postman, your browser, or any other tool capable of making HTTP requests, send a GET request to the following endpoint: `GET http://localhost:8000/backup/backup_data`.
If everything has been set up correctly, the request should trigger the database backup task. After the task has been completed, you can check your Dropbox file explorer to verify if the backup file has been successfully uploaded.
Scheduling Backups with Celery Beat
To automate your backups using Celery Beat and cron jobs, start by installing Celery Beat by running `pip install django-celery-beat`.
Once installed, you will need to add `'django_celery_beat'` to your `INSTALLED_APPS` in the Django settings.
Celery Beat allows you to schedule periodic tasks, such as backing up your PostgreSQL database at regular intervals.
After installing Celery Beat, update the code inside `backup/celery.py` to include the periodic tasks setup.
python
import os
from celery import Celery
from dotenv import load_dotenv
from backup.tasks import backup_postgres_and_upload
from datetime import timedelta
from celery.schedules import crontab
# Set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'async_project.settings')
app = Celery('backupPG')
# Load environment variables from .env file
load_dotenv()
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django apps.
app.autodiscover_tasks()
@app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls the backup task every 10 seconds
sender.add_periodic_task(
timedelta(seconds=10),
backup_postgres_and_upload.s(
database_name=os.environ.get('DATABASE_NAME'),
user=os.environ.get('DATABASE_USER'),
password=os.environ.get('DATABASE_PASSWORD'),
host=os.environ.get('DATABASE_HOST'),
port=5432,
backup_dir='.'
),
name='Backup PostgreSQL every 10 seconds'
)
@app.task(bind=True, ignore_result=True)
def debug_task(self):
print(f'Request: {self.request!r}')
Running Django, Celery Worker, and Celery Beat
Now that everything is set up, you'll need to run three terminal instances.
In the first terminal, start your Django development server using `python manage.py runserver`.
In the second terminal, start the Celery worker using the command `celery -A async_project worker --loglevel=info`.
Finally, in the third terminal, start the Celery Beat scheduler using `celery -A async_project beat --loglevel=info`.
For testing purposes, the task has been set to run every 10 seconds. However, for production environments, it's recommended to set the task to run less frequently, such as weekly. Refer to the [Celery documentation](https://docs.celeryproject.org/en/stable/userguide/periodic-tasks.html) for guidance on production scheduling.
If everything is configured correctly, the database backups should be uploaded to your Dropbox account.
Likely Bugs and Issues
In this section, we discuss potential bugs or issues that might arise while running your Celery and Django setup. It's essential to anticipate and troubleshoot these common problems to ensure a smooth development experience. Proper environment configuration, managing external services like Redis and Dropbox, and handling Celery correctly are key to avoiding headaches during development.
- For windows users, ensure that the WSL environment is properly initiated by installing the necessary VS Code extensions, such as Remote WSL and Remote Explorer.
- If you change permissions in your [dropbox console](https://i.imgur.com/yHDLUt9.png), make sure to generate a new access token to reflect the updated permission settings.
- Ensure that Redis is running in the background, whether locally or via a cloud-hosted Redis server.
- Stop and respawn the Celery worker whenever changes are made to the codebase to ensure those changes are applied.
- If the Celery worker is being run on a Windows (not WSL) terminal, use the `--pool=solo` argument to avoid common issues with running Celery on Windows. Use the command: `celery -A your-application worker -l info --pool=solo`.