examples/blog-articles/github-actions-tutorial/notebook.ipynb
GitHub Actions is a powerful automation platform that helps developers automate tedious, time-wasting software development workflows. Instead of running tests, executing scripts at intervals, or doing any programmable task manually, you can let GitHub Actions take the wheel when certain events happen in your repository. In this tutorial, you will learn how to use this critical feature of GitHub and design your own workflows for several real-world use cases.
At its core, GitHub Actions is a continuous integration and continuous delivery (CI/CD) platform that lets you automate various tasks directly from your GitHub repository. Think of it as your personal robot assistant that can:
Let's look at a common scenario: You are building a Python application that scrapes product prices from various e-commerce websites. Without GitHub actions, you would need to:
With GitHub actions, all of these tasks can be automated through workflows, usually written in YAML files like below:
name: Run Price Scraper
on:
schedule:
- cron: '0 */12 * * *' # Runs every 12 hours
workflow_dispatch: # Allows manual triggers
jobs:
scrape:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Run scraper
env:
API_KEY: ${{ secrets.FIRECRAWL_API_KEY }}
run: python scraper.py
This workflow automatically runs a scraper every 12 hours, handles Python version setup, and securely manages API keys - all without manual intervention.
Throughout this tutorial, we'll build several practical GitHub Actions workflows for Python applications. You will learn how to:
By the end, you will have hands-on experience with GitHub Actions and be able to automate your own Python projects effectively.
Note: Even though code examples are Python, the concepts and hands-on experience you will gain from the tutorial will apply to any programming language.
Let's start by understanding the core concepts that make GitHub Actions work.
To write your own GitHub Actions workflows, you need to understand how its different components work together. Let's break down these core concepts using a practical example: automating tests for a simple Python script.
A workflow is an automated process that you define in a YAML file within your repository's .github/workflows directory. Think of it as a recipe that tells GitHub exactly what to do, how and when to do it. You can transform virtually any programmable task into a GitHub workflow as long as it can be executed in a Linux, Windows, or macOS environment and doesn't require direct user interaction.
Here is a basic workflow structure:
# test.yaml
name: Python Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
The YAML file starts by specifying the name of the workflow with the name field. Immediately after, we specify the events that triggers this workflow. In this example, the workflow automatically executes on each git push command and pull request. We will learn more about events and triggers in a later section.
Next, we define jobs, which are the building blocks of workflows. Each job:
runs-on field.For example, you might have separate jobs for testing and deployment:
jobs:
test:
runs-on: ubuntu-latest
...
deploy:
runs-on: macos-latest
...
Each job can contain one or more steps that are executed sequentially. Steps are individual tasks that make up your job. They can:
For example, a typical test job might have steps to:
Each step can specify:
name: A display name for the stepuses: Reference to an action to runrun: Any operating-system specific terminal command like pip install package or python script.pywith: Input parameters for actionsenv: Environment variables for the stepNow that we understand jobs and steps, let's look at Actions - the reusable building blocks that make GitHub Actions so powerful.
The test.yaml file from earlier has a single test job that executes two steps:
actions/checkout@v3 action.actions/setup-python@v4 and python-version as an input parameter for said action.# test.yaml
name: Python Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
Actions are reusable units of code that can be shared across workflows (this is where GitHub Actions take its name). They are like pre-packaged functions that handle common tasks. For instance, instead of writing code to set up Node.js or caching dependencies, you can use the GitHub official actions like:
actions/setup-node@v3 - Sets up Node.js environmentactions/cache@v3 - Caches dependencies and build outputsactions/upload-artifact@v3 - Uploads workflow artifactsactions/download-artifact@v3 - Downloads workflow artifactsactions/labeler@v4 - Automatically labels pull requestsactions/stale@v8 - Marks and closes stale issues/PRsactions/dependency-review-action@v3 - Reviews dependency changesEvents are specific activities that trigger a workflow. Common triggers include:
push: When code is pushed to the repositorypull_request: When a PR is opened or updatedschedule: At specified times using cron syntaxworkflow_dispatch: Manual trigger via GitHub UIHere is how you can configure multiple triggers:
name: Comprehensive Workflow
on:
push:
branches: [main]
pull_request:
branches: [main]
schedule:
- cron: '0 0 * * *' # Daily at midnight
workflow_dispatch: # Manual trigger
jobs:
process:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run daily tasks
run: python daily_tasks.py
env:
API_KEY: ${{ secrets.FIRECRAWL_API_KEY }}
This example shows how a single workflow can:
git pushTo use the schedule trigger effectively in GitHub Actions, you'll need to understand cron syntax. This powerful scheduling format lets you automate workflows to run at precise times. The syntax uses five fields to specify when a job should run:
Here are some common cron schedule examples:
# Daily at 3:30 AM UTC
- cron: '30 3 * * *'
# Every Monday at 1:00 PM UTC
- cron: '0 13 * * 1'
# Every 6 hours at the first minute
- cron: '0 */6 * * *'
# At minute 15 of every hour
- cron: '15 * * * *'
# Every weekday (Monday through Friday)
- cron: '0 0 * * 1-5'
# Each day at 12am, 6am, 12pm, 6pm on Tuesday, Thursday, Saturday
- cron: '0 0,6,12,18 * * 1,3,5'
Here is a sample workflow for a scraping job with four different schedules (multiple schedules are allowed):
name: Price Scraper Schedules
on:
schedule:
- cron: '0 */4 * * *' # Every 4 hours
- cron: '30 1 * * *' # Daily at 1:30 AM UTC
- cron: '0 9 * * 1-5' # Weekdays at 9 AM UTC
jobs:
scrape:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Firecrawl scraper
env:
API_KEY: ${{ secrets.FIRECRAWL_API_KEY }}
run: python scraper.py
Remember that GitHub Actions runs on UTC time, and schedules might experience slight delays during peak GitHub usage. That's why it's helpful to combine schedule with workflow_dispatch as we saw earlier - giving you both automated and manual trigger options.
Understanding these core concepts allows you to create workflows that are efficient (running only when needed), secure (properly handling sensitive data), maintainable (using reusable actions) and scalable (running on different platforms).
In the next section, we will put these concepts into practice by creating your first GitHub actions workflow.
Let's create a practical GitHub Actions workflow from scratch. We'll build a workflow that automatically tests a Python script and runts it on a schedule - a universal task applicable to any programming language.
Let's start by creating a working directory for this mini-project:
mkdir first-workflows
cd first-workflows
Let's create the standard .github/workflows folder structure GitHub uses for detecting workflow files:
mkdir -p .github/workflows
The workflow files can have any name but must have a .yml extension:
touch .github/workflows/system_monitor.yml
In addition to the workflows folder, create a tests folder as well as a test file:
mkdir tests
touch tests/test_main.py
We should also create the main.py file along with a requirements.txt:
touch main.py requirements.txt
Then, add these two dependencies to requirements.txt:
psutil>=5.9.0
pytest>=7.0.0
Finally, let's initialize git and make our first commit:
git init
git add .
git commit -m "Initial commit"
Check out the Git documentation if you don't have it installed already.
Let's write the workflow logic first. Open system_monitor.yml and paste each code snippet we are about to define one after the other.
name: System Monitoring
on:
schedule:
- cron: '*/30 * * * *' # Run every 30 minutes
workflow_dispatch: # Enables manual trigger
In this part, we give a descriptive name to the workflow that appears in GitHub's UI. Using the on field, we set the workflow to run every 30 minutes and through a manual trigger.
jobs:
run_script:
runs-on: ubuntu-latest
jobs contains all the jobs in this workflow and it has a run_script name, which is a unique identifier.
There are five steps that run sequentially in this workflow. They are given descriptive names that appear in the GitHub UI and uses official GitHub actions and custom terminal commands.
jobs:
monitor:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run tests
run: pytest tests/
- name: Collect system metrics
run: python main.py
Here is what each step does:
actions/checkout@v3.pippytest packagetests directory using pytest.python main.py.Notice the use of | (pipe) operator for multi-line commands.
After you complete writing the workflow, commit the changes to Git:
git add .
git commit -m "Add a workflow file for monitoring system resources"
Now, let's write the main.py file, which is a monitoring script that helps software developers track system resource usage over time, enabling them to identify performance bottlenecks and capacity issues in their development environment.
import psutil
import json
from datetime import datetime
from pathlib import Path
This script collects and logs system metrics over time. It uses psutil to gather CPU usage, memory usage, disk usage, and active process counts. The metrics are timestamped and saved to JSON files organized by date.
The script has three main functions:
def get_system_metrics():
"""Collect key system metrics"""
metrics = {
"cpu_percent": psutil.cpu_percent(interval=1),
"memory_percent": psutil.virtual_memory().percent,
"disk_usage": psutil.disk_usage('/').percent,
"timestamp": datetime.now().isoformat()
}
# Add running processes count
metrics["active_processes"] = len(psutil.pids())
return metrics
get_system_metrics() - Collects current system metrics including CPU percentage, memory usage percentage, disk usage percentage, timestamp, and count of active processes.
def save_metrics(metrics):
"""Save metrics to a JSON file with today's date"""
date_str = datetime.now().strftime("%Y-%m-%d")
reports_dir = Path("system_metrics")
reports_dir.mkdir(exist_ok=True)
# Save to daily file
file_path = reports_dir / f"metrics_{date_str}.json"
# Load existing metrics if file exists
if file_path.exists():
with open(file_path) as f:
daily_metrics = json.load(f)
else:
daily_metrics = []
# Append new metrics
daily_metrics.append(metrics)
# Save updated metrics
with open(file_path, 'w') as f:
json.dump(daily_metrics, f, indent=2)
save_metrics() - Handles saving the metrics to JSON files. It creates a system_metrics directory if needed, and saves metrics to date-specific files (e.g. metrics_2024-12-12.json). If a file for the current date exists, it loads and appends to it, otherwise creates a new file.
def main():
try:
metrics = get_system_metrics()
save_metrics(metrics)
print(f"System metrics collected at {metrics['timestamp']}")
print(f"CPU: {metrics['cpu_percent']}% | Memory: {metrics['memory_percent']}%")
return True
except Exception as e:
print(f"Error collecting metrics: {str(e)}")
return False
if __name__ == "__main__":
main()
main() - Orchestrates the metric collection and saving process. It calls get_system_metrics(), saves the data via save_metrics(), prints current CPU and memory usage to console, and handles any errors that occur during execution.
The script can be run directly or imported as a module. When run directly (which is what happens in a GitHub Actions workflow), it executes the main() function which collects and saves one set of metrics.
Combine the code snippets above into the main.py file and commit the changes:
git add .
git commit -m "Add the main.py functionality"
Testing is a critical part of software engineering workflows for several reasons:
For our system metrics collection script, tests would be valuable to verify:
get_system_metrics() function returns data in the expected format with valid rangessave_metrics() function properly handles file operations and JSON serializationmain() function orchestrates the workflow as intendedWith that said, let's work on the tests/test_main.py file:
import json
from datetime import datetime
from pathlib import Path
from main import get_system_metrics, save_metrics
The test file we are about to write demonstrates key principles of testing with pytest, a popular Python testing framework. Pytest makes it easy to write tests by using simple assert statements and providing a rich set of features for test organization and execution. The test functions are automatically discovered by pytest when their names start with test_, and each function tests a specific aspect of the system's functionality.
def test_get_system_metrics():
"""Test if system metrics are collected correctly"""
metrics = get_system_metrics()
# Check if all required metrics exist and are valid
assert 0 <= metrics['cpu_percent'] <= 100
assert 0 <= metrics['memory_percent'] <= 100
assert metrics['active_processes'] > 0
assert 'timestamp' in metrics
In this example, we have two test functions that verify different components of our metrics collection system. The first test, test_get_system_metrics(), checks if the metrics collection function returns data in the expected format and with valid ranges. It uses multiple assert statements to verify that CPU and memory percentages are between 0 and 100, that there are active processes, and that a timestamp is included. This demonstrates the practice of testing both the structure of returned data and the validity of its values.
def test_save_and_read_metrics():
"""Test if metrics are saved and can be read back"""
# Get and save metrics
metrics = get_system_metrics()
save_metrics(metrics)
# Check if file exists and contains data
date_str = datetime.now().strftime("%Y-%m-%d")
file_path = Path("system_metrics") / f"metrics_{date_str}.json"
assert file_path.exists()
with open(file_path) as f:
saved_data = json.load(f)
assert isinstance(saved_data, list)
assert len(saved_data) > 0
The second test, test_save_and_read_metrics(), showcases integration testing by verifying that metrics can be both saved to and read from a file. It follows a common testing pattern: arrange (setup the test conditions), act (perform the operations being tested), and assert (verify the results). The test ensures that the file is created in the expected location and that the saved data maintains the correct structure. This type of test is particularly valuable as it verifies that different components of the system work together correctly.
Combine the above code snippets and commit the changes to GitHub:
git add .
git commit -m "Write tests for main.py"
Now that we've created our system monitoring workflow, let's set it up on GitHub and run it. First, push everything we have to a new GitHub repository:
git remote add origin https://github.com/your-username/your-repository.git
git branch -M main
git push -u origin main
Once the workflow file is pushed, GitHub automatically detects it and displays in the "Actions" tab of your repository. The workflow is scheduled so you don't need to do anything - the first workflow run will happen within 30 minutes (remember how we set the running interval to */30 with cron). Since the workflow also includes a workflow_dispatch field, you can trigger it manually by clicking on the "Run workflow" button.
After clicking the button, a new run appears within a few seconds (refresh if you don't see it). To see the workflow run in real-time, click on it and expand the monitor job. You'll see each step executing:
Right now, our workflow file has a problem - while it successfully collects metrics, it doesn't commit and push the changes back to the repository. This means that although metrics are being gathered, they aren't being saved in version control. Let's modify the workflow to automatically commit and push the collected metrics:
name: System Monitor
on:
schedule:
- cron: '*/30 * * * *'
workflow_dispatch:
permissions:
contents: write
jobs:
monitor:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run tests
run: python -m pytest
- name: Collect metrics
run: python main.py
- name: Commit and push changes
run: |
git config --global user.name 'github-actions[bot]'
git config --global user.email 'github-actions[bot]@users.noreply.github.com'
git add metrics.json
git commit -m "Update metrics" || exit 0
git push
The key changes in this updated workflow are:
permissions block with contents: write to allow the workflow to push changes back to the repository.github-actions botmetrics.json fileThe "|| exit 0" after git commit ensures the workflow doesn't fail if there are no changes to commit.
This allows the workflow to automatically save and version control the metrics it collects. Let's commit the changes to the workflow and push:
git add .
git commit -m "Add a commit step to the workflow file"
git push origin main
After this change, try running the workflow manually and verify its success by navigating to the Actions tab in your GitHub repository. You should see the workflow run and the metrics.json file updated with new data.
When building automated workflows with GitHub Actions, proper handling of sensitive data like API keys, passwords, and access tokens is crucial for security. Let's explore best practices for managing these credentials.
Environment variables in GitHub Actions can be set at different levels:
Here is how to properly configure and use them:
First, store sensitive values as repository secrets:
API_KEYDATABASE_URLAWS_ACCESS_KEYReference secrets in your workflow file using the secrets context:
name: Web Scraping Pipeline
# ... the rest of the file
jobs:
scrape:
runs-on: ubuntu-latest
steps:
# ... the rest of the steps
- name: Run scraper
env:
API_KEY: ${{ secrets.API_KEY }}
DATABASE_URL: ${{ secrets.DATABASE_URL }}
run: python scraper.py
Above, the "Run scraper" step executes scraper.py which relies on two environment variables configured through the secrets context.
For local development, use .env files to manage environment variables:
touch .env
echo "API_KEY='your-api-key-here'" >> .env
echo "DATABASE_URL='postgresql://user:pass@localhost:5432/db'" >> .env
Create a .gitignore file to prevent committing sensitive data:
echo ".env" >> .gitignore
Use python-dotenv to load variables from .env files:
from dotenv import load_dotenv
import os
# Load environment variables from .env file
load_dotenv()
# Access variables
api_key = os.getenv('API_KEY')
database_url = os.getenv('DATABASE_URL')
if not api_key or not database_url:
raise ValueError("Missing required environment variables")
This code demonstrates loading environment variables from a .env file using python-dotenv. The load_dotenv() function reads the variables, which can then be accessed via os.getenv(). Basic validation ensures required variables exist.
Create a configuration class to validate environment variables:
from pydantic import BaseSettings, SecretStr
class Settings(BaseSettings):
api_key: SecretStr
database_url: str
debug_mode: bool = False
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
# Initialize settings
settings = Settings()
This approach using Pydantic provides several advantages for environment variable management:
debug_mode demonstrates setting defaultsSecretStr provides secure handling of sensitive valuesThe Settings class inherits from BaseSettings which automatically loads from environment variables. The Config class specifies to also load from a .env file.
Using settings = Settings() creates a validated configuration object that can be imported and used throughout the application. This is more robust than accessing os.environ directly.
Example usage:
settings.api_key.get_secret_value() # Securely access API key
settings.database_url # Type-checked database URL
settings.debug_mode # Boolean with default value
Handle different environments (development, staging, production) using environment-specific files:
.env # Default environment variables
.env.development # Development-specific variables
.env.staging # Staging-specific variables
.env.production # Production-specific variables
Load the appropriate file based on the environment:
from dotenv import load_dotenv
import os
env = os.getenv('ENVIRONMENT', 'development')
env_file = f'.env.{env}'
load_dotenv(env_file)
This approach allows you to maintain separate configurations for different environments while keeping sensitive information secure. The environment-specific files can contain different values for the same variables, such as:
You can also combine this with the Pydantic settings approach shown earlier for robust configuration management across environments.
For example, staging might use a test database while production uses the live database:
# .env.staging:
DATABASE_URL=postgresql://test-db.example.com
API_KEY=test-key
.env.production:
DATABASE_URL=postgresql://prod-db.example.com
API_KEY=live-key
This separation helps prevent accidental use of production resources during development and testing.
Let's explore three practical examples of GitHub Actions workflows for common Python tasks: web scraping, package publishing, and container builds.
Web scraping is a common use case for automated workflows. Let's build a workflow that scrapes Hacker News on a schedule using Firecrawl, which is a Python AI-based web scraping engine designed for large-scale data collection. Here are some key benefits that make Firecrawl an excellent choice for this task:
Let's build our web scraping workflow using Firecrawl to demonstrate these capabilities.
# Create project directory and install dependencies
mkdir hacker-news-scraper && cd hacker-news-scraper
pip install firecrawl-py pydantic python-dotenv
# Create necessary files
touch requirements.txt scraper.py .env
# Add dependencies to requirements.txt
echo "firecrawl-py\npydantic\npython-dotenv" > requirements.txt
# Add Firecrawl API key to .env (get your key at firecrawl.dev/signin/signup)
echo "FIRECRAWL_API_KEY='your_api_key_here'" > .env
Open the scraper script where we define our scraping logic:
# scraper.py
import json
from firecrawl import FirecrawlApp
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from typing import List
from datetime import datetime
load_dotenv()
BASE_URL = "https://news.ycombinator.com/"
First, we import necessary libraries and packages, also defining a base URL we are going to scrape.
class NewsItem(BaseModel):
title: str = Field(description="The title of the news item")
source_url: str = Field(description="The URL of the news item")
author: str = Field(
description="The URL of the post author's profile concatenated with the base URL."
)
rank: str = Field(description="The rank of the news item")
upvotes: str = Field(description="The number of upvotes of the news item")
date: str = Field(description="The date of the news item.")
class NewsData(BaseModel):
news_items: List[NewsItem]
We define two Pydantic models to structure our scraped data:
NewsItem - Represents a single news item with fields for title, URL, author, rank, upvotes and dateNewsData - Contains a list of NewsItem objectsThese models help validate the scraped data and ensure it matches our expected schema. They also make it easier to serialize/deserialize the data when saving to JSON. Using Field with a detailed description is crucial because Firecrawl uses these definitions to automatically detect the HTMl elements and CSS selectors we are looking for.
def get_news_data():
app = FirecrawlApp()
data = app.scrape_url(
BASE_URL,
params={
"formats": ["extract"],
"extract": {"schema": NewsData.model_json_schema()},
},
)
return data
The get_news_data() function uses Firecrawl to scrape Hacker News. It creates a FirecrawlApp instance and calls scrape_url() with the BASE_URL and parameters specifying we want to extract data according to our NewsData schema. The schema helps Firecrawl automatically identify and extract the relevant HTML elements. The function returns the scraped data containing news items with their titles, URLs, authors, ranks, upvotes and dates.
def save_firecrawl_news_data():
"""
Save the scraped news data to a JSON file with the current date in the filename.
"""
# Get the data
data = get_news_data()
# Format current date for filename
date_str = datetime.now().strftime("%Y_%m_%d_%H_%M")
filename = f"firecrawl_hacker_news_data_{date_str}.json"
# Save the news items to JSON file
with open(filename, "w") as f:
json.dump(data["extract"]["news_items"], f, indent=4)
print(f"{datetime.now()}: Successfully saved the news data.")
if __name__ == "__main__":
save_firecrawl_news_data()
The save_firecrawl_news_data() function handles saving the scraped Hacker News data to a JSON file. It first calls get_news_data() to fetch the latest data from Hacker News. Then it generates a filename using the current timestamp to ensure uniqueness. The data is saved to a JSON file with that filename, with the news items formatted with proper indentation for readability. Finally, it prints a confirmation message with the current timestamp when the save is complete. This function provides a convenient way to store snapshots of Hacker News data that can be analyzed later.
Combine these snippets into the scraper.py script. Then, we can write a workflow that executes it on schedule:
cd .. # Change back to the project root directory
touch .github/workflows/hacker-news-scraper.py # Create the workflow file
Here is what the workflow file must look like:
name: Run Hacker News Scraper
permissions:
contents: write
on:
schedule:
- cron: "0 */6 * * *"
workflow_dispatch:
jobs:
scrape:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r hacker-news-scraper/requirements.txt
- name: Run scraper
run: python hacker-news-scraper/scraper.py
- name: Commit and push if changes
run: |
git config --local user.email "github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
git add .
git commit -m "Update scraped data" -a || exit 0
git push
This workflow runs our Hacker News scraper every 6 hours using GitHub Actions. It sets up Python, installs dependencies, executes the scraper, and automatically commits any new data to the repository. The workflow can also be triggered manually using the workflow_dispatch event. One important note about the paths specified in the workflow file is that they must match your repository's directory structure exactly, including the requirements.txt location and the path to your scraper script.
To enable the workflow, simply push all the changes to GitHub and test it through the UI. The next runs will be automatic.
git add .
git commit -m "Add a scraping workflow"
git push origin main
Publishing Python packages to PyPI (Python Package Index) typically involves several steps. First, developers need to prepare their package by creating a proper directory structure, writing setup files, and ensuring all metadata is correct. Then, the package needs to be built into distribution formats - both source distributions (sdist) and wheel distributions (bdist_wheel). Finally, these distribution files are uploaded to PyPI using tools like twine. This process often requires careful version management and proper credentials for the package repository. While this can be done manually, automating it with CI/CD pipelines like GitHub Actions ensures consistency and reduces human error in the release process.
For example, the following workflow publishes a new version of a package when you create a new release:
name: Publish Python Package
on:
release:
types: [created]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build twine
- name: Build package
run: python -m build
- name: Publish to PyPI
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
run: |
python -m twine upload dist/*
The workflow automates publishing Python packages to PyPI when GitHub releases are created.
Required setup steps:
setup.py or pyproject.toml configuredpypi.orgPYPI_API_TOKEN in repository secretsThe workflow process:
The __token__ username is used with PyPI's token authentication, while the actual token is accessed securely through GitHub secrets.
GitHub Actions can also automate building and pushing Docker containers to container registries like Docker Hub or GitHub Container Registry (GHCR). This workflow is useful for maintaining containerized applications and ensuring your latest code changes are packaged into updated container images.
The process typically involves:
DockerfileThis automation ensures your container images stay in sync with code changes and are readily available for deployment. Here is a sample workflow containing these steps:
name: Build and Push Container
on:
push:
branches: [main]
paths:
- 'Dockerfile'
- 'src/**'
workflow_dispatch:
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: |
user/app:latest
user/app:${{ github.sha }}
This workflow introduces a few new GitHub Actions concepts and syntax:
The paths trigger filter ensures the workflow only runs when changes are made to the Dockerfile or files in the src directory, preventing unnecessary builds.
docker/setup-buildx-action configures Docker Buildx, which provides enhanced build capabilities including multi-platform builds and build caching.
docker/login-action handles registry authentication. Before using this, you must:
DOCKERHUB_USERNAME and DOCKERHUB_TOKEN as repository secrets in GitHubdocker/build-push-action is a specialized action for building and pushing Docker images. The configuration shows:
context: . (builds from current directory)push: true (automatically pushes after building)tags: specifies multiple tags including:
latest: rolling tag for most recent versiongithub.sha: unique tag using commit hash for versioningThe workflow assumes you have:
src directoryWhen this workflow runs successfully, it produces a containerized version of your application that is automatically published to Docker Hub and can be pulled with either the latest tag or specific commit hash.
Throughout this tutorial, we've explored the fundamentals and practical applications of GitHub Actions for Python development. From understanding core concepts like workflows, jobs, and actions, to implementing real-world examples including automated testing, web scraping, package publishing, and container builds, you've gained hands-on experience with this powerful automation platform. We've also covered critical aspects like managing sensitive data through environment variables and secrets, ensuring your automated workflows are both secure and maintainable.
As you continue your journey with GitHub Actions, remember that automation is an iterative process. Start small with basic workflows, test thoroughly, and gradually add complexity as needed. The examples provided here serve as templates that you can adapt and expand for your specific use cases. For further learning, explore the GitHub Actions documentation, join the GitHub Community Forum, and experiment with the vast ecosystem of pre-built actions available in the GitHub Marketplace. Whether you're building a personal project or managing enterprise applications, GitHub Actions provides the tools you need to streamline your development workflow and focus on what matters most - writing great code.
If you want to learn more about Firecrawl, the web scraping API we used today, you can read the following posts:
Thank you for reading!