GitHub CI/CD Data Collection Tool
This tool is used to pull activity data from specified GitHub repositories within a given time window (e.g., PRs, Issues, Contributors, Stars, etc.), which can then be used for further analysis and visualization.
Suitable for:
- Engineering productivity metrics (PR/Issue/Contributors/Stars within a period)
- CI/CD change tracking (by time range)
- Batch data collection for a single repository
Directory Structure
community/
collect_github_data.py
Requirements
- Python 3.8+ (recommended 3.10+)
- Network access to the GitHub API
- GitHub Personal Access Token (PAT)
Quick Start
1) Create a virtual environment (recommended)
python3 -m venv venv/dev-cicd
source venv/dev-cicd/bin/activate
(Optional) Upgrade pip:
pip install -U pip
2) Configure a GitHub Token (required)
This tool fetches data through the GitHub API. To avoid API rate limits and ensure access to private repositories, it is recommended to configure a personal token.
- Open the token creation page:
https://github.com/settings/tokens - Create a token (read-only permissions are sufficient)
- Export the token as an environment variable:
export GH_PAT=xxx
⚠️ Do NOT hardcode your token or commit it into the repository.
3) Set collection parameters and run
export START_DATE=2025-12-08
export END_DATE=2026-01-24
export TARGET_REPOS=baidu/vLLM-Kunlun
python community/collect_github_data.py
Parameters
The tool is configured via environment variables:
| Parameter | Required | Example | Description |
|---|---|---|---|
GH_PAT |
✅ | ghp_xxx |
GitHub Personal Access Token |
START_DATE |
✅ | 2025-12-08 |
Collection start date (format: YYYY-MM-DD) |
END_DATE |
✅ | 2026-01-24 |
Collection end date (format: YYYY-MM-DD) |
TARGET_REPOS |
✅ | baidu/vLLM-Kunlun |
Target repositories (supports multiple) |