Files

GitHub CI/CD Data Collection Tool

This tool is used to pull activity data from specified GitHub repositories within a given time window (e.g., PRs, Issues, Contributors, Stars, etc.), which can then be used for further analysis and visualization.

Suitable for:

  • Engineering productivity metrics (PR/Issue/Contributors/Stars within a period)
  • CI/CD change tracking (by time range)
  • Batch data collection for a single repository

Directory Structure

community/
  collect_github_data.py

Requirements

  • Python 3.8+ (recommended 3.10+)
  • Network access to the GitHub API
  • GitHub Personal Access Token (PAT)

Quick Start

python3 -m venv venv/dev-cicd
source venv/dev-cicd/bin/activate

(Optional) Upgrade pip:

pip install -U pip

2) Configure a GitHub Token (required)

This tool fetches data through the GitHub API. To avoid API rate limits and ensure access to private repositories, it is recommended to configure a personal token.

  1. Open the token creation page:
    https://github.com/settings/tokens
  2. Create a token (read-only permissions are sufficient)
  3. Export the token as an environment variable:
export GH_PAT=xxx

⚠️ Do NOT hardcode your token or commit it into the repository.


3) Set collection parameters and run

export START_DATE=2025-12-08
export END_DATE=2026-01-24
export TARGET_REPOS=baidu/vLLM-Kunlun

python community/collect_github_data.py

Parameters

The tool is configured via environment variables:

Parameter Required Example Description
GH_PAT ghp_xxx GitHub Personal Access Token
START_DATE 2025-12-08 Collection start date (format: YYYY-MM-DD)
END_DATE 2026-01-24 Collection end date (format: YYYY-MM-DD)
TARGET_REPOS baidu/vLLM-Kunlun Target repositories (supports multiple)