85 lines
2.0 KiB
Markdown
85 lines
2.0 KiB
Markdown
# GitHub CI/CD Data Collection Tool
|
|
|
|
This tool is used to pull activity data from specified GitHub repositories within a given time window (e.g., PRs, Issues, Contributors, Stars, etc.), which can then be used for further analysis and visualization.
|
|
|
|
Suitable for:
|
|
- Engineering productivity metrics (PR/Issue/Contributors/Stars within a period)
|
|
- CI/CD change tracking (by time range)
|
|
- Batch data collection for a single repository
|
|
|
|
---
|
|
|
|
## Directory Structure
|
|
|
|
```text
|
|
community/
|
|
collect_github_data.py
|
|
```
|
|
|
|
---
|
|
|
|
## Requirements
|
|
|
|
- Python 3.8+ (recommended 3.10+)
|
|
- Network access to the GitHub API
|
|
- GitHub Personal Access Token (PAT)
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### 1) Create a virtual environment (recommended)
|
|
|
|
```bash
|
|
python3 -m venv venv/dev-cicd
|
|
source venv/dev-cicd/bin/activate
|
|
```
|
|
|
|
(Optional) Upgrade pip:
|
|
|
|
```bash
|
|
pip install -U pip
|
|
```
|
|
|
|
---
|
|
|
|
### 2) Configure a GitHub Token (required)
|
|
|
|
This tool fetches data through the GitHub API. To avoid API rate limits and ensure access to private repositories, it is recommended to configure a personal token.
|
|
|
|
1. Open the token creation page:
|
|
https://github.com/settings/tokens
|
|
2. Create a token (read-only permissions are sufficient)
|
|
3. Export the token as an environment variable:
|
|
|
|
```bash
|
|
export GH_PAT=xxx
|
|
```
|
|
|
|
> ⚠️ Do NOT hardcode your token or commit it into the repository.
|
|
|
|
---
|
|
|
|
### 3) Set collection parameters and run
|
|
|
|
```bash
|
|
export START_DATE=2025-12-08
|
|
export END_DATE=2026-01-24
|
|
export TARGET_REPOS=baidu/vLLM-Kunlun
|
|
|
|
python community/collect_github_data.py
|
|
```
|
|
|
|
---
|
|
|
|
## Parameters
|
|
|
|
The tool is configured via environment variables:
|
|
|
|
| Parameter | Required | Example | Description |
|
|
|----------|----------|---------|-------------|
|
|
| `GH_PAT` | ✅ | `ghp_xxx` | GitHub Personal Access Token |
|
|
| `START_DATE` | ✅ | `2025-12-08` | Collection start date (format: YYYY-MM-DD) |
|
|
| `END_DATE` | ✅ | `2026-01-24` | Collection end date (format: YYYY-MM-DD) |
|
|
| `TARGET_REPOS` | ✅ | `baidu/vLLM-Kunlun` | Target repositories (supports multiple) |
|