From 3b59d0ebe9d0ce309b95e69ec35b949c6179b473 Mon Sep 17 00:00:00 2001 From: wangxiyuan Date: Thu, 26 Feb 2026 08:47:46 +0800 Subject: [PATCH] [Doc][Feature] Add vLLM Ascend development guidelines AGETNS.md (#6797) ### What this PR does / why we need it? This PR adds a new document, `AGENTS.md`, which provides detailed development guidelines for contributors to the vLLM Ascend project. These guidelines cover code style, testing, NPU-specific considerations, and the contribution process to ensure code quality and consistency. ### Does this PR introduce _any_ user-facing change? No, this is a documentation-only update for developers. ### How was this patch tested? This is a documentation change and does not require testing. - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd Signed-off-by: wangxiyuan --- .pre-commit-config.yaml | 1 + AGENTS.md | 367 ++++++++++++++++++++++++++++++++++++++++ CLAUDE.md | 1 + 3 files changed, 369 insertions(+) create mode 100644 AGENTS.md create mode 100644 CLAUDE.md diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 530edd22..9cad2e5f 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -47,6 +47,7 @@ repos: rev: v0.45.0 hooks: - id: markdownlint + args: ['--fix'] exclude: '.*\.inc\.md$|.*report_template\.md$|.*contributors\.md$|.*PULL_REQUEST_TEMPLATE\.md$' stages: [manual] # Only run in CI diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..6795a8a3 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,367 @@ +# vLLM Ascend Development Guidelines + +This document provides instructions for contributors to the vLLM Ascend project. Please read and follow these guidelines to ensure code quality, maintainability, and consistency. + +--- + +## Table of Contents + +- [Setup and Environment](#setup-and-environment) + - [Environment Variables](#environment-variables) + - [Environment Variable Review Requirement](#environment-variable-review-requirement) +- [Testing](#testing) + - [Unit and System Tests](#unit-and-system-tests) + - [Running Tests](#running-tests) +- [Code Style](#code-style) + - [Python Conventions](#python-conventions) + - [Naming Conventions](#naming-conventions) +- [NPU-Specific Considerations](#npu-specific-considerations) + - [Tensor item() Operations](#tensor-item-operations) + - [Memory and Performance](#memory-and-performance) +- [Model and Plugin Architecture](#model-and-plugin-architecture) + - [vLLM Ascend Plugin Architecture](#vllm-ascend-plugin-architecture) + - [Patching Requirement](#patching-requirement) + - [Model Runner Changes](#model-runner-changes) +- [Commit Messages and Pull Requests](#commit-messages-and-pull-requests) + - [Commit Message Format](#commit-message-format) +- [Review Checklist](#review-checklist) + - [Code Quality](#code-quality) + - [Testing](#testing-1) + - [Documentation](#documentation) + - [NPU Considerations](#npu-considerations) + - [Commit and PR](#commit-and-pr) +- [Quick Start for Contributors](#quick-start-for-contributors) +- [References](#references) + +--- + +## Setup and Environment + +### Environment Variables + +All environment variables must be defined in `vllm_ascend/envs.py` using the centralized `env_variables` dictionary. + +**Requirements:** + +- Add documentation for each environment variable in the `env_variables` dict comment +- Specify default values and valid ranges +- Indicate whether the variable is sensitive (credentials, keys) + +**Example:** + +```python +env_variables = { + "VLLM_ASCEND_ENABLE_NZ": lambda: int(os.getenv("VLLM_ASCEND_ENABLE_NZ", 1)), + # ... +} +``` + +**Never**: Hardcode environment variable names throughout the codebase. Reference them from the central module using `from vllm_ascend import envs`. + +### Environment Variable Review Requirement + +**Strict Review Required**: All new environment variables must undergo code review. + +Reviewers must verify: + +- The variable name follows the `VLLM_ASCEND_*` naming convention +- Default value is appropriate for all supported hardware +- Documentation is added to the `env_variables` dict +- The variable is used in a performance-critical path + +--- + +## Testing + +### Unit and System Tests + +**Requirement**: All new functionality requires corresponding tests. + +- **Unit Tests (UT)**: Located in `tests/ut/`, cover core logic, edge cases, and error conditions +- **System Tests (ST)**: Located in `tests/e2e/`, verify end-to-end behavior and integration points +- **Nightly Tests**: Include benchmarks for NPU-specific code paths in `tests/e2e/nightly/` + +**Test Coverage Guidelines:** + +- New features: Tests must cover happy path and failure modes +- Bug fixes: Tests must include a regression test for the bug +- Performance-critical code: Include benchmarks and performance regression tests + +### Running Tests + +```bash + + +# Run specific unit test file +pytest -sv tests/ut/ops/test_prepare_finalize.py + +# Run specific unit test +pytest -sv tests/ut/ops/test_prepare_finalize.py::test_prepare_inputs + +# Run NPU-specific tests (requires NPU hardware) +pytest -sv tests/e2e/singlecard/test_piecewise_res_consistency +``` + +**Requirement**: Run all tests locally before requesting review. Verify tests pass on NPU hardware for NPU-specific changes. + +--- + +## Code Style + +### Python Conventions + +- **Imports**: All imports at the top of the file. Valid exceptions: + - Circular imports (use inline imports) + - Lazy loading for worker/isolation processes + - Type-checking imports wrapped in `if TYPE_CHECKING:` + +- **Global Variables**: Avoid new global variables. Pass dependencies explicitly through function parameters. + + **Allowed:** + - Constants named `ALL_UPPER_CASE` (e.g., `MAX_BATCH_SIZE` in `envs.py`) + - Immutable configuration objects + + **Requires Approval:** + - Any new mutable global state + +- **No Magic Numbers**: Use named constants with descriptive names: + + ```python + # Bad + if seq_len > 2048: ... + + # Good + MAX_CONTEXT_LENGTH = 2048 + if seq_len > MAX_CONTEXT_LENGTH: ... + ``` + +- **Descriptive Naming**: Use names that describe functionality, not implementation details. + + ```python + # Bad + is_deepseek_v3_r1 + flag1 + tmp_var + + # Good + supports_dynamic_temperature + uses_speculative_decoding + ``` + +### Naming Conventions + +- **Classes**: `PascalCase` (e.g., `NPUModelRunner`, `AscendSampler`, `ACLGraphManager`) +- **Functions/Methods**: `snake_case` (e.g., `forward_pass`, `compute_attention`) +- **Constants**: `ALL_UPPER_CASE` (e.g., `MAX_BATCH_SIZE`, `VLLM_ASCEND_ENABLE_NZ`) +- **Variables**: `snake_case` (e.g., `token_ids`, `sequence_lengths`) + +--- + +## NPU-Specific Considerations + +### Tensor item() Operations + +**Warning**: `tensor.item()` operations cause synchronization overhead on NPU when the `tensor` is on device. + +If the `tensor` is a device tensor, the operator `item()` will trigger a synchronous data transfer from NPU to CPU, which can severely degrade performance in hot paths, cause this will make `AsyncScheduler` block here. + +**Review Requirements:** + +1. Profile performance impact before merging +2. Consider alternative patterns: + - Keep values on device when possible + - Batch operations to reduce sync frequency + - Use device-side operations (e.g., `torch.argmax`, `torch.sum`) +3. Document when `item()` is unavoidable (e.g., logging, conditional logic) + +**Example Patterns:** + +```python +# Bad: In hot loop - causes sync per iteration +for tensor in tensors: + value = tensor.item() + +# Better: Batch operations - single sync +values = [t.item() for t in tensors] # Single batch sync + +# Good: Keep on device when possible +max_value = torch.max(tensor) # No sync needed +if max_value > threshold: # Comparison can stay on device + ... +``` + +### Memory and Performance + +Additional NPU-specific best practices: + +- Avoid CPU-NPU memory transfers in hot paths +- Prefer in-place operations where safe (e.g., `x.add_()`, `x.mul_()`) +- Monitor memory fragmentation, especially for long-running processes +- Test with realistic workloads on actual NPU hardware (Ascend 910B/C) + +--- + +## Model and Plugin Architecture + +### vLLM Ascend Plugin Architecture + +vLLM Ascend is a **hardware plugin** that integrates with upstream vLLM via the pluggable hardware interface. It does not add new model files directly. + +**Required Pattern**: Model-specific functionality should be implemented via: + +1. **Patching** (in `vllm_ascend/patch/`): + - `vllm_ascend/patch/platform/` - Platform-level patches (distributed, scheduling) + - `vllm_ascend/patch/worker/` - Worker-level patches (model-specific behavior) + - Example: `patch_deepseek.py` modifies upstream Deepseek model behavior + - Patch is not the best solution for all cases. Use it when necessary. + +2. **Inheritance**: + - `NPUModelRunner(GPUModelRunner)` - Extend vLLM model runner with NPU-specific behavior + - `AscendSampler` - Extend vLLM sampler with NPU-specific operations + - Add NPU-specific components via composition (e.g., `AclGraphManager`) + - Custom Operators - NPU-specific custom operators (e.g., `AscendRMSNorm`) + +3. **External upstream contributions** where appropriate + +### Patching Requirement + +**Strict Review Required**: All new patches must undergo thorough architectural review. + +Reviewers must verify: + +- The patch targets the correct upstream component +- The patch is minimal and focused +- Performance implications are understood +- A long-term plan exists for upstream contribution + +**Example Patch Pattern:** + +```python +# vllm_ascend/patch/worker/patch_deepseek.py +from vllm.model_executor.models.deepseek_v2 import DeepseekV2Model + +def forward(self, input_ids, positions, ...): + # NPU-specific forward implementation + ... + +DeepseekV2Model.forward = forward # Patch upstream class +``` + +### Model Runner Changes + +**Strict Review Required**: All new behaviors added to `model_runner` must undergo thorough architectural review. + +Reviewers must verify: + +- The necessity of the new behavior (why can't this be in a patch?) +- Performance implications on NPU hardware +- Compatibility with existing model implementations +- Long-term maintainability and test coverage + +**NPU Model Runner Files:** + +- `vllm_ascend/worker/model_runner_v1.py` - vLLM v1 model runner +- `vllm_ascend/worker/v2/model_runner.py` - vLLM v2 model runner +- `vllm_ascend/_310p/model_runner_310p.py` - Ascend 310P model runner + +--- + +## Commit Messages and Pull Requests + +### Commit Message Format + +Follow the [Conventional Commits](https://www.conventionalcommits.org/) format: + +```txt +: + + +``` + +**Valid Types**: `feat`, `fix`, `perf`, `refactor`, `test`, `docs`, `chore` + +**Good Examples:** + +```txt +feat(npu): add flash attention support for Ascend CANN + +- Implements FlashAttention-2 kernel for NPU backend +- Reduces memory usage by 30% compared to baseline + +fix(model_runner): correct padding token handling + +- Fixes token padding that caused incorrect attention masks +- Addresses issue #1234 + +perf: avoid CPU-NPU sync in attention computation + +- Inline computation to avoid tensor.item() calls +- Improves throughput by 15% +``` + +**Bad Examples:** + +```txt +fix bug +add feature +update code +``` + +--- + +## Review Checklist + +Before merging, verify: + +### Code Quality + +- [ ] Code follows style guidelines (naming, imports, no magic numbers) +- [ ] No global state added without justification +- [ ] Patching pattern used correctly (if applicable) +- [ ] No direct model file additions + +### Testing + +- [ ] New tests added for new functionality (`tests/ut/` or `tests/e2e/`) +- [ ] Existing tests pass +- [ ] NPU-specific tests verified on actual hardware +- [ ] Performance benchmarks included where applicable + +### Documentation + +- [ ] Environment variables documented +- [ ] Public APIs documented +- [ ] User-facing changes reflected in docs + +### NPU Considerations + +- [ ] `tensor.item()` usage reviewed for performance impact +- [ ] No unnecessary CPU-NPU transfers in hot paths +- [ ] Memory usage verified on NPU hardware + +### Commit and PR + +- [ ] Commit messages are clear and descriptive +- [ ] PR description is complete +- [ ] All review comments addressed + +--- + +## Quick Start for Contributors + +1. Install development dependencies: `pip install -e .[dev]` +2. Run tests: `pytest tests/` +3. Check linting: `ruff check vllm_ascend/` +4. Format code: `ruff format vllm_ascend/` +5. Make your changes following guidelines in this document +6. Add tests for new behavior +7. Run full test suite before committing +8. Create a PR with clear description + +--- + +## References + +- [vLLM Hardware Plugin RFC](https://github.com/vllm-project/vllm/issues/11162) +- [Documentation](https://docs.vllm.ai/projects/ascend/en/latest/) +- [Contributors Guide](https://docs.vllm.ai/projects/ascend/en/latest/community/contributors.html) diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..302cdeab --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +IMPORTANT: Ensure you’ve thoroughly reviewed the [AGENTS.md](AGENTS.md) file before beginning any work.