sglang/python/sglang/global_config.py

"""Global configurations"""

import os


class GlobalConfig:
    """
    Store some global constants.

    See also python/sglang/srt/managers/schedule_batch.py::global_server_args_dict, which stores
    many global runtime arguments as well.
    """

    def __init__(self):
        # Verbosity level
        # 0: do not output anything
        # 2: output final text after every run
        self.verbosity = 0

        # Default backend of the language
        self.default_backend = None

        # Runtime constants: New generation token ratio estimation
        self.default_init_new_token_ratio = float(
            os.environ.get("SGLANG_INIT_NEW_TOKEN_RATIO", 0.7)
        )
        self.default_min_new_token_ratio_factor = float(
            os.environ.get("SGLANG_MIN_NEW_TOKEN_RATIO_FACTOR", 0.14)
        )
        self.default_new_token_ratio_decay_steps = float(
            os.environ.get("SGLANG_NEW_TOKEN_RATIO_DECAY_STEPS", 600)
        )

        # Runtime constants: others
        self.retract_decode_steps = 20
        self.flashinfer_workspace_size = os.environ.get(
            "FLASHINFER_WORKSPACE_SIZE", 384 * 1024 * 1024
        )

        # Output tokenization configs
        self.skip_special_tokens_in_output = True
        self.spaces_between_special_tokens_in_out = True

        # Language frontend interpreter optimization configs
        self.enable_precache_with_tracing = True
        self.enable_parallel_encoding = True


global_config = GlobalConfig()
release initial code Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com> Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> 2024-01-08 04:37:50 +00:00			`"""Global configurations"""`

Remove deprecated configs (#1431) 2024-09-15 08:52:18 -07:00			`import os`

release initial code Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com> Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> 2024-01-08 04:37:50 +00:00
			`class GlobalConfig:`
Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00			`"""`
			`Store some global constants.`

			`See also python/sglang/srt/managers/schedule_batch.py::global_server_args_dict, which stores`
			`many global runtime arguments as well.`
			`"""`

release initial code Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com> Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> 2024-01-08 04:37:50 +00:00			`def __init__(self):`
			`# Verbosity level`
			`# 0: do not output anything`
			`# 2: output final text after every run`
			`self.verbosity = 0`

Enable cuda graph by default (#612) 2024-07-13 05:29:46 -07:00			`# Default backend of the language`
release initial code Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com> Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> 2024-01-08 04:37:50 +00:00			`self.default_backend = None`

Enable cuda graph by default (#612) 2024-07-13 05:29:46 -07:00			`# Runtime constants: New generation token ratio estimation`
Improve the user control of new_token_ratio (#1811) 2024-10-26 16:39:41 -07:00			`self.default_init_new_token_ratio = float(`
			`os.environ.get("SGLANG_INIT_NEW_TOKEN_RATIO", 0.7)`
			`)`
			`self.default_min_new_token_ratio_factor = float(`
			`os.environ.get("SGLANG_MIN_NEW_TOKEN_RATIO_FACTOR", 0.14)`
			`)`
			`self.default_new_token_ratio_decay_steps = float(`
			`os.environ.get("SGLANG_NEW_TOKEN_RATIO_DECAY_STEPS", 600)`
			`)`
Enable cuda graph by default (#612) 2024-07-13 05:29:46 -07:00
Simplify mem state (#623) 2024-07-15 02:01:09 -07:00			`# Runtime constants: others`
Auto adjust new ratio (#708) 2024-07-23 22:06:02 -07:00			`self.retract_decode_steps = 20`
Remove deprecated configs (#1431) 2024-09-15 08:52:18 -07:00			`self.flashinfer_workspace_size = os.environ.get(`
			`"FLASHINFER_WORKSPACE_SIZE", 384 * 1024 * 1024`
			`)`
Enable cuda graph by default (#612) 2024-07-13 05:29:46 -07:00
			`# Output tokenization configs`
release initial code Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com> Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> 2024-01-08 04:37:50 +00:00			`self.skip_special_tokens_in_output = True`
SamplingParams add "spaces_between_special_tokens" argument (#392) 2024-05-01 07:17:12 +08:00			`self.spaces_between_special_tokens_in_out = True`
release initial code Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com> Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> 2024-01-08 04:37:50 +00:00
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu> 2025-03-03 00:12:04 -08:00			`# Language frontend interpreter optimization configs`
Handle truncation errors (#436) 2024-05-13 15:56:00 -07:00			`self.enable_precache_with_tracing = True`
release initial code Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com> Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> 2024-01-08 04:37:50 +00:00			`self.enable_parallel_encoding = True`

Optimize mem indices mangement (#619) 2024-07-13 23:39:37 -07:00
release initial code Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com> Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> 2024-01-08 04:37:50 +00:00			`global_config = GlobalConfig()`