Index of /大语言模型/推理工程/


../
Block Transformer:Global-to-Local Language Mode..> 07-Jun-2024 23:58             1624410
Cost-Efficient Large Language Model Serving for..> 02-Jul-2024 01:12              997684
MInference 1.0:Accelerating Pre-filling for Lon..> 03-Jul-2024 01:40             4866573
MagicDec:Breaking the Latency-Throughput Tradeo..> 26-Aug-2024 01:09              986850
SGLang:Efficient Execution of Structured Langua..> 07-Jun-2024 01:02             1383463
SplitWise:Efficient generative LLM inference us..> 21-May-2024 01:44             2154278
Taming Throughput-Latency Tradeoff in LLM Infer..> 19-Jun-2024 01:02             1945812
Theory,Analysis,and Best Practices for Sigmoid ..> 09-Sep-2024 01:10             6339673
star Attention.pdf                                 27-Nov-2024 01:28             1036928
vAttention:Dynamic Memory Management for Servin..> 08-May-2024 01:10             1160649