Index of /大语言模型/推理工程/
../
Block Transformer:Global-to-Local Language Mode..> 07-Jun-2024 23:58 1624410
Cost-Efficient Large Language Model Serving for..> 02-Jul-2024 01:12 997684
MInference 1.0:Accelerating Pre-filling for Lon..> 03-Jul-2024 01:40 4866573
MagicDec:Breaking the Latency-Throughput Tradeo..> 26-Aug-2024 01:09 986850
SGLang:Efficient Execution of Structured Langua..> 07-Jun-2024 01:02 1383463
SplitWise:Efficient generative LLM inference us..> 21-May-2024 01:44 2154278
Taming Throughput-Latency Tradeoff in LLM Infer..> 19-Jun-2024 01:02 1945812
Theory,Analysis,and Best Practices for Sigmoid ..> 09-Sep-2024 01:10 6339673
star Attention.pdf 27-Nov-2024 01:28 1036928
vAttention:Dynamic Memory Management for Servin..> 08-May-2024 01:10 1160649