Index of /强化学习/
../
3 - Chapter 2 State Values and Bellman Equation..> 09-Mar-2025 07:15 831481
4 - Appendix.pdf 09-Mar-2025 09:18 282909
Fine-Tuning Language Models from Human Preferen..> 09-Jun-2025 14:20 967470
GAE.pdf 22-May-2025 14:17 1798117
L2-Bellman equation.pdf 09-Mar-2025 05:21 643991
L3-Bellman optimality equation.pdf 09-Mar-2025 05:22 571205
L4-Value iteration and policy iteration.pdf 09-Mar-2025 05:22 587091
L5-Monte Carlo methods.pdf 23-Mar-2025 05:15 817906
L6-Stochastic approximation.pdf 23-Mar-2025 07:35 791323
L7-Temporal-Difference Learning.pdf 23-Mar-2025 09:27 760652
L8-Value function methods.pdf 23-Mar-2025 09:30 1547043
L9-Policy gradient methods.pdf 23-Mar-2025 09:31 485638
PPO.pdf 05-Jun-2025 14:39 2923532
TRPO.pdf 26-May-2025 14:25 1026237
Training language models to follow instructions..> 09-Jun-2025 14:26 1797405
Training language models to follow instructions..> 09-Jul-2025 14:42 36754
hw1.pdf 21-Oct-2023 03:36 239449
hw2.pdf 21-Oct-2023 03:36 294643
hw3.pdf 21-Oct-2023 03:36 304412
hw4.pdf 02-Nov-2023 00:27 268352
hw5.pdf 20-Nov-2023 05:59 467987
lec-10.pdf 30-Oct-2023 04:49 2384356
lec-11.pdf 30-Oct-2023 04:49 2744545
lec-12.pdf 30-Oct-2023 04:49 2043657
lec-13.pdf 30-Oct-2023 04:49 2607247
lec-14.pdf 30-Oct-2023 04:49 2176516
lec-15.pdf 30-Oct-2023 04:49 2904122
lec-16.pdf 30-Oct-2023 04:50 2310829
lec-17.pdf 30-Oct-2023 04:50 1563173
lec-18.pdf 30-Oct-2023 04:50 2596043
lec-19.pdf 30-Oct-2023 04:50 3018222
lec-20.pdf 30-Oct-2023 04:50 2669245
lec-21.pdf 11-Nov-2023 16:22 1849136
lec-22.pdf 12-Nov-2023 15:25 3559460
lec-4.pdf 30-Oct-2023 04:50 2297941
lec-6.pdf 30-Oct-2023 04:50 2279701
lec-7.pdf 30-Oct-2023 04:50 1816444
lec-8.pdf 30-Oct-2023 04:50 2344617
lec-9.pdf 30-Oct-2023 04:50 1486065
强化学习综述.pdf 20-May-2025 14:34 8029190
强化学习综述.pdf.xoj 22-May-2025 14:17 10393