Sale!

Reinforcement Learning An Introduction 2nd Edition by Richard Sutton, Andrew Barto ISBN 0262039249 978-0262039246

Name: Reinforcement Learning An Introduction 2nd Edition by Richard Sutton, Andrew Barto ISBN 0262039249 978-0262039246
SKU: EB_42682
Availability: InStock

Original price was: $70.00.Current price is: $35.00.

Instant download Reinforcement Learning An Introduction 2nd Edition Richard S. Sutton after payment

SKU: EB_42682 Category: Ebooks

Description

Reinforcement Learning An Introduction 2nd Edition by Richard Sutton, Andrew Barto – Ebook PDF Instant Download/Delivery: 0262039249, 978-0262039246
Full download Reinforcement Learning An Introduction 2nd edition after payment

Product details:

ISBN 10: 0262039249
ISBN 13: 978-0262039246
Author: Richard Sutton, Andrew Barto

The significantly expanded and updated new edition of a widely used text on reinforcement learning, one of the most active research areas in artificial intelligence.

Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field’s key ideas and algorithms. This second edition has been significantly expanded and updated, presenting new topics and updating coverage of other topics.

Like the first edition, this second edition focuses on core online learning algorithms, with the more mathematical material set off in shaded boxes. Part I covers as much of reinforcement learning as possible without going beyond the tabular case for which exact solutions can be found. Many algorithms presented in this part are new to the second edition, including UCB, Expected Sarsa, and Double Learning. Part II extends these ideas to function approximation, with new sections on such topics as artificial neural networks and the Fourier basis, and offers expanded treatment of off-policy learning and policy-gradient methods. Part III has new chapters on reinforcement learning’s relationships to psychology and neuroscience, as well as an updated case-studies chapter including AlphaGo and AlphaGo Zero, Atari game playing, and IBM Watson’s wagering strategy. The final chapter discusses the future societal impacts of reinforcement learning.

Reinforcement Learning An Introduction 2nd Table of contents:

1. Introduction

1.1. Reinforcement Learning

1.2. Examples

1.3. Elements of Reinforcement Learning

1.4. Limitations and Scope

1.5. An Extended Example: Tic-Tac-Toe

1.6. Summary

1.7. Early History of Reinforcement Learning

I: Tabular Solution Methods

2. Multi-armed Bandits

2.1. A k-armed Bandit Problem

2.2. Action-value Methods

2.3. The 10-armed Testbed

2.4. Incremental Implementation

2.5. Tracking a Nonstationary Problem

2.6. Optimistic Initial Values

2.7. Upper-Confidence-Bound Action Selection

2.8. Gradient Bandit Algorithms

2.9. Associative Search (Contextual Bandits)

2.10 Summary

3. Finite Markov Decision Processes

3.1. The Agent–Environment Interface

3.2. Goals and Rewards

3.3. Returns and Episodes

3.4. Unified Notation for Episodic and Continuing Tasks

3.5. Policies and Value Functions

3.6. Optimal Policies and Optimal Value Functions

3.7. Optimality and Approximation

3.8. Summary

4. Dynamic Programming

4.1. Policy Evaluation (Prediction)

4.2. Policy Improvement

4.3. Policy Iteration

4.4. Value Iteration

4.5. Asynchronous Dynamic Programming

4.6. Generalized Policy Iteration

4.7. Efficiency of Dynamic Programming

4.8. Summary

5. Monte Carlo Methods

5.1. Monte Carlo Prediction

5.2. Monte Carlo Estimation of Action Values

5.3. Monte Carlo Control

5.4. Monte Carlo Control without Exploring Starts

5.5. Off-policy Prediction via Importance Sampling

5.6. Incremental Implementation

5.7. Off-policy Monte Carlo Control

5.8. *Discounting-aware Importance Sampling

5.9. *Per-decision Importance Sampling

5.10. Summary

6. Temporal-Difference Learning

6.1. TD Prediction

6.2. Advantages of TD Prediction Methods

6.3. Optimality of TD(0)

6.4. Sarsa: On-policy TD Control

6.5. Q-learning: Off-policy TD Control

6.6. Expected Sarsa

6.7. Maximization Bias and Double Learning

6.8. Games, Afterstates, and Other Special Cases

6.9. Summary

7. n-step Bootstrapping

7.1. n-step TD Prediction

7.2. n-step Sarsa

7.3. n-step Off-policy Learning

7.4. *Per-decision Methods with Control Variates

7.5. Off-policy Learning Without Importance Sampling: The n-step Tree Backup Algorithm

7.6. *A Unifying Algorithm: n-step Q(σ)

7.7. Summary

8. Planning and Learning with Tabular Methods

8.1. Models and Planning

8.2. Dyna: Integrated Planning, Acting, and Learning

8.3. When the Model Is Wrong

8.4. Prioritized Sweeping

8.5. Expected vs. Sample Updates

8.6. Trajectory Sampling

8.7. Real-time Dynamic Programming

8.8. Planning at Decision Time

8.9. Heuristic Search

8.10. Rollout Algorithms

8.11. Monte Carlo Tree Search

8.12. Summary of the Chapter

8.13. Summary of Part I: Dimensions

II: Approximate Solution Methods

9. On-policy Prediction with Approximation

9.1. Value-function Approximation

9.2. The Prediction Objective (VE)

9.3. Stochastic-gradient and Semi-gradient Methods

9.4. Linear Methods

9.5. Feature Construction for Linear Methods

9.5.1. Polynomials

9.5.2. Fourier Basis

9.5.3. Coarse Coding

9.5.4. Tile Coding

9.5.5. Radial Basis Functions

9.6. Selecting Step-Size Parameters Manually

9.7. Nonlinear Function Approximation: Artificial Neural Networks

9.8. Least-Squares TD

9.9. Memory-based Function Approximation

9.10. Kernel-based Function Approximation

9.11. Looking Deeper at On-policy Learning: Interest and Emphasis

9.12. Summary

10. On-policy Control with Approximation

10.1. Episodic Semi-gradient Control

10.2. Semi-gradient n-step Sarsa

10.3. Average Reward: A New Problem Setting for Continuing Tasks

10.4. Deprecating the Discounted Setting

10.5. Differential Semi-gradient n-step Sarsa

10.6. Summary

11. *Off-policy Methods with Approximation

11.1. Semi-gradient Methods

11.2. Examples of Off-policy Divergence

11.3. The Deadly Triad

11.4. Linear Value-function Geometry

11.5. Gradient Descent in the Bellman Error

11.6. The Bellman Error is Not Learnable

11.7. Gradient-TD Methods

11.8. Emphatic-TD Methods

11.9. Reducing Variance

11.10. Summary

12. Eligibility Traces

12.1. The λ-return

12.2. TD(λ)

12.3. n-step Truncated λ-return Methods

12.4. Redoing Updates: Online λ-return Algorithm

12.5. True Online TD(λ)

12.6. *Dutch Traces in Monte Carlo Learning

12.7. Sarsa(λ)

12.8. Variable λ and γ

12.9. Off-policy Traces with Control Variates

12.10. Watkins’s Q(λ) to Tree-Backup(λ)

12.11. Stable Off-policy Methods with Traces

12.12. Implementation Issues

12.13. Conclusions

13. Policy Gradient Methods

13.1. Policy Approximation and its Advantages

13.2. The Policy Gradient Theorem

13.3. REINFORCE: Monte Carlo Policy Gradient

13.4. REINFORCE with Baseline

13.5. Actor–Critic Methods

13.6. Policy Gradient for Continuing Problems

13.7. Policy Parameterization for Continuous Actions

13.8. Summary

III: Looking Deeper

14. Psychology

14.1. Prediction and Control

14.2. Classical Conditioning

14.2.1. Blocking and Higher-order Conditioning

14.2.2. The Rescorla–Wagner Model

14.2.3. The TD Model

14.2.4. TD Model Simulations

14.3. Instrumental Conditioning

14.4. Delayed Reinforcement

14.5. Cognitive Maps

14.6. Habitual and Goal-directed Behavior

14.7. Summary

15. Neuroscience

15.1. Neuroscience Basics

15.2. Reward Signals, Reinforcement Signals, Values, and Prediction Errors

15.3. The Reward Prediction Error Hypothesis

15.4. Dopamine

15.5. Experimental Support for the Reward Prediction Error Hypothesis

15.6. TD Error/Dopamine Correspondence

15.7. Neural Actor–Critic

15.8. Actor and Critic Learning Rules

15.9. Hedonistic Neurons

15.10. Collective Reinforcement Learning

15.11. Model-based Methods in the Brain

15.12. Addiction

15.13. Summary

16. Applications and Case Studies

16.1. TD-Gammon

16.2. Samuel’s Checkers Player

16.3. Watson’s Daily-Double Wagering

16.4. Optimizing Memory Control

16.5. Human-level Video Game Play

16.6. Mastering the Game of Go

16.6.1. AlphaGo

16.6.2. AlphaGo Zero

16.7. Personalized Web Services

16.8. Thermal Soaring

17. Frontiers

17.1. General Value Functions and Auxiliary Tasks

17.2. Temporal Abstraction via Options

17.3. Observations and State

17.4. Designing Reward Signals

17.5. Remaining Issues

17.6. Experimental Support for the Reward Prediction Error Hypothesis

References

Index

People also search for Reinforcement Learning An Introduction 2nd :

deep reinforcement learning an introduction

reinforcement learning an introduction 2nd edition pdf

reinforcement learning an introduction solutions

reinforcement learning an introduction 3rd edition

reinforcement learning an introduction citation

Tags: Richard Sutton, Andrew Barto, Reinforcement Learning, An Introduction

Reinforcement Learning An Introduction 2nd Edition by Richard Sutton, Andrew Barto ISBN 0262039249 978-0262039246

Reinforcement Learning An Introduction 2nd Edition by Richard Sutton, Andrew Barto – Ebook PDF Instant Download/Delivery: 0262039249, 978-0262039246Full download Reinforcement Learning An Introduction 2nd edition after payment

Product details:

Reinforcement Learning An Introduction 2nd Table of contents:

People also search for Reinforcement Learning An Introduction 2nd :

Login

Reinforcement Learning An Introduction 2nd Edition by Richard Sutton, Andrew Barto – Ebook PDF Instant Download/Delivery: 0262039249, 978-0262039246
Full download Reinforcement Learning An Introduction 2nd edition after payment