|
Foreword |
6 |
|
|
Series Editors’ Foreword |
8 |
|
|
References |
10 |
|
|
Preface |
11 |
|
|
Acknowledgements |
16 |
|
|
Contents |
17 |
|
|
Abbreviations |
24 |
|
|
Symbols |
25 |
|
|
1 Overview of Adaptive Dynamic Programming |
27 |
|
|
1.1 Introduction |
27 |
|
|
1.2 Reinforcement Learning |
29 |
|
|
1.3 Adaptive Dynamic Programming |
33 |
|
|
1.3.1 Basic Forms of Adaptive Dynamic Programming |
36 |
|
|
1.3.2 Iterative Adaptive Dynamic Programming |
41 |
|
|
1.3.3 ADP for Continuous-Time Systems |
44 |
|
|
1.3.4 Remarks |
47 |
|
|
1.4 Related Books |
48 |
|
|
1.5 About This Book |
52 |
|
|
References |
53 |
|
|
Part I Discrete-Time Systems |
60 |
|
|
2 Value Iteration ADP for Discrete-Time Nonlinear Systems |
61 |
|
|
2.1 Introduction |
61 |
|
|
2.2 Optimal Control of Nonlinear Systems Using General Value Iteration |
62 |
|
|
2.2.1 Convergence Analysis |
64 |
|
|
2.2.2 Neural Network Implementation |
72 |
|
|
2.2.3 Generalization to Optimal Tracking Control |
76 |
|
|
2.2.4 Optimal Control of Systems with Constrained Inputs |
80 |
|
|
2.2.5 Simulation Studies |
83 |
|
|
2.3 Iterative ?-Adaptive Dynamic Programming Algorithm for Nonlinear Systems |
91 |
|
|
2.3.1 Convergence Analysis |
93 |
|
|
2.3.2 Optimality Analysis |
101 |
|
|
2.3.3 Summary of Iterative ?-ADP Algorithm |
104 |
|
|
2.3.4 Simulation Studies |
107 |
|
|
2.4 Conclusions |
111 |
|
|
References |
111 |
|
|
3 Finite Approximation Error-Based Value Iteration ADP |
115 |
|
|
3.1 Introduction |
115 |
|
|
3.2 Iterative ?-ADP Algorithm with Finite Approximation Errors |
116 |
|
|
3.2.1 Properties of the Iterative ADP Algorithm with Finite Approximation Errors |
117 |
|
|
3.2.2 Neural Network Implementation |
124 |
|
|
3.2.3 Simulation Study |
128 |
|
|
3.3 Numerical Iterative ?-Adaptive Dynamic Programming |
131 |
|
|
3.3.1 Derivation of the Numerical Iterative ?-ADP Algorithm |
131 |
|
|
3.3.2 Properties of the Numerical Iterative ?-ADP Algorithm |
135 |
|
|
3.3.3 Summary of the Numerical Iterative ?-ADP Algorithm |
144 |
|
|
3.3.4 Simulation Study |
145 |
|
|
3.4 General Value Iteration ADP Algorithm with Finite Approximation Errors |
149 |
|
|
3.4.1 Derivation and Properties of the GVI Algorithm with Finite Approximation Errors |
149 |
|
|
3.4.2 Designs of Convergence Criteria with Finite Approximation Errors |
157 |
|
|
3.4.3 Simulation Study |
164 |
|
|
3.5 Conclusions |
171 |
|
|
References |
171 |
|
|
4 Policy Iteration for Optimal Control of Discrete-Time Nonlinear Systems |
174 |
|
|
4.1 Introduction |
174 |
|
|
4.2 Policy Iteration Algorithm |
175 |
|
|
4.2.1 Derivation of Policy Iteration Algorithm |
176 |
|
|
4.2.2 Properties of Policy Iteration Algorithm |
177 |
|
|
4.2.3 Initial Admissible Control Law |
183 |
|
|
4.2.4 Summary of Policy Iteration ADP Algorithm |
185 |
|
|
4.3 Numerical Simulation and Analysis |
185 |
|
|
4.4 Conclusions |
196 |
|
|
References |
197 |
|
|
5 Generalized Policy Iteration ADP for Discrete-Time Nonlinear Systems |
199 |
|
|
5.1 Introduction |
199 |
|
|
5.2 Generalized Policy Iteration-Based Adaptive Dynamic Programming Algorithm |
199 |
|
|
5.2.1 Derivation and Properties of the GPI Algorithm |
201 |
|
|
5.2.2 GPI Algorithm and Relaxation of Initial Conditions |
210 |
|
|
5.2.3 Simulation Studies |
214 |
|
|
5.3 Discrete-Time GPI with General Initial Value Functions |
221 |
|
|
5.3.1 Derivation and Properties of the GPI Algorithm |
221 |
|
|
5.3.2 Relaxations of the Convergence Criterion and Summary of the GPI Algorithm |
233 |
|
|
5.3.3 Simulation Studies |
237 |
|
|
5.4 Conclusions |
243 |
|
|
References |
243 |
|
|
6 Error Bounds of Adaptive Dynamic Programming Algorithms |
244 |
|
|
6.1 Introduction |
244 |
|
|
6.2 Error Bounds of ADP Algorithms for Undiscounted Optimal Control Problems |
245 |
|
|
6.2.1 Problem Formulation |
245 |
|
|
6.2.2 Approximate Value Iteration |
247 |
|
|
6.2.3 Approximate Policy Iteration |
252 |
|
|
6.2.4 Approximate Optimistic Policy Iteration |
258 |
|
|
6.2.5 Neural Network Implementation |
262 |
|
|
6.2.6 Simulation Study |
264 |
|
|
6.3 Error Bounds of Q-Function for Discounted Optimal Control Problems |
268 |
|
|
6.3.1 Problem Formulation |
268 |
|
|
6.3.2 Policy Iteration Under Ideal Conditions |
270 |
|
|
6.3.3 Error Bound for Approximate Policy Iteration |
275 |
|
|
6.3.4 Neural Network Implementation |
278 |
|
|
6.3.5 Simulation Study |
280 |
|
|
6.4 Conclusions |
283 |
|
|
References |
284 |
|
|
Part II Continuous-Time Systems |
286 |
|
|
7 Online Optimal Control of Continuous-Time Affine Nonlinear Systems |
287 |
|
|
7.1 Introduction |
287 |
|
|
7.2 Online Optimal Control of Partially Unknown Affine Nonlinear Systems |
287 |
|
|
7.2.1 Identifier--Critic Architecture for Solving HJB Equation |
289 |
|
|
7.2.2 Stability Analysis of Closed-Loop System |
301 |
|
|
7.2.3 Simulation Study |
306 |
|
|
7.3 Online Optimal Control of Affine Nonlinear Systems with Constrained Inputs |
311 |
|
|
7.3.1 Solving HJB Equation via Critic Architecture |
314 |
|
|
7.3.2 Stability Analysis of Closed-Loop System with Constrained Inputs |
318 |
|
|
7.3.3 Simulation Study |
322 |
|
|
7.4 Conclusions |
325 |
|
|
References |
326 |
|
|
8 Optimal Control of Unknown Continuous-Time Nonaffine Nonlinear Systems |
328 |
|
|
8.1 Introduction |
328 |
|
|
8.2 Optimal Control of Unknown Nonaffine Nonlinear Systems with Constrained Inputs |
329 |
|
|
8.2.1 Identifier Design via Dynamic Neural Networks |
330 |
|
|
8.2.2 Actor--Critic Architecture for Solving HJB Equation |
335 |
|
|
8.2.3 Stability Analysis of Closed-Loop System |
337 |
|
|
8.2.4 Simulation Study |
342 |
|
|
8.3 Optimal Output Regulation of Unknown Nonaffine Nonlinear Systems |
346 |
|
|
8.3.1 Neural Network Observer |
347 |
|
|
8.3.2 Observer-Based Optimal Control Scheme Using Critic Network |
352 |
|
|
8.3.3 Stability Analysis of Closed-Loop System |
356 |
|
|
8.3.4 Simulation Study |
359 |
|
|
8.4 Conclusions |
362 |
|
|
References |
362 |
|
|
9 Robust and Optimal Guaranteed Cost Control of Continuous-Time Nonlinear Systems |
364 |
|
|
9.1 Introduction |
364 |
|
|
9.2 Robust Control of Uncertain Nonlinear Systems |
365 |
|
|
9.2.1 Equivalence Analysis and Problem Transformation |
367 |
|
|
9.2.2 Online Algorithm and Neural Network Implementation |
369 |
|
|
9.2.3 Stability Analysis of Closed-Loop System |
372 |
|
|
9.2.4 Simulation Study |
375 |
|
|
9.3 Optimal Guaranteed Cost Control of Uncertain Nonlinear Systems |
379 |
|
|
9.3.1 Optimal Guaranteed Cost Controller Design |
381 |
|
|
9.3.2 Online Solution of Transformed Optimal Control Problem |
387 |
|
|
9.3.3 Stability Analysis of Closed-Loop System |
392 |
|
|
9.3.4 Simulation Studies |
397 |
|
|
9.4 Conclusions |
402 |
|
|
References |
403 |
|
|
10 Decentralized Control of Continuous-Time Interconnected Nonlinear Systems |
406 |
|
|
10.1 Introduction |
406 |
|
|
10.2 Decentralized Control of Interconnected Nonlinear Systems |
407 |
|
|
10.2.1 Decentralized Stabilization via Optimal Control Approach |
408 |
|
|
10.2.2 Optimal Controller Design of Isolated Subsystems |
413 |
|
|
10.2.3 Generalization to Model-Free Decentralized Control |
419 |
|
|
10.2.4 Simulation Studies |
423 |
|
|
10.3 Conclusions |
433 |
|
|
References |
433 |
|
|
11 Learning Algorithms for Differential Games of Continuous-Time Systems |
435 |
|
|
11.1 Introduction |
435 |
|
|
11.2 Integral Policy Iteration for Two-Player Zero-Sum Games |
436 |
|
|
11.2.1 Derivation of Integral Policy Iteration |
438 |
|
|
11.2.2 Convergence Analysis |
441 |
|
|
11.2.3 Neural Network Implementation |
443 |
|
|
11.2.4 Simulation Studies |
446 |
|
|
11.3 Iterative Adaptive Dynamic Programming for Multi-player Zero-Sum Games |
449 |
|
|
11.3.1 Derivation of the Iterative ADP Algorithm |
451 |
|
|
11.3.2 Properties |
456 |
|
|
11.3.3 Neural Network Implementation |
462 |
|
|
11.3.4 Simulation Studies |
469 |
|
|
11.4 Synchronous Approximate Optimal Learning for Multi-player Nonzero-Sum Games |
477 |
|
|
11.4.1 Derivation and Convergence Analysis |
478 |
|
|
11.4.2 Neural Network Implementation |
482 |
|
|
11.4.3 Simulation Study |
491 |
|
|
11.5 Conclusions |
496 |
|
|
References |
496 |
|
|
Part III Applications |
499 |
|
|
12 Adaptive Dynamic Programming for Optimal Residential Energy Management |
500 |
|
|
12.1 Introduction |
500 |
|
|
12.2 A Self-learning Scheme for Residential Energy System Control and Management |
501 |
|
|
12.2.1 The ADHDP Method |
505 |
|
|
12.2.2 A Self-learning Scheme for Residential Energy System |
506 |
|
|
12.2.3 Simulation Study |
509 |
|
|
12.3 A Novel Dual Iterative Q-Learning Method for Optimal Battery Management |
513 |
|
|
12.3.1 Problem Formulation |
513 |
|
|
12.3.2 Dual Iterative Q-Learning Algorithm |
514 |
|
|
12.3.3 Neural Network Implementation |
520 |
|
|
12.3.4 Numerical Analysis |
523 |
|
|
12.4 Multi-battery Optimal Coordination Control for Residential Energy Systems |
530 |
|
|
12.4.1 Distributed Iterative ADP Algorithm |
532 |
|
|
12.4.2 Numerical Analysis |
544 |
|
|
12.5 Conclusions |
550 |
|
|
References |
550 |
|
|
13 Adaptive Dynamic Programming for Optimal Control of Coal Gasification Process |
553 |
|
|
13.1 Introduction |
553 |
|
|
13.2 Data-Based Modeling and Properties |
554 |
|
|
13.2.1 Description of Coal Gasification Process and Control Systems |
554 |
|
|
13.2.2 Data-Based Process Modeling and Properties |
556 |
|
|
13.3 Design and Implementation of Optimal Tracking Control |
562 |
|
|
13.3.1 Optimal Tracking Controller Design by Iterative ADP Algorithm Under System and Iteration Errors |
562 |
|
|
13.3.2 Neural Network Implementation |
570 |
|
|
13.4 Numerical Analysis |
573 |
|
|
13.5 Conclusions |
584 |
|
|
References |
585 |
|
|
14 Data-Based Neuro-Optimal Temperature Control of Water Gas Shift Reaction |
586 |
|
|
14.1 Introduction |
586 |
|
|
14.2 System Description and Data-Based Modeling |
587 |
|
|
14.2.1 Water Gas Shift Reaction |
587 |
|
|
14.2.2 Data-Based Modeling and Properties |
588 |
|
|
14.3 Design of Neuro-Optimal Temperature Controller |
590 |
|
|
14.3.1 System Transformation |
590 |
|
|
14.3.2 Derivation of Stable Iterative ADP Algorithm |
591 |
|
|
14.3.3 Properties of Stable Iterative ADP Algorithm with Approximation Errors and Disturbances |
593 |
|
|
14.4 Neural Network Implementation for the Optimal Tracking Control Scheme |
597 |
|
|
14.5 Numerical Analysis |
600 |
|
|
14.6 Conclusions |
604 |
|
|
References |
604 |
|
|
Index |
606 |
|