Скачать книгу

is reinforcement learning (RL) [16, 20].

      As an agent who has to make decisions in an atmosphere to maximize a given definition of accumulated rewards, the RL problem can be formalized. It will become apparent that this formalization extends to a wide range of tasks and captures many important characteristics of artificial intelligence, such as a sense of cause and effect, as well as a sense of doubt and non-determinism [5].

      A main feature of RL is that good behavior is taught by an agent. This suggests that it incrementally modifies or acquires new habits and abilities. Another significant feature of RL is that it uses experience of trial and error (as opposed to for example, dynamic programming that a priori assumes maximum environmental knowledge). Therefore the RL agent does not need full environmental awareness or control; it just needs to be able to communicate with the environment and gather information. The knowledge is gained a priori in an offline environment, then it is used as a batch for learning (the offline setting is therefore also called batch RL) [3].

Schematic illustration of the reinforcement learning process.

      Figure 1.1 Reinforcement learning process.

      Deep reinforcement learning contains aspects of neural networks and learning with reinforcement (Figure 1.1). Deep reinforcement learning is achieved using two different methods: deep Q-learning and policy specular highlights. Deep Q-learning techniques attempt to anticipate the rewards will accompany certain steps taken in a particular state, while policy gradient strategies seek to optimize the operational space, predicting the behavior themselves. Policy-based approaches of deep reinforcement learning are either stochastic in architecture. Certainly, probabilistic measures map states to policies, while probabilistic policies build probabilistic models for behavior [6].

      The aim of this chapter is to provide the reader with accessible tailoring of basic deep reinforcement learning and to support research experts. The primary contribution made by this work is

      1 Originated with a complete review study of comprehensive deep reinforcement learning concept and framework.

      2 Provided detailed applications and challenges in deep reinforcement learning.

      This chapter is clearly distinguished by the points mentioned above from other recent surveys. This gives the data as comprehensive as previous works. The chapter is organized as follows: Section 1.2 summarizes the complete description of reinforcement learning. The different applications and problems are explored in Section 1.3, accompanied by a conclusion in Section 1.4.

      1.2.1 Introduction

      Markov decision process (MDP) Figure 1.2 is composed of:

      State in MDP can be represented as raw images or we use sensors for robotic controls to calculate the joint angles, velocity, and pose of the end effector.

       A movement in a chess game or pushing a robotic arm or a joystick may be an event.

       The reward is very scarce for a GO match: 1 if we win or −1 if we lose. We get incentives more often. We score whenever we hit the sharks in the Atari Seaquest game (Figure 1.3).

       If it is less than one the discount factor discounts potential incentives. In the future, money raised also has a smaller current value, and we will need it to further converge the solution for a strictly technical reason.

       We can indefinitely rollout behaviour or limit the experience to N steps in time. This is called the horizon.

Schematic illustration of the markov process.

      Figure 1.2 Markov process.

Schematic illustration of the raw images of state.

      Figure 1.3 Raw images of State.

      1.2.2 Framework

      Compared to other fields such as Deep Learning, where well-established frameworks such as Tensor Flow, PyTorch, or MXnet simplify the lives of DL practitioners, the practical implementations of Reinforcement Learning are relatively young. The advent of RL frameworks, however, has already started and we can select from many projects right now that greatly encourage the use of specialized RL techniques. Frameworks such as Tensor Flow or PyTorch have appeared in recent years to help transform pattern recognition into a product, making deep learning easier for practitioners to try and use [17].

      In the Reinforcement Learning arena, a similar pattern is starting to play out. We are starting to see the resurgence of many open source libraries and tools to deal with this, both by helping to create new pieces (not by writing from scratch) and above all, by combining different algorithmic components of prebuild. As a consequence, by generating high abstractions of the core components of an RL algorithm, these Reinforcement Learning frameworks support engineers [7].

      A significant number of simulations include Deep Reinforcement Learning algorithms, introducing another multiplicative dimension to the time load of Deep Learning itself. This is mainly needed by the architectures we have not yet seen in this sequence, such as, among others, the distributed actor-critic methods or behaviors of multi-agents. But even choosing the best model also involves tuning hyper parameters and searching between different settings of hyper parameters; it can be expensive. All this includes the need for supercomputers based on distributed systems of heterogeneous servers (with multi-core CPUs and hardware accelerators such as GPUs or TPUs) to provide high computing power [18].

      1.2.3 Choice of the Learning Algorithm and Function Approximator Selection

      In deep learning, the function approximator characterizes how the characteristics are handled to higher levels of abstraction (a fortiori

Скачать книгу