Hostname: page-component-6766d58669-l4t7p Total loading time: 0 Render date: 2026-05-14T15:03:33.277Z Has data issue: false hasContentIssue false

Optimal Path Planning for Wireless Power Transfer Robot Using Area Division Deep Reinforcement Learning

Published online by Cambridge University Press:  01 January 2024

Yuan Xing*
Affiliation:
Department of Engineering and Technology, University of Wisconsin-Stout, Menomonie, WI 54751, USA
Riley Young
Affiliation:
Department of Engineering and Technology, University of Wisconsin-Stout, Menomonie, WI 54751, USA
Giaolong Nguyen
Affiliation:
Department of Engineering and Technology, University of Wisconsin-Stout, Menomonie, WI 54751, USA
Maxwell Lefebvre
Affiliation:
Department of Engineering and Technology, University of Wisconsin-Stout, Menomonie, WI 54751, USA
Tianchi Zhao
Affiliation:
Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721, USA
Haowen Pan
Affiliation:
Changzhou Voyage Electronics Technology LLC, Changzhou, China
Liang Dong
Affiliation:
Department of Electrical and Computer Engineering, Baylor University, Waco, TX 76706, USA
*
Correspondence should be addressed to Yuan Xing; xingy@uwstout.edu

Abstract

This paper aims to solve the optimization problems in far-field wireless power transfer systems using deep reinforcement learning techniques. The Radio-Frequency (RF) wireless transmitter is mounted on a mobile robot, which patrols near the harvested energy-enabled Internet of Things (IoT) devices. The wireless transmitter intends to continuously cruise on the designated path in order to fairly charge all the stationary IoT devices in the shortest time. The Deep Q-Network (DQN) algorithm is applied to determine the optimal path for the robot to cruise on. When the number of IoT devices increases, the traditional DQN cannot converge to a closed-loop path or achieve the maximum reward. In order to solve these problems, an area division Deep Q-Network (AD-DQN) is invented. The algorithm can intelligently divide the complete charging field into several areas. In each area, the DQN algorithm is utilized to calculate the optimal path. After that, the segmented paths are combined to create a closed-loop path for the robot to cruise on, which can enable the robot to continuously charge all the IoT devices in the shortest time. The numerical results prove the superiority of the AD-DQN in optimizing the proposed wireless power transfer system.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © 2022 Yuan Xing et al.
Figure 0

TABLE 1: Symbols and explanations.

Figure 1

FIGURE 1: Mobile wireless power transmitter cruises on the calculated path to charge multiple harvested energy-enabled IoT devices.

Figure 2

FIGURE 2: The entire test field consists of same space unit square. K = 8 harvested energy-enabled IoT devices are deployed in the test field. The shadow area adjacent to each IoT device indicates the effective charging area for the respective IoT devices. For example, the boundary of effective charging areas for No. 6 IoT device is highlighted in red.

Figure 3

FIGURE 3: Flowchart of wireless power transfer implementation.

Figure 4

FIGURE 4: The average rewards of reward1, reward2, and reward3 versus the training episodes in area I of the experimental field.

Figure 5

FIGURE 5: The average time consumption achieved by reward1, reward2, and reward3 versus the training episodes in area I of the experimental field.

Figure 6

FIGURE 6: The average rewards of reward1, reward2, and reward3 versus the training episodes in area II of the experimental field.

Figure 7

FIGURE 7: The average time consumption achieved by reward1, reward2, and reward3 versus the training episodes in area II of the experimental field.

Figure 8

FIGURE 8: The effective charging rate of random action selection, Q-learning, DQN, and AD-DQN versus the total number of IoT devices.

Figure 9

FIGURE 9: The average time consumption of random action selection, Q-learning, DQN, and AD-DQN versus the total number of IoT devices.

Figure 10

FIGURE 10: The optimal path determined by AD-DQN. Bold black line indicates the path for the wireless power transfer robot.