Hostname: page-component-77c78cf97d-lphnv Total loading time: 0 Render date: 2026-04-23T09:14:45.154Z Has data issue: false hasContentIssue false

Optimal Wireless Information and Power Transfer Using Deep Q-Network

Published online by Cambridge University Press:  01 January 2024

Yuan Xing*
Affiliation:
Department of Engineering & Technology, University of Wisconsin-Stout, Menomonie, WI, USA
Haowen Pan
Affiliation:
Shanghai Legit Network Technology Co, Shanghai, China
Bin Xu
Affiliation:
Shanghai Legit Network Technology Co, Shanghai, China
Cristiano Tapparello
Affiliation:
Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY 14627, USA
Wei Shi
Affiliation:
Department of Engineering & Technology, University of Wisconsin-Stout, Menomonie, WI, USA
Xuejun Liu
Affiliation:
Department of Engineering & Technology, University of Wisconsin-Stout, Menomonie, WI, USA
Tianchi Zhao
Affiliation:
Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721, USA
Timothy Lu
Affiliation:
Department of Engineering & Technology, University of Wisconsin-Stout, Menomonie, WI, USA
*
Correspondence should be addressed to Yuan Xing; xingy@uwstout.edu

Abstract

In this paper, a multiantenna wireless transmitter communicates with an information receiver while radiating RF energy to surrounding energy harvesters. The channel between the transceivers is known to the transmitter, but the channels between the transmitter and the energy harvesters are unknown to the transmitter. By designing its transmit covariance matrix, the transmitter fully charges the energy buffers of all energy harvesters in the shortest amount of time while maintaining the target information rate toward the receiver. At the beginning of each time slot, the transmitter determines the particular beam pattern to transmit with. Throughout the whole charging process, the transmitter does not estimate the energy harvesting channel vectors. Due to the high complexity of the system, we propose a novel deep Q-network algorithm to determine the optimal transmission strategy for complex systems. Simulation results show that deep Q-network is superior to the existing algorithms in terms of the time consumption to fulfill the wireless charging process.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © 2021 Yuan Xing et al.
Figure 0

FIGURE 1: Wireless information transmitter and receiver surrounded by multiple RF energy harvesters.

Figure 1

ALGORITHM 1: Deep Q-network algorithm training process.

Figure 2

TABLE 1: DQN simulation parameters.

Figure 3

FIGURE 2: Dueling double deep Q-network structure.

Figure 4

FIGURE 3: Deep Q-network performance on different learning rates and number of hidden layers for the neural network.

Figure 5

FIGURE 4: Deep Q-network performance for different values of neural network replacement iteration interval and experience pool.

Figure 6

FIGURE 5: The deep Q-network performance for different reward functions.

Figure 7

FIGURE 6: The deep Q-network performance for different energy buffer size Bmax.

Figure 8

FIGURE 7: The comparison of average time consumption between DQN and other algorithms (myopic solution, multiarmed bandit, even power allocation, and random action selection) in the Rician fading channel model. The number of transmit antennas is M = 3. The number of energy harvesters is N = 2.

Figure 9

FIGURE 8: The action selection process of two harvesters scenario when σamp=0.05.

Figure 10

FIGURE 9: The comparison of average time consumption between DQN and other algorithms (myopic solution, multiarmed bandit, even power allocation, and random action selection) when the number of energy harvesters is N = 2, 3, 4, 5, 6. The number of transmit antennas is M = 3. The standard deviation of the channel amplitude is σamp=0.05.

Figure 11

FIGURE 10: The comparison of average time consumption between DQN and other algorithms (myopic solution, multiarmed bandit, even power allocation, and random action selection) when the number of energy harvesters is N = 2, 3, 4, 5, 6. The number of transmit antennas is M = 4. The standard deviation of the channel amplitude is σamp=0.05.