Hostname: page-component-77f85d65b8-hzqq2 Total loading time: 0 Render date: 2026-04-22T18:44:27.232Z Has data issue: false hasContentIssue false

Audience selection for maximizing social influence

Published online by Cambridge University Press:  12 January 2024

Balázs R. Sziklai*
Affiliation:
Institute of Economics, HUN-REN Centre for Economic and Regional Studies, Budapest, Hungary Department of Operations Research and Actuarial Sciences, Corvinus University of Budapest, Budapest, Hungary
Balázs Lengyel
Affiliation:
Institute of Economics, HUN-REN Centre for Economic and Regional Studies, Budapest, Hungary Corvinus Institute for Advanced Studies, Corvinus University of Budapest, Budapest, Hungary Institute of Data Analytics and Information Systems, Corvinus University of Budapest, Budapest, Hungary
*
Corresponding author: Balázs R. Sziklai; Email: sziklai.balazs@krtk.hun-ren.hu
Rights & Permissions [Opens in a new window]

Abstract

Viral marketing campaigns target primarily those individuals who are central in social networks and hence have social influence. Marketing events, however, may attract diverse audience. Despite the importance of event marketing, the influence of heterogeneous target groups is not well understood yet. In this paper, we define the Audience Selection (AS) problem in which different sets of agents need to be evaluated and compared based on their social influence. A typical application of Audience selection is choosing locations for a series of marketing events. The Audience selection problem is different from the well-known Influence Maximization (IM) problem in two aspects. Firstly, it deals with sets rather than nodes. Secondly, the sets are diverse, composed by a mixture of influential and ordinary agents. Thus, Audience selection needs to assess the contribution of ordinary agents too, while IM only aims to find top spreaders. We provide a systemic test for ranking influence measures in the Audience Selection problem based on node sampling and on a novel statistical method, the Sum of Ranking Differences. Using a Linear Threshold diffusion model on two online social networks, we evaluate eight network measures of social influence. We demonstrate that the statistical assessment of these influence measures is remarkably different in the Audience Selection problem, when low-ranked individuals are present, from the IM problem, when we focus on the algorithm’s top choices exclusively.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Sketch of a social network with five geographical locations (red, orange, blue, green, and pink) each accommodating three agents.

Figure 1

Table 1. The PageRank values and Harmonic centrality of the social network depicted in Figure 1. Agents are referred as $x_i$, where $x$ denotes the first letter of the towns’ color (red, orange, blue, green, pink,) and $i$ denotes the agents location in the town (top, middle, bottom). Bold numbers represent the maximum value among towns

Figure 2

Table 2. SRD computation. nSRD stands for normalized SRD

Figure 3

Figure 2. iWiW CRRN test. The order of the centrality measures in the legend (from top to bottom) follows the same order as the colored bars in the figure (from left to right). The bars’ height is equal to their normalized SRD values. The black curve is a continuous approximation of the cumulative distribution function of the random SRD values. All (normalized) SRD values fall outside the 5% threshold (XX1: 5% threshold, Med: Median, XX19: 95% threshold).

Figure 4

Figure 3. Pokec CRRN test. The order of the centrality measures in the legend (from top to bottom) follows the same order as the colored bars in the figure (from left to right). The bars’ height is equal to their normalized SRD values. The black curve is a continuous approximation of the cumulative distribution function of the random SRD values. All (normalized) SRD values fall outside the 5% threshold (XX1: 5% threshold, Med: Median, XX19: 95% threshold).

Figure 5

Figure 4. Cross-validation on iWiW data. The boxplot shows the median (black diamond), Q1/Q3 (blue box), and min/max values. The measures are ranked from left to right by the median values. The ‘$\sim$’ sign between two neighboring measures indicates that the Wilcoxon test found no significant difference between the rankings induced by the measures.

Figure 6

Figure 5. Cross-validation of Pokec data. The boxplot shows the median (black diamond), Q1/Q3 (blue box), and min/max values. The measures are ranked from left to right by the median values. The ‘$\sim$’ sign between two neighboring measures indicates that the Wilcoxon test found no significant difference between the rankings induced by the measures.

Figure 7

Table 3. Comparing the rankings of centralities induced by the Audience Selection (AS) and Influence Maximization (IM) problem on iWiW. Ranks in brackets show the tied ranks according to the cross-validation

Figure 8

Table 4. Comparing the rankings of centralities induced by the Audience Selection (AS) and Influence Maximization (IM) problem on Pokec. Ranks in brackets show the tied ranks according to the cross-validation

Figure 9

Table A1. Average centrality values of the 24 randomly selected seed sets of iWiW. Avg. spread denotes the percentage of nodes the sample sets managed to infect on average over 5000 runs

Figure 10

Table A2. iWiW ranking matrix. Data compiled from Table A1. Spreads closer than 0.005% to each other were considered as a tie. nSRD stands for normalized SRD

Figure 11

Table A3. Average centrality values of the 24 randomly selected seed sets of Pokec. Avg. spread denotes the percentage of nodes the sample sets managed to infect on average over 5000 runs

Figure 12

Table A4. Pokec ranking matrix. Data compiled from Table A3. Spreads closer than 0.005% to each other were considered as a tie. rSRD stands for normalized SRD

Figure 13

Figure A1. Parameter selection for Linear Threshold Centrality and Generalized Degree Discount. Smaller SRD values indicate better alignment with the reference, which was the spreading potential of the sample sets on iWiW (XX1: 5% threshold, Med: Median, XX19: 95% threshold).