Hostname: page-component-89b8bd64d-72crv Total loading time: 0 Render date: 2026-05-07T07:50:13.153Z Has data issue: false hasContentIssue false

PGST: A Persian gender style transfer method

Published online by Cambridge University Press:  15 August 2023

Reza Khanmohammadi
Affiliation:
Computer Engineering Department, Faculty of Engineering, University of Guilan, Rasht, Iran
Seyed Abolghasem Mirroshandel*
Affiliation:
Computer Engineering Department, Faculty of Engineering, University of Guilan, Rasht, Iran
*
Corresponding author: Seyed Abolghasem Mirroshandel; Email: mirroshandel@guilan.ac.ir
Rights & Permissions [Opens in a new window]

Abstract

Recent developments in text style transfer have led this field to be more highlighted than ever. There are many challenges associated with transferring the style of input text such as fluency and content preservation that need to be addressed. In this research, we present PGST, a novel Persian text style transfer approach in the gender domain, composed of different constituent elements. Established on the significance of parts of speech tags, our method is the first that successfully transfers the gendered linguistic style of Persian text. We have proceeded with a pre-trained word embedding for token replacement purposes, a character-based token classifier for gender exchange purposes, and a beam search algorithm for extracting the most fluent combination. Since different approaches are introduced in our research, we determine a trade-off value for evaluating different models’ success in faking our gender identification model with transferred text. Our research focuses primarily on Persian, but since there is no Persian baseline available, we applied our method to a highly studied gender-tagged English corpus and compared it to state-of-the-art English variants to demonstrate its applicability. Our final approach successfully defeated English and Persian gender identification models by 45.6% and 39.2%, respectively.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. An illustration of the various stages of our proposed method. Text rendered in blue/red denotes a male/female stylistic adaptation, respectively.

Figure 1

Table 1. The list of symbols/notations used in this paper

Figure 2

Figure 2. Proposed baseline gender classifier’s neural network architecture.

Figure 3

Figure 3. Proposed character-based token classifier’s neural network architecture.

Figure 4

Figure 4. An example of fastText word embedding space that has been projected with PCA. Note: each blue/red scatter represents male/female classified. (Note: Persian pronunciations are shown between slashes, and English translations are included between parenthesis.)

Figure 5

Figure 5. An overview of what our model’s approach for transferring an input’s style from $S_s$ to $S_t$ does. (Note: for an input with five tokens, we have $t_{1..5}$ and each $t_i$ has a set of $r_{ij}$ with j as the number of opposite gender predicted set of the token classifier between the $\text{top}_n$=10 word embedding’s most similar suggested words.)

Figure 6

Table 2. Dataset comparison

Figure 7

Table 3. Model comparison

Figure 8

Table 4. A comparison of the effects of the developed style transfer approaches in defeating the gender classifier model

Figure 9

Table 5. A comparison of positive and negative effects of applying different approaches

Figure 10

Table 6. A contingency table on our finalized style transfer approach [i.e., character-based + (Adj, Adv, V, N)] in Persian and English

Figure 11

Table 7. P-values for paired samples in our corpora (alpha = 0.01, degree of freedom = n − 1)

Figure 12

Table 8. Results of Kappa inter-annotator agreement

Figure 13

Table 9. Quality assessment of annotated samples

Figure 14

Table 10. Comparison of automatic evaluation results of different models in English

Figure 15

Table 11. Test samples of transferring Persian text using the PGST method

Figure 16

Table 12. Translation of Table 11’s test samples