Preference-based reinforcement learning (PbRL) significantly simplifies the design of reward functions in reinforcement learning (RL) tasks. However, because of the tasks’ complexity, intransitive preferences, and sensitivity to preference errors, PbRL requires substantial feedback to achieve the desired performance. This extensive reliance on expert input notably increases the burden on participants. We have developed a novel framework: Self-teacher-learning preference-based reinforcement learning (STL-PbRL). In the teacher-led (TL) module, agents learn more reliable reward prediction models (RM) through TL PbRL. In the self-learning (SL) module, agents utilizing the preference comparison approach for trajectory segments integrate sparse but critical, easily designed task-oriented information into the feedback process. The STL-PbRL framework incorporates the SL module to refine the RM initially generated by the TL module. We have demonstrated that this integration significantly enhances RM by enabling RM to converge toward an optimal reward model that effectively supports achieving a training policy that meets task objectives. This streamlined and efficient STL-PbRL framework enables a more accurate and efficient training process. Our experimental results confirm that the SL module seamlessly integrates with existing PbRL algorithms, significantly reducing the need for feedback and alleviating the impact of errors in preference indications. This efficiency and effectiveness highlight that STL-PbRL innovates, simplifies, and enhances the RL training process across various applications.