Hostname: page-component-89b8bd64d-ktprf Total loading time: 0 Render date: 2026-05-08T01:30:59.709Z Has data issue: false hasContentIssue false

The generation of problem-focussed patent clusters: a comparative analysis of crowd intelligence with algorithmic and expert approaches

Published online by Cambridge University Press:  11 October 2017

Rights & Permissions [Opens in a new window]

Abstract

This paper presents a new crowdsourcing approach to the construction of patent clusters, and systematically benchmarks it against previous expert and algorithmic approaches. Patent databases should be rich sources of inspiration which could lead engineering designers to novel solutions for creative problems. However, the sheer volume and complexity of patent information means that this potential is rarely realised. Rather than the keyword driven searches common in commercial systems, designers need tools that help them to understand patents in the context of the problem they are considering. This paper presents an approach to address this problem by using crowd intelligence for effective generation of patent clusters at lower cost and with greater rationale. A systematic study was carried out to compare the crowd’s efficiency with both expert and algorithmic patent clusters, with the results indicating that the crowd was able to create 80% more patent pairs with appropriate rationale.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
Distributed as Open Access under a CC-BY 4.0 license (http://creativecommons.org/licenses/by/4.0/)
Copyright
Copyright © The Author(s) 2017
Figure 0

Figure 1. Overview of project themes for patent analysis.

Figure 1

Table 1. Summary of text mining and grouping approaches to patent clustering, with highlighted references

Figure 2

Figure 2. Patent clusters labelled using the highest average rank method (Fu et al.2013b).

Figure 3

Figure 3. The patent space defined by Expert 3 (Fu et al.2013b).

Figure 4

Figure 4. A patent landscape based on CPC classification codes (generated with PatentInspiration, http://www.patentinspiration.com).

Figure 5

Figure 5. Articulation between established crowdsourcing platforms and the customised patent cluster platform.

Figure 6

Figure 6. The workflow for assignation, gathering and presentation of crowd results.

Figure 7

Table 2. Participation statistics for the two crowdsourcing platforms

Figure 8

Figure 7. Location of workers participating in patent tasks (generated with BatchGeo, https://batchgeo.com).

Figure 9

Figure 8. Number of clusters versus number of connected patent pairs for the crowd, experts and algorithms.

Figure 10

Table 3. Composition of clusters and connected patent pairs for different approaches

Figure 11

Figure 9. The number of crowd workers identifying identical patent pairs.

Figure 12

Table 4. Patent pairs with highest generated labels by the crowd compared with algorithmic and expert assessment

Figure 13

Figure 10. Percentage of benchmark pairings identified by numbers of crowd workers.

Figure 14

Table 5. Distinct cluster labels generated in various approaches

Figure 15

Table 6. Comparison of cosine similarity scores for labels across different clustering approaches

Figure 16

Table 7. Grammar percentages for labels generated by various approaches

Figure 17

Figure 11. Comparison between the crowd and Fu’s algorithm ranking of patent relevance to the given problem.

Figure 18

Table 8. Patent pair agreement between evaluator and crowd results

Figure 19

Figure 12. Box plot illustrating the relationship between the number of agreed evaluators and the number of crowd workers agreed with a patent pair.

Figure 20

Figure 13. Visualisation of clustering between patents and patent labels for a cropped patent cluster zone.

Figure 21

Table 9. Practical considerations of crowdsourcing clustering with reference to other approaches

Figure 22

Table 10. Phi nominal correlations between crowd workers and other approaches

Figure 23

Table 11. Chi-square values and percentage of correct prediction of unlinked and linked pairs through crowd workers’ results of logistic regression