A. Introduction: Article 5(1)(c)’s Chinese “Origin”
The European Union’s Artificial Intelligence Act (EU AI Act), adopting a risk-based approach to categorize and govern AI applications, represents a pioneering attempt at comprehensive AI regulation. The Act classifies AI systems based on their potential societal risks, with Article 5 setting forth a ban on “prohibited AI practices” that are deemed to pose “unacceptable” risks to society. Among the prohibited practices enumerated in Article 5 are: (1) subliminal, manipulative, or deceptive techniques; (2) exploitation of vulnerabilities; (3) social scoring systems; (4) criminal risk assessments; (5) indiscriminate scraping of facial images from the internet or CCTV footage; (6) emotion inference; (7) biometric categorization; and (8) real-time remote biometric identification systems in publicly accessible spaces.Footnote 1 This list reflects, without question, longstanding concerns within the AI governance and policy community and resonates with those familiar with the debates surrounding AI regulation.Footnote 2
This Article focuses on Article 5(1)(c)’s prohibition of “social scoring” applications of AI, a provision that may be particularly interesting for those with an eye on comparative law and governance with data technology. As some may realize, such ban under the AI Act might have been, to a considerable extent, motivated by, and formulated in response to, the European lawmakers’ perceptions of China’s Social Credit System. Until this day, the Western narrative surrounding China’s SCS has been predominantly characterized by dystopian, science-fiction-style portrayals. The media, political circles, and even scholarly literature has become used to depicting the SCS as an Orwellian nightmare of digital surveillance and authoritarian control, where the Chinese government scores every individual citizen on everything they do and uses the same score to determine everything they receive, resulting in massive surveillance, systemic discrimination, and disenfranchisement.Footnote 3 That is far from real. Overall, the SCS at the national and local levels has primarily consisted of regulatory systems for tightening corporate compliance through government data-tracking mechanisms and interagency enforcement collaborations.Footnote 4 It also includes a major initiative for using corporate entities’ regulatory compliance records as alternative credit data for bank lending.Footnote 5 There are indeed local government authorities that have set up experimental scoring programs, which citizens may voluntarily participate in and where they may use the scores for such benefits as reduced fares for bus and park admissions, library deposit exemptions, and nursing care for elders by community workers.Footnote 6 There has never been any comprehensive scoring regime for citizens implemented or even contemplated by the national government. The most controversial SCS program, as it affects individuals, is the court system’s blacklist for individual defendants or corporate legal representatives that evade in bad faith their obligations under effective court judgments. The blacklisted individuals are prohibited from excessive spending, including travelling by air or high-speed trains, staying in four- or five-star hotels, and sending their children to expensive private schools.Footnote 7
The true story has been told numerous times,Footnote 8 but it unfortunately is just not heard by many. When Article 5(1)(c) was formally introduced into the European lawmaking process with a first draft of the AI Act in 2021, media reports widely associated the social scoring ban with China’s SCS, reflecting a popular understanding that the provision’s underlying motivation was, at least partially, to prevent the perceived Chinese dataveillance regime from taking a foothold in Europe.Footnote 9 The likely presence of such motivation for this legislative provision is also evidenced by various related circumstances. For example, the European Parliament’s study on “Biometric Recognition and Behavioral Detection” explicitly referenced China’s “social credit score system” when discussing the risks of AI-driven mass surveillance.Footnote 10 The European Economic and Social Committee, in its comments on the AI Act, also called for strengthening the social scoring ban by citing the Chinese practice as a worrying example.Footnote 11 Analyses by organizations such as Human Rights Watch and the Brookings Institute also consistently connect the EU’s regulatory approach with concerns about systems similar to China’s.Footnote 12 And while the official EU legal document for the AI Act does not explicitly mention China,Footnote 13 its emphasis on protecting fundamental rights and preventing misuse of AI technologies aligns closely with the concerns raised by the SCS in other policy discussions.
Furthermore, a comparison of the language in the 2021 draft and in the finally adopted version of the EU AI Act provides additional clues about the possible presence of Chinese “inspiration” underlying Article 5(1)(c). The 2021 draft specifically prohibited social scoring practices undertaken by or on behalf of “public authorities” for the purpose of evaluating or classifying the “trustworthiness” of natural persons.Footnote 14 Both terms of “public authorities” and “trustworthiness,” when considered together, appear to be coded references to China’s SCS, which was generally depicted in EU law and policy circles as a state-imposed scoring of citizens’ moral “trustworthiness.” Notably, however, references to “public authorities” and “trustworthiness” were removed from both the provision and the recitals in the 2024 enacted version. One often cited rationale for such evolution in statutory language is that lawmakers subsequently recognized that dangerous scoring practices are not limited to those by public authoritiesFootnote 15 and that “trustworthiness” is too narrow and poorly defined to capture the full range of objectives behind the prevalent and varying scoring practices. Alternatively, however, such change may have also reflected the perhaps reluctant realization that the perceived Chinese scenarios Europeans initially sought to guard themselves against may not actually exist in China.Footnote 16 The EU legal provision, therefore, became somewhat reoriented towards practices in the EU that have stirred controversies, such as the scoring systems used by national governments in administering social welfare programsFootnote 17 and fully automated decision-making processes by commercial entities, including lenders’ use of SCHUFA scores for creditworthiness assessment.Footnote 18 Nonetheless, in February 2025, as the European Commission issued its guidelines on prohibited AI practices covered under Article 5(1)(c), it again illustrated the “unjustified or disproportionate treatment” that would render social scoring prohibited with a hypothetical that is clearly constructed based on the drafters’ conception of how municipal scoring system works in China.Footnote 19
The enacted Article 5(1)(c) raises important questions about the scope and practicality of the social scoring ban thereunder. As further discussed in Section B, compared with its previous draft version, the adopted provision on one hand expanded the scope of the ban to include private-sector scoring and, on the other hand, incorporated more nuanced language to offer qualifications and carve-outs to the ban-as-default. This adjustment appears intended to allow for a more flexible regulatory framework, perhaps because European lawmakers may have eventually realized the prevalence of scoring practices in both private and public sectors within Europe. The subsequent Commission guidelines offer supports to such understanding, noting repeatedly that “case-by-case” analyses are warranted for determining if a social scoring practice is prohibited.Footnote 20 But if that is the case, then what remains subjected to a meaningful “ban”? How will this diluted version of a “ban” become practically implemented? Is this still a “ban” in a substantive sense, or might it be more sensible to see such statutory provision as opening up the possibility for a balanced approach to regulating AI-based scoring practices?
In addressing these questions, this Article offers both a functional analysis for social scoring and a comparative approach to its regulation. The functional analysis here aims to explicate the societal functions of scoring schemes, including the so-called “social scoring,” and the underlying rationales for their creation and applications. The Article argues that scoring systems—AI-driven or not—fundamentally arise from society’s need to optimize allocation of scarce resources. A proper understanding of this premise reveals that social scoring is inevitable across many contexts where allocation requires more than simple egalitarian distribution. Given this understanding, it is inappropriate to address the potential downsides of scoring with a simplistic all-or-nothing approach. Instead, a marginalist approach that addresses the incremental risks introduced by AI when applied to existing or novel scoring contexts is more sensible and practicable.
Contrary to the conventional discourse that treats China’s SCS solely as a cautionary example for the West, this Article suggests that China’s experiences with regulating its SCS may offer useful lessons regarding practical regulatory options and approaches to handling issues arising from scoring systems, including their automated or AI-powered versions. With a more pragmatic understanding of both the problems and available policy responses, EU authorities could potentially transform Article 5(1)(c) from an elusive ban into a useful framework for regulation. By examining the EU AI Act’s social scoring ban through a comparative lens and proposing a nuanced regulatory understanding, this Article hopes to contribute to the larger-scale dialogue on sensible strategies for AI governance.
B. The EU Social Scoring Ban: What are Really “Banned”?
Article 5(1)(c) of the EU AI Act reads:
[T]he placing on the market, the putting into service or the use of AI systems for the evaluation or classification of natural persons or groups of persons over a certain period of time based on their social behaviour or known, inferred or predicted personal or personality characteristics, with the social score leading to either or both of the following: (i) detrimental or unfavourable treatment of certain natural persons or groups of persons in social contexts that are unrelated to the contexts in which the data was originally generated or collected; (ii) detrimental or unfavourable treatment of certain natural persons or groups of persons that is unjustified or disproportionate to their social behaviour or its gravity.
Recital 31 of the AI Act, which offers explanatory insights for Article 5(1)(c)’s social scoring ban, reads:
AI systems providing social scoring of natural persons by public or private actors may lead to discriminatory outcomes and the exclusion of certain groups. They may violate the right to dignity and non-discrimination and the values of equality and justice. Such AI systems evaluate or classify natural persons or groups thereof on the basis of multiple data points related to their social behaviour in multiple contexts or known, inferred or predicted personal or personality characteristics over certain periods of time. The social score obtained from such AI systems may lead to the detrimental or unfavourable treatment of natural persons or whole groups thereof in social contexts, which are unrelated to the context in which the data was originally generated or collected or to a detrimental treatment that is disproportionate or unjustified to the gravity of their social behaviour. AI systems entailing such unacceptable scoring practices and leading to such detrimental or unfavourable outcomes should therefore be prohibited. That prohibition should not affect lawful evaluation practices of natural persons that are carried out for a specific purpose in accordance with Union and national law.
These statutory texts in the adopted final version of the EU AI Act are ambiguous in several aspects, which will likely complicate the interpretation and application of the so-called “social scoring ban.” The Commission guidelines are meant to offer greater clarity, but whether the guidelines may achieve this appears doubtful and remains to be observed. Among others, first, the term “social behaviour,” which is key for defining “social scoring” as practices presumably distinguishable from “general” scoring practices, lacks specificity. To the extent that, as the Commission guidelines admitted, almost all interactive human behaviors can be considered “social,”Footnote 21 the term “social behaviour” could thus in theory encompass virtually all scenarios of social activities where some kinds of scoring practices already apply to persons—as opposed to, say, the quality of air, water, or consumer products—including education, employment, financial borrowing, or government. It should be more than clear to all that a vast ban on introducing AI systems to all established scoring practices makes no sense (scoring systems made available by internet platforms for users to evaluate service providers being an obvious example). It is worth noting that, as previously mentioned, in the first draft of the Act “social scoring” was defined more specifically as that for evaluating the “trustworthiness” of people. Despite its own vagueness, the term “trustworthiness” could, if included in the final version, have made it easier for the ban to focus on scoring practices that make incursions into the areas of social ethics and moral character. That is after all what European authorities fearfully imagined the Chinese system was doing.Footnote 22 Ironically, however, with such coded reference to China eventually removed, the term “social behavior” becomes utterly inadequate for the law’s definition or scoping purposes.
Second, the meaning of “detrimental or unfavourable treatment,” which is intended to limit the application of the ban to only those that inflict negative consequences on people, is also ambiguous. As will be further explained in Section C, scoring systems inherently involve differential treatments of individuals for the purpose of allocating scarce resources, and variation inevitably leads to the perception of disadvantages or detriment by some portion of the group. Such reading thus circles us back again to the question about how any scoring practices may fall outside of the scope of such prohibition. In the discourse surrounding China’s SCS, for example, there used to be a similar debate about whether it is less problematic to use scoring systems for awarding benefits or favorable treatment than for imposing penalties, in particular by government entities.Footnote 23 While some argue that the former is more desirable and legally acceptable, others question the logic of such distinction for the fact that people who do not receive beneficial treatments are, in effect, experiencing “detrimental” treatments.Footnote 24 If one comes to appreciate the “glass-half-full or half-empty” aspect here, then at least for the AI Act’s purpose this qualifying phrase is not as usefully clarifying as may have been intended. Suppose, for example, a ride-hailing platform gives priority to drivers with high ratings in picking up the more lucrative gigs—a commonplace marketing and self-governance practice for ride-hailing platforms in China and the West. While the platform may argue that such scoring system is only intended for rewarding the good behavior of platform drivers, it may well fall into the prohibited category of scoring here because, given the limited number of good gigs, such scoring-based allocation does cause “detrimental or unfavourable treatment” of those lower-score drivers who are left with the “junk gigs.” The Commission guidelines, meanwhile, did not address this potential ambiguity at all in the paragraphs discussing what may consist of “detrimental or unfavourable treatment.”Footnote 25
Third, the provision’s requirement for contextual integrityFootnote 26—prohibiting scoring that uses data generated from contexts unrelated to the one in which the scoring occurs—is also unhelpful for clarifying the ban’s scope. The “relatedness” between the context where behavioral data are generated and that where data are used for scoring purposes is usually a matter of degree instead of a question with binary answers. It is inherently difficult to determine how much “relatedness” is required for a scoring system to use certain kinds of data generated from a different context. While blatantly offensive associations or inferences—for example, using physical attractiveness scores for financial decisions—are evidently problematic, many ordinary instances are less clear-cut. With advanced analytical tools, correlations between seemingly unrelated factors—for example, borrowing history and performance in government projects—might be established to varying degrees. For the original “social credit,” referring to the evaluation for creditworthiness of borrowers by drawing on alternative credit data, there is in fact empirical ground for recognizing the credit-relevance of social network data, legal compliance data, and other data not traditionally used in credit-assessment models.Footnote 27 The question, in other words, is not about whether they are “related” to the credit context—because they in fact often are—but about whether the society has legitimate interest in not allowing the relevant data to be used for the specific scoring purpose, even where the contexts are related. For another example, debates have existed over whether school admission boards should include or exclude applicants’ nonacademic information, such as extracurricular activities or family background information.Footnote 28 The core issue lies, obviously, not in whether some sort of relatedness exists objectively, as it often does, but in when such scoring with such data becomes normatively unacceptable—a difficult question fraught with complexity and considerations over ethics and equity, which, as the Commission guidelines admitted, has to be left for “case-by-case assessment.”Footnote 29
Fourth, the requirement for scoring’s proportionality, though in itself a well-established principle in EU jurisprudence, remains inherently subjective as it is applied to the working of AI systems. As Karliuk admits, the complex reality of AI systems requires that the proportionality’s central rules about balancing be kept abstract and open to context-specific applications.Footnote 30 At the very least, it seems highly unlikely that proportionality, for its own subjectiveness, may help make the statutory scope of the scoring ban more certain and predictable. Again, the Commission guidelines here have to resort to a call for “case-by-case assessment.”Footnote 31
Finally, Recital 31’s assurance that lawful practices are carved out from the ban is almost tautological and adds little clarity to the interpretation of Article 5(1)(c). There are simply too many existing scoring practices predating the AI Act—such as the commercial credit-scoring systems that employ alternative credit data, which, albeit controversial,Footnote 32 have not been found illegal. Provided that a certain long-existing scoring practice continues to be carried on without AI, there is obviously no need for the AI Act to include an explicit exemption or carve-out for such practices. Meanwhile, wherever AI systems are added to existing scoring practices, then even for those practices long established as lawful, a new look could be warranted anyway. For example, while scoring in domains like social welfare administration is long established, given that the introduction of AI was meant to enable greater evaluation capacity with more data that would appear related to the evaluation context questions naturally will arise about whether AI-enhanced scoring for welfare administrationremains legitimate or legal.Footnote 33 From the perspective of compliance, therefore, the carve-out of “lawful practices” is unclear because it does not answer such key questions under the social scoring ban.
In summary, the adopted language of Article 5(1)(c) of the EU AI Act presents considerable interpretive challenges, leaving the provision’s practical application uncertain. As the previous analysis suggests, the social scoring ban could be construed as either broadly impactful, affecting most existing scoring practices, or inadvertently narrow and practically relevant only to few systems. Indeed, such indeterminacy is also not new. Somewhat ironically, a few years ago China had no problem joining a UN pledge that includes language about banning “social scoring,” even though the insertion of such language into the document, as proposed by Germany and Italy, was actually targeting China.Footnote 34 That was likely because China held the position that its SCS, as it was actually designed and practiced, was something different from the dystopian universal scoring system that China was accused of practicing.
C. Why Scoring, and What Can Go Wrong?
While semantic critiques suffice to expose Article 5(1)(c)’s interpretive uncertainties, to tackle the challenges and decide how broad or narrow the ban’s scope requires more substantive and nuanced policy considerations. To that end, I propose that we first take a step back and start with a rather basic question: What, if at all, can be problematic about “social” scoring if it indeed differs from scoring in general?
Again, from the European lawmakers’ perspective, the answer should have been straightforward if “social scoring” refers simply to the imagined Chinese dystopia, where the state-run scoring system is believed to have led to massive surveillance, systemic discrimination, and disenfranchisement. But no one should seriously interpret formally adopted EU laws as striving to resolve fictional problems, let alone foreign ones. By all means, it should not be news that scoring systems are prevalent in Western societies.Footnote 35 The proper interpretation and implementation of the social scoring ban, instead, must be grounded in a more realistic identification of how scoring may go wrong. And to appreciate the potential problems and perils of social scoring, we in turn need to understand why scoring practices have been introduced into numerous social decision-making contexts in the first place. After all, if a particular scoring system only serves to inflict emotional or physical harms and creates inequitable and unjust consequences, such a system would readily be considered perverse and illegitimate. In such a case, a total ban should face little questions or objections. The fact that societies now grapple with how scoring practices may be properly regulated or even banned suggests, however, that scoring plays a vital and indispensable role in the functioning and governance of human affairs.
I. Allocative Scoring
“Scoring” in a societal context, as implied in the text of Article 5(1)(c), is a quantitative method for the evaluation and classification of the scored human subjects. This mechanism becomes necessary in human societies where scarce resources, presumably allocated most often through market transactions, must be allocated and distributed among individuals or groups and, meanwhile, the market pricing system is either not directly available or is inappropriate for such allocative purposes. One extreme but illustrative scenario is the scoring system hospitals use for allocating scarce medical resources, such as mechanical ventilation during the peak waves of Covid-19 pandemic,Footnote 36 when such life-saving resources were available only for a portion of the patients waiting in the emergency rooms.
In many societies, at least some scarce resources are distributed by government authorities based on equal entitlements or egalitarian principles, entitling every citizen to receive an equal amount. In such situations, there is no need for evaluation or classification systems, and scoring mechanisms are thus unnecessary. However, these scenarios are far from universal or even common. And even in areas where formal egalitarian principles are applied, it is well known that implicit rationing mechanisms based on waiting time and quality may still lead to unintended distributive inequities.Footnote 37
Evaluation and classification systems, including scoring mechanisms, are often designed and implemented as alternatives to equal-distribution systems. The primary objective of any evaluation and classification system is to distinguish those who, based on merit, need, or other criteria about deserve, should receive the allocation from those who will not. Scoring provides such necessarily differentiating mechanism. It also offers more granular differentiation than simple binary categories like “eligible/ineligible” or “pass/fail,” which potentially brings about greater precision and allocative efficiency.
Moreover, quantitative scoring methods can be, in various senses, more objective, rigorous, or even “scientific” than loose principles or standards for differentiation—or at least they can be perceived as such by those affected by the process. The greater granularity in differentiation offered by scoring systems, in theory, may also lead to higher allocative efficiency in distributing goods and service.Footnote 38 Recent studies have demonstrated the broad application of scoring systems in various contexts for allocating scarce resource. For instance, in healthcare, scoring systems have been shown to be effective in predicting patients’ in-hospital mortality risks and thus improving allocation of care.Footnote 39 Similarly, in social care for older adults, sophisticated resource allocation processes—which often involve scoring mechanisms—have been extensively used in managing limited resources.Footnote 40
II. Scoring Flaws
Although scoring systems are often expected to improve allocative efficiency, their purported advantages are not universally applicable. Whether a scoring system functions effectively, or even makes sense, depends not only on its own design and implementation but also on the social context in which it operates.
To begin with, scoring systems inevitably interact with a complex web of social norms and understandings. Although scores appear objective due to their quantitative form, they can also be assigned in highly subjective or discretionary manners, presenting thus only a façade of scientific rigor. Many scoring systems operate with limited transparency, making it difficult to assess their true objectivity. If a scoring system generates scores without rigorous, rule-based algorithms and instead relies on the scorer’s discretion, the purported increase in granularity of classification does not necessarily improve allocative efficiency. In practice, one will often encounter instances where scoring indeed fails to yield desirable allocative results, including in contexts with significant stakes. Sometimes the scoring itself fails to meet basic standards of rigor and objectivity and instead merely conceals the decision-making process’s arbitrariness. For example, consider the exam-paper grading practices by inattentive college professors—perhaps in particular in subjects of humanities and social sciences—who may follow no clear rules and whose grades may appear inexplicable to students. Such opaque grading from now and then could lead to student complaints, particularly when grades determine eligibility for valuable opportunities like scholarships, admission to graduate programs, or jobs.
Second, scoring may be found unsatisfactory because any scheme that classifies subjects along certain criteria may be faulted for taking into consideration only some but not all factors that could arguably be relevant or even important for the allocative purpose of the scoring mechanism. For example, while “social credit” or “big data credit” nowadays are almost always critiqued for being unfair and discriminatory, it is important to remember that traditional credit reporting had long been accused for not considering enough factors that could help borrowers demonstrate creditworthiness, especially individuals and small businesses with limited credit histories. Hurley and Adebayo discussed this “thin file” problem, which disproportionately affected young adults, immigrants, and low-income individuals.Footnote 41 The original “social credit” was, in fact, an attempt at responding to such limits to the traditional credit scoring schemes, as it aims to extend credit to the traditionally underserved groups by incorporating alternative data for credit evaluation and scoring.Footnote 42
Similarly, in the context of college admission practices around the world, regardless of the tests or screening processes adopted, there has constantly been criticism about how such evaluations omit factors that could indicate potential success or that may be used for better allocating educational resources, such as improving equity and remedying past discrimination. This has often led to proposals for more comprehensive, multidimensional assessment methods in educational settings.Footnote 43 In the U.S., the thorniest controversy is indeed whether the factor of racial background, definitely a “social” factor, should be taken into consideration for college admission. The Supreme Court of the United States, in a recent decision, reversed course on its prior approach that allowed for the inclusion of “diversity” into the calculus.Footnote 44 But critics, including the elite institutions themselves, believe that the exclusion of that variable contravenes the goals and values of higher education.Footnote 45 Therefore, it is important to realize that, although Article 5(1)(c) of the EU AI Act appears to ban scoring schemes that assess individuals based on “too many” factors, the use of limited considerations in generating scores can also be a flaw rather than a virtue. Excessively narrow scoring criteria can fail to capture the complexities of individual circumstances, leading to inequitable outcomes.
Third, scoring systems are sometimes faulted not for failing to achieve or optimize allocative objectives but for creating undesirable distributive effects. For example, even if credit scores accurately reflect a person’s default risk, high-risk assessments may concentrate in disadvantaged social groups, thereby entrenching existing inequalities. Similarly, scoring systems in education and employment—the key building blocks of meritocracy—are known to reinforce the entrenchment and reproduction of the elite class and existing social hierarchies.Footnote 46 And when law-enforcement agencies apply risk assessment algorithms for the purpose of allocating police forces in different neighborhoods, it is often worried that the diverging levels of police presence may simply lead to higher levels of arrests and prosecution in so-called “high-risk” neighborhoods, where people of disadvantaged groups tend to concentrate and may consequently suffer higher rates of arrests and incarceration.Footnote 47 Indeed, the way in which the scoring systems may enable the process of “automating inequality” is well illustrated by the stories told in Virgina Eubanks’s book on the subject.Footnote 48
Fourth, the application of scoring systems may generate problematic incentives within society. For instance, the original “social credit” systems in Western contexts—big-data credit analytics that utilize social-media data—might improve the allocative efficiency of credit and loans. However, if individuals realize that their personal associations affect their access to credit, they may distance themselves from certain people or groups, potentially leading to societal segregation and decreased social mobility. The Chinese SCS is sometimes cited as an example of such concerns, as media reports often claim that parents’ social scores could affect their children’s eligibility for educational or employment opportunities.Footnote 49 That is unfortunately again mischaracterized. The popular stories about high school students being denied admission to prestigious colleges in Beijing due to their parents’ being blacklisted as defaulters of court decisions are simply untrue.Footnote 50 Such narratives have something to do with the commonplace confusion about the government-employment background-check system with the SCS. That said, incentive problems do manifest in other Chinese contexts of government-run scoring outside of the SCS context. For example, in some of the mega Chinese cities, the scoring system used for determining an individual’s eligibility for permanent residential status (hukou) previously led to rampant record fabrication, fraud, and rent-seeking behaviors. As some of the systems assign scores for individuals who receive certain kind of rewards from the government or other public entities, it is reported that there has formed a black market with intermediaries arranging for, in extralegal manners, the issuance of such rewards to those who are willing to pay a fee.Footnote 51
In addition, one rationale against scoring systems is that they may induce excessive conformism. Behavioral diversity may diminish the extent to which people care about the consequences of the scoring and also understand the rules to be uniform and stringent.Footnote 52 That’s one point the famous Black Mirror episode of Nosedive tries to drive at, although outside of the science-fiction scenario we do observe various mechanisms employed by social media and e-commerce for inducing herd behavior among users.Footnote 53 The deeper challenge here, however, is that the society may legitimately expect and desire varied levels of conformism in different social contexts. We hope that drivers of commercial fleets, for example, all conform to safety rules when they are on the roads,Footnote 54 although we will not expect them to hold our same political opinions. The critical problem here, therefore, is not conformism per se as induced by a scoring system, but the wrong kind of conformism that was not or should not be intended in those contexts where the society values and strives to safeguard diversity.
In sum, there are multiple reasons why social scoring may be considered problematic in society. In some cases, scoring systems are flawed in achieving their allocative efficiency due to a lack of transparency, objectivity, or comprehensiveness. In other instances, the fundamental issue is that classification or evaluative measures may be inappropriate for allocating social resources and goods, despite the theoretical need for allocative efficiency. Moreover, scoring systems can entrench social inequality and create perverse incentives that negatively affect societal cohesion. Such mapping of criticisms against scoring in general is crucial for identifying where and how AI-driven social scoring creates risks that warrant regulatory attention.
III. AI Versus Human-Administered Scoring Systems
Having examined the functions and potential concerns of scoring as a general mechanism for societal resource-allocation, we now turn to the implications of incorporating AI into scoring practices. Discussions of the risks associated with AI often focus on how AI, as a novel technology, introduces dangers previously unknown to the world. However, to achieve a nuanced and practical regulatory approach that addresses the specific challenges posed by AI, the better angle may be to explore the additional risks and benefits that AI brings to preexisting practices such as social scoring.
As explained, human-administered scoring systems have inherent limitations and potential flaws. Human scoring can involve arbitrary discretion without a necessarily reasonable basis. When based on rules or predetermined algorithms, these rules may be poorly designed or inadequate, leaving many important considerations outside the parameters of decision-making. Even with a well-crafted algorithm for score generation, human scorers are susceptible to mistakes and biases. And to illustrate the inconsistency of human scorers, simply consider again how a not-so-attentive professor might grade the first exam paper on top of the pile in front of her as compared to those in the middle: consistency is far from guaranteed.
As the quantitative nature of scoring often lends itself to computerization, scoring systems incorporating AI could improve along those margins over human-administered systems. As Sunstein argue, machine algorithms are less susceptible to the cognitive biases and inconsistencies that afflict human decision-making.Footnote 55 AI-driven scoring systems could potentially mitigate flaws inherent in traditional, human-administered scoring processes. This is an important point because, irrespective of how imperfect or problematic algorithmic scoring may be as a basis for allocative decision-making, the human alternative can be even cruder, more opaque, and less reliable. As long as the optimization targets for the algorithm are set forth clearly enough, a machine scorer does not experience fluctuations in stamina or mood during the scoring process. A machine will not discriminate against an exam writer who includes statements that are offensive to a human scorer’s ear but should otherwise not result in point deductions. Both noise and bias prevalent in the human scoring practices may thus be plausibly reduced. For example, in criminal justice, an area where the “original sin” of automated decision-making and social scoring was often found, systems such as COMPAS have been accused of rendering unjust and inexplicable scores about a defendant’s risk of recidivism, which may be affected by underlying racial and gender biases.Footnote 56 However, as long as one recognizes the legal realist argument about the problematic aspects of human judges, a more useful analysis would involve comparing whether, for specific tasks such as risk evaluation, algorithms actually provide an improvement over human judges. According to the famous study by Jon Kleinberg and his co-authors, for example, there can be very good reason to believe that machine-learning algorithms perform better than human judges in evaluating risks of flight and recidivism for the purpose of bail decisions,. The algorithm can bring about not only improved predictive accuracy, but also greater distributive fairness—in other words, more favorable results to racial minority groups.Footnote 57
Moreover, AI systems can process vast amounts of data, potentially producing scores based on a more comprehensive set of factors and along richer dimensions. This capability addresses the common criticism of human-operated scoring systems, which is the oversight of relevant factors. The ability to consider a broader range of variables could lead to more nuanced and potentially fairer assessments. For example, the underserved population under traditional credit-reporting schemes, such as small agricultural businesses in China, have had a much better chance of getting access to credit lines under the alternative credit assessment regime that, as operated by the government, evaluates these borrowers by looking at data generated from their records at regulatory agencies instead of records of assets and bank cashflows, of which they typically do not have much.Footnote 58 Additionally, AI scoring systems can operate more cost-effectively at scale, covering larger populations and a greater variety of social contexts, thereby improving efficiency.Footnote 59 Even though AI scoring faces criticism as it becomes applied by some European national governments in screening for fraud on social security applications,Footnote 60 it is important to see that, counterfactually, without these systems the manual inspection process would at least have taken a longer time, and that such increased waiting times would not necessarily have led to fair and effective outcomes regarding individuals’ entitlement to government services.
Nonetheless, these very advantages of AI scoring also give rise to concerns. While AI can considerably improve allocation efficiency when the scoring algorithms are sound, it can also exacerbate existing biases and inequalities if the system is flawed or intentionally designed or manipulated for discriminatory or harmful purposes.Footnote 61 In other words, AI introduces an amplification effect in scoring. From the perspective of the EU lawmakers, scoring’s potentially negative impact on the right to non-discrimination seems a particularly weighty concern,Footnote 62 as Recital 31 of the AI Act specifically expressed the worry that AI social scoring of natural persons by public or private actors may lead to discriminatory outcomes and the exclusion of certain groups.Footnote 63
Furthermore, AI’s capability to process diverse data dimensions may augment the risk of unjustifiable social scoring that links multiple contexts which are normatively expected to be kept separate from each other. Even though the contexts may be related in fact, such links could raise ethical concerns about privacy and autonomy. And such cross-contextual scoring capability, while potentially insightful, blurs the lines between different aspects of an individual’s life in ways that may be ethically problematic. For example, one’s sexual orientation, even if related to his or her risk of loan default, should not be used for credit assessment purposes. And the additional concern brought about by AI is that the scoring system may become empowered to collect and process so much data in black-box fashion that people may not be able to realize or test if the undesirable inclusion of data takes place or not.Footnote 64
Relatedly, the perceived objectivity and complexity of AI systems, coupled with their automated operation, could make problematic scoring practices harder to challenge. The “black-box” nature of some AI algorithms can hinder transparency and accountability.Footnote 65 This opacity not only makes it difficult for individuals to understand and contest their scores but also poses challenges for regulators and policymakers to ensure fair and ethical practices. Accountability is also weakened if government officials become accustomed to attributing controversial decisions to the presumed objectivity of a machine when citizens inquire or protest about their scores and corresponding treatments. Eventually, the black-box scoring system may also face challenges on the dimension of societal trust: People may have low confidence in systems because the outputs and the underlying rationales are not easily comprehensible.
Perhaps most concerning is the risk that, as AI scoring becomes ubiquitous, society at large may become less critical regarding the question of where the pursuit of allocative efficiency through scoring is appropriate and where it is not. This normalization of score-based allocation could potentially overlook important ethical considerations and human values that are not easily quantifiable. Or, at least, the black-box scoring algorithms may generate scores based on insights about connections among different contexts that people have not realized exist, thus providing no opportunity for society at large to deliberate as to whether such connections are normatively acceptable.Footnote 66
D. Beyond Bans: Regulatory Options for AI-Based Social Scoring
Despite the foregoing concerns about social scoring and AI-based scoring systems, this section argues that a marginalist, targeted approach to regulate AI-enhanced social scoring, as opposed to a total ban, suffices to address those concerns. In regulating AI-powered social scoring, authorities should avoid the human-AI double-standard inclination,Footnote 67 which imposes unrealistic standards of performance for AI to achieve; authorities also need to move beyond the simplistic, binary options of permitting or banning AI applications.Footnote 68 By focusing on the marginal impact of AI on scoring practices, policymakers should be able to craft more balanced and targeted regulations that mitigate specific risks without giving up on one of society’s most prevalent and useful decision-making tools. Such approach may also be aligned with the EU AI Act’s risk-based framework, provided that it is recognized that not all scoring systems pose unacceptable risks as initially presumed. Overall, it is important and also feasible to find a pragmatic pathway for interpreting and implementing Article 5(1)(c) in order to address AI-driven social-scoring challenges without unnecessarily banning beneficial applications.
First of all, to address potential risks of AI-based social scoring, any literal or rigorous implementation of a “ban” is unnecessary, as authorities can surely utilize existing data-privacy and consumer-protection frameworks, such as the General Data Protection Regulation (GDPR) in the European context. These frameworks are designed to mitigate incremental threats brought about by the increasingly ubiquitous data-processing activities to society’s interests in privacy and safety protection, including risks associated with excessive data aggregation and fully automated decision-making. While debates about the adequacy of current data protection laws in the context of AI advancements are far from settled,Footnote 69 these regulations still offer robust regulatory and enforcement tools over data-intensive scoring practices. Indeed, as the EU lawmakers expressly acknowledged, “existing data protection, consumer protection, and digital service legislation” are expected to work in conjunction with the AI Act to deal with manipulative or exploitative practices facilitated by AI systems.Footnote 70
Social scoring inherently involves the collection and processing of personal information, whose problem is extensively addressed by existing privacy regulations. Although AI may introduce new complexities, the core challenges in social scoring are not fundamentally different from those posed by other AI applications, particularly automated decision-making mechanisms that do not necessarily involve the generation or use of quantitative scores. Therefore, rather than establishing an entirely new legal framework to prohibit AI scoring, policymakers should focus on adapting and strengthening existing frameworks to address AI-specific concerns. For instance, the GDPR’s stance on fully automated scoring, as illustrated by the SCHUFA case, is “prohibition by default.”Footnote 71 However, if individuals are provided opportunities to inquire, dispute, and protest their scores, the scoring process may then no longer be considered fully automated. Establishing effective ex-post mechanisms for individuals to challenge their scores is a reasonable regulatory requirement for operators of consequential scoring systems. Such mechanisms also help maintain and dynamically improve the quality of scoring. In China, for example, the government-run SCS has invested significantly in developing so-called “credit repair” systems, primarily for corporate entities and their senior executives. Specifically, the credit repair systems offer not only mechanisms for credit subjects to exercise their procedural rights of inquiry, protests for errors, and error correction, but also pathways for remedying the negative evaluation they received through investing in better compliance.Footnote 72 In addition, the right to explanation for automated decision-making under GDPR, though controversial, could be further developed to enhance transparency in AI scoring contexts.Footnote 73
Second, drawing lessons from China’s SCS, it may be advisable to clearly define ex ante scopes for both data inputs and use cases in high-stakes AI scoring systems that regulators fear may cause unforeseen issues on a large scale. Since 2021, the Chinese National Credit Information Center has maintained and annually updated two authoritative catalogs, one for input data for public credit assessment and the other for credit-based reward and punishment measures.Footnote 74 This approach effectively limits the scale of cross-contextual scoring and prevents excessive and unjustifiable linkages between factors that the society has an interest in keeping separate. By designing and implementing similar catalogs, regulators can limit the types of data that may be used as inputs for scoring systems; restrict the contexts in which scores can be applied for resource distribution; offer transparency in the process of creating and updating these catalogs and consequently curtail hidden manipulative use of the mechanisms; and facilitate public deliberation on the appropriateness and proportionality of various scoring practices. This approach, with the periodical updating feature, also allows for some flexibility in adapting to new technologies and societal needs while maintaining clear boundaries on the reach of AI scoring systems. However, it is important to note that, under such a framework, China’s SCS is now clearly and decidedly designed not to be “all-encompassing” as previously perceived by some international observers.Footnote 75 A potential drawback is that scoring systems based on predefined catalogs of input will fall far short of fully leveraging the advantages of machine learning and other advanced AI technologies, which rely on the AI’s capability to optimize with as much data as possible and in unpredictable ways. While such scoring may be less effective in terms of allocative efficiency, it is likely to be more acceptable to those risk-averse regulators chaperoning a like-minded populace in terms of their attitudes towards AI.
Third, beyond defensive regulations that guard against identifiable risks, effective regulation of social scoring should also focus on mechanisms that ensure data and algorithm quality, thereby encouraging continuous improvement in the AI scoring systems’ performance. The effectiveness of any scoring system depends on the quality of both its data inputs and algorithmic processes. Poor-quality systems not only fail to achieve allocative efficiency but also cause harm and erode trust in the entire scoring paradigm. In the case of China’s SCS, the primary challenges to the effective implementation of many conceived projects and initiatives have been related to data and algorithm quality.
To enhance the functionality of scoring systems, there must be more rigorous and directed data-quality regulations. Article 10 of the AI Act represents efforts towards such direction.Footnote 76 And while data protection laws often emphasize procedural controls like notice and consent, a more substantive approach to data-quality regulation should include: (1) mandatory data-quality audits for high-impact scoring systems; (2) establishment of minimum data-quality standards for AI scoring inputs; and (3) regular reporting requirements on data-quality metrics. As a general matter, meanwhile, open data and data sharing offer one additional mechanism through which data-quality issues may become more timely identified and addressed. As Werbach comments, one lesson for data-driven governance that China’s Social Credit System may offer the West is indeed that China strives not only to aggregate data in the public credit data system, but also to make such data available to many government departments as well as private-sector actors.Footnote 77 That is important not only because valuable data resources, processed at the government’s cost, are made available for the utility of a broader group of users, but also because there are now simply many more pairs of eyes looking into the data so that quality issues are much more easily spotted.
Regarding algorithm quality, especially that for scoring systems employed by public authorities, one concern is that the authorities using such scoring algorithms may have inadequate incentives to ensure their optimal performance. One potential solution to the problem of algorithm quality shortfall, as proposed by Levmore and Fagan, is to allow or even require competing algorithms for high-stake scoring systems.Footnote 78 The competing algorithm proposal may also provide answers to the often identified shortcoming of conventional anti-discrimination law, which is the difficulty of demonstrating algorithmic biases by affected individuals.Footnote 79 Additionally, establishing independent algorithm-review boards to assess and validate scoring algorithms and mandating regular algorithm audits and performance assessments would promote a dynamic environment of continuous improvement and accountability in AI scoring systems.
Fourth, policymakers and regulators should consider adopting an experimental approach to the implementation of AI scoring systems. Drawing on China’s experiences in unfolding its SCS, authorities may initially introduce new scoring systems as soft guiding tools rather than mandatory decision-making instruments. The Chinese market regulatory authorities, for example, from time to time implement scoring mechanisms to evaluate business entities’ compliance with market regulations and corporate social responsibility.Footnote 80 Very often these scores, once generated, may be accessible primarily by regulators as well as the evaluated companies themselves so that they are better informed about how regulatory authorities view their performance in those areas, upon which they may consider taking self-regulatory measures in order to preempt regulatory actions.Footnote 81 In other words, using scores as soft-law mechanisms in such a manner can facilitate communication between regulators and the regulated entities, allowing for flexibility and timely adaptation if scoring practices prove inappropriate or ineffective. This approach permits real-world testing of AI scoring systems while minimizing potential negative impacts. If EU regulators and courts may consider excluding such soft-law and experimental uses of AI scoring from the scope of Article 5(1)(c)’s ban, that would seem a reasonable approach to the statutory provision’s interpretation and application. More generally, establishing regulatory safe harbors or similar legal mechanisms to exempt experimental soft-scoring activities may work to encourage responsible innovation and allow for the gradual refinement of AI scoring practices based on empirical evidence and stakeholder feedback.
The foregoing regulatory options offer practicable alternatives to outright bans on AI-based social scoring. By focusing on utilizing current regulatory structures, clearly delineating the boundaries of AI-scoring applications, ensuring the quality and reliability of data and algorithms, and promoting cautious experimentation, a relatively more nuanced regulatory framework will allow society to harness the potential benefits of AI in scoring systems while mitigating their incremental risks. As noted, such a balanced approach probably aligns even better with the EU AI Act’s risk-based philosophy and paves the way for responsible innovation in AI-driven social scoring. Moreover, as the above discussion shows, while the many dystopian accounts about the Chinese SCS are not based on facts, China’s experiences in regulating the related algorithmic evaluation and scoring practices to restrain their potential abuses could in fact serve as useful lessons for European and other authorities in developing their own regulatory options for AI scoring.
E. Conclusion
The EU AI Act’s purported ban on social scoring necessitates careful consideration to avoid unintended consequences and to harness the potential benefits of AI-enhanced scoring practices. Article 5(1)(c) should be interpreted narrowly to define the scope of the ban effectively. Such an interpretation ensures that most societal scoring practices—especially those enhanced by AI to improve functionality without introducing significant new risks—are not subjected to an outright prohibition. Instead, these systems should be appropriately regulated to maximize their benefits while mitigating their potential harms. The Commission guidelines have seemingly moved towards such direction, but the step is not big or decisive enough.
The interpretation and implementation of Article 5(1)(c) will play a pivotal role in shaping the future of AI-enhanced social scoring within the EU and beyond. Regulation of AI-based social scoring should extend beyond mere risk-mitigation. It should also aim to promote greater allocative efficiency, optimizing the objectives that scoring systems are designed to achieve. Therefore, regulatory frameworks must encourage innovation in scoring practices that demonstrably enhance efficiency and fairness, while simultaneously addressing potential risks and ethical concerns. Additionally, it is important that regulations clearly define, ex ante, the characteristics of egregious scoring systems that warrant prohibition. This clarity is essential in providing certainty for innovators and ensuring that beneficial scoring practices are not inadvertently stifled. Decisions to ban specific scoring systems should be grounded in a thorough understanding of the factual situations, rather than being driven by speculative fears or dystopian scenarios. By doing so, regulations can be both justified and effective, ensuring that only genuinely harmful practices are restricted.
Acknowledgements
The author thanks participants at the International Workshop on Comparative AI Law organized by Peking University School of Transnational Law (September 28-29, 2024) for their valuable feedbacks. Thanks also to the editors of the German Law Journal for their meticulous editorial work.
Competing Interests
The author declares none.
Funding Statement
Research underlying this Essay is supported by a National Social Science Fund Major Project (21&ZD199) and a grant from Wuhan East Lake High-Tech Development Zone National Comprehensive Experimental Base for Governance of Intelligent Society.