top of page

Why Trustworthiness Matters More Than Trust in AI

  • Writer: Brydon Wang
    Brydon Wang
  • 4 days ago
  • 13 min read

By Dr Brydon Wang


The wrong question

Artificial intelligence is frequently described as having a trust problem.

As automated systems become more capable and more deeply embedded within society, governments are rapidly seeking to engender public trust in AI, technology companies continue to promise ever-increasing trustworthy AI and regulators debate how trust can be maintained. But while these discussions are understandable, they begin from the wrong premise.


Trust is not the problem but an outcome.


When trust breaks down, our instinct is often to ask why people no longer trust a system, institution or technology. Yet this approach focuses on the response rather than the cause. A more useful question is to ask whether the object of trust has behaved in a manner worthy of trust in the first place.


This distinction formed the basis of my doctoral research on automated decision-making systems and the law. Drawing on trust literature and the seminal model developed by Mayer, Davis and Schoorman, I argued that debates about artificial intelligence should focus less on trust itself and more on trustworthiness.


Trust within a particular scenario is an attitude or willingness to be vulnerable to a given level of risk. In business, we have developed sophisticated methods for assessing both the probability and severity of risk. But where risks are poorly understood or untested (in the way that we haven’t really seen the consequences of AI deployment shake out), we need to return to the underlying mechanisms through which trust forms.


Figure 1 sets out the model of trust and trustworthiness developed during my doctoral research. Building on the work of Mayer, Davis and Schoorman, the model distinguishes between trust as an attitudinal response and trustworthiness as a set of signals emitted by the intended recipient of trust. Importantly, the model suggests that trust itself is not the starting point of governance. Rather, governance should focus on the signals of trustworthiness and the mechanisms through which those signals are communicated, interpreted and evaluated.


As observed by Mayer, Davis and Schoorman, trust is an attitudinal response that emerges from an interaction with a person’s individual propensity to trust and the signals of trustworthiness received from an intended recipient of trust.


Governments and businesses are not sure of the risks of AI and the technology and the companies that sit behind various market offerings are unproven at the moment. Accordingly, governments and business (and us as individuals) can be argued to be taking a trusting stance that, while we haven’t received all the signals of trustworthiness, we’re going to take that gamble, that willingness to accept exposure the risks of AI despite incomplete signals of trustworthiness.

But when we talk about trustworthy AI, when we regulate for it, when governments seek to increase trustworthy behaviour, we need to focus on the signals of trustworthiness.

 

Trustworthiness and the signals we send

Trustworthiness is a quality signalled by the person, institution or system seeking that trust. The Mayer model identifies three primary signals of trustworthiness: ability, integrity and benevolence. In the context of AI systems, I argue:


  • Ability observes the system’s functionality. Can the system perform the task it claims to perform?

  • Integrity speaks to whether the system behaves in a way that aligns with our shared value systems. Does it comply with our laws, industry norms, wider public expectations?

  • Benevolence concerns that interface of the technology and the individual user. Is the technology designed with the individual’s best interest and can it be said to be built with a positive orientation towards the individual user.


These three dimensions remain enormously useful. They provide a practical framework for understanding why trust emerges and why it breaks down. Yet as my research progressed, I became increasingly convinced that not all three dimensions carry equal weight.


Contemporary AI governance has devoted enormous attention to ability. We evaluate model performance, accuracy, robustness, reliability and cybersecurity. Increasingly, we are also developing mechanisms to assess integrity through auditing, compliance obligations, transparency requirements and governance frameworks. But my research suggests these signals operate hierarchically rather than independently. We need to understand what benevolence is and how such a technology is meant to behave in the individual’s best interest before we can determine what this standard we are holding these systems to, what our laws should look like. And when we have these standards, these laws, we can then set out the technical briefs that determine the functional requirements that allow us to see if a system actually performs with ability.


Unfortunately, our understanding of benevolence and benevolent AI remains comparatively underdeveloped. And this is surprising because many of the most significant controversies in artificial intelligence and automated decision-making systems are ultimately disputes about benevolence rather than competence. The central question is rarely whether a system works (ability). The question is whether it serves the interests of those affected by it (benevolence) even if performs strictly within the letter of the law and industry standards (integrity).

 

Why Benevolence Matters More Than Integrity in AI Governance

Much of our approach to Trustworthy AI operates along the signal line of integrity as the solution to this problem. The premise is that if organisations clearly articulate their values and consistently act in accordance with them, integrity is presumed to follow. The difficulty with this presumption is that integrity requires a reference point.


Integrity means acting consistently with a set of values, principles or commitments, what is termed ‘value congruence’. But whose values are we talking about? Whose conception of fairness? Whose view of acceptable risk? A system can operate with perfect integrity according to one set of values while appearing deeply problematic according to another.

The increasingly shrill debates about artificial intelligence arise precisely because these questions remain contested.  


For example, when technology companies say they are taking publicly available information and training data, this is often countered by the argument that this is unauthorised use of the intellectual property of other creators that is then used to generate outputs that then deprive these same creators of the financial benefits of their original creations used to generate these outputs. The arguments of theft against the clear benefits of AI being applied to critical situations to develop new vaccines or help speed up calculations and findings to drive efficient outcomes and reduce waste is problematic. The dispute is not really about capability but about competing conceptions of what constitutes benevolent conduct.


What is clear is that we have a definitional problem with the term ‘AI’, that it is used to describe far too wide a spectrum of data-focused technologies, and that is now impairing our ability to determine a set of values that would be helpful to building, preserving and repairing trust and broken trust.


But if we then have to investigate each instance and type of AI to formulate these standards, we could very well be caught in a never-ending cycle of debate and reformulation… perhaps useful when we actually have that standard, because then we may need this constant churn to make sure these standards and laws remain fit-for-purpose.

But to get to that initial standard / law, we need to go up one level to the signal of benevolence and determine what constitutes good (benevolent) conduct.

 

The challenge of benevolence

Benevolence is often misunderstood as kindness.

Within trust research, benevolence has a more specific meaning. It concerns whether those affected by a decision have reason to believe their interests have been considered and protected. This immediately raises a difficult question.


How do we determine what benevolent behaviour looks like? There is no universally accepted answer. However, because AI involves the use of data / information, how we consider the collection, use, storage, correction and destruction of information, ie. our privacy laws, provides a good starting point.


Public discussions often refer to privacy as though it were a single concept. Yet scholars such as Roger Clarke have demonstrated that privacy encompasses multiple dimensions and interests. Informational privacy, bodily privacy, territorial privacy, decisional privacy and communications privacy may all be implicated in a single technological system.


Accordingly, determining what constitutes benevolent conduct therefore requires more than compliance with a technical rule. It requires ongoing negotiation between competing interests, values and expectations. The same challenge appears in debates about fairness, transparency, accountability and consent. The difficulty is not quite simply that we disagree about outcomes but that we often disagree about the values that should guide those outcomes.

 

Towards mechanisms of benevolence

If benevolence is the most important dimension of trustworthiness, then AI governance must devote greater attention to the mechanisms through which benevolence is identified, negotiated and maintained. Fortunately, we are not starting from nothing. Many of the governance structures already emerging around artificial intelligence can be understood as attempts to operationalise benevolence.


The OECD AI Principles did not emerge from a technical exercise. They emerged through processes of consultation, negotiation and consensus-building across governments, industry, academia and civil society. Their legitimacy derives not merely from their content but from the process through which they were developed.


Similarly, privacy frameworks, human rights instruments and public participation processes represent institutional attempts to create shared understandings of acceptable conduct. My own research has explored several mechanisms that may contribute to this process.


Consensus mechanisms

The first mechanism is consensus.

The term is perhaps most familiar from blockchain systems, where consensus mechanisms are used to establish agreement about the state of a distributed ledger. In reality, the significance of consensus extends far beyond technology. Courts, parliaments, planning processes, standards bodies and public consultation frameworks can all be understood as forms of consensus mechanism. They provide structured ways for societies to reach decisions despite disagreement.


This distinction is important because many contemporary discussions about AI governance assume that benevolence can be determined by technical experts alone. Yet questions of fairness, privacy, public interest and acceptable risk are rarely technical questions, and are, in contrast social questions that require collective judgement.


Similar challenges arise in infrastructure planning. In my work supporting the development of Renewable Energy Zone Orders in Victoria, one of the recurring themes was social licence. Communities were rarely debating engineering specifications. Instead, they were debating fairness, participation, benefit sharing and procedural legitimacy. In other words, they were debating what benevolent conduct looked like in practice. The challenge was not simply reaching a decision but to the establishment of a legitimate process through which competing interests could be heard, assessed and incorporated into that decision. The technical standards and regulatory obligations that ultimately emerge from these processes are important, but they follow an earlier question: has the community had a meaningful opportunity to influence the conception of the public interest being advanced?


In earlier work on blockchain-enabled construction contracts, I explored the role of information oracles and consensus mechanisms. Oracles perform the function of bringing information from outside a system into a form that can be recognised and acted upon within it. While developed in a technical context, the concept provides a useful analogy for AI governance. Determining what constitutes benevolent conduct similarly requires mechanisms through which competing perspectives can be brought to the surface, evaluated and translated into standards capable of guiding decision-making. Community engagement processes, public consultation and other forms of consensus-building can therefore be understood as governance analogues of the consensus mechanisms that underpin distributed systems.


Training data, labels and classifications do not emerge naturally. Human decisions determine what data is collected, how it is categorised, which features are considered relevant and which outcomes are considered desirable. Every AI system therefore contains a series of human judgements that are often obscured beneath technical language.

The question is therefore not whether humans should remain involved, they are already involved. Instead, the more essential question is where those humans sit within the system and whose perspectives they represent.


For this reason, many discussions about human-in-the-loop design remain incomplete. A human reviewer inserted at the end of an automated process may improve oversight, but this alone does not address the underlying challenge of benevolence. A more useful concept that I argued on the recent AI, Law and Justice panel at the University of Queensland is community-in-the-loop governance.


If benevolence concerns the interests of those affected by a system, then those affected communities should play some role in determining the values, assumptions and trade-offs embedded within it. The objective is not unanimity. It is legitimacy. Consensus mechanisms create pathways through which competing interests can be surfaced, negotiated and ultimately incorporated into governance arrangements.


Viewed this way, consensus becomes a signal of benevolence because it demonstrates that institutions have not simply imposed their own conception of the public interest. Instead, they have established processes through which competing perspectives can be heard and reconciled.

 

Mutual vulnerability transparency

Consensus alone is not sufficient. This is because institutions frequently invite participation while retaining control over the information necessary to evaluate their decisions. If done this way, public consultation can become performative (and perfunctory) if participants are expected to reveal their interests while the institution itself remains comparatively opaque.

This observation led me to develop the concept of mutual vulnerability transparency.


Most transparency obligations operate in a single direction. Citizens disclose information to governments, or consumers disclose information to companies. Similarly, users are expected to disclose information to digital platforms. But the institution that receives information while revealing comparatively little about its own uncertainties, assumptions or limitations is a process that distorts this notion of transparency and creates a uniquely ironic type of power imbalance.


Trustworthy relationships rarely operate this way.


One of the reasons trust develops between individuals is that vulnerability is often reciprocal. Each party reveals something of themselves and, in doing so, creates the conditions under which trust can emerge. Institutions seeking trustworthiness should be held to a similar standard.


What is called for instead is that there should be transparency around what is uncertain, what the limitations are, the dependencies and risk vectors. The purpose is not simply to disclose information but to disclose vulnerability. Institutions should be willing to reveal where their assumptions may be wrong, where data may be incomplete, where models may perform poorly, where trade-offs have been made and why they were made, and critically, where human judgement remains necessary to address these design decisions that have been embedded in the design of the AI system. Trustworthiness emerges not from the appearance of certainty but from the willingness to acknowledge uncertainty.

Mutual vulnerability transparency transforms transparency from a compliance exercise into a relational practice. Rather than simply exposing information, it demonstrates a willingness by institutions to share in the vulnerability created by their decisions.

 

Seamfulness rather than seamlessness

The third mechanism is seamfulness.

Technology systems are often designed to appear seamless and we seem to have turned this into a celebration of vaunted efficiency. Decisions are presented as frictionless, objective and inevitable... and complexity or, rather, nuance is hidden. Human intervention thus disappears from view or is seen to be problematic and inefficient. The result is then an impression that the system simply produces the correct answer.


However, the appearance of seamlessness can itself undermine trustworthiness. Seamless systems obscure the choices, assumptions and compromises that shape their outcome. They conceal uncertainty precisely where uncertainty matters most. In my research, I argued that trustworthy systems should instead embrace seamfulness.


A seam is the place where two systems meet. It is the point at which assumptions become visible and where decisions can be questioned. Seamful systems reveal the boundaries between human and machine judgement. They expose areas of uncertainty. They acknowledge that alternative decisions could have been made.


Most importantly, seamful systems create opportunities for challenge and contestation.

There have been steps of course in recent times with the UX of various LLM models to correct this. The ‘three dots’, or the rolling notations that let a user know what searches and processes the LLM is undertaking to generate a response… but care should be taken to separate seamfulness from mere transparency. Transparency focuses on visibility. Seamfulness focuses on intelligibility. The objective is not merely to reveal information but to reveal the structure of decision-making itself.


A seamful AI system allows individuals to understand where decisions originate, where discretion is exercised and where accountability resides. In this sense, seamfulness becomes another signal of benevolence. It demonstrates respect for the agency of those affected by a decision. Rather than asking people to accept outcomes on faith, it invites them to understand, question and participate in the governance of the system itself.


Our impaired ability to read the signals of benevolence

The challenge facing AI governance is therefore larger than determining whether systems are accurate, transparent or accountable. Increasingly, we must ask whether our existing mechanisms for identifying benevolent behaviour remain fit for purpose.


Historically, humans inferred benevolence through signals that operated through indicators such as: the attention received, effort, memory, responsiveness and apparent understanding. These indicators communicate this signal of benevolence and that the other person has adopted a positive orientation towards our interests. Large language models are now capable of reproducing many of these signals at unprecedented scale.


The difficulty is that we may incorrectly infer benevolence from personalisation. A response can feel deeply tailored to us because it is highly responsive to our question. But this responsiveness to a prompt is not the same thing as responsiveness to a person. The model may generate outputs that appear attentive to our interests while possessing no independent understanding of our welfare, values or long-term interests.


This creates a new challenge for AI governance. The issue is not simply that systems can be wrong. The issue is that they may increasingly become capable of simulating the signals through which humans have traditionally identified benevolent behaviour.


And because trustworthiness ultimately depends upon our ability to assess whether another party is acting with a positive orientation towards our interests, if technological systems become increasingly capable of reproducing the appearance of benevolence without the underlying characteristics that traditionally supported it, then our existing mechanisms for evaluating trustworthiness may become less reliable. In short, this is why so many users are being caught out: we have been trained to detect mistakes or flaws in thinking that emerge in writing, but the phenomenal fluency (backed by deep computational models) makes it so much harder for us to find faults with the generated output. We’re more likely to switch from a right answer we may know under the pressure of the persuasiveness of a generated model.


This is why the future of AI governance may depend not simply on regulating technology, but on developing new ways of recognising benevolence itself. The question is no longer whether artificial intelligence can earn our trust but whether we can still recognise the signals of trustworthiness when the technologies around us become increasingly capable of simulating them. If trust is an outcome, then the challenge for AI governance is not teaching people to trust more but to ensure that the institutions deploying these technologies remain worthy of that trust. BTW

 

Related research

 

 
 

© Brydon Timothy Wang

bottom of page