The Digital Trolley Problem: AI Ethics in Life-or-Death Decisions
An investigation into how AI systems handle life-or-death decisions: autonomous vehicles, medical triage, military targeting, cultural moral reasoning, liability frameworks, and the alignment problem as a trolley problem at civilizational scale.
The Trolley Problem Goes Digital
In 1967, the philosopher Philippa Foot introduced a thought experiment that has tormented ethics students ever since: a trolley is heading toward five people tied to the track. You can pull a lever to divert the trolley to a side track, where it will kill one person instead of five. Do you pull the lever?
For decades, the trolley problem was a tool for exploring tensions between utilitarian and deontological ethics in philosophy classrooms. It was deliberately artificial — a stripped-down scenario designed to isolate moral intuitions from the complexity of real-world decision-making. No one expected it to become an engineering specification.
Then self-driving cars happened. Suddenly, the trolley problem was not hypothetical. Engineers at Waymo, Tesla, Cruise, and other companies had to program vehicles that might, in rare but real circumstances, face analogous decisions: swerve left and hit a pedestrian, swerve right and hit a barrier (risking the passengers), or continue straight and hit a group of people. The philosophical thought experiment became a software requirement.
And autonomous vehicles are only the beginning. AI systems are now making or assisting with life-or-death decisions in medical triage, military targeting, criminal sentencing, and resource allocation during emergencies. The digital trolley problem is not a single ethical dilemma. It is a category of problems that arises whenever an AI system has the power to affect who lives and who dies — and the authority, explicit or implicit, to make that choice without meaningful human oversight.
This investigation examines how the trolley problem manifests in real AI systems, who programs the values that guide those systems, and what it means for human agency when life-or-death ethics are encoded in software.
Autonomous Vehicle Ethics: The MIT Moral Machine
The most ambitious attempt to empirically map human moral intuitions about AI life-or-death decisions is the MIT Moral Machine experiment, published in Nature in 2018 by Edmond Awad, Sohan Dsouza, Richard Kim, Jonathan Schulz, Joseph Henrich, Azim Shariff, Jean-Francois Bonnefon, and Iyad Rahwan.
The Moral Machine presented participants with variations of the trolley problem adapted for autonomous vehicles: scenarios where a self-driving car’s brakes fail and it must choose between hitting one group of people or another. Over 40 million decisions from participants in 233 countries produced a dataset of unprecedented scale.
The findings were revelatory — and troubling. Three nearly universal preferences emerged: spare humans over animals, spare the many over the few, and spare the young over the old. These preferences appeared across cultures, suggesting some degree of universal moral intuition.
But underneath the universals, the Moral Machine uncovered dramatic cultural variation. The researchers identified three cultural clusters with distinct moral preferences. The “Western” cluster (North America, Europe, and countries with strong European cultural influence) showed a relatively strong preference for sparing the young over the old. The “Eastern” cluster (many Asian countries) showed a weaker age preference but a stronger preference for sparing those of higher social status. The “Southern” cluster (Latin America and former French colonies) showed the strongest preference for sparing women and the strongest preference for inaction (not swerving) over action.
These findings pose a fundamental challenge for autonomous vehicle ethics. If moral intuitions vary systematically across cultures, whose intuitions should a self-driving car follow? Should a Tesla in Tokyo make different decisions than a Tesla in Toulouse? Should the ethics settings be adjustable by the owner, the manufacturer, or the government?
The automobile industry has largely avoided answering these questions directly. Most autonomous vehicle companies describe their systems’ decision-making in terms of following traffic laws and minimizing harm in general, without specifying how the system chooses between harms when a choice is forced. This is understandable from a liability perspective — no company wants to publicly announce the algorithm by which their product decides who dies — but it is ethically unsatisfying. The decisions are being made. Someone programmed the values. The question of who and how is simply being hidden from public scrutiny.
Medical Triage AI
Medical triage — the process of determining which patients receive treatment first when resources are scarce — is inherently a trolley problem. Every allocation decision that directs resources to one patient necessarily withholds them from another. When AI systems assist or automate triage, the trolley problem becomes algorithmic.
AI triage systems are deployed in emergency departments, intensive care units, and disaster response scenarios. They evaluate patients based on clinical data — vital signs, lab results, medical history — and assign priority scores that influence treatment decisions. During the COVID-19 pandemic, several hospitals used or considered AI systems to allocate ventilators and ICU beds.
The ethical challenges are immense. Triage AI systems encode specific value judgments about whose life matters more. A system that prioritizes patients with the highest probability of survival implicitly devalues the lives of patients with pre-existing conditions, disabilities, or advanced age — groups that are disproportionately non-white and lower-income. A system that assigns equal probability of survival to all patients ignores clinical reality. Every design choice is an ethical choice.
The Veterans Health Administration’s CARE (Clinical Assessment, Reporting, and Tracking) system uses predictive analytics to identify patients at risk of deterioration. The system has been credited with improving outcomes, but it has also raised concerns about algorithmic bias: if the training data reflects historical disparities in care (which it does), the system may perpetuate those disparities by systematically underestimating the acuity of patients from underserved populations.
A landmark study published in Science in 2019 by Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan demonstrated that a widely used healthcare algorithm exhibited significant racial bias. The algorithm used healthcare spending as a proxy for healthcare need, but because Black patients had historically received less healthcare spending than white patients with the same conditions, the algorithm systematically rated Black patients as less sick. At a given risk score, Black patients were significantly sicker than white patients. The algorithm was not designed to be racist. It was designed to predict costs. But predicting costs in a system with embedded racial disparities produces racist outcomes.
This is the trolley problem in its most insidious form: not a dramatic choice between hitting one person or another, but a quiet, systematic undervaluation of certain lives, built into the algorithm, invisible to the individual patients it affects, and replicated millions of times across the healthcare system.
Military Targeting Decisions
The application of AI to military targeting represents the trolley problem in its most consequential and most morally fraught form. Lethal autonomous weapons systems (LAWS) — systems that can select and engage targets without human intervention — force questions about life-or-death decision-making into the sharpest possible focus.
The debate over autonomous weapons has been active at the United Nations since 2014, when the Convention on Certain Conventional Weapons convened its first informal expert meeting on LAWS. Over a decade later, no binding international agreement has been reached. The Campaign to Stop Killer Robots, a coalition of NGOs, has called for a preemptive ban on fully autonomous weapons. Several countries, including Austria, Brazil, Chile, and Mexico, have supported such a ban. The major military powers — the United States, Russia, China, the United Kingdom, and France — have opposed it.
The arguments for autonomous weapons mirror the arguments for automation in general: machines can process information faster than humans, are not subject to emotional bias, do not commit war crimes out of fear or rage, and can operate in environments too dangerous for human soldiers. Proponents argue that autonomous weapons could actually reduce casualties by making more precise targeting decisions.
The arguments against are both practical and principled. On the practical side, AI targeting systems are vulnerable to adversarial attacks, sensor errors, and environmental conditions that current technology cannot reliably handle. The consequences of a false positive — misidentifying a civilian as a combatant — are irreversible. On the principled side, many ethicists argue that the decision to take a human life must involve human moral judgment, regardless of how accurate a machine decision might be. This is not an argument about capability; it is an argument about the moral weight of the decision itself.
The distinction between “human-in-the-loop” systems (where a human must approve each engagement), “human-on-the-loop” systems (where a human monitors the system and can override it), and “human-out-of-the-loop” systems (fully autonomous) is central to the debate. In practice, the distinction is less clear than it appears. If an AI system presents a targeting recommendation and a human has three seconds to approve or override it in a high-stress combat environment, the “human in the loop” is not exercising meaningful moral judgment. They are rubber-stamping a machine decision under time pressure.
The United States Department of Defense Directive 3000.09, issued in 2012 and updated in 2023, requires that autonomous weapons be designed to allow commanders and operators to exercise “appropriate levels of human judgment” over the use of force. What counts as “appropriate” is left deliberately vague, creating a policy framework that sounds restrictive but accommodates virtually any level of automation.
Whose Ethics? The Programming Problem
Every AI system that makes or influences life-or-death decisions embodies a set of values. Those values had to come from somewhere. Someone — a programmer, a product manager, a data scientist, a committee — decided what the system should optimize for, what trade-offs it should make, and what constraints it should respect.
This is the programming problem: whose ethics get encoded into the system? And through what process?
The question has no satisfying answer. In most cases, the values embedded in AI systems are determined by small teams of engineers at private companies, operating without public oversight, democratic accountability, or formal ethical review. The engineers may be well-intentioned. They may have consulted ethics guidelines. They may have conducted internal reviews. But they are not elected. They are not representative. They are not accountable in any meaningful way to the people whose lives their decisions affect.
The programming problem is exacerbated by the opacity of modern AI systems. In rule-based systems, the values are explicit: you can read the code and see the rules. In machine learning systems, the values are implicit: they are embedded in the training data, the loss function, the architecture choices, and the optimization process. The engineers who build these systems often cannot fully articulate the values the system has learned, let alone justify them.
This creates what the philosopher Luciano Floridi has called a “responsibility gap” — a situation where no individual or institution is clearly responsible for the moral decisions made by an AI system. The engineers designed the system but did not intend the specific decision. The company deployed the system but did not make the individual choice. The user relied on the system but did not understand its decision process. Responsibility diffuses until it disappears.
Cultural Differences in Moral Reasoning
The Moral Machine experiment revealed that moral intuitions vary systematically across cultures. This finding has profound implications for the global deployment of AI systems that make or influence life-or-death decisions.
Consider the preference for sparing the young over the old. In cultures that emphasize the value of elders and the accumulated wisdom of age, a system that systematically prioritizes saving younger lives may be experienced as morally offensive. In cultures that emphasize the potential of youth and the principle of giving each person the fullest possible life, the same system may be experienced as morally obvious.
Or consider the trolley problem’s foundational distinction between action and inaction. In many Western ethical traditions, there is a morally significant difference between doing harm (pulling the lever to divert the trolley) and allowing harm (not pulling the lever). In some Eastern ethical traditions, the distinction carries less weight: you are responsible for outcomes you could have influenced, regardless of whether you acted or refrained from acting.
These cultural differences are not marginal variations on a shared moral framework. They are fundamental disagreements about what matters, who matters, and why. Any AI system deployed globally — from autonomous vehicles to medical triage to content moderation algorithms — must navigate these disagreements, and there is no neutral position from which to navigate them.
The default approach — building AI systems that reflect the moral intuitions of their creators (predominantly young, male, educated, Western, and wealthy) and deploying them globally — is a form of moral imperialism. It imposes one culture’s values on all cultures, not through argument or persuasion but through code.
The alternative — building AI systems with culturally adjustable ethics — raises its own problems. If a self-driving car in one country values elderly lives more than a self-driving car in another country, we have created a system where the value of a human life depends on geography. This is deeply uncomfortable, but it may simply be making explicit what is already true in human moral practice.
Liability When AI Makes Fatal Choices
When an AI system makes a decision that results in death, the legal question of liability becomes urgent. Who is responsible? The developer who designed the algorithm? The company that deployed it? The user who relied on it? The regulator who approved it?
Current legal frameworks were not designed for AI decision-making and do not map cleanly onto it. Product liability law holds manufacturers responsible for defective products, but an AI system that makes a reasonable decision based on its programming — choosing to hit one pedestrian rather than five — has not necessarily “malfunctioned.” Negligence law requires that someone breach a duty of care, but if the AI system’s decision-making process is opaque, establishing what the duty of care required in a specific situation is difficult.
The European Union’s AI Liability Directive, proposed in 2022, attempts to address some of these gaps by creating a framework for allocating liability for AI-related harms. The directive introduces a presumption of causality in certain circumstances: if a defendant failed to comply with AI regulations and the plaintiff suffered harm of the kind that compliance was meant to prevent, causality is presumed unless the defendant proves otherwise.
The insurance industry is adapting as well. Autonomous vehicle insurers are developing frameworks that shift liability from individual drivers to vehicle manufacturers and AI system developers. This shift reflects the reality that when a self-driving car is operating autonomously, the “driver” is not making decisions and should not bear the liability for those decisions.
But liability frameworks do not resolve the underlying ethical question. They determine who pays when something goes wrong. They do not determine what “wrong” means. An AI system that correctly implements its programmed values — that makes the decision its designers intended it to make — has not, by any legal or engineering standard, malfunctioned. If the programmed values are morally wrong, the failure is in the programming, not the execution. And holding engineers personally liable for moral judgments made under conditions of radical uncertainty is neither fair nor effective.
The Alignment Problem as a Trolley Problem at Scale
The AI alignment problem can be understood as the trolley problem scaled to civilizational scope. Alignment asks: how do we ensure that AI systems pursue objectives that are consistent with human values? The trolley problem asks: whose values? Which objectives? When values conflict, which take priority?
Every trolley problem variant involves a trade-off between competing values: lives versus lives, action versus inaction, certainty versus risk, individual rights versus collective welfare. The alignment problem involves the same trade-offs, but at the scale of systems that affect billions of people simultaneously.
A recommender algorithm that optimizes for engagement over well-being is making a values trade-off. A criminal justice algorithm that prioritizes public safety over individual rights is making a values trade-off. A resource allocation algorithm that maximizes efficiency over equity is making a values trade-off. These are all trolley problems, but they are trolley problems that run continuously, at machine speed, across entire populations, without the moral pause that the trolley problem thought experiment is designed to provoke.
The alignment problem is harder than the trolley problem because it cannot be solved by choosing the right values. Even if we could specify the right values — a task that two millennia of moral philosophy have not accomplished — we would still face the technical challenge of encoding those values into systems that pursue them faithfully, without gaming, shortcutting, or reinterpreting them in ways we did not intend. This is the specification problem, and it applies to trolley-problem-scale decisions with particular force.
HUMAIN OS and Autonomous Decision-Making
HUMAIN, Saudi Arabia’s national AI company, has described its HUMAIN OS platform as capable of autonomous decision-making and “understanding human intent.” These capabilities, if realized, would place HUMAIN OS squarely in the space of digital trolley problems.
Any AI system that makes autonomous decisions will inevitably face situations where its choices benefit some people at the expense of others. When HUMAIN OS allocates resources, prioritizes tasks, or makes recommendations that affect human welfare, it is solving trolley problems — whether or not its designers frame them that way.
The questions that matter are the questions the ethics frameworks purport to answer: What values guide the system’s decisions? Whose values are they? How were they selected? Who was consulted? What oversight mechanisms exist? What happens when the system makes a decision that causes harm?
For a system developed in Saudi Arabia, governed by Saudi law, and aligned with Saudi Arabia’s national interests, these questions take on additional dimensions. Saudi Arabia’s political system does not include democratic accountability, free press, or independent judiciary in the forms that Western AI governance frameworks assume. The value systems embedded in HUMAIN OS will reflect the priorities of its creators and sponsors, and those priorities may not align with the interests of the global populations the system is designed to serve.
This is not a criticism unique to HUMAIN. Every AI system reflects the values of its creators. Google’s AI reflects the values of a Silicon Valley advertising company. OpenAI’s AI reflects the values of a San Francisco technology startup. China’s AI systems reflect the values of the Chinese Communist Party. The trolley problem is always, at bottom, a question about power: who has the power to decide, and whose interests does that decision serve?
Beyond the Trolley: Toward Real-World AI Ethics
The trolley problem, for all its usefulness as a thought experiment, has limitations as a framework for real-world AI ethics. Real-world AI decisions are rarely binary choices between clearly defined outcomes. They involve uncertainty, incomplete information, cascading consequences, and trade-offs that cannot be reduced to a simple calculus of lives saved and lost.
Real-world AI ethics must move beyond trolley-problem thinking in several ways.
First, from dramatic choices to systemic effects. The most morally significant AI decisions are not the rare, dramatic, life-or-death choices that dominate public discussion. They are the mundane, repeated, cumulative decisions that shape lives over time: who gets a loan, who gets a job interview, who gets released on bail, whose content gets amplified. These decisions rarely kill anyone directly, but their aggregate effect on human welfare is enormous.
Second, from individual decisions to institutional design. Rather than asking what an AI system should do in a specific trolley-problem scenario, we should ask what institutional structures ensure that AI decision-making is accountable, transparent, and aligned with democratic values. The ethics frameworks are a start, but enforcement and oversight are what matter.
Third, from engineering solutions to political ones. The trolley problem frames ethics as an engineering challenge: program the right values into the system. But the real challenge is political: whose values, chosen by whom, through what process, accountable to whom? These are questions about governance, not code.
The digital trolley problem is real. AI systems are making life-or-death decisions, and those decisions embody values that someone chose. But the most important ethical work is not solving the trolley problem. It is building the institutions, laws, and democratic processes that determine who gets to program the trolley in the first place.