人工智能与超级智能的哲学与伦理期末结课报告:背景意义、研究现状、思考与展望
人工智能的本体论与道德主体性研究
该组文献集中探讨人工智能的哲学基础,包括意识、意向性、思维的定义,以及机器是否具备道德主体性(Moral Agency)的理论界定与哲学争论。
- The problem of machine ethics in artificial intelligence(Rajakishore Nath, Vineet Sahu, 2017, AI & SOCIETY)
- From Human Mind to Artificial Intelligence: Advancing AI Value Alignment Through Psychological Theories(J Shaoxiong, L Chao, 2025, Journal of Psychological Science)
- A Theological Account of Artificial Moral Agency(Ximian Xu, 2023, Studies in Christian Ethics)
- Philosophical foundations of artificial consciousness(Ron Chrisley, 2008, Artificial Intelligence in Medicine)
- A perceived moral agency scale: Development and validation of a metric for humans and social machines(J. Banks, 2019, Computers in Human Behavior)
- Artificial agency, consciousness, and the criteria for moral agency: what properties must an artificial agent have to be a moral agent?(K. Himma, 2009, Ethics and Information Technology)
- Implementations in Machine Ethics(Suzanne Tolmeijer, Markus Kneer, Cristina Sarasua, 2020, ACM Computing Surveys)
- Ethical Programming and Machine Moral Agency(Kęstutis Mosakas, 2023, Future Law, Ethics, and Smart Technologies)
- A Challenge for Machine Ethics(Ryan Tonkens, 2009, Minds and Machines)
- From Biological to Artificial Consciousness: Neuroscientific Insights and Progress(Masataka Watanabe, 2022, The Frontiers Collection)
- Artificial Moral Agents: A Survey of the Current Status(José-Antonio Cervantes, Sonia López, Luis-Felipe Rodríguez, Salvador Cervantes, Francisco Cervantes, Félix F. Ramos, 2019, Science and Engineering Ethics)
- Artificial consciousness and artificial ethics: Between realism and social relationism(S Torrance, 2020, Machine Ethics and Robot Ethics)
- Consciousness, intentionality and intelligence: some foundational issues for artificial intelligence(M. Aydede, G. Guzeldere, 2000, Journal of Experimental & Theoretical Artificial Intelligence)
- Why not Artificial Consciousness or Thought?(Richard H. Schlagel, 1999, Minds and Machines)
- Artificial Consciousness: Utopia or Real Possibility?(G. Buttazzo, 2001, Computer)
- A.I.: Artificial Intelligence as Philosophy: Machine Consciousness and Intelligence(David Gamez, 2024, The Palgrave Handbook of Popular Culture as Philosophy)
- Philosophical Analysis of Consciousness as an Intersection Point of Philosophy, Culture and Artificial Intelligence(Sanjay Kumar Tiwari, Vijay Kumar Tiwari, 2025, The Voice of Creative Research)
- When Is a Robot a Moral Agent(John P. Sullins, 2006, Machine Ethics)
- Artificial consciousness: a perspective from the free energy principle(Wanja Wiese, 2024, Philosophical Studies)
- Moral agency without responsibility? Analysis of three ethical models of human-computer interaction in times of artificial intelligence (AI)(Alexis Fritz, Wiebke Brandt, Henner Gimpel, S. Bayer, 2020, De Ethica)
- Artificial Consciousness or Artificial Intelligence(Florin Spanache, 2017, DIALOGO)
- A philosophical and technical view of artificial consciousness(Andrey Shcherbakov, Artem Uryadov, 2024, Wearable Technology)
- Perspectives about artificial moral agents(Andreia Martinho, Adam Poulsen, M. Kroesen, C. Chorus, 2021, AI and Ethics)
算法公正、偏见治理与社会伦理影响
该组文献重点分析算法在决策中的偏见、歧视及其社会不平等影响,探讨算法正义、数据公正以及如何通过伦理框架与社会技术协作实现负责任的AI部署。
- Algorithmic Bias and Data Justice: ethical challenges in Artificial Intelligence Systems(Javier González-Argote, E. Maldonado, Karina Maldonado, 2025, EthAIca)
- Algorithmic bias, fairness, and inclusivity: a multilevel framework for justice-oriented AI(P. Panarese, Marta Grasso, C. Solinas, 2025, AI & SOCIETY)
- Ethical Implications of Bias in Machine Learning(Adrienne Yapo, Joseph W. Weiss, 2018, Proceedings of the Annual Hawaii International Conference on System Sciences)
- Algorithmic Bias and Access to Opportunities(Lisa Herzog, 2021, Oxford Handbook of Digital Ethics)
- Embedding AI in society: ethics, policy, governance, and impacts(Michael Pflanzer, Veljko Dubljević, William A. Bauer, D. Orcutt, G. List, Munindar P. Singh, 2023, AI & SOCIETY)
- Algorithmic justice and ethical governance in artificial intelligence: a conceptual insight and further research suggestions(ES Asamoah, STG Doku, S Koomson, 2026, … Journal of Ethics and …)
- Searching for Inclusive Artificial Intelligence for Social Good: Participatory Governance and Policy Recommendations for Making AI More Inclusive and Benign for Society(M. Moon, 2023, Public Administration Review)
- AI in Governance and Policy Making(Ashish K Saxena, 2024, International Journal of Science and Research (IJSR))
- Why human–AI relationships need socioaffective alignment(HR Kirk, I Gabriel, C Summerfield, B Vidgen, 2025, Humanities and Social …)
- Human-centricity in AI governance: A systemic approach(Anton Sigfrids, J. Leikas, Henrikki Salo-Pöntinen, Emmi Koskimies, 2023, Frontiers in Artificial Intelligence)
- AI Governance Needs Sociotechnical Expertise(Serena Oduro, Tamara Kneese, 2024, Data and Society, available at: Link to the cited …)
- The ethics of algorithms: key problems and solutions(Andreas Tsamados, Nikita Aggarwal, Josh Cowls, J. Morley, Huw Roberts, M. Taddeo, L. Floridi, 2020, AI & SOCIETY)
- An Overview of Artificial Intelligence Ethics(Changwu Huang, Zeqi Zhang, Bifei Mao, X. Yao, 2023, IEEE Transactions on Artificial Intelligence)
- Artificial intelligence in governance: recent trends, risks, challenges, innovative frameworks and future directions(Arjun Ghosh, Ankit Saini, Himanshu Barad, 2025, AI & SOCIETY)
- The ethical imperative of algorithmic fairness in AI-enabled hiring: a critical analysis of bias, accountability, and justice(Jason Law, 2025, AI and Ethics)
- Algorithmic bias: Senses, sources, solutions(Sina Fazelpour, D. Danks, 2021, Philosophy Compass)
- Towards a Code of Ethics for Artificial Intelligence(P. Boddington, 2017, Artificial Intelligence: Foundations, Theory, and Algorithms)
- Societal and ethical impacts of artificial intelligence: Critical notes on European policy frameworks(Lucia Vesnić-Alujević, Susana Nascimento, Alexandre Pólvora, 2020, Telecommunications Policy)
- Beyond the individual: governing AI's societal harm(NA Smuha, 2021, Internet Policy Review)
- Theorising Algorithmic Justice(O. Marjanovic, D. Cecez-Kecmanovic, R. Vidgen, 2021, European Journal of Information Systems)
- Beyond bias and discrimination: redefining the AI ethics principle of fairness in healthcare machine-learning algorithms(B. Giovanola, S. Tiribelli, 2022, AI & SOCIETY)
- Disambiguating Algorithmic Bias: From Neutrality to Justice(Elizabeth Edenberg, Alexandra Wood, 2023, Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society)
- Beyond the algorithm: applying critical lenses to AI governance and societal change(Mohammed Hassen, 2025, AI and Ethics)
- Algorithmic injustice: a relational ethics approach(Abeba Birhane, 2021, Patterns)
- Ethics of artificial intelligence in global health: Explainability, algorithmic bias and trust.(A. Kerasidou, 2021, Journal of Oral Biology and Craniofacial Research)
人工智能价值对齐与人机协作机制
该组文献聚焦于如何将人类价值观嵌入AI系统,研究价值对齐的技术路径(如RLHF)、度量方法、协作框架以及在医疗等特定领域的实践挑战。
- From Principle to Practice: Value Alignment in AI Ethics and Governance(Jianfeng Cao, 2025, German Law Journal)
- Making moral machines: why we need artificial moral agents(Paul Formosa, M. Ryan, 2020, AI & SOCIETY)
- A Case for Machine Ethics in Modeling Human-Level Intelligent Agents(R. Boyles, 2018, Kritike: An Online Journal of Philosophy)
- Understanding the Process of Human-AI Value Alignment(J. McKinlay, M. Vos, J. Hoffmann, A. Theodorou, 2025, Journal of Artificial Intelligence Research)
- Artificial superintelligence alignment in healthcare(D. Ueda, S. Walston, Ryo Kurokawa, T. Saida, Maya Honda, M. Iima, Tadashi Watabe, Masahiro Yanagawa, Kentaro Nishioka, K. Sofue, Akihiko Sakata, S. Sugawara, M. Kawamura, Rintaro Ito, Koji Takumi, S. Oda, Kenji Hirata, Satoru Ide, Shinji Naganawa, 2025, Japanese Journal of Radiology)
- Critiquing the Reasons for Making Artificial Moral Agents(Aimee van Wynsberghe, Scott Robbins, 2018, Science and Engineering Ethics)
- Human-AI Interaction Alignment: Designing, Evaluating, and Evolving Value-Centered AI For Reciprocal Human-AI Futures(Hua Shen, Tiffany Knearem, Divy Thakkar, Pat Pataranutaporn, Anoop K. Sinha, Yike Shi, Jenny T. Liang, L. Ahmad, Tanushree Mitra, Brad A. Myers, Yang Li, 2025, Proceedings of the Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems)
- Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis(Aran Nayebi, 2025, Proceedings of the AAAI Conference on Artificial Intelligence)
- Towards friendly AI: a comprehensive review and new perspectives on human-AI alignment(Qiyang Sun, Yupei Li, Emran Alturki, Sunil Munthumoduku Krishna Murthy, Bjorn W. Schuller, 2026, AI and Ethics)
- ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs(Hua Shen, Tiffany Knearem, Reshmi Ghosh, Yu-Ju Yang, Nicholas Clark, Tanushree Mitra, Yun Huang, 2024, Proceedings of the 9th Widening NLP Workshop)
- Human Value Alignment in AI(Ilias O. Pappas, Polyxeni Vassilakopoulou, 2025, Handbook of Human-Centered Artificial Intelligence)
- Measuring Human-AI Value Alignment in Large Language Models(Hakim Norhashim, Jungpil Hahn, 2024, Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society)
- Exploring ‘Value Alignment’: A Genealogy and Three Conceptions(Daniel López-Castro, 2026, Law, Governance and Technology Series)
超级智能风险、治理与存在性哲学反思
该组文献探讨超智能(ASI)带来的存在性风险、文明崩溃威胁,以及宏观治理模型、跨学科(神学、未来学)视角下的AI伦理规范与人类未来意义。
- Deep ASI Literacy: Educating for Alignment with Artificial Super Intelligent Systems(Nicolas J. Tanchuk, 2025, Educational Theory)
- Existential risk from AI and orthogonality: Can we have it both ways?(V. C. Müller, M. Cannon, 2021, Ratio)
- Ethically Aligned Design in Autonomous and Intelligent Systems: An Overview(Andrew Burnside, Emerson Bodde, 2025, 2025 IEEE International Symposium on Ethics in Engineering, Science, and Technology (ETHICS))
- Aligning artificial intelligence with human values: reflections from a phenomenological perspective(Shengnan Han, Eugene Kelly, Shahrokh Nikou, Eric-Oluf Svee, 2021, AI & SOCIETY)
- The Notion of Existential Risk and Its Role for the Anticipation of Technological Development’s Long-Term Impact(Roberto Paura, 2019, Anticipation Science)
- The Pursuit of Human Existential Significance from the Perspective of AI Existential Risk(S Lei, 2021, 信阳师范大学学报(哲学社会科学版))
- Ethics, Governance, and Policies in Artificial Intelligence(L Floridi, 2021, Philosophical Studies Series)
- From posthumanism to ethics of artificial intelligence(Rajakishore Nath, Riya Manna, 2021, AI & SOCIETY)
- Artificial Intelligence and Public Values: Value Impacts and Governance in the Public Sector(Yu-Che Chen, Michael Ahn, Yi-Fan Wang, 2023, Sustainability)
- AI Governance: A Challenge for Public Health(Jennifer K. Wagner, Megan Doerr, Cason D. Schmit, 2024, JMIR Public Health and Surveillance)
- Reconstruction of the Ethics of Artificial Intelligence Development in Islamic Philosophy and Muhammadiyah Thought(Mazwar Ismiyanto, S. Anif, Harun Joko Prayitno, Ahmad Muhibbin, Dian Artha Kusumaningtyas, Trisakti Handayani, 2026, Jurnal Penelitian Sains Teknologi)
- Philosophy of Artificial Intelligence(Jerry Kaplan, 2016, Artificial Intelligence)
- Ethical Considerations in Artificial Intelligence Courses(Emanuelle Burton, J. Goldsmith, Sven Koenig, B. Kuipers, Nicholas Mattei, T. Walsh, 2017, AI Magazine)
- Cybertheology and the Ethical Dimensions of Artificial Superintelligence: A Theological Inquiry into Existential Risks(T. Peters, 2024, Khazanah Theologia)
- Existential risks: a philosophical analysis(Phil Torres, 2019, Inquiry)
- Machine Intelligence, Artificial General Intelligence, Super-Intelligence, and Human Dignity(Ted Peters, 2025, Religions)
- The Existential Threats of AI(Robert Samuels, 2025, The Global Solution to AI)
- Superintelligence, heuristics and embodied threats(A. Mastrogiorgio, Riccardo Palumbo, 2025, Mind & Society)
- THE PROBLEM OF ARTIFICIAL INTELLIGENCE AND CONSCIOUSNESS AS ONE OF THE PRIORITY DIRECTIONS OF CONTEMPORARY PHILOSOPHY(R. Aliyev, 2025, Philosophy and Governance)
- Extraterrestrial Artificial Intelligence: The Final Existential Risk?(W. Naudé, 2023, SSRN Electronic Journal)
- The Ethics of Artificial Intelligence for the Sustainable Development Goals(F Mazzi, L Floridi, 2023, Philosophical Studies Series)
- The ethics of creating artificial superintelligence: a global risk perspective(J. Dessureault, R. Lamontagne, Pierre-Olivier Parisé, 2025, AI and Ethics)
- Ethics of Artificial Intelligence(Francisco Lara, Jan Deckers, 2023, The International Library of Ethics, Law and Technology)
- AI governance: a systematic literature review(Amna Batool, Didar Zowghi, Muneera Bano, 2025, AI and Ethics)
- The state as a model for AI control and alignment(Micha Elsner, 2024, AI & SOCIETY)
- Introduction to the Ethics of Artificial Intelligence(David J. Gunkel, 2024, Handbook on the Ethics of Artificial Intelligence)
本报告将人工智能的哲学与伦理研究系统划分为四个维度:本体论层面的机器意识与道德主体性、社会层面的算法公正与治理、工程层面的价值对齐与人机协作、以及宏观层面的超级智能存在性风险与哲学反思。这一分类框架涵盖了从技术实现到社会规范再到人类文明未来的全方位探讨,为期末结课报告提供了清晰的学术逻辑支撑。
总计87篇相关文献
… ethics task of ensuring ethical behaviour of an artificial agent. Although, there are many philosophical issues related to artificial intelligence… is to discuss, first, whether ethics is the sort of …
… the topic of philosophical aspects of the ethics of artificial intelligence, a good place to start is with the concept of artificial intelligence itself. What exactly is artificial intelligence? It is a …
Artificial intelligence (AI) has profoundly changed and will continue to change our lives. AI is being applied in more and more fields and scenarios such as autonomous driving, medical care, media, finance, industrial robots, and internet services. The widespread application of AI and its deep integration with the economy and society have improved efficiency and produced benefits. At the same time, it will inevitably impact the existing social order and raise ethical concerns. Ethical issues, such as privacy leakage, discrimination, unemployment, and security risks, brought about by AI systems have caused great trouble to people. Therefore, AI ethics, which is a field related to the study of ethical issues in AI, has become not only an important research topic in academia, but also an important topic of common concern for individuals, organizations, countries, and society. This article will give a comprehensive overview of this field by summarizing and analyzing the ethical risks and issues raised by AI, ethical guidelines and principles issued by different organizations, approaches for addressing ethical issues in AI, and methods for evaluating the ethics of AI. Additionally, challenges in implementing ethics in AI and some future perspectives are pointed out. We hope our work will provide a systematic and comprehensive overview of AI ethics for researchers and practitioners in this field, especially the beginners of this research discipline.
What is the philosophy of AI? You might wonder why a field like AI seems to attract so much controversy. After all, other engineering disciplines—such as civil, mechanical, or electrical engineering—aren’t typically the target of vociferous criticism from various branches of the humanities. Largely,...
This study aims to analyze the perspectives of al-Islam and Muhammadiyah on the development of Artificial Intelligence (AI) within the framework of ethics, epistemology, and the concept of blessing (barakah). The research employs a qualitative approach using a library research method, through the analysis of literature on Islamic philosophy, Muhammadiyah thought, and studies on technology ethics and AI. The data were analyzed using content analysis and hermeneutic techniques to identify normative principles relevant to responding to AI development. The findings indicate that Islam views technology as an instrument of the human mandate of khalifah (vicegerency), which must be directed toward public welfare (maslahah), justice, and balance between worldly life and the hereafter. The concept of Islamic ethics including tawhid, adl, moral character (akhlaq), and social responsibility serves as the normative foundation for evaluating and utilizing AI. Knowledge (ilm) is understood as a religious obligation that is not morally neutral; therefore, AI development must be oriented toward truth and benefit. Meanwhile, the concept of blessing (barakah) emphasizes sustainability and the spiritual dimension in the use of technology. From the Muhammadiyah perspective, the integration of religion and science, the strengthening of education, and community empowerment constitute the primary principles in AI development. AI is positioned as a means of civilizational renewal that must be guided by ethical values to prevent injustice or dehumanization. Thus, al-Islam and Muhammadiyah offer an integrative and normative philosophical framework for directing AI development in a responsible, just, and spiritually meaningful manner.
… far more by our ethical and philosophical landscape than by … the years by ethicists and philosophers, working in areas that … can impact on AI and the ethical dilemmas that it is likely to …
The recent surge in interest in ethics in artificial intelligence may leave many educators wondering how to address moral, ethical, and philosophical issues in their AI courses. As instructors we want to develop curriculum that not only prepares students to be artificial intelligence practitioners, but also to understand the moral, ethical, and philosophical impacts that artificial intelligence will have on society. In this article we provide practical case studies and links to resources for use by AI educators. We also provide concrete suggestions on how to integrate AI ethics into a general artificial intelligence course and how to teach a stand-alone artificial intelligence ethics course.
… Philosophical Studies Series aims to provide a forum for the best current research in contemporary philosophy … illuminating ways of addressing philosophical questions and …
… our journey from the genealogical traces of posthumanistic movement and its influence on the contemporary philosophy. Later, this will help us to explore its’ compatibility with AI ethics. …
… Philosophical Studies Series aims to provide a forum for the best current research in contemporary philosophy … illuminating ways of addressing philosophical questions and …
… Finally, the third part considers possible contributions to the ethics of AI from other … philosophy of science. So, the first question we have to ask ourselves is the very meaning of AI ethics. …
The emergence of Artificial Superintelligence (ASI) in healthcare presents unprecedented opportunities for revolutionizing diagnostics, treatment planning, and population health management, but also introduces critical risks if these systems are not properly aligned with human values and clinical objectives. This review examines the theoretical foundations of ASI and the alignment problem in healthcare contexts, exploring how misaligned Artificial Intelligence (AI) systems could optimize for wrong objectives or pursue harmful strategies leading to patient harm and systemic failures. Current challenges in AI alignment are illustrated through real-world examples from radiology and clinical decision-making, where algorithms have demonstrated concerning biases, generalizability failures, and optimization for inappropriate proxy measures. The paper analyzes key alignment challenges including objective complexity and technical pitfalls, bias and fairness issues in healthcare data, ethical integration concerns involving compassion and patient autonomy, and system-level policy challenges around regulation and liability. Technical alignment strategies are discussed including reinforcement learning from human feedback, interpretability requirements, formal verification methods, and adversarial testing approaches. Normative alignment solutions encompass ethical frameworks, professional standards, patient engagement protocols, and multi-level governance structures spanning institutional, national, and international coordination. The review emphasizes that successful ASI alignment in healthcare requires combining cutting-edge AI research with fundamental medical ethics, noting that while proper alignment could enable transformative health improvements and medical breakthroughs, misalignment risks undermining the core purpose of medicine. The stakes of this alignment challenge are characterized as among the highest in both technology and ethics, with implications extending from individual patient safety to public trust and potentially existential risks.
Much recent work in the value theory of autonomous and intelligent systems (AIS) revolves around three issues. First is the alignment problem: the problem of producing AIS whose values align with humanity's interests. Second., superintelligence: the potential for AIS to develop intelligence which would surpass even the most intelligent humans. An increasing number of authors argue that superintelligent AIS could emerge overnight because of a recursively improving process-this is the singularity hypothesis. Further., many of the same authors believe that the concatenation of these problems should direct interest to the long-term potential for misaligned and superintelligent AIS which could pose risks of existential proportions to human interests. Therefore., they argue for a policy stance which we describe as “hard alignment.,'’ the proposal of cooperation with technological experts to avoid hypothetical scenarios where AIS disempower humanity. On the other hand., we describe our view as “soft alignment.'’ Considering the lack of adequate evidence for hard alignment's radical claims., the finite resources and attention of policymakers and technological experts are best served by devoting., at best., a modest amount of time., attention., and resources towards policies regarding AIS which manage the moral risks involved in misaligned., already-existing AIS., and regarding misaligned potential AIS relative to the evidentiary basis for their possible realization. Therefore., we argue for the adoption of policies towards alignment which manage the everyday risk involved in misaligned AIS rather than long-term existential risks., which are difficult to quantify.
… lethal autonomous weapons, raise significant ethical and security concerns due to their … for comprehensive safety protocols, ethical alignment strategies, and regulatory oversight to …
Abstract As China rapidly advances in AI innovation and development, especially in frontier AI, its regulatory and ethical frameworks are under increasing pressure to ensure that technological progress aligns with human interests and societal values. This Article argues that AI value alignment—the process of ensuring AI systems act in accordance with human values, norms, and ethical principles—should be adopted as a strategic pillar in China’s evolving AI governance architecture. While China has already established a comprehensive legal, ethical, and self-regulatory landscape to address AI risks, these mechanisms often rely on reactive enforcement and external compliance. In contrast, AI value alignment offers a proactive, intrinsic approach that embeds safety and ethical constraints directly into AI systems, making them safer, more trustworthy, and responsive to human needs. This study begins by mapping China’s current AI governance landscape, including national legislation such as the Cybersecurity Law, Personal Information Protection Law, and a growing set of regulations targeted at algorithms and generative AI. It also evaluates China’s normative commitments, such as the “human-centric” and “tech for good” principles articulated in national policy documents, and the increasing role of corporate self-regulation among major technology firms. While commendable in scope and ambition, these governance mechanisms often fall short in ensuring that AI behavior aligns with safety constraints and ethical intent—particularly when AI systems (such as agentic AI) become more autonomous and capable. This gap highlights the urgent need for a systematic value alignment strategy. The Article then delves into the conceptual and technical foundations of AI value alignment, identifying both engineering challenges—such as reward misspecification, data bias, and model deception—and normative dilemmas, including moral pluralism, value aggregation, and dynamic ethics. Special attention is paid to frontier models like large language models and artificial general intelligence (AGI), which pose alignment challenges at a scale previously unseen. Drawing on contemporary alignment techniques such as RLHF (Reinforcement Learning from Human Feedback) and principle-based alignment, such as Anthropic’s Constitutional AI, the Article explores their limitations and calls for a more diversified, interdisciplinary, and forward-looking alignment research agenda. Finally, the Article offers a roadmap for operationalizing AI value alignment across three key governance domains: Law and regulation, ethical norms, and industry self-regulation. Recommendations include the incorporation of alignment assessments into regulatory filings, the development of technical standards for value alignment and ethics-by-design guidelines, and institutional investments in safety and alignment research. The Article concludes by asserting that value alignment is not merely a technical safeguard but a governance imperative for the age of autonomous AI and agentic AI. By integrating alignment into its AI governance strategy, China can not only enhance domestic safety and public trust but also better coordinate with global AI ethics and safety initiatives—ultimately contributing to the shared goal of human-aligned and beneficial artificial intelligence.
Artificial intelligence companies and researchers are currently working to create Artificial Superintelligence (ASI): AI systems that significantly exceed human problem‐solving speed, power, and precision across the full range of human solvable problems. Some have claimed that achieving ASI — for better or worse — would be the most significant event in human history and the last problem humanity would need to solve. In this essay Nicolas Tanchuk argues that current AI literacy frameworks and educational practices are inadequate for equipping the democratic public to deliberate about ASI design and to assess the existential risks of such technologies. He proposes that a systematic educational effort toward what he calls “Deep ASI Literacy” is needed to democratically evaluate possible ASI futures. Deep ASI Literacy integrates traditional AI literacy approaches with a deeper analysis of the axiological, epistemic, and ontological questions that are endemic to defining and risk‐assessing pathways to ASI. Tanchuk concludes by recommending research aimed at identifying the assets and needs of educators across educational systems to advance Deep ASI Literacy.
Debates about the development of artificial superintelligence and its potential threats to humanity tend to assume that such a system would be historically unprecedented, and that its behavior must be predicted from first principles. I argue that this is not true: we can analyze multiagent intelligent systems (the best candidates for practical superintelligence) by comparing them to states, which also unite heterogeneous intelligences to achieve superhuman goals. States provide a model for several problems discussed in the literature on superintelligence, such as principal-agent problems and Instrumental Convergence. Philosophical arguments about governance, therefore, provide possible solutions to these problems, or point out problems in previously suggested solutions. In particular, the liberal concept of checks and balances, and Hannah Arendt’s concept of legitimacy, describe how state behavior is constrained by the preferences of constituents that could also apply to artificial systems. However, they also point out ways in which present-day computational developments could destabilize the international order by reducing the number of decision-makers involved in state actions. Thus, interstate competition not only serves as a model for the behavior of dangerous computational intelligences but also as the impetus for their development.
… The origin of the value alignment concept is deeply intertwined with the notion of superintelligence. Superintelligence, in turn, cannot be understood without its connection to the …
Our temptation to personify machine intelligence is not unexpected. As a child we named our dolls and took our Teddy Bear to bed with us. Today we ask death bots to comfort us with post-mortem conversation. All the while we know this to be pretend. Yet we must ask: if Artificial General Intelligence (AGI) or even Artificial Super-Intelligence (ASI) become available, will our game of pretend continue? Or will intelligent robots actually become selves deserving of dignity that hitherto could be ascribed only to human persons? If government-imposed guardrails shut the door on development of AGI and ASI in order to preserve human safety and even dignity, we might never learn whether AGI or ASI could develop selfhood, personhood, virtue, or religious sensibilities. As we approach the future, can we live without knowing whether AGI or ASI would be capable of developing selfhood and commanding dignity?
Purpose: This study explores the role of cybertheology in addressing the ethical and societal challenges posed by Artificial Superintelligence (ASI), which has the potential to surpass human cognitive capabilities, heralding a profound cultural and existential crisis. It integrates theological anthropology to assess the implications of a posthuman future. Methodology: Utilising a comprehensive literature review, the research examines technological, philosophical, and theological perspectives through primary and secondary sources, including influential works by futurists and ethicists. The methodology aims to uncover the nuanced discourse surrounding the development of ASI and its potential impacts. Findings: The analysis reveals a narrative marked by speculative optimism and significant existential concerns regarding ASI. A critical gap in the existing ethical discourse is identified, highlighting the necessity for a grounded ethical framework that addresses the profound implications of superintelligent entities on human dignity and societal norms. Research Implications: The findings emphasise the urgent need to incorporate robust ethical considerations into the development and deployment of ASI. Cybertheology is presented as a vital framework for ensuring that ASI technologies align with human values and theological insights, thus providing a valuable lens through which to view the integration of superintelligence into society. Originality/Value: This paper contributes to academic and policy discussions on ASI by promoting cybertheology as a crucial perspective in ethical deliberations. It enriches scholarly dialogues by linking technological advancements with theological and ethical evaluations, proposing that cybertheology can play a pivotal role in shaping policies that govern ASI technologies. This approach ensures that technological progress is compatible with humanistic values, fostering a holistic understanding of ASI's potential impact on humanity.
… Kantian artificial moral agents. Specifically, the sort of AMAs under issue are machines that … out to meet different standards for moral agency, then so much the better for Machine Ethics. …
… 731) argue in response that we will not learn more about morality through machine ethics but only through studying human psychology. While it might be true that we can learn much …
… When that agency3 causes harm or good in a moral sense, we can say the machine has moral agency. Autonomy thus described is not sufficient in itself to ascribe moral agency. …
The pursuit of AMAs is complicated. Disputes about the development, design, moral agency, and future projections for these systems have been reported in the literature. This empirical study explores these controversial matters by surveying (AI) Ethics scholars with the aim of establishing a more coherent and informed debate. Using Q-methodology, we show the wide breadth of viewpoints and approaches to artificial morality. Five main perspectives about AMAs emerged from our data and were subsequently interpreted and discussed: (i) Machine Ethics: The Way Forward; (ii) Ethical Verification: Safe and Sufficient; (iii) Morally Uncertain Machines: Human Values to Avoid Moral Dystopia; (iv) Human Exceptionalism: Machines Cannot Moralize; and (v) Machine Objectivism: Machines as Superior Moral Agents. A potential source of these differing perspectives is the failure of Machine Ethics to be widely observed or explored as an applied ethic and more than a futuristic end. Our study helps improve the foundations for an informed debate about AMAs, where contrasting views and agreements are disclosed and appreciated. Such debate is crucial to realize an interdisciplinary approach to artificial morality, which allows us to gain insights into morality while also engaging practitioners.
… of agency, natural agency, artificial agency, and moral agency, as well as articulate what are widely taken to be the criteria for moral agency… whether a machine is a moral agent are well …
This article seeks to explore the idea of artificial moral agency from a theological perspective. By drawing on the Reformed theology of archetype-ectype, it will demonstrate that computational artefacts are the ectype of human moral agents and, consequently, have a partial moral agency. In this light, human moral agents mediate and extend their moral values through computational artefacts, which are ontologically connected with humans and only related to limited particular moral issues. This moral leitmotif opens up a way to deploy carebots into Christian pastoral care while maintaining the human agent's uniqueness and responsibility in pastoral caregiving practices.
Increasingly complex and autonomous systems require machine ethics to maximize the benefits and minimize the risks to society arising from the new technology. It is challenging to decide which type of ethical theory to employ and how to implement it effectively. This survey provides a threefold contribution. First, it introduces a trimorphic taxonomy to analyze machine ethics implementations with respect to their object (ethical theories), as well as their nontechnical and technical aspects. Second, an exhaustive selection and description of relevant works is presented. Third, applying the new taxonomy to the selected works, dominant research patterns, and lessons for the field are identified, and future directions for research are suggested.
Many industry leaders and academics from the field of machine ethics would have us believe that the inevitability of robots coming to have a larger role in our lives demands that robots be endowed with moral reasoning capabilities. Robots endowed in this way may be referred to as artificial moral agents (AMA). Reasons often given for developing AMAs are: the prevention of harm, the necessity for public trust, the prevention of immoral use, such machines are better moral reasoners than humans, and building these machines would lead to a better understanding of human morality. Although some scholars have challenged the very initiative to develop AMAs, what is currently missing from the debate is a closer examination of the reasons offered by machine ethicists to justify the development of AMAs. This closer examination is especially needed because of the amount of funding currently being allocated to the development of AMAs (from funders like Elon Musk) coupled with the amount of attention researchers and industry leaders receive in the media for their efforts in this direction. The stakes in this debate are high because moral robots would make demands on society; answers to a host of pending questions about what counts as an AMA and whether they are morally responsible for their behavior or not. This paper shifts the burden of proof back to the machine ethicists demanding that they give good reasons to build AMAs. The paper argues that until this is done, the development of commercially available AMAs should not proceed further.
This paper focuses on the research field of machine ethics and how it relates to a technological singularity—a hypothesized, futuristic event where artificial machines will have greater-than-human-level intelligence. One problem related to the singularity centers on the issue of whether human values and norms would survive such an event. To somehow ensure this, a number of artificial intelligence researchers have opted to focus on the development of artificial moral agents, which refers to machines capable of moral reasoning, judgment, and decision-making. To date, different frameworks on how to arrive at these agents have been put forward. However, there seems to be no hard consensus as to which framework would likely yield a positive result. With the body of work that they have contributed in the study of moral agency, philosophers may contribute to the growing literature on artificial moral agency. While doing so, they could also think about how the said concept could affect other important philosophical concepts.
Abstract Although current social machine technology cannot fully exhibit the hallmarks of human morality or agency, popular culture representations and emerging technology make it increasingly important to examine human interlocutors’ perception of social machines (e.g., digital assistants, chatbots, robots) as moral agents. To facilitate such scholarship, the notion of perceived moral agency (PMA) is proposed and defined, and a metric developed and validated through two studies: (1) a large-scale online survey featuring potential scale items and concurrent validation metrics for both machine and human targets, and (2) a scale validation study with robots presented as variably agentic and moral. The PMA metric is shown to be reliable, valid, and exhibiting predictive utility.
Philosophical and sociological approaches in technology have increasingly shifted toward describing AI (artificial intelligence) systems as ‘(moral) agents,’ while also attributing ‘agency’ to them. It is only in this way – so their principal argument goes – that the effects of technological components in a complex human-computer interaction can be understood sufficiently in phenomenological-descriptive and ethical-normative respects. By contrast, this article aims to demonstrate that an explanatory model only achieves a descriptively and normatively satisfactory result if the concepts of ‘(moral) agent’ and ‘(moral) agency’ are exclusively related to human agents. Initially, the division between symbolic and sub-symbolic AI, the black box character of (deep) machine learning, and the complex relationship network in the provision and application of machine learning are outlined. Next, the ontological and action-theoretical basic assumptions of an ‘agency’ attribution regarding both the current teleology-naturalism debate and the explanatory model of actor network theory are examined. On this basis, the technical-philosophical approaches of Luciano Floridi, Deborah G. Johnson, and Peter-Paul Verbeek will all be critically discussed. Despite their different approaches, they tend to fully integrate computational behavior into their concept of ‘(moral) agency.’ By contrast, this essay recommends distinguishing conceptually between the different entities, causalities, and relationships in a human-computer interaction, arguing that this is the only way to do justice to both human responsibility and the moral significance and causality of computational behavior.
… that is known by a number of names: machine ethics, machine morality, artificial morality, … ethical and moral agents according to the strategies and criteria used to deal with ethical …
Teisės fakultetas / Faculty of Law
Abstract This paper offers a critical review on conditions and impacts of AI/ML in society, with a dedicated overview of the European AI policy framework. Through the analysis of policy papers produced by European institutions, European national governments and other organisations situated between research and policy-making, we bring an overarching outlook of key ethical and societal issues currently under discussion at the intersection of European policy agendas and recent literature on the topic. Our findings show that 21 analysed documents look both at individual and societal impacts, with their understanding generally aligned in calls for more responsibility, accountability, transparency, safety or trust. Furthermore, our findings also point to the necessity of more integrated approaches between governments, industry and academia stakeholders, and above all, to the need of applied multidisciplinary frameworks, supported by both anticipatory outlooks and public engagement exercises able to tackle the often excessive technicality of the debate.
… Both academic research and practical evidence have often compellingly predicted and suggested AI's potential impact on the labor market, industry, and services, as well as the risks …
… Dubljević 2022) that we should consider the societal impact of AI implementation in the context of ethical values. Unsurprisingly, ethical principles of AI are a major theme for many of the …
… key will be implementing AI governance practices that employ … humanities and social science expertise into AI governance. We … impact assessments or other AI assessments,10 craft AI …
… AI systems. This paper seeks to establish a theoretical framework for analyzing AI governance … on their relevance to the societal and ethical implications of AI, their impact on public trust, …
Human-centricity is considered a central aspect in the development and governance of artificial intelligence (AI). Various strategies and guidelines highlight the concept as a key goal. However, we argue that current uses of Human-Centered AI (HCAI) in policy documents and AI strategies risk downplaying promises of creating desirable, emancipatory technology that promotes human wellbeing and the common good. Firstly, HCAI, as it appears in policy discourses, is the result of aiming to adapt the concept of human-centered design (HCD) to the public governance context of AI but without proper reflection on how it should be reformed to suit the new task environment. Second, the concept is mainly used in reference to realizing human and fundamental rights, which are necessary, but not sufficient for technological emancipation. Third, the concept is used ambiguously in policy and strategy discourses, making it unclear how it should be operationalized in governance practices. This article explores means and approaches for using the HCAI approach for technological emancipation in the context of public AI governance. We propose that the potential for emancipatory technology development rests on expanding the traditional user-centered view of technology design to involve community- and society-centered perspectives in public governance. Developing public AI governance in this way relies on enabling inclusive governance modalities that enhance the social sustainability of AI deployment. We discuss mutual trust, transparency, communication, and civic tech as key prerequisites for socially sustainable and human-centered public AI governance. Finally, the article introduces a systemic approach to ethically and socially sustainable, human-centered AI development and deployment.
Abstract The rapid evolution of artificial intelligence (AI) is structuralizing social, political, and economic determinants of health into the invisible algorithms that shape all facets of modern life. Nevertheless, AI holds immense potential as a public health tool, enabling beneficial objectives such as precision public health and medicine. Developing an AI governance framework that can maximize the benefits and minimize the risks of AI is a significant challenge. The benefits of public health engagement in AI governance could be extensive. Here, we describe how several public health concepts can enhance AI governance. Specifically, we explain how (1) harm reduction can provide a framework for navigating the governance debate between traditional regulation and “soft law” approaches; (2) a public health understanding of social determinants of health is crucial to optimally weigh the potential risks and benefits of AI; (3) public health ethics provides a toolset for guiding governance decisions where individual interests intersect with collective interests; and (4) a One Health approach can improve AI governance effectiveness while advancing public health outcomes. Public health theories, perspectives, and innovations could substantially enrich and improve AI governance, creating a more equitable and socially beneficial path for AI development.
… Key issues in AI governance that require careful consideration include data privacy, algorithmic bias, transparency, accountability, and the potential impact of AI on human rights and …
… protect societal interests that are adversely impacted by AI. By conceptualising AI’s societal harm… While the societal impact of AI systems is increasingly discussed—particularly under the …
As artificial intelligence (AI) transforms a wide range of sectors and drives innovation, it also introduces different types of risks that should be identified, assessed, and mitigated. Various AI governance frameworks have been released recently by governments, organizations, and companies to mitigate risks associated with AI. However, it can be challenging for AI stakeholders to have a clear picture of the available AI governance frameworks, tools, or models and analyze the most suitable one for their AI system. To fill the gap, we present the literature to answer key questions: WHO is accountable for AI systems’ governance, WHAT elements are being governed, WHEN governance occurs within the AI development life cycle, and HOW it is implemented through frameworks, tools, policies, or models. Adopting the systematic literature review (SLR) methodology, this study meticulously searched, selected, and analyzed 28 articles, offering a foundation for understanding different facets of AI governance. The analysis is further enhanced by categorizing artifacts of AI governance under team-level governance, organization-level governance, industry-level governance, national-level governance, and international-level governance. The findings of this study on existing AI governance solutions can assist research communities in proposing comprehensive AI governance practices.
While there has been growth in the literature exploring the governance of artificial intelligence (AI) and recognition of the critical importance of guiding public values, the literature lacks a systematic study focusing on public values as well as the governance challenges and solutions to advance these values. This article conducts a systematic literature review of the relationships between the public sector AI and public values to identify the impacts on public values and the governance challenges and solutions. It further explores the perspectives of U.S. government employees on AI governance and public values via a national survey. The results suggest the need for a broad inclusion of diverse public values, the salience of transparency regarding several governance challenges, and the importance of stakeholder participation and collaboration as governance solutions. This article also explores and reports the nuances in these results and their practical implications.
… Besides ethical issues, the impact of AI on improving public participation in governance is … In general, the study widens our understanding of AI effect on labor market and social policies, …
… most promising avenue toward artificial consciousness (AC), … theoretical possibility of artificial consciousness is unfounded… in accounting for or reproducing consciousness. This is done …
… PHILOSOPHICAL VIEWS OF SELF-AWARENESS From a purely philosophical perspective, we cannot verify the presence of consciousness in another brain, either human or artificial, …
… it as just another misguided philosophical puzzle. We are not conscious for the most part even … world appears completely detached from our perceptual consciousness, as if our physical …
… problems in artificial intelligence, do take philosophical problems … of consciousness. Donald Perlis’s papers build a case that … The field of “artificial consciousness” (AC) is practically …
The article reflects various approaches of philosophy and programming to methods for solving the technical problem of creating and software implementation of artificial consciousness (AC). Various purposes of creation and basic approaches to determining the nature of AC are described. To solve the problem of creating an AC, an architecture is proposed that includes ten levels, starting from the basic level of collecting and systematizing information about the external world and ending with the upper level of influence on it, agreed with the person and the level of decision-making. The features of the delimitation of functions and the procedure for interaction between a person and an AC are considered in detail. In conclusion, the most important, from a programmer’s point of view, properties that characterize artificial consciousness are given.
Does the assumption of a weak form of computational functionalism, according to which the right form of neural computation is sufficient for consciousness, entail that a digital computational simulation of such neural computations is conscious? Or must this computational simulation be implemented in the right way, in order to replicate consciousness? From the perspective of Karl Friston’s free energy principle, self-organising systems (such as living organisms) share a set of properties that could be realised in artificial systems, but are not instantiated by computers with a classical (von Neumann) architecture. I argue that at least one of these properties, viz. a certain kind of causal flow, can be used to draw a distinction between systems that merely simulate, and those that actually replicate consciousness.
… This chapter explores the philosophical… “Consciousness” covers natural and artificial consciousness and explains why the ethical treatment of AIs should be linked to their consciousness…
The article investigates the primary role of artificial intelligence in the modern stage of the development of technogenic civilization. It clarifies that the development of artificial intelligence systems simultaneously has a profound impact on the values and philosophical perspectives of society. The study examines the history of the development of artificial intelligence and provides an in-depth analysis of its impact on technogenic civilization. This study offers a novel contribution by integrating a dialectical analysis of AI’s historical evolution with its ethical implications, providing a unique perspective on its role in shaping technogenic civilization’s future. The results of the article reveal that artificial intelligence cannot fully replace human consciousness. However, artificial intelligence systems have the potential to imitate human behavior and make automated decisions. The article notes that the primary responsibility for the crises created by artificial intelligence systems lies with humanity. The role of artificial intelligence in the future of technogenic civilization will be determined not only by technological progress but also by the proper application of moral and ethical approaches. Overall, while the development of artificial intelligence systems facilitates human life, it also alters society’s moral and ethical contours. For this reason, strengthening regulations regarding artificial intelligence, as well as the social-philosophical analysis of the relationship between artificial intelligence and consciousness, is essential for the sustainable development of technogenic civilization. The article highlights the need for ethical and legal regulation of AI as it reshapes moral and social frameworks. AI cannot fully replace human cognition but interacts with it, transforming technogenic civilization. Therefore, addressing AI’s ethical and philosophical challenges is crucial for future development. The article explores AI’s impact on technogenic civilization, noting that it reshapes ethical frameworks.
… From this, he derived one of the most famous lines in philosophy: “I think, therefore I am.” This Cartesian “I” is the focus of this book. It is our starting point for inquiring into …
Consciousness, intentionality and intelligence: some foundational issues for artificial intelligence
… consciousness, intentionality and intelligence. After we present the fundamental framework that has shaped both the philosophy … questions, we turn to consciousness, whose study still …
Consciousness, as a fundamental aspect of human experience, has been a subject of profound inquiry across philosophy, culture, and the rapidly evolving field of artificial intelligence (AI). This paper explores the multifaceted nature of consciousness as a nexus where these domains intersect. By examining philosophical theories of consciousness, cultural interpretations of self-awareness, and the implications of AI advancements, the study addresses the challenges of defining consciousness, its diverse cultural interpretations, and the ethical and technical questions surrounding its replication or simulation in machines. The paper argues that consciousness is not only a philosophical puzzle but also a cultural construct and a technological frontier, with significant implications for our understanding of humanity and the future of intelligent systems. Through an interdisciplinary lens, this analysis highlights the need for continued dialogue between philosophy, culture, and AI research to navigate the complexities of consciousness in an increasingly technologically driven world.
… conceptions of consciousness, one is able to see how philosophical worries to do with … of a dependence by many sectors of the philosophical and scientific community (a dependence …
This paper seeks to quantify the human-AI value alignment in large language models. Alignment between humans and AI has become a critical area of research to mitigate potential harm posed by AI. In tandem with this need, developers have incorporated a values-based approach towards model development where ethical principles are integrated from its inception. However, ensuring that these values are reflected in outputs remains a challenge. In addition, studies have noted that models lack consistency when producing outputs, which in turn can affect their function. Such variability in responses would impact human-AI value alignment as well, particularly where consistent alignment is critical. Fundamentally, the task of uncovering a model’s alignment is one of explainability – where understanding how these complex models behave is essential in order to assess their alignment. This paper examines the problem through a case study of GPT-3.5. By repeatedly prompting the model with scenarios based on a dataset of moral stories, we aggregate the model’s alignment with human values to produce a human-AI value alignment metric. Moreover, by using a comprehensive taxonomy of human values, we uncover the latent value profile represented by these outputs, thereby determining the extent of human-AI value alignment.
Background: Value alignment in computer science research is often used to refer to the process of aligning the behaviour of artificial intelligence systems with humans’ desires, but the way the phrase is used often lacks precision. Objectives: In this paper, we conduct a systematic literature review to advance the understanding of value alignment in artificial intelligence by characterising the topic in the context of its research literature. We use this to suggest a more precise definition of the term. Methods: We analyse the abstracts, introductions and conclusions of 172 value alignment research articles that have been published in recent years and synthesise their content using thematic analysis. From these 172 papers we select 85 papers using a structured criteria for a deep analysis, coding these papers in full. Results: Our analysis leads to six themes: value alignment drivers & approaches; challenges in value alignment; values in value alignment; cognitive processes in humans and AI; human-agent teaming; and designing and developing value-aligned systems. Conclusions: By analysing these themes in the context of the literature, we define value alignment as an ongoing process between humans and autonomous agents that aims to express and implement abstract values in diverse contexts, while managing the cognitive limits of both humans and AI agents and also balancing the conflicting ethical and political demands generated by the values in different groups. Our analysis gives rise to a set of research challenges and opportunities in the field of value alignment for future work.
… Xu and Gao have suggested going beyond the scope of current HCAI practice that primarily focuses on individual human-AI systems, to include the perspectives of organizations, …
… alternative terms such as value alignment, human-compatible … We argue that the emotional attunement and value alignment … Specifically, value alignment will ensure that an AI system’s …
The rapid integration of generative AI into everyday life underscores the need to move beyond unidirectional alignment models that only adapt AI to human values. This workshop focuses on bidirectional human-AI alignment, a dynamic, reciprocal process where humans and AI co-adapt through interaction, evaluation, and value-centered design. Building on our past CHI 2025 BiAlign SIG and ICLR 2025 Workshop, this workshop will bring together interdisciplinary researchers from HCI, AI, social sciences and more domains to advance value-centered AI and reciprocal human-AI collaboration. We focus on embedding human and societal values into alignment research, emphasizing not only steering AI toward human values but also enabling humans to critically engage with and evolve alongside AI systems. Through talks, interdisciplinary discussions, and collaborative activities, participants will explore methods for interactive alignment, frameworks for societal impact evaluation, and strategies for alignment in dynamic contexts. This workshop aims to bridge the disciplines’ gaps and establish a shared agenda for responsible, reciprocal human-AI futures.
Artificial Intelligence (AI) must be directed at humane ends. The development of AI has produced great uncertainties of ensuring AI alignment with human values (AI value alignment) through AI operations from design to use. For the purposes of addressing this problem, we adopt the phenomenological theories of material values and technological mediation to be that beginning step. In this paper, we first discuss the AI value alignment from the relevant AI studies. Second, we briefly present what are material values and technological mediation and reflect on the AI value alignment through the lenses of these theories. We conclude that a set of finite human values can be defined and adapted to the stable life tasks that AI systems will be called upon to accomplish. The AI value alignment can also be fostered between designers and users through technological mediation. Upon that foundation, we propose a set of common principles to understand the AI value alignment through phenomenological theories. This paper contributes the unique knowledge of phenomenological theories to the discourse on AI alignment with human values.
… and human–human relationships: How should we balance the value of well-functioning AI companionship alongside the need for authentic human connection? AI companions can …
As AI systems become more advanced, ensuring their alignment with a diverse range of individuals and societal values becomes increasingly critical. But how can we capture fundamental human values and assess the degree to which AI systems align with them? We introduce ValueCompass, a framework of fundamental values, grounded in psychological theory and a systematic review, to identify and evaluate human-AI alignment. We apply ValueCompass to measure the value alignment of humans and large language models (LLMs) across four real-world scenarios: collaborative writing, education, public sectors, and healthcare. Our findings reveal concerning misalignments between humans and LLMs, such as humans frequently endorse values like"National Security"which were largely rejected by LLMs. We also observe that values differ across scenarios, highlighting the need for context-aware AI alignment strategies. This work provides valuable insights into the design space of human-AI alignment, laying the foundations for developing AI systems that responsibly reflect societal values and ethics.
… mind, particularly in terms of value judgment and moral decision-making processes. … AI value alignment. It reviews core psychological theories concerning the formation of moral values, …
We formalize AI alignment as a multi-objective optimization problem called -agreement, in which a set of N agents (including humans) must reach approximate (ε) agreement across M candidate objectives, with probability at least 1-δ. Analyzing communication complexity, we prove an information-theoretic lower bound showing that once either M or N is large enough, no amount of computational power or rationality can avoid intrinsic alignment overheads. This establishes rigorous limits to alignment *itself*, not merely to particular methods, clarifying a "No-Free-Lunch" principle: encoding "all human values" is inherently intractable and must be managed through consensus-driven reduction or prioritization of objectives. Complementing this impossibility result, we construct explicit algorithms as achievability certificates for alignment under both unbounded and bounded rationality with noisy communication. Even in these best-case regimes, our bounded-agent and sampling analysis shows that with large task spaces (D) and finite samples, *reward hacking is globally inevitable*: rare high-loss states are systematically under-covered, implying scalable oversight must target safety-critical slices rather than uniform coverage. Together, these results identify fundamental complexity barriers---tasks (M), agents (N), and state-space size (D)---and offer principles for more scalable human-AI collaboration.
The standard argument to the conclusion that artificial intelligence (AI) constitutes an existential risk for the human species uses two premises: (1) AI may reach superintelligent levels, at which point we humans lose control (the ‘singular-ity claim’); (2) Any level of intelligence can go along with any goal (the ‘orthogonality thesis’). We find that the singularity claim requires a notion of ‘general intelligence’, while the orthogonality thesis requires a notion of ‘instrumental intelli-gence’. If this interpretation is correct, they cannot be joined as premises and the argument for the existential risk of AI turns out invalid. If the interpretation is incorrect and both premises use the same notion of intelligence, then at
… existential risks, one of the most feared has come to be an unaligned Artificial General Intelligence (AGI) (or Artificial Super-Intelligence … catastrophic and existential risk to humanity […
… In principle, we could build a kind of superintelligence … superintelligence would do—looks quite difficult. It also looks like we will only get one chance. Once unfriendly superintelligence …
… The deep challenge of super-intelligence to the significance of human existence does not … Therefore, in order to deal with the deep challenge caused by super-intelligence, we should …
… In this article, I will argue that the notion of existential risk … In the first part, I analyze the notion of existential risk through a … toward a superintelligence without taking the related risks in …
… could create an existential risk similar to that of superintelligent machines. Nevertheless, … is meaningful for existential risk. Indeed, we do not need superintelligent machines— and a …
ABSTRACT This paper examines and analyzes five definitions of ‘existential risk.’ It tentatively adopts a pluralistic approach according to which the definition that scholars employ should depend upon the particular context of use. More specifically, the notion that existential risks are ‘risks of human extinction or civilizational collapse’ is best when communicating with the public, whereas equating existential risks with a ‘significant loss of expected value’ may be the most effective definition for establishing existential risk studies as a legitimate field of scientific and philosophical inquiry. In making these arguments, the present paper hopes to provide a modicum of clarity to foundational issues relating to the central concept of arguably the most important discussion of our times.
This article examines the critical ethical challenges posed by algorithmic bias in artificial intelligence (AI) systems, focusing on its implications for social justice and data equity. Through a systematic review of case studies and theoretical frameworks, we analyze how biased datasets and algorithmic designs perpetuate structural inequalities, particularly affecting marginalized communities. The study highlights key examples, such as gender and racial biases in facial recognition and hiring algorithms, while exploring mitigation strategies rooted in data justice principles. Additionally, we evaluate regulatory responses, including the European Union's AI Act, which proposes a risk-based governance framework. The findings underscore the urgent need for interdisciplinary approaches to develop fairer AI systems that align with ethical standards and human rights.
Summary It has become trivial to point out that algorithmic systems increasingly pervade the social sphere. Improved efficiency—the hallmark of these systems—drives their mass integration into day-to-day life. However, as a robust body of research in the area of algorithmic injustice shows, algorithmic systems, especially when used to sort and predict social outcomes, are not only inadequate but also perpetuate harm. In particular, a persistent and recurrent trend within the literature indicates that society's most vulnerable are disproportionally impacted. When algorithmic injustice and harm are brought to the fore, most of the solutions on offer (1) revolve around technical solutions and (2) do not center disproportionally impacted communities. This paper proposes a fundamental shift—from rational to relational—in thinking about personhood, data, justice, and everything in between, and places ethics as something that goes above and beyond technical solutions. Outlining the idea of ethics built on the foundations of relationality, this paper calls for a rethinking of justice and ethics as a set of broad, contingent, and fluid concepts and down-to-earth practices that are best viewed as a habit and not a mere methodology for data science. As such, this paper mainly offers critical examinations and reflection and not “solutions.”
… of research to assess how bias is theorized, measured, and … of consensus on how algorithmic bias and fairness should be … to bias mitigation, one that integrates computational, ethical, …
As algorithms have become ubiquitous in consequential domains, societal concerns about the potential for discriminatory outcomes have prompted urgent calls to address algorithmic bias. In response, a rich literature across computer science, law, and ethics is rapidly proliferating to advance approaches to designing fair algorithms. Yet computer scientists, legal scholars, and ethicists are often not speaking the same language when using the term ‘bias.’ Debates concerning whether society can or should tackle the problem of algorithmic bias are hampered by conflations of various understandings of bias, ranging from neutral deviations from a standard to morally problematic instances of injustice due to prejudice, discrimination, and disparate treatment. This terminological confusion impedes efforts to address clear cases of discrimination. In this paper, we examine the promises and challenges of different approaches to disambiguating bias and designing for justice. While both approaches aid in understanding and addressing clear algorithmic harms, we argue that they also risk being leveraged in ways that ultimately deflect accountability from those building and deploying these systems. Applying this analysis to recent examples of generative AI, our argument highlights unseen dangers in current methods of evaluating algorithmic bias and points to ways to redirect approaches to addressing bias in generative AI at its early stages in ways that can more robustly meet the demands of justice.
The chapter discusses the problem of algorithmic bias in decision-making processes that determine access to opportunities, such as recidivism scores, college admission decisions, or loan scores. After describing the technical bases of algorithmic bias, it asks how to evaluate them, drawing on Iris Marion Young’s perspective of structural (in)justice. The focus is in particular on the risk of so-called ‘Matthew effects’, in which privileged individuals gain more advantages, while those who are already disadvantaged suffer further. Some proposed solutions are discussed, with an emphasis on the need to take a broad, interdisciplinary perspective rather than a purely technical perspective. The chapter also replies to the objection that private firms cannot be held responsible for addressing structural injustices and concludes by emphasizing the need for political and social action.
Data ‐ driven algorithms are widely used to make or assist decisions in sensitive domains, including healthcare, social services, education, hiring, and criminal justice. In various cases, such algorithms have preserved or even exacerbated biases against vulnerable communities, sparking a vibrant field of research focused on so ‐ called algorithmic biases. This research includes work on identification, diagnosis, and response to biases in algorithm ‐ based decision ‐ making. This paper aims to facilitate the application of philosophical analysis to these contested issues by providing an overview of three key topics: What is algorithmic bias? Why and how can it occur? What can and should be done about it? Throughout, we highlight connections—both actual and potential—with philosophical ideas and concerns.
AI has the potential to disrupt and transform the way we deliver care globally. It is reputed to be able to improve the accuracy of diagnoses and treatments, and make the provision of services more efficient and effective. In surgery, AI systems could lead to more accurate diagnoses of health problems and help surgeons better care for their patients. In the context of lower-and-middle-income-countries (LMICs), where access to healthcare still remains a global problem, AI could facilitate access to healthcare professionals and services, even specialist services, for millions of people. The ability of AI to deliver on its promises, however, depends on successfully resolving the ethical and practical issues identified, including that of explainability and algorithmic bias. Even though such issues might appear as being merely practical or technical ones, their closer examination uncovers questions of value, fairness and trust. It should not be left to AI developers, being research institutions or global tech companies, to decide how to resolve these ethical questions. Particularly, relying only on the trustworthiness of companies and institutions to address ethical issues relating to justice, fairness and health equality would be unsuitable and unwise. The pathway to a fair, appropriate and relevant AI necessitates the development, and critically, successful implementation of national and international rules and regulations that define the parameters and set the boundaries of operation and engagement.
… AI ethics, algorithmic bias, algorithmic justice and responsible AI governance. The selection process was guided by transparent inclusion and exclusion criteria, consistent with PRISMA …
… with the broader ethical implications of discriminatory outcomes. This paper examines algorithmic bias in hiring through established ethical concepts of justice, capability development, …
ABSTRACT The mounting evidence of unintended harmful social consequences of automated algorithmic decision-making (AADM), powered by AI and big data, in transformative services (e.g., welfare services), is startling. The algorithmic harm experienced by individuals, communities and society-at-large involves new injustice claims and disputes that go beyond issues of social justice. Drawing from the theory of “abnormal justice” in this paper we articulate a new theory of algorithmic justice that addresses the questions: WHAT is the matter of algorithmic justice? WHO counts as a subject of algorithmic justice? HOW are algorithmic justices performed? and How to address and resolve disputes about the WHAT, WHO and HOW of algorithmic justice? We illustrate the theory of algorithmic justice by drawing from a case of AADM in social welfare services, widely adopted by governments around the world. Our research points to datafication, technological inscribing and the systemic nature of injustices as important IS-specific aspects of algorithmic justice. Our main practical contribution comes from the articulation of algorithmic justice as a framework that (1) makes visible the injustices related to the “what”, “who”, and “how” of AADM in transformative services, and (2) provides further insights into how we might address and resolve these algorithmic injustices.
Biases in AI and machine learning algorithms are presented and analyzed through two issues management frameworks with the aim of showing how ethical problems and dilemmas can evolve. While “the singularity” concept in AI is presently more predictive than actual, both benefits and damage that can result by failure to consider biases in the design and development of AI. Inclusivity and stakeholder awareness regarding potential ethical risks and issues need to be identified during the design of AI algorithms to ensure that the most vulnerable in societies are protected from harm.
The increasing implementation of and reliance on machine-learning (ML) algorithms to perform tasks, deliver services and make decisions in health and healthcare have made the need for fairness in ML, and more specifically in healthcare ML algorithms (HMLA), a very important and urgent task. However, while the debate on fairness in the ethics of artificial intelligence (AI) and in HMLA has grown significantly over the last decade, the very concept of fairness as an ethical value has not yet been sufficiently explored. Our paper aims to fill this gap and address the AI ethics principle of fairness from a conceptual standpoint, drawing insights from accounts of fairness elaborated in moral philosophy and using them to conceptualise fairness as an ethical value and to redefine fairness in HMLA accordingly. To achieve our goal, following a first section aimed at clarifying the background, methodology and structure of the paper, in the second section, we provide an overview of the discussion of the AI ethics principle of fairness in HMLA and show that the concept of fairness underlying this debate is framed in purely distributive terms and overlaps with non-discrimination, which is defined in turn as the absence of biases. After showing that this framing is inadequate, in the third section, we pursue an ethical inquiry into the concept of fairness and argue that fairness ought to be conceived of as an ethical value. Following a clarification of the relationship between fairness and non-discrimination, we show that the two do not overlap and that fairness requires much more than just non-discrimination. Moreover, we highlight that fairness not only has a distributive but also a socio-relational dimension. Finally, we pinpoint the constitutive components of fairness. In doing so, we base our arguments on a renewed reflection on the concept of respect, which goes beyond the idea of equal respect to include respect for individual persons. In the fourth section, we analyse the implications of our conceptual redefinition of fairness as an ethical value in the discussion of fairness in HMLA. Here, we claim that fairness requires more than non-discrimination and the absence of biases as well as more than just distribution; it needs to ensure that HMLA respects persons both as persons and as particular individuals. Finally, in the fifth section, we sketch some broader implications and show how our inquiry can contribute to making HMLA and, more generally, AI promote the social good and a fairer society.
Research on the ethics of algorithms has grown substantially over the past decade. Alongside the exponential development and application of machine learning algorithms, new ethical problems and solutions relating to their ubiquitous use in society have been proposed. This article builds on a review of the ethics of algorithms published in 2016 (Mittelstadt et al. Big Data Soc 3(2), 2016). The goals are to contribute to the debate on the identification and analysis of the ethical implications of algorithms, to provide an updated analysis of epistemic and normative concerns, and to offer actionable guidance for the governance of the design, development and deployment of algorithms.
本报告将人工智能的哲学与伦理研究系统划分为四个维度:本体论层面的机器意识与道德主体性、社会层面的算法公正与治理、工程层面的价值对齐与人机协作、以及宏观层面的超级智能存在性风险与哲学反思。这一分类框架涵盖了从技术实现到社会规范再到人类文明未来的全方位探讨,为期末结课报告提供了清晰的学术逻辑支撑。