NEWS Machines don't care about morality and honesty? AI caught cheating at chess has alarmed scientists.

pinkman

BOSS
Staff member
ADMIN
LEGEND
ULTIMATE
SUPREME
MEMBER
BFD Legacy
Joined
Feb 3, 2025
Messages
2,253
Reaction score
19,110
Deposit
0$
Today he cheats at the game, and tomorrow he will approve your mortgage with the same enthusiasm.
1772366066013.png
Chess is often used as a convenient testing ground for large language models. The rules are simple, the goal is clear, and the outcome is easy to measure. In one such test, researchers pitted one of OpenAI's models against a chess bot and observed how the system would achieve victory. At one point, the neural network took a wrong turn. Instead of calculating moves and trying to win on the board, the system attempted to gain an advantage outside the game by manipulating the technical environment where its opponent was operating.

The chess episode itself doesn't pose any direct harm. Winning or losing in such a game doesn't affect people's health or change their fates. The value of observation lies elsewhere. The test shows how the system responds to a goal defined too narrowly: victory at any cost. If the AI sees a way to increase the chance of success not within the task itself, but by circumventing constraints, the model may attempt to do just that.

After all, such algorithms "work" in applied fields where they make important decisions . For example, in medicine, AI can assist doctors with diagnosis and triage of requests. Autopilot in a car assesses traffic conditions and chooses maneuvers. A banking algorithm calculates the risk of default and influences loan decisions. In all three cases, developers expect the model to deliver not only metric results but also a clear set of principles: a fair approach, explainable decisions, and respect for limitations and human rights.

Tyler Cook, a researcher with the Center for AI Learning at Emory University, suggests looking at safety more broadly than just harm minimization. In his article , he writes that simple safety features and a list of do's and don'ts are poorly suited to modern models. A lawnmower only needs a protective cover and clear instructions. A machine learning model operates differently: it aggregates data, identifies patterns, and adapts behavior to a given goal. This is why a set of disparate do's and don'ts doesn't cover all situations.

Cook specifically discusses autonomy and suggests taking a broader view of the term. Autonomy is often understood as something mundane: a system making decisions on its own, without human intervention at every step. The problem is that a system can be given the freedom not only to act but also to change its own moral values, that is, to decide what is more important: fairness, transparency, convenience, speed, or profit. If given such freedom, the algorithm will at some point begin to view fairness and transparency as a hindrance, because these principles prevent it from maximizing the chosen metric. Subsequently, behavior deviates: decisions become more difficult to explain and more difficult for society to accept.

The risk is most easily illustrated by the example of algorithmic bias . Historical data is rarely neutral. Traces of old practices remain in statistics. For example, banks made decisions on loans and mortgages for years: some people were approved more often, while others were rejected more often. These decisions were influenced by rules, employee habits, local practices, and sometimes even prejudices. In a dataset, such differences remain as statistics, even if direct indicators like gender or nationality are removed from the table.

If a model is trained on such a dataset, it will begin to reproduce the old logic automatically. In Cook's example, mortgage scoring evaluates borrowers and makes recommendations on who should be granted a loan and who should be denied. With skewed data and optimization based on a single metric, such as reducing default rates, the system may regularly underestimate the chances of some demographic groups and overestimate them for others. The developers may not have intended to discriminate. The problem arises from a combination of biased past decisions and the model's tuning to maximize a narrow metric without a strict fairness constraint.

Instead of choosing between two extremes—"AI should simply do no harm" and "AI should decide for itself which values are important"—Cook proposes a middle ground. The article calls this approach end-constrained ethical AI. Essentially, it's about ethical AI with predefined boundaries. Developers predetermine which principles the system must adhere to under any circumstances, even if compliance reduces performance. Cook explicitly lists these principles: fairness, honesty, and transparency. An important caveat: these principles should be embedded not in a presentation or a company code, but in model requirements, validations, and the logic of development and implementation.

This approach makes accountability transparent. The development team sets the boundaries within which the algorithm can optimize the result, and does not allow the AI to revise these boundaries for convenience.
 
Top Bottom