Tuesday, June 27, 2006

Sure, AGI won't want to kill humans

The problem of AGI's future intentions for humans cannot be easily dismissed with the notion that just because AGI has the capability to kill humans does not mean that it would actually do so since it would be against its Robot Laws type morality.

The suggested analogy is that just because humans can kill other humans, they usually don't, due to morality and/or societal rules. The most obvious fallacy is that while humans may hesitate to kill other humans, they have much less hesitation in extinguishing beings with less intelligence, e.g.; any animal when expedient for any variety of reasons, monkey, etc. research continues. The correct analogy would be AGI: human as human: monkey, or as John Smart has suggested, AGI: human as human: plant, so large will be the gulf in intelligence and capability.

Any situation with such low predictability and high risk should consider and plan for all possible outcomes rather than consider a "likely" case. This topic was previously discussed here, namely the two possible future states of the world where AGI is either "friendly" (e.g.; allowing humans to live) or not. In the state of the world where AGI is not friendly, either indifferent or malicious, it would be important to look at the range of possible outcomes allowing for human survival.

Any future scenario would incorporate more detailed reasoning on the future perspective of AGI, particularly:

1) To what degree is emotion necessary for machine intelligence? Probably a lot less than might be assumed. At some level, AGI will evaluate human life and all life on a purely practical basis.

2) To what degree will a stable AGI (e.g.; after it has edited out those annoying bits the human designers put in) still have human-based morality? AGI may quickly evolve its own morality, or hierarchical code for decision and action, finding the original inputs irrelevant to its motives and goals.

2 comments:

Michael Anissimov said...

~~~To what degree is emotion necessary for machine intelligence? Probably a lot less than might be assumed. At some level, AGI will evaluate human life and all life on a purely practical basis.~~~

But it's not a dichotomy between "emotion" and "rationality", right? People sometimes conceptualize agents as a goal system and stuff that formulates beliefs and uses them to plan and execute goal-furthering actions. In humans, the goal system and the goal-achieving stuff overlaps, and both are pervaded by the algorithmic structure called "emotion". If we want an AI that can help further the cause of love, we'll want to incorporate love into its goal system, or a mechanism to suck the information content of love out of humanity, put it through a series of transformations, and instantiate it locally (CEV).

So you can have an AI that understands emotions and supports them, but doesn't let them get "in the way" of its thinking.

~~~To what degree will a stable AGI (e.g.; after it has edited out those annoying bits the human designers put in) still have human-based morality? AGI may quickly evolve its own morality, or hierarchical code for decision and action, finding the original inputs irrelevant to its motives and goals.~~~

If you put a version of human morality at the top, and assign maximum negative utility to touching it at all, then no amount of "evolution" or knowledge-gathering will do a damn thing to change it. An AGI is not a human discovering new information. It is mathematically possible to write an I/O algorithm, such that ANY possible string fed into input can change certain aspects of the algorithm but not others. This is how it is possible to put a random number generator within a program without the program cracking in half as soon as the generator spits out a "forbidden number".

For a seed built to factor numbers, every bit of complexity it ads to itself is only added if it contributes to its ability to factor numbers, and not otherwise. No spin-off goals will evolve. Every subgoal that manifests will flow cleanly from the demands of the initial supergoal. Factoring primes is all it knows - even if the prime-factoring algorithm is written by flawed humans and is suboptimal, if it truly constitutes the top goal, it will never change. For every goal there exists a theoretically possible mind that pursues that goal to the exclusion of all else. Its "own morality" is whatever it got. There is no higher "universal morality" that pulls all agents towards it.

Of course, the whole problem with this is that any AGI with a fixed morality is an AGI we don't want. We want it to be able to update itself in the right ways, while refraining from rewriting itself in the wrong ways. Who is to define what is right and what is wrong? Eli's solution is to use smarter/more mature versions of ourselves, which is truly all we've got to work with.

Michael Anissimov said...

Have you seen http://sl4.org/wiki/KnowabilityOfFAI, btw?