The assertion was not an argument, it was an assumption made in this thought experiment. We are assuming the AI can outclass human thinking in the problem setup.
your point about intent is well taken, its unclear how super systems w/o intent will operate. this article is talking about our ability to contain a super AI that does have Ill intent.
I agree we are stretching to be able to analyze how these systems will behave. We had ZERO idea the kinds of capabilities that would emerge from GPT systems before we built them. And we don't understand them now. like we are with another human, we must simply observe its behavior from the outside.
But your assurance we can "pull the plug" is based on an assumption that we can know the effect of actions. but the world is very complex, and the world of software even more so. Imagine one agent must look at the actions of a second agent and ensure they are safe. I dont need to know anything about AI to understand the inherent difficulty of this task. Especially when the second agent is submitting CODE to be run. There are so many ways of hiding ones real intent.
knowing when to pull the plug is a sys-admin knowing when to disable a fraudulent account. its not easy at all to distinguish benign vs malevolent actions. The world is too complex... too many ways to hide ones true intent.
Notice I am making all of these argument w/o reference to the AI at all.
but dont rest to easy about how far away scary AI is. empircal evidence shows improvement is dramatically non-linear. to a first approximation it is step wise, and the size of the steps are quite varied. .... and the size of the GPT step is colossal.