There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to such an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart's Law 1. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous , and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant , which notes there are "(at least) four different mechanisms" that relate to Goodhart's Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in artificial intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimizing the proxy, and so the increased optimizationpower offered by artificial intelligence makes it especially critical for that field. Varieties of Goodhart-like Phenomena As used in this paper, a Goodhart effect is when optimization causes a collapse of the statistical relationship between a goal which the optimizer intends and the proxy used for that goal. The four categories of Goodhart effects introduced by Garrabrant are 1) Regressional, where selection for an imperfect 1 As a historical note, Goodharts Law  as originally formulated states that "any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." This has been interpreted and explained more widely, perhaps to the point where it is ambiguous what the term means. Other closely related formulations, such as Campbell's law (which arguably has scholarly precedence) and the Lucas critique, were also initially specific, and their interpretation has also been expanded greatly. Lastly, the Cobra Effect and perverse incentives are often closely related to these failures, and the different effects interact. Because none of the terms were laid out formally, the categories proposed do not match what was originally discussed. A separate forthcoming paper intends to address the relationship between those formulations and the categories more formally explained here.
translated by 谷歌翻译