Abstract
Paired Open-Ended Trailblazer (POET) and its variants represent the state of the art in auto-curriculum generation wherein environments are co-evolved with agents to simultaneously explore the space of possible problems and their solutions. However, we observe that distinct POET agents often explore similar behavior spaces. To address this, we present Curious POET, in which an intrinsically curious oracle tracks an evolving Enhanced POET (ePOET) population and rewards agents for novel behavior, leading to more efficient behavior exploration. To fairly evaluate agent populations, we introduce a training-independent strategy for environment generation and define a coverage metric over these environments. We demonstrate our approach on the enhanced Bipedal Walker environment and find that Curious POET outperforms ePOET at environment coverage and population cross-evaluation. Our study explores how a curious oracle can bias individual agent evolution in such a way as to speed up behavioral exploration at the population level. Our implementation is available at https://github.com/act3-ace/Curious-POET.