The room of our random experiments
by Chaitanya Kharyal
This page is meant to compile all the unpublished results relevant to my paper Reward Learning through Ranking Mean Squared Error. In case of any issues, or questions, you can reach out to me through my email.
A reviewer rightly pointed out that all our results are on either OpenAI Gym Mujoco, or Deepmind Control suite. Even though these environments are standard in reward learning, we decided to test R4 on metaworld. On metaworld, R4’s advantage over RbRL does not seem to be as heavy as the other environments, even though it still outperforms RbRl in one out of three tested environments while perfornming comparatively in other two:

Apart from metaworld, we had previously tested R4 on lunar lander (discrete action space) and hungry-thirsty (tabular environment) in our workshop paper.
In our paper, we used the fast differentiable ranking operator(Blondel et al.) as our