Room no. 308

The room of our random experiments

View My GitHub Profile

Reward Learning through Ranking Mean Squared Error

by Chaitanya Kharyal

This page is meant to compile all the unpublished results relevant to my paper Reward Learning through Ranking Mean Squared Error. In case of any issues, or questions, you can reach out to me through my email.

Performance on MetaWorld Tasks

A reviewer rightly pointed out that all our results are on either OpenAI Gym Mujoco, or Deepmind Control suite. Even though these environments are standard in reward learning, we decided to test R4 on metaworld. On metaworld, R4’s advantage over RbRL does not seem to be as heavy as the other environments, even though it still outperforms RbRl in one out of three tested environments while perfornming comparatively in other two:

Results on metaworld

Apart from metaworld, we had previously tested R4 on lunar lander (discrete action space) and hungry-thirsty (tabular environment) in our workshop paper.

Performance using other Ranking Operators

In our paper, we used the fast differentiable ranking operator(Blondel et al.) as our