Reproducing Kindred’s UR-Reacher-2 Learning Experiment

This guest blog post was written by Oliver Limoyo, a PhD student from the STARS laboratory at the University of Toronto, studying reinforcement learning for robotic applications. Oliver was one of the early testers for SenseAct.

Achieving reproducible results in reinforcement learning experiments can be notoriously difficult; with the added hardware artifacts of physical robots, this becomes an even more challenging feat. Having said this, I was very curious to try out SenseAct, an open-source framework optimized for real-time reinforcement learning and reproducible results. Although there are existing general solutions in robotics such as the Robot Operating System (ROS), these are not specifically optimized for reinforcement learning experiments. As a bit of background, our lab is very interested in collaborative robots (cobots) that are intended to physically interact with humans in shared workspaces. For my research, I am investigating the use of learning-based approaches to enable these robots to perform difficult-to-model tasks in unstructured and dynamic environments -- in particular, how to enable reinforcement learning on physical platforms by leveraging effective perceptual representations of sensor data acquired through interaction.

In this guest blog post, I will present a summary of my experiences working with SenseAct to reproduce the UR-Reacher-2 experiment on our lab’s UR10 arm, which was initially demonstrated with a UR5 arm by Kindred. I’ll highlight how my experience working with SenseAct went overall, which parts of the process were easy, what obstacles I encountered and how they were overcome.

Overview of experience

As a first step, since the UR10 is the larger version of the UR5, some of the setup parameters had to be modified, notably the geometric parameters of the manipulator model and the cartesian safety boundaries. Fortunately, this procedure was very easy since there was an existing specific setup file with all the relevant parameters in one place.

After having modified these parameters, I connected my laptop to the robot and ran the SenseAct UR-Reacher-2 example code. The arm did not move -- my first obstacle: an error stating that the packet size received from the arm was not correct. After referring to UR’s client interface guide, I discovered that the packet definition of their underlying firmware changes from version to version. The fix was relatively quick, given that there was an existing file with all of the required definitions. I ran the code again and, this time, no errors showed up and the arm started moving:


Initial performance of the arm. The excessive vibrations and jitter prevented the arm from learning any sort of effective policy.


Unfortunately, the arm’s motion was a lot more jittery when compared to Kindred’s initial release video. The excessive vibrations prevented the arm from learning any sort of effective policy even after training for 3 hours. After reading a relevant issue from the Github repository for the UR arm Robot Operating System (ROS) wrapper, I found out that the underlying arm firmware dealt with the action commands being sent slightly differently across different versions; specifically, the optimal setting for the velocity command’s timing had to be tweaked. Once this was done, I ran the experiment one more time:


The results after training for about 45 minutes.


The algorithm was able to successfully learn an effective policy. In order to further test the reproducibility, I tried re-running the experiment with the same randomization seed. The two final learning curves are shown below:         

Two learning curves from different trials using the same randomization seed.

Two learning curves from different trials using the same randomization seed.


To my surprise, the final learning curves were practically identical! Although this was a relatively basic experiment, it was a promising first step considering the complexity of the overall system.

Concluding thoughts

A key insight I gained from this experiment with real-world reinforcement learning is the importance of understanding the underlying firmware of a robotic platform and its interplay with a reinforcement learning algorithm. Perhaps it’s obvious in hindsight, but no matter how good a learning algorithm is, if the underlying system (e.g., the action command being sent) is operating incorrectly, it is unlikely to learn a useful policy.

Working with real hardware introduces an additional layer of complexity in a system with so many existing moving parts. With that said, Kindred’s SenseAct definitely streamlines many of these issues, and all of the obstacles encountered were dealt with relatively quickly. Having worked a bit with both robots and reinforcement learning algorithms, each finicky and unforgiving in their own right, I was extremely surprised to be able to get a real-world reinforcement learning experiment up and running in a matter of days. I am excited to continue using SenseAct for my own research work and to continue contributing to the project!