Dexterous manipulation of objects is a core task in robotics. Because of the design complexity needed for robot controllers even for simple manipulation tasks for humans, e.g., pouring a cup of tea, robots currently in use are mostly limited to specific tasks within a known environment.
Disentangled Representation Learning is to learn a task-independent representation which captures an underlying low-dimensional representation of the observed world factorized into a number of variation factors, e.g., object properties, geometry, lighting, etc. Its theoretical advantages for supervised learning and reinforcement learning, e.g., data efficiency for subsequent machine learning, the generalization skills, transferability and interpretability of learnt models, have been highlighted in [Bengio et al. 2013] and there exists a large body of literature, e.g., face editing and recognition under various factors including face identity, lighting, pose facial expression [Tran et al.@CVPR2017][Marriot et al.@FG2020] in computer vision, contextualized word embeddings and language models [Vaswani et al.@NIPS2017][Brown et al.2020] in NLP, and more recently [Higgins et al.@ICML2017][Whitney et al.@ICRL2020] in RL for robotics. Within the project, We are building upon our expertise on disentangled latent feature space learning [Marriott et al.@CVPR2021] and manipulation subspace learning [Katyara et al.@IEEE TCDS2021], and extending the previous research on representation learning to robot manipulation control and performing simulation enabled self-supervised representation learning to learn embeddings of states and action sequences and capture the structure of the environment’s dynamics, and thereby improve data efficiency in our end-to-end RL for dexterous bimanual robotic manipulation.
For further details, visit here: https://docs.google.com/document/d/1Sl0ptYZvOzS9_44mLvR9dzckNxL0KObpfHWSe0paNHQ/edit?usp=sharing