In this report, we present our investigations on feature representation, hand mask modality, past action prediction, and model ensemble, for the EPIC-Kitchens Action Anticipation Challenge. Building upon an existing action anticipation model, i.e., RULSTM, our framework effectively utilizes enhanced feature representation, gives more emphasis on many-shot objects, and incorporates additional hand mask modality. We also explore a network modification to capture past action prediction. Furthermore, to exploit all the training data and aggregate the complementary information from different models, we employ model ensemble. We achieved top-1 action anticipation accuracy of 16.02% for Seen Kitchens (S1), and 10.11% for Unseen Kitchens (S2). Our submission, under the team name VI-I2R, achieved 2nd place for both seen and unseen kitchens, in terms of top-1 action anticipation accuracy.