Machine remaining useful life (RUL) prediction is vital in improving the reliability of industrial systems and reducing maintenance cost. Recently, long short-term memory (LSTM)-based algorithms have achieved state-of-the-art performance for RUL prediction, due to their strong capability of modeling sequential sensory data.In many cases, the RUL prediction algorithms are required to be deployed on edge devices to support real-time decision making, reduce the data communication cost and preserve the data privacy. However, the powerful LSTM-based methods which have high complexity cannot be deployed to edge devices with limited computational power and memory. To solve this problem, we propose a knowledge distillation framework, entitled KDnet-RUL, to compress a complex LSTM-based method for RUL prediction. Specifically, it includes a generative adversarial network based knowledge distillation (GAN-KD) for disparate architecture knowledge transfer, a learning-during-teaching based knowledge distillation (LDT-KD) for identical architecture knowledge transfer and a sequential distillation upon LDT-KD for complicated datasets. We leverage simple and complicated datasets to verify the effectiveness of the proposed KDnet-RUL. The results demonstrate that the proposed method significantly outperforms state-of-the-art KD methods. The compressed model with 12.8 times less weights and 46.2 times less total float point operations even achieves a comparable performance with the complex LSTM model for RUL prediction.