Shi, X., Wong, Y. D., Chai, C., Li, M. Z.-F., Chen, T., & Zeng, Z. (2022). Automatic Clustering for Unsupervised Risk Diagnosis of Vehicle Driving for Smart Road. IEEE Transactions on Intelligent Transportation Systems, 23(10), 17451–17465. https://doi.org/10.1109/tits.2022.3166838
Early risk diagnosis and driving anomaly detection from vehicle stream are of great benefits in a range of advanced solutions towards Smart Road and crash prevention, although there are intrinsic challenges, especially lack of ground truth, definition of multiple risk exposures. This study proposes a domain-specific automatic clustering (termed AutoCluster) to self-learn the optimal models for unsupervised risk assessment, which integrates key steps of clustering into an auto-optimisable pipeline, including feature and algorithm selection, hyperparameter auto-tuning. Firstly, based on surrogate conflict measures, a series of risk indicator features are constructed to represent temporal-spatial and kinematical risk exposures. Then, we develop an unsupervised feature selection method to identify the useful features by elimination-based model reliance importance (EMRI). Secondly, we propose balanced Silhouette Index (bSI) to evaluate the internal quality of imbalanced clustering. A loss function is designed that considers the clustering performance in terms of internal quality, inter-cluster variation, and model stability. Thirdly, based on Bayesian optimisation, the algorithm auto-selection and hyperparameter auto-tuning are self-learned to generate the best clustering results. Herein, NGSIM vehicle trajectory data is used for test-bedding. Findings show that AutoCluster is reliable and promising to diagnose multiple distinct risk levels inherent to generalised driving behaviour. We also delve into risk clustering, such as, algorithms heterogeneity, Silhouette analysis, hierarchical clustering flows, etc. Meanwhile, the AutoCluster is also a method for unsupervised data labelling and indicator threshold calibration. Furthermore, AutoCluster is useful to tackle the challenges in imbalanced clustering without ground truth or a priori knowledge.
This work was supported in part by the National Key Research and Development Program of China under Grant 2018YFB1600502, in part by the Chinese National Science Foundation under Grant 61803283, in part by the Shanghai Municipal Education Commission and Shanghai Education Development Foundation under the “Chen Guang” Project (18CG17), and in part by the Shanghai Municipal Science and Technology Major Project (2021SHZDZX0100).