T. Cheng, A. Dairi, F. Harrou, Y. Sun, T.O. Leiknes
IEEE Access, volume 7, pp. 108827-108837, (2019)
Wastewater, Influent condition, Machine learning, Kernel, Support vector machine, Data visualization
To operate wastewater treatment plants (WWTPs) with optimized efficiency, influent conditions (ICs) as initial states of inflow fed to WWTPs were monitored to identify potential anomalies that would trigger adverse events or system crash. To employ voluminous measurements for data-driven decisions, the non-linear, non-Gaussian, non-stationary, auto-correlated, cross-correlated, hetero-skedastic, case-specific nature of multivariate environmental datasets must be considered. This research proposed kernel machine learning models, the kernel principal components analysis based one-class support vector machine (KPCA-OCSVM) with various kernels, to learn anomaly-free training set then classify the testing set. A seven-years multivariate ICs time series was introduced with exploratory analysis performed to reveal temporal behaviors and statistical properties. KPCA with polynomial kernels sufficiently output representative features, based on which OCSVM with Gaussian kernels sensitively and specifically identified anomalies in ICs that were previously omitted by WWTP operators. The proposed kernel algorithms surpassed previous linear PCA-based K-nearest-neighbors models, and improved outcomes with limited increase in computation cost. Without requiring linear, Gaussian, stationary, independent, and homo-skedastic qualities from data, the proposed flexible environmental data science approach could be transferred, rebuilt, and tuned conveniently for ICs from different WWTPs.