1 IntroductionNowadays, lithium-ion batteries are utilized in a wide range of applications, from portable devices to grid-level energy storage, due to their high energy density, high power density, long lifetime, and falling cost (Chen T. et al., 2021; Severson et al., 2019). However, a long battery lifetime impedes battery development because it takes months or years to observe the deterioration. Moreover, despite the standardized manufacturing processes of lithium-ion batteries, even from the same batch, batteries can have significantly different lifetime due to internal heterogeneity (porosity, thickness, and etc.) and different operation conditions. Therefore, early-stage lifetime prediction methods are crucial to assess batteries in advance and shorten the required experimental time which can accelerate battery research, production, and design optimization (Chen B. R. et al., 2021).Existing studies for battery lifetime prediction can generally be divided into two groups: model-based methods and data-driven methods. For the model-based methods, researchers either start with an empirical model with explicit parameters (Schmalstieg et al., 2014) or a model (equivalent circuit model or electrochemical model) combined with advanced filtering algorithms to estimate the aging status (He et al., 2011; Arachchige et al., 2017; Wassiliadis et al., 2018). Xing et al. (2013) used a particle filter to update the parameters within an empirical exponential and a polynomial regression model to track the battery’s degradation trend. With a simple battery model, Saha et al. (2009) proposed a particle filter method to predict the state of charge (SOC), state of health (SOH), and remaining useful life (RUL) based on the correlations between the battery capacity and resistance. Yang et al. (2019) implemented a particle filter with a semi-empirical model based on Coulombic efficiency, which is highly correlated with the loss of active lithium inventory, to estimate the battery health. To facilitate the resample process within the particle filter. Tang et al. (2019) proposed a model-oriented gradient-correction particle filter method for future degradation. By using the base-model as a regulation within the evolution of the particle, the global information from the base model is utilized and help the model achieve a better prediction result. Gao et al. (2022) proposed a SOC and SOH co-estimated framework comprising a simplified electrochemical model and dual nonlinear filter. Compared with the mathematical model, which is not adaptive to the real-time behavior of the battery, the filter-based prediction approach treats parameters as state variables that are identified online with real-time data. Therefore, compared with empirical models, filter methods offer better precision and accuracy. However, they still have some drawbacks: 1) the performance is greatly affected by the underlying battery degradation model; 2) early-cycle prediction remains a challenge for these methods because of limited capacity lost in the early cycles (Fei et al., 2022).Unlike model-based methods, statistical and machine learning approaches can infer from the cycle data and offer a more general approach to predict battery lifetime in the early stage stages of operation. Moreover, statistical and machine learning approaches are attractive, with the recent improvement in algorithms and computational power, and the growing availability of battery cycling data. Nowadays, many studies have been done using these advanced methods to address engineering problems, such as computational fluid dynamics, molecular design, and so on (Reich, 1997; Liakos et al., 2018; Sanchez-Lengeling and Aspuru-Guzik, 2018; Brunton et al., 2020; Hegde and Rokseth, 2020; Mendez, 2022). Nonetheless, these techniques are also applied in predicting battery lifetime. With the extracted features, previous capacity trend, temperature and depth of discharge, and so on. Liu et al. (2019) predicted cyclic aging using Gaussian progress regression (GPR) with a modified kernel which reflects the electrochemical behavior. To further improve the model performance, instead of using features to construct a regression model alone, a base model is firstly fitted to learn the battery’s long life information, and then a migrated mean function and migrated-GPR model are used to predict the fading curve with 30% starting data (Liu et al., 2022). Besides GPR (Richardson et al., 2017; 2019b; 2019a), the recurrent neural network (RNN), especially the long short-term memory (LSTM) network (Zhang et al., 2018; Gupta et al., 2021; Hu et al., 2021, 2022; Li et al., 2021; Uddin et al., 2022), is usually used in battery fading curve prediction due to its extraordinary ability in handling time-series data. Zhang et al. (2018) used the LSTM network to synthesize a data-driven battery RUL predictor. The drop-out method is applied to avoid overfitting, and the Monte Carlo (MC) simulation is used to generate the RUL prediction uncertainties. Furthermore, to utilize the advantage of the LSTM network and GPR. Liu et al. (2021) decompose the capacity data with the empirical mode decomposition (EMD) method and feed the decomposed result to LSTM and GPR, respectively. Therefore, the long-term dependency of capacity is captured by the LSTM network, while the uncertainty quantification caused by the capacity regeneration is captured by the GPR. Other methods such as deep neural network (Hsu et al., 2022), linear regression with elastic net (Severson et al., 2019), random forest (Yang et al., 2022), stacked denoising autoencoders (Xu et al., 2021), etc., are used to predict RUL or to extrapolate battery fading curve with some starting cycle data.Early-stage lifetime prediction with limited data is crucial for battery development and deployment and remains a challenge for researchers. During the early stage, batteries will undergo a formation process in which the electrochemical behavior is different than in operation after the early stage. For example, many batteries’ capacity increases in the early stage which is a behavior that has not been fully understood yet (Guo et al., 2022). This behavior results in relatively small capacity changes during the early stage. Therefore, predicting lifetime with early cycle data is challenging. Existing methods generally require 40%–70% of historical data of the entire battery lifetime to estimate the model parameters or train a data-driven model (Hu et al., 2020). Therefore, careful feature engineering is needed to generate features that highly correlate with lifetime while given limited cycle data.Feature engineering is a necessary process of selecting, manipulating, and transforming raw data into features that can be used in model development. Proper feature engineering can ease the modeling difficulty and enable the model to output results of higher quality (Zheng and Casari, 2018). Generally, features for battery prognosis can be derived from 1) raw data (voltage/current/temperature-time curve); 2) incremental capacity and differential voltage analysis; 3) directly measured variables; 4) statistical metrics; and 5) extraction from a deep neural network with raw data input (Fei et al., 2022). These features are extracted through two feature extraction techniques: 1) traditional knowledge-guided feature extraction and 2) deep learning based automatic feature extraction. The traditional method ensures that the extracted features are relevant to the lifetime prediction and have physical meaning and implication. For example, temperature is often used. The physical meaning behind this feature is that as batteries lose capacity, the internal resistance increases, resulting in a higher temperature during operation. In contrast to the physics-knowledge-guided feature extraction, using a deep neural network to extract features is hard to understand and lacks physical meaning due to the black-box nature of the neural network. Nevertheless, extracting features from time-series data, especially physics-guided features, has seldom been studied. Fei et al. (2022) construct the features from 2nd, 10th,
Read More