Application of Support Vector Regression in Krylov Solvers

Rehana Thalib, Maharani Abu Bakar and Nur Fadhilah Ibrahim "Application of Support Vector Regression in Krylov Solvers”, Annals of Emerging Technologies in Computing (AETiC), Print ISSN: 2516-0281, Online ISSN: 2516-029X, pp. 178-186, Vol. 5, No. 5, 20th March 2021, Published by International Association of Educators and Researchers (IAER), DOI: 10.33166/AETiC.2021.05.022, Available: http://aetic.theiaer.org/archive/v5/v5n5/22.html. Research Article


Introduction
Machine learning (ML) is the recent technology that is used broadly in many different discipline subjects. Literally, ML means that a computer learns the behaviour or pattern of some data given [1]. The pattern is then used for the classification or prediction. The given data is called the training data, while the prediction is called the output. To realize the process of the classification and prediction, ML requires some techniques such as support vector machine (SVM), support vector regression (SVR), artificial neural network (ANN), etc [2]. The SVM and SVR differs by the purpose of the tools where the first one aims to classify, while the second aims to regress. Different from general prediction, in this study, we use SVR to predict an approximate solution of the systems of linear equations (SLEs) based on the features of some iterates generated by an iterative method, called Lanczos algorithm, which is Krylov based solvers, Lanczos-type solvers revised Lanczos method proposed by Cornelius Lanczos [3], by introducing the theory of orthogonal polynomials (FOPs), which enables us creating some recurrence relationships between the orthogonal polynomials, [4]. This leads two other classes of Krylov iterative methods, called Baheux-types, [4], and Farooq-types, [5]. The differences between the two are that Baheux-types were built by the three-terms recurrence relationships of , −1 , −2 , (1) , −1 (1) , and −2 (1) , whereas Farooq-types were built from Baheux-types with addition −3 and −3 (1) . Definitely, Farooq-types need more computation of the coefficient of , compared with Baheux-types, however, they are more robust to find a good solution [5].
In the recent advances of Lanczos-type methods, they focused on dealing with the breakdown, for instance restarting Lanczos-types [6], and switching Lanczos-types [7]. Restarting enables us to restart the Lanczos solvers whenever they faced breakdown. It is involving the choice of a good quality of iterate used to restart with [8]. On the other hand, switching deals with the use of some Lanczos-types running alternately whenever the breakdown occurs. Furthermore, [9] introduced a model prediction to find an approximate solution of SLEs. The methods involving EIEMLA [10] and MEIEMLA [11]. The later one revised the firsts one by the way to interpolate some iterates generated by the Lanczos-types. This study will look at this prediction model by using SVR and the regular regression. The benefit of using these two prediction tools is that they are more accurate to predict compared with the extrapolation method. In fact, SVR is chosen in this study since it is suitable for our data set at the moment.

Lanczos-Type Algorithms
Lanczos-type algorithms are well-known as an effective iterative methods for solving nonsymmetric system = , where ∈ × , and , ∈ . It is a Krylov-based that employs the theory of formal orthogonal polynomials by defining the linear function which satisfies ( ( ) = 0), = 0,1, … − 1 where < and be the dimension of the SLEs, to compute the coefficients of polynomials = [3]. The approximate solution is computed by using the relation of = − , without computing the inverse matrix of . This leads to several variants of Lanczos algorithms, such as Orthores ( 4 ), Orthomin, Orthodir, etc [4], and 12 , 13 / 6 [3].
The only one drawback that typically all Krylov-based solvers has is that they faced the breakdown which causes their convergence is unstable. It is commonly caused by the division by zero on the Lanczos formula. It is not our authority to discuss further regarding this phenomenon, however, our investigation of the model prediction is motivated by the issue of breakdown. Particularly, we are interested in looking at the pattern of the iterates generated by the Lanczos algorithms. Good study regarding this can be found in [10] and [11], where both articles discuss the prediction model by using interpolation and extrapolation methods which the model is based on the pattern persistent in the Lanczos vector sequences. This leads at least two methods called EIEMLA and MEIEMLA.

EIEMLA and MEIEMLA in the Solution Prediction of SLEs
One of the strategies to deal with the breakdown in Krylov-based solvers such the Lanczostype algorithms, is the predicting solution. Let a Lanczos-type solves a non-symmetric SLEs, = , where ∈ × , and , ∈ for a fixed number of iterations, say , or before the Lanczos algorithm faced the breakdown. The sequence S = { , 2 , ⋯ , , … , } of all iterates can be visualized by using parallel coordinate system as follows. Figure 1. Parallel Coordinate System representation of iterates generated by Orthodir [9] www.aetic.theiaer.org As can be seen from Figure 1, the bold curves are the pattern formed by some iterates with small residual norms. The idea then comes up with modeling the iterates by using interpolation method, particularly by using PCHIP (piecewise cubic hermite interpolation polynomial). The extrapolation method is then needed to get an approximate solution which preserves the properties of the sequence S. This procedure was implemented in the method called embedding interpolation and extrapolation model in Lanczos-types (EIEMLA) [9].
In EIEMLA, the interpolation is done over all entries of the first iterate, { ( ) , 2 ( ) , … , ( ) }, all entries of the second iterate { ( ) , 2 ( ) , … , ( ) }, etc. MEIEMLA, revised the EIEMLA by interpolating all iterates in the sequence at one time. Both EIEMLA and MEIEMLA predicted the approximate solution of SLEs within the range of interval [1, ], where < . In this study, we use regression methods which allow us to predict out of the range, or [1,1 ], for 1 > . The benefit of this kind of prediction is that we obtain a new iterate which is not the Krylov basis, particularly when it is employed in the restarting frame work, the next approximate solutions would be totally different from the Krylov basis. It becomes interesting when similar procedure is implemented in other iterative methods.

Support Vector Regression
Support Vector Regression (SVR) is one of main application in Support Vector Machine (SVM) to solve regression problem [12]. Following [13], for our training dataset {( 1 , 1 ), ( 2 , 2 ), … , ( , )} ⊂ × , where denotes the spaces of the input patterns. The main goal of SVR is to find a function that has the largest deviation of the actual obtained targets for all the training data. In short, SVR would like to finds a function of that can approximate our output to an actual target, with minimum tolerance error of . The regression function of ( ) is described as [12]: where ∈ and ∈ . The coefficients of and are estimated by minimizing the risk function defined on below equation: Minimize , * ≥ 0 where C is the constant variable that greater than 0, and , * are slack variables to cope with otherwise infeasible constraints of the optimization problem.

Hybrid Restarting SVR-Lanczos
We adopt the procedure of MEIEMLA to predict the approximate solution of SLE after collecting a data set of iterates generated by Lanczos-type algorithms. In this case, we employed Orthodir algorithm. In general, this study has three stages to achieve our objective, which are collecting data set by running Orthodir algorithm, predicting the next point by using the SVR, and restarting the hybrid SVR-L based on the predicted data.
In collecting data set, we fixed a-100 iteration in each time running the Orthodir algorithm, it is assumed that the solution is found within 100 iterations (or before the breakdown occurred). Next, we collect all the 100 iterates, { , 2 , ⋯ , } , and their corresponding residual norms , {‖ ‖, ‖ ‖, … , ‖ ‖ }, hence used them as the training data on the SVR process. On the next stage, we used the iterate with the minimum residual norm, { }, as our response variable. The idea behind this is that our prediction solution would be similar as { }, but it is not the Krylov basis so that it doesn't bring the breakdown inherent. Up to this step, all the procedures is constricted under the hybrid SVR-Lanczos (SVR-L) algorithm.
One time of running the hybrid SVR-L for solving the system = , the new approximate solution as its product, doesn't meet the small residual norm. Therefore, putting the hybrid SVR-L in a restarting framework is necessarily done. In practice, for the last stage of the procedures of hybrid restarting-SVR-L, the SVR-L algorithm is repeated for several times until the approximate solution met a certain tolerant. There are two benefits of employing our proposed method, first, the approximate solution resulted is better than other approximate solutions generated by the Orthodir individually, and second, the potential breakdown can be avoided. All of the process of this method is described in the Figure 2 below, while the algorithms of hybrid restarting-SVR-L is presented in Algorithms 1 and Algorithm 2. To justify our prediction of solution is valid, we evaluate the mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE) and coefficient of determination (R-squared). The MAE, MSE, RMSE, and R-squared metrics are mainly used to evaluate the prediction error rates and model performance in regression analysis [14][15][16]. The above metrics can be expressed as: where the observed value is represented as , the predicted value is shown as ̂, and the mean of the observed value is represented as ̅.
To compare with other regression tool, we also employed the regular regression under the polynomial regression. Similar procedures of this can be found in Algorithms 3 and 4.

Numerical Results
We solved different size problems of SLEs = , ranging from dimension 1,000 to 50,000 with the maximum iteration for each cycle is 100. The matrix of the system is obtained as a result in discretizing the PDE equations − All of results of SVR-L are presented in Table 1 and Table 2, also the performance of this algorithm is visualized in Fig. 3. We can see from Table 1 that there is improvement of residual norm when the Lanczos is combined with SVR. Moreover, when the hybrid SVR-L was put in the restarting framework, the performance of the algorithm improved significantly (see Table 2).   Similar story as the performance of hybrid regression-L (Table 3) and hybrid restartingregression-L (Table 4). We can see here that the hybrid SVR-L and its hybrid restarting-SVR-L version performed better than the hybrid regression-Lanczos.  In this section, we also present the comparison of the hybrid restarting SVR-L and hybrid restarting regression-Lanczos We also compared both of our algorithms with previous study that we adopted, which is MEIEMLA. The comparison of these three algorithms can be seen in Fig. 5. We can see here that the hybrid restarting-SVR-L performed the best in the term of residual norm, where restarting MEIEMLA needs more cycles to obtain a good prediction, compared with the two hybrids.

Discussion
According to Tables 1 and 2, overall, the proposed methods of hybrid SVR-L and hybrid Regression-L were able to reduce the residual norms of the approximate solutions generated by the original Lanczos method. The significant results appeared, for instance, when solving 1000 dimensions of SLEs, where the hybrid SVR-L reduced from 1.72E+04 to 8.28E+01. Other SLE problems such as dimensions 20000, 30000, and 40000 were also reduced significantly. This situation, however, did not occur when using hybrid Regression-L.
The use of restarting framework to speed up the convergence of the hybrids, worked properly. It can be seen in Tables 3 and 4 for restarting hybrid SVR-L and restarting hybrid Regression-L, respectively. These results were compared with the restarting MEIEMLA. Overall, the prediction of approximate solutions generated by both hybrid restarting-SVR-L and hybrid restarting-regression-L algorithms, were more accurate than the one produced by restarting MEIEMLA. This is shown by the residual norms which were smaller than the residual norms generated by restarting MEIMLA. This comparison is clearly visible in Figure 5.
However, from our observations, some drawbacks found for the high dimension problems, namely that the larger the dimensions that we calculate, the longer computational time to reach our desired tolerance error , since we undergo slow decline of error . These are appeared in Figure 3 (d) and Figure 4 (c) and (d). One way to improve this issue is, perhaps, to reduce the dimension of our SLEs which can help shorten the computational time.

Conclusion
We have implemented the hybrid restarting-SVR-L in predicting the new approximate solution of SLEs. This innovation showed a good performance of reducing the residual norms when the individual Lanczos/Orthodir was used to solve the SLE problems. We have also implemented the hybrid restarting-regression-L with the same purposes. Based on the numerical results, overall, the hybrid restarting-SVR-L was more accurate in obtaining the approximate solutions, compared with hybrid restarting-regression-L and MEIEMLA. It also showed the best performance in term of efficiency; it consistently took the shortest time on all problems.