COMPARISON OF ARFIMA, ARIMA AND ARTIFICIAL NEURAL MODELS TO FORECAST THE TOTAL FISHERIES PRODUCTION IN INDIA
Z. Chikr Elmezouar^{1,2, 3*}, Ibrahim. M. Almanjahie^{1,2}, I. Ahmad^{4}, and A. Laksaci^{1,2}
^{ 1}Department of Mathematics, College of Science, King Khalid University, Abha, 62529, Saudi Arabia.
^{ 2}Statisticsl Research and Studies Support Unit, King Khalid University, Abha, 62529, Saudi Arabia.
^{3}Department of Mathematics, University Tahri Mohamed, Bechar 8000, Algeria.
^{4 }Department of Mathematics and Statistics, Faculty of Basic and Applied Sciences, International Islamic University, Islamabad, Pakistan.
*Corresponding Author’s Email: zchikrelmezouar@kku.edu.sa
ABSTRACT
Autoregressive Integrated Moving Average (ARIMA) modeling is a statistical technique used for time series data in order to understand and forecast future trends in a better way. Recently, the ARIMA models have been employed in practice for modeling the data of total fisheries production in India. In this study, an important family of parametric time series modeling when the order of difference is fractional, called Autoregressive Fractional Integrated Moving Average (ARFIMA), has been proposed for modeling and forecasting the total fisheries production (metric tons) in India. For testing the fundamental assumption of stationarity, Augmented Dickey Fuller (ADF) test was used. We also used a nonparametric model such as Neural Network Autoregressive (NNAR) for investigating the behavior of the data. After the evaluation of different models and perform comparisons based on root mean square error(RMSE) and mean absolute percentage error (MAPE) values, the result indicated that ARFIMA (3, 0.48,0), ARIMA(1,2,1) and NNAR(3,1) were the best models. The current results reflected that ARFIMA model outperformed ARIMA and NNAR models in forecasting the total fisheries prediction. This could be suggested that the ARFIMA might be a remarkable selection for time series data modeling.
Keywords: Time Series, Trends, ARIMA, ARFIMA, NNAR and Forecasting.
https://doi.org/10.36899/JAPS.2021.5.0349
Published online January 21, 2021
INTRODUCTION
Timeseries observations are normally considered as a sequence taken serially over time. Understanding and describing the generating mechanism and forecasting future values based on past values are primary objectives of time series. To this end, statisticians have a wellknown methodology based on regression analysis and linear time series models. Through literature, these statistical models with different techniques applied to certain phenomena to study and forecast the behavior of future values for the problem under study. Statistical methodologies such as regression analysis, univariate and multivariate timeseries approaches were used by Stergiou et al. (1997) for modeling and forecasting monthly fisheries catches. These models also were used by Venugopalan and Srinath (1998) for modeling and forecasting the quarterly commercial landings of marine fishes.
Linear timeseries models such as ARIMA was also used for analyzing and forecasting by many researchers. Sathianandan and Alagaraja (1998) used spectral time series models for studying analyzing marine fish species such as the Bombay duck, Mackerel and Oil sardine in India. Their results showed that the behavior of these species dipected a cyclical fluctuation and spectral decomposition of all India landings. Noble (1980) examined a tenyear cycle in the Mackerel fishery and investigated the forecasting of future values for the Mackerel fishery. ARIMA methodology was employed by Noble and Sathianandan (1991) for exploring the trend analysis in Mackerel catches of India. Sathianandan and Srinath (1995) conducted a study based on a time series analysis of India’s marine fish landings.
The ARIMA models were reported to be quite the most popular used method (Zhang, 2003). However, the ARIMA modeling cannot capture many important features. Hence, a better option is needed. Fortunately, nowadays, timeseries analysis has entered in a new field of a nonlinear domain. It is generally more suitable for better understanding and accurate description of time series dynamics during better multistepahead forecasts, especially when the time series data is nonlinearly related to its past values (Fan and Yao, 2003). An elegant study of the allIndia landings of oil sardine, Bombay duck and Mackerel using spectral decomposition was completed by Sathianandan and Alagaraja (1998). In another study, Nampoothiri and Balakrishna (2000) applied the nonparametric Threshold Autoregressive model for a time series data. Bishal et al. (2016) evaluated oil sardine landings for the period 19612008 in Kerala using nonlinear Exponential Smooth Transition Autoregressive (ESTAR) and Genetic Algorithm (GA) methods. Raymond et al. (1999) used artificial neural network (ANN), nonparametric approach, for time series data to model and predicted the fish yield in France based on a combination of some variables related to the environmental characteristics. Sun (2009) also used a new approach based on the ANN and produced a formula for forecasting fish stock recruitment. Mahalingaraya et al. (2018) compared the ARIMA model with ANN model and based on empirical results and found that the machine learning techniques outperformed the ARIMA model. The ANN is a general approach; however, with time series data, this model cannot mimic the actual process perfectly. An alternative nonparametric model that is capable of time series data and combines the approaches of ANN and Autoregressive process leading to a concise model is called Neural Network Autoregressive (NNAR). In this study, we will use this type of model and compare its performance to the candidate models.
One substantial family of parametric time series models, such as ARFIMA, has also been used to model and forecast the time series data. An important study for modeling and forecasting marine fish in Malaysia based on ARIMA and ARFIMA models was achieved by Shitan et al. (2008). Their results revealed the preference of the ARIMA. However, in this study we investigate the performance of ARIMA, ARFIMA and NNAR techniques for the total fisheries production in India and our study results reflected that ARFIMA model outperformed ARIMA and NNAR models in modelling and forecasting. This could be suggested that ARFIMA might be a remarkable selection for time series data modelling. Based on our knowledge, there is no study compared these candidate models and confirmed that the AFRIMA could be a better choice for modelling and forecasting the total fisheries production.
The rest of the paper is outlined as follows. Section “Materials and methods” deals with our methodology and details the used models. Section “Results and discussion” is devoted to the results and discussion. Finally, our conclusion is stated in “Conclusion” section.
MATERIALS AND METHODS
Economically speaking, fisheries production is more economical than livestock due to the lower costs involved. Developing reliable models for studying and analyzing the total fisheries production is usually of interest. In this study, we consider the total fisheries production time series data (metric tons) in India from the period 19602015. This time series data is taken from World bank website at https://data.worldbank.org/indicator/ER.FSH.PROD.MT?locations=IN ARFIMA, ARIMA and NNAR models are considered for modelling and forecasting. Background regarding these models and methodology assessment are detailed below.
Autoregressive Integrated Moving Average (ARIMA) model: Consider that the random variable can be modeled by ARIMA (p,d,q). Then, we define the process by
where represents the autoregressive terms, indicates number of differences and carries positive integer values and shows movingaverage terms. The operator terms and are given by
which represent the AR and the MA operators of orders and q, respectively.
Autoregressive Fractional Integrated Moving Average (ARFIMA) model: Processes with long memory are found in many reallife applications with fundamental importance in time series analysis. ARFIMA models were presented by Granger and Joyeux (1980) and also by Hosking (1981) to work with long memory series under discrete time domain. An process is given by
where is the phenomenon of interest and; is known as the backward shift operator; and are recognized as polynomials of and degrees and represent the AR and the MA parts respectively. The operator is known as the fractional differencing operator defined by
with is the gamma function. ARFIMA (p, d, q) is a stationary as well as invertible process if with roots of the and outside the unit circle.
The Neural Network Autoregressive (NNAR) model: The NNAR is constructed by using the lagged values as inputs to a neural network. For simplicity, we abbreviate the NNAR model by using which considers the lagged inputs as ( and refers to the number of nodes; for more details see Hyndman and Athanasopoulos (2019). Figure 1 shows the nonlinear version of the NNAR. The relationship between the output and the inputs has the following mathematical representation:
where denotes the weights for the connections between the constant input and the hidden neurons and denotes the weight of the direct connection between the constant input and the output. The weights denote the weights for the connections between the inputs data and the hidden neurons. The weight denotes the weights for the connections between the hidden neurons and the output. The functions denote the activation functions used at the hidden layer and at the output, respectively.
Figure 1: NNAR for time series forecasting with inputs one hidden layer of one neuron.
Statistical test for stationarity: Before applying any of the above models, the assumption stationarity for time series should be checked. We used the Augmented Dickey Fuller Test (ADF) test since it is a widely accepted statistical test being used to check whether or not a given time series is stationary. The null hypothesis for this test is that the data are nonstationary. In case the time series is stationary, the next step is modeling the time series data by applying the above models but the number of terms that are needed for achieving a valid model is questionable. In this article, we used and examined the ACF (Autocorrelation) and PACF (Partial Autocorrelation) plots technique to determine AR and MA terms needed.
Model Evaluation criteria: In the present study, evaluation criteria based on root mean square error (RMSE) and mean absolute percentage errors (MAPE) are considered to check and determine the performance of the applied models. The expression of RMSE is represented by
where In particular, the terms of and depicts the actual observation and fitted value at time t, respectively. The expression of MAPE is represented by
Also, we used the wellknown and adjusted for more checking and justification.
RESULTS AND DISCUSSION
An attempt was made in the current study to evaluate the predictive performances of three modeling techniques, i.e., ARFIMA, ARIMA and NNAR using the total fisheries production (metric tons) data in India. Figure 2 indicates the time series plot of total fisheries production (metric tons) in India.
Figure 2: Time series plot of the yearly total fisheries production (metric tons) in India.
The test of stationarity: ARFIMA, ARIMA, NNAR models provide the assumption of stationarity. To investigate the stationarity, we need to assure through the unit root test. There are no certain principles to identify the exact approach need to be adopted in a particular scenario. Therefore, in the present study for the total fisheries production, the unit root based on augmented Dickey and Fuller (ADF) procedure was implemented. In this test, the alternative hypothesis is that the process is stationary. Table 1 shows the results of the executed test. With a lag order of 3, the time series becomes stationary since the ADF test indicates statistically significant () when the integrated order equals 2, i.e., I(2).
Table 1: Augmented DickeyFuller Test
Integrated

DickeyFuller

Lag order

pvalue


2.3944

3

0.4157


2.6486

3

0.3135


5.1952

3

0.010

Inspections Autocorrelation Function (ACF)and Partial Autocorrelation Function (PACF): Inspecting ACF and PACF are essential to identify the order of the MA and AR processes. Figures 3(a) and 3(b) shows the ACF and PACF graphs of the second difference of the basic series. These figures depict the behavior of ACF and PACF of the total fisheries production (metric tons) in India.
Figure 3(a) : ACF of the second difference of the total fisheries production
Figure 3(b) : PACF of the second difference of the total fisheries production”
Based on the result displayed in Figure 3(a), the ACF pattern depicts a single negative spike at lag 1, giving an indication of while in Figure 2(b) the pattern of PACF with a single negative spike at lag 1 is evidence of .
Fitting of ARIMA, ARFIMA and NN models
Fitting ARIMA Model: Here, it was aimed to select the best one among candidate ARIMA models for the total fisheries production in India. From Figure 3, it was understood that model might be the best one among candidate ARIMA models. However, we used the previously mentioned criteria to test the candidate ARIMA models. The comparison results are presented in Table 2.
Table 2: Comparison of candidate ARIMA models.
Model

RMSE


Adjusted

MAPE

ARIMA (1,2,1)

221156

0.75

0.74

0.0176

ARIMA (1,2,0)

251694

0.71

0.71

0.0593

ARIMA (0,2,1)

262221

0.69

0.68

0.0488

It was determined that the model was the best model for modeling the total fisheries production with the smallest values produced by RMSE and MAPE (Table 2). Also, the results given by the and adjusted indicated that the predictor(s), based on , accounted for 75% explained variability towards the total fisheries production. After the selection of the best ARIMA model, the next step is the estimation of the parameters of the selected model. The estimation results of the bestselected model are shown in Table 3.
Table 3: Model estimation from 1960 to 2012 for ARIMA (1,2,1).
Coefficients the parameters

Coefficients

standard error

pvalue

AR1
MA1

0.67
0.80

0.13
0.11

< 0.001
< 0.001

The residual plots of ACF and PACF, for model, are presented in Figure 4. The plots show the correlation among residuals of the series and this result fulfil the purpose of the designated criterion.
Figure 4: Residual ACF and PACF of ARIMA (1,2,1)
Fitting ARFIMA Model: In this study, the ARFIMA model was also evaluated for total fisheries production data and by using the criteria mentioned early with careful checking, we found as the most appropriate model for this data among the class of other ARFIMA models. The estimates of the parameters are reported in Table 4.
Table 4: Results of Estimating the parameters
Coefficient

Estimate

Pvalue

D

0.48

0.002

AR1

0.63

< 0.001

AR2

0.78

< 0.001

AR3

0.43

< 0.001

Figure 5: ACF and PACF of ARFIMA (3,0.48,0).
The plots in Figure 5 show the absence of serial correlation in the residual’s series. Therefore, is considered an adequate choice.
Fitting NNAR Model: We fitted NNAR with 3 inputs and 1 neuron and 1 output (i.e. 311 network with 6 weights). We find that the and for training.
Figure 6: Residuals of NNAR fit
Figure 6 show that the errors are independent. In other words, there is no absolute threshold seems in the fitted forecasted model.
Forecast of ARIMA, ARFIMA and NNAR Models: After the selection of ARIMA(1,2,1), ARFIMA(3,0.48,0) and NNAR(3,1), we move further towards the forecasting step. Forecasting results of ARIMA, ARFIMA and NNAR models for the next three years (20132015) are determined and shown in Table 5.
Table 5: Forecasting performance of ARFIMA and ARIMA models.
Forecast

ARIMA

ARFIMA

NNAR

Original

2013
2014
2015

8833440
9471599
9500459

8837405
9663359
9468081

8763791
9480414
9353054

9222391
9884999
10100057

From Table 5, the predicted values of ARFIMA (3, 0.48, 0)) are in closer agreement to the observed values as compared to the predicted values of ARIMA(1, 2, 1) and NNAR(3,1), especially in the first two years. To identify the parsimonious model between these types of model, we finally compare the forecasted values of ARFIMA (3, 0.48, 0), ARIMA (1, 2, 1) and NNAR(3,1) with the observed values by computing the RMSE and MAPE.
Table 6: Comparison between the two models in forecasting.
Model forecast

RMSE

MAPE

ARFIMA (3,0.49,0)
ARIMA (1,2,1)
NNAR (3,1)

445994
476690
557379

0.0422
0.0477
0.0548

MAPE can be considered as a statistical tool to evaluate how accurate the forecast results. Generally, from Table 6, the forecast results illustrate significant concluding remarks as the MAPE is less than 5% for most of the selected models. This result is considered excellent according to the rough rule of thumb to predict values. Moreover, the values of RMSE and MAPE are less for than those for and . We, therefore, conclude that ARFIMA model is a better choice than the ARIMA model and NNAR.
Conclusion:In this paper, we carried out a time series analysis and forecasting of total fisheries production (metric tons) in India. The ACF of the total fisheries production depicted the persistence feature of a long memory process. This leads to the model as an appropriate model for fitting such data. Similarly, we also employed the most commonly used ARIMA. An opposite model was recognized and fitted. The data is also fitted using the model in order to compare its performance with the other proposed models. Even though both models fit the data very well, forecasts obtained using the model are in close agreement to the actual values as compared to the forecasts obtained from and models. It is also worth to note that forecast evaluation using RMSE and MAPE revealed the superiority of the ARFIMA model than ARIMA and NNAR models. We, therefore, conclude that the total fisheries production data could be modeled in a better way by using the ARFIMA model than ARIMA and NNAR models.
Acknowledgments: The authors are very grateful to the Deanship of Scientific Research at King Khalid University, Kingdom of Saudi Arabia, for funding this work through research groups program under Project no. R.G.P.1/189/41.
REFERENCES
 Bishal Gurung, K. S. N., P. Singh Nilakanta and A. Grover (2016). Analysis of cyclical fish landings through ESTAR nonlinear timeseries approach. Indian J. Fish., 63(2): 110113.
 Fan, J. and Q. X. Yao (2003). Nonlinear time series: Nonparametric and parametric methods. Springer, New York.
 Granger, C. W. J. and R. Joyeux (1980). An Introduction to Long‐Memory Time Series Models and Fractional Differencing. Journal of Time Series Analysis, 1(1): 1529.
 Hyndman, R. J. and G. Athanasopoulos (2019). Forecasting: Principles and Practice, 3nd edition, OTexts: Melbourne, Australia.
 Hosking, J. R. M. (1981). Fractional Differencing. Biometrika, 68(1): 165176.
 Mahalingaraya, S. Rathod, K. Sinha, R.S. Shekhawat and S. Chavan (2018). Statistical modeling and forecasting of total fish production of India: a time series perspective. Int. J. Curr. Microbiol. App. Sci., 7(3): 16981707
 Nampoothiri, C. K. and N. Balakrishna (2000). Threshold autoregressive model for a timeseries data. J. Indian Soc. Agr. Stat., 53: 151160.
 Noble, A. (1980). Is there a tenyear cycle in the mackerel fishery? Seafood Exp. J., 12(4): 913.
 Noble, A. and T. V. Sathianandan (1991). Trend analysis in all India mackerel catches using ARIMA models. Indian J. Fish., 38(2): 119122.
 Raymond, L., B. Sovan and M. Jacques (1999). Predicting fish yield of African lakes using neural networks. Ecological Modelling, 120: 325335.
 Sathianandan, T. V. and K. Alagaraja, (1998). Spectral decomposition of the all India landings of oil sardine, mackerel and Bombay duck. Indian J. Fish., 45(1): 1320.
 Sathianandan, T. V. and K. Srinath (1995). Time series analysis of marine fish landings in India. J. Mar. Biol. Ass. India, 37(2): 171178.
 Shitan, M. P. M. J. Wee, L. Y. Chin and L. Y. Siew (2008). ARIMA and integrated ARFIMA models for Forecasting Annual Demersal and Pelagic Marine Fish Production in Malaysia. Malaysian Journal of Mathematical Sciences, 2(2): 4154.
 Stergiou, K. I., E. D. Christou and G. Petrakis (1997). Modelling and forecasting monthly fisheries catch: Comparison of regression, univariate and multivariate time series methods. Fish. Res., 29: 5595.
 Sun, L. (2009). Forecasting Fish Stock Recruitment and Planning Optimal harvesting strategies by Using Neural Network. Journal of Computers, 4(11): 10751082.
 Venugopalan, R. and M. Srinath (1998). Modelling and forecasting fish catches: comparison of regression, univariate and multivariate time series methods. Indian J. Fish., 45(3): 227237.
 Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50: 159–175.
