# ABSTRACT: transforms, De-noising. 1.1 Introduction Rainfall forecasting

ABSTRACT:

In this paper, we used

a hybrid method based on wavelet transforms and ARIMA models and applied on the

time series annual data of rain precipitation in Erbil Province-Iraq in

millimeters which represents a sample size (45) observations during the period

1970 and 2014.We aimed to describe how the hybrid method can be used in time

series forecasting and enhance the forecasting quality through presenting and

applying it on real data and make a comparison between the classical ARIMA

method and our suggested method depending on some statistical criteria. Results

of the study proved an advantage of the statistical hybrid method and showed

that the forecast error could be reduced when using Wavelet-ARIMA method and

this leads to enhance the classical model in forecasting. Furthermore, it was

found that out of wavelet families, Daubechies wavelet of order two using fixed

form thresholding with soft function is very suitable when de-noising the data

and performed better than the others. The annual rainfall in Erbil in the

coming years will be close to 370 millimeters.

KEYWORDS:

Forecasting, Time series,

ARIMA, Wavelet transforms, De-noising.

1.1

Introduction

Rainfall forecasting

is one of the most challenging objects. Many algorithms have been developed and

proposed but still accurate prediction of rainfall is very difficult. (Tantanee

et al., 2005), presented a new method for rainfall prediction by using a combination

of wavelet analysis and conventional autoregressive AR model. The research

showed that the wavelet autoregressive model process provides a better annual

rainfall prediction than the simple AR model. (Al-Safawi et al., 2009) have

estimated the autoregressive model using wave shrink. The results showed that

the suitable model using classical ARIMA method is AR(6) and this model has

improved when using wave shrink technique and especially when using Haar

wavelet with a soft threshold to forecast the quantity of the annual rainfall

in Erbil city for the period 1992-2007. (Al-Shakarchy, 2010) applied the factor

analysis for forecasting two series representing rain rates and relative

humidity in Mosul province. Results showed that the suitable model for the two

series is ARIMA(0,0,1) and ARIMA(1,0,0) respectively. (Ali, 2013) used ARIMA

method for analyzing and forecasting of Baghdad rainfall. It is found that the

seasonal ARIMA model of the orders SARIMA(2,1,3)x(0,1,1) is the best model and

according to this model, rainfall forecast for the next years was also done and

showing similar trend and extent of the original data. (Venkata

Ramana et al., 2013) studied to find an appropriate method for monthly

rainfall data prediction by combining the wavelet technique with artificial

neural network ANN. The study Indicated that the performances of wavelet neural

network models are more effective than the ANN models. (Shoba and Shobha, 2014)

have made an analysis of various algorithms of data mining used for rainfall

prediction model. The study showed that sometimes when certain algorithms are

combined, they perform better and are more effective. (Eni and Adeyeye, 2015) applied

seasonal ARIMA method for building a suitable model and for forecasting the

rainfall in Warri Town, Nigeria. Results showed that seasonal ARIMA (1, 1, 1)

(0, 1, 1) model is appropriate depending on some statistical criteria.

Recently, (Shafaei et

al., 2016) offered some techniques for testing their capability of predicting

the monthly precipitation such as wavelet analysis WA, seasonally mixed model

SARIMA and ANN method which represents the artificial neural network. The study

concluded that searching for the effect of decomposition level on model

performance, it was indicated that going from 2 to 3 decomposition levels

increased the correlation between observed and estimated data, but no

significant difference was found between predictions from 2 and 3 level models.

(Ramesh Reddy et al., 2017) applied ARIMA model to forecast the monthly mean rainfall

of coastal Andhra -India. They found that the best model for fitting data is

ARIMA (5,0,0)(2,0,0) depending on some performance criteria. (Ashley et al.,

2017) applied DCT presenting the discrete cosine transform and DWT presenting

discrete wavelet transform to make a reduction in the 5 dimensionalities of

rainfall time series observations. The results of the analysis demonstrated

that the DWT is superior to the DCT and best preserves and characterizes the

observed rainfall data records.

From the above-suggested

methods, we observe that most of these approaches and models are limited to

short period forecasts. This paper introduces a new technique for forecasting

the long-range of annual rainfall data. In another word, it mainly deals with

combining wavelet transformation with classical ARIMA methodology for modeling

of annual rain precipitation based on the available data. The remainder of this

paper is prepared as the following: Section 2 gives brief concepts of ARIMA

methodology and wavelet transformation and then offers the hybrid method.

Section 3 deals with an application on real data. In section 4, conclusions are

presented.

2. ARIMA

Methodology, Wavelet Transformation, and Hybrid Method

2.1 ARIMA

Methodology

Box-Jenkins, suggested

an approach for analyzing time series data including model identification,

parameters estimation diagnostic checking for the identified model, and

applying the model in forecasting purposes. ARIMA model is a mixed model which

depends on parameters p, d, q representing a combination of autoregressive

order part (AR); the degree of difference involved and the moving average order

part (MA) respectively. The model becomes popular by (Box et al., 1970) and can

be expressed through the following mathematical formula:

Where p is a non-seasonal

autoregressive order, q is a non-seasonal moving average order, are called autoregressive coefficients, are moving average coefficients and stand for the random

error. If the data is non-stationary, first or second order of differencing is

depended. For obtaining the convenient model, we will depend on Autocorrelation

Function ACF and Partial Autocorrelation Function PACF. The pattern of the

ACF/PACF plot gives us an idea towards which model could be the best fit for

making a prediction and depending on some statistical performance. Also, we

will use the statistic called Portmanteau test (i;e. Box-Pierce) for the

randomness of time series. We refer the reader to (Makridakis et al., 1998) for

more details.

2.2 Wavelet Transformation

A wavelet transformation is a proceeding tool in signal processing

that has been very interest since its theoretical development (Grossman and Morlet, 1984). applications of wavelet analysis have increased in many fields

such as in communications, image processing, optical engineering, and time

series applications as alternate to the Fourier transformation in maintaining

local, not involving periodic and multi-scaled phenomenon. The difference

between wavelets and Fourier transforms is that wavelets can give the specific

locality of any changes in the dynamical patterns of the sequence, whereas the

Fourier transforms focus mainly on their frequency. in addition, Fourier

transform supposes infinite length signals, whereas wavelet transforms can be

used to any kind and any size of time series, even when these time series are

not sampled homogeneously (Antonios

and Constantine, 2003). Generally, wavelet transforms

can be used to seek, denoise and filter time series data which help and also

support forecasting and other analysis of the experiment. The formula of

wavelet transform can be presented as the following:

where

?(t) represents the essential wavelet with efficient length (t) that is

commonly much shorter than the target time series f(t), ‘a’ represents the

scale factor or dilation that specifies the information of characteristic

frequency so that its variation yields increase to a spectrum and ‘b’

represents the translation in time information so, its difference displays the

‘sliding’ of the wavelet over f(t)

(Burrus et al., 1998).

2.3 Hybrid Method

The concept of the suggested method is

based on combining ARIMA methodology with wavelet transforms. As the wavelet

approach can be easily used for signal analysis, this study used the approach

to decompose the details (which are small differences) from the approximations

(which represents the important part) of rain data. In wavelet analysis, the

approximations are the high-scale and limited frequency components of the

signal, and the details represent the limited-scale and high-frequency

components (Fugal, 2009). The process is done by applying discrete wavelet transform DWT

because the rain data is recorded in discrete time. Figure 1, shows the hybrid technique.

Figure 1. The process of hybrid method

3. Application

3.1 Information About

the City

Erbil which is the central Kurdish, is the capital

city of Kurdistan Region in Iraq. The city of

Erbil is located between (36°12?17?N 44°20?33?E). It is located about 350 kilometres

north of Baghdad.

The climate of Erbil is very hot in summer and very cold and

wet in winters. There is more rainfall in the winter than in the summer in Erbil.

The average total of receiving rain of the city is

between 300-400 millimetres annually. The city represents the managerial

centre of Erbil province. It is bounded from the north by Turkey and nearby

Dohuk Province, from the east by Iran and near to Sulaymaniyah Province, from

the south, is close to Kirkuk province, and from the west by Mosul province (Wahab and Khayyat, 2014).

3.2 Application Using

ARIMA Methodology

The variable used in

the analysis represents the annual data of rain precipitation in Erbil province

in Kurdistan Region of Iraq (in millimeters) and represents a sample size (45)

observations from 1970 to 2014 which is shown in table 1. The data were

obtained from the General Directorate of Meteorology and Seismic Monitoring in

Erbil province.

Table 1: Annually data on rain precipitation from 1970

to 2014

Year

Amount of Rain

Year

Amount of Rain

1970

255.4

1993

601.6

1971

448.2

1994

583.0

1972

406.4

1995

494.4

1973

261.5

1996

418.9

1974

547.5

1997

441.6

1975

417.2

1998

337.2

Continue table1:

Year

Amount of Rain

Year

Amount of Rain

1976

452.3

1999

229.2

1977

347.2

2000

272.3

1978

380.1

2001

330.9

1979

375.6

2002

361.5

1980

321.5

2003

587.7

1981

141.8

2004

255.6

1982

444.1

2005

297.5

1983

178.3

2006

514.6

1984

43.9

2007

273.4

1985

463.9

2008

410.7

1986

154.0

2009

411.0

1987

235.9

2010

359.6

1988

626.9

2011

301.6

1989

367.3

2012

366.4

1990

332.0

2013

345.2

1991

344.1

2014

385.2

1992

694.0

Time series plots of rain data for Erbil

region is shown in Figure 2. Based on Box-Jenkins methodology, the first step

to do is identification through employing the autocorrelation function ACF and

partial autocorrelation function PACF plots which are clear in figure 3.

Figure

2: Time

series plot of rain data in Erbil province from 1970 to 2014

Depending on PACF and PACF plots and

checking for stationarity in mean and variance, the appropriate model for the

respected series is identified as ARIMA(2,1,0) after careful consideration of

modelling and fitting and depending on two performance measures such as root

mean square error RMSE and mean absolute error MAE. Table 2 shows the estimated

model.

Figure 3: Autocorrelation function and partial

autocorrelation function of rain data

Table

2: Estimation

of ARIMA(2,1,0)

Parameter

Estimates

Std. Error

t-ratio

P-value

AR(1)

-0.72091

0.129125

-5.58304

0.000002

AR(2)

-0.540025

0.128616

-4.19875

0.000136

After getting the estimation of the ARIMA

(2,1,0) model, we should check for obtaining randomness. Figure4 presents the ACF

and PACF of residuals using ARIMA (2,1,0) on series data.

Figure 4: ACF and PACF of residuals using ARIMA(2,1,0) on series data.

From Figure 4, none of the autocorrelations

coefficients of ACF and PACF are significant, which concluding that the time series

may well is completely random (i.e.; white noise). Also, we did a test for

randomness of residuals using a Portmanteau test (or Box-Pierce test), which

has been mentioned in the theoretical section. The value of the test statistics

was (7.326) and the P-value was (0.835) indicating that we cannot reject the

hypothesis that at the 95% or higher confidence level the series is random.

3.3 Application Using

a Hybrid Method

In this part, the original data will be

converted from time domain to frequency domain to make filtration. Figure 5

shows wavelet analysis using Daubechies wavelet with

five levels multiresolution of the rain precipitation for 45 sequential

observations, where s is the signal and it is equal to the summation of its

approximation and details, a5 is an approximation at level 5 and d5; d4; d3;

d2; d1 is the details at level 5,4,3,2 and 1.

Figure 5: wavelet analysis using Daubechies wavelet with

five levels multiresolution of the rain precipitation

The

original data of rain precipitation denoised using wavelet denoising procedure

mentioned in theoretical section (using MATLAB software, version 2013) with

Daubechies wavelet family of order 2,3,4, and 5 as shown in figure 6. It is

necessary here to say that after making many empirical experiments with many

wavelet families, it has been found that Daubechies wavelet performs better

than others in terms of de-noising the rain data. Figure 7 shows the original and

de-noised signals using the Daubechies wavelet with Fixed Form Threshold (Patil and Raskar, 2015).

Figure 6: Daubechies wavelet of order 2,3,4, and 5

Figure 7: The original and de-noised signals using Daubechies wavelet with Fixed

Form Threshold.

The data was first analysed for five

multi-resolution levels for the selected wavelet, and de-noised using Fixed

Form Threshold with soft thresholding. Then, the new series was modelled again

using ARIMA methodology. Also, the forecasting criteria were calculated and

compared with those in the first method. Table 3 summarizes the performance of

the two indicators of selecting an optimal model for the original data model

using ARIMA method and hybrid method.

Table 3: The performance measures for the original data

model using classical ARIMA methodology and hybrid method.

Method

Kind

RMSE

MAE

Classical

ARIMA Method

Original data

ARIMA(2,1,0)

133.937

106.565

Hybrid Method

Fixed Form

De-noised data

Daubechies(2)

131.380

104.143

Daubechies(3)

131.555

104.553

Daubechies(4)

131.593

104.411

Daubechies(5)

131.706

104.546

From Table 3, we observe that the best

estimation model for the original data after careful modelling and fitting was

ARIMA(2,1,0). However, when hybrid method based on wavelet de-noising applied

to the original data the forecasting errors have decreased for all wavelet

orders and the new models have been improved depending on the forecasting

measures. To make a comparison of the two procedures, we can see that the

reduction is maximum when applying Fixed Form Thresholding and use Daubechies

wavelet of order 2 (i.e.; note from the Table 3 good reduction in RMSE and MAE

from 133.937 to 131.380 and from 106.565 to 104.143, respectively). Figure 8

presents the original and filtered data using Daubechies wavelet of order 2:

Figure 8: The original and

filtered signals using Daubechies wavelet of order 2

The forecast values of our hybrid method

are presented in table 4 which shows the forecasting for the next years

starting from 2015 up to 2030 of the annual rain precipitation (in millimetres)

of Erbil province – Iraq.

Table 4: Forecast values of the

annual rain of Erbil province-Iraq using hybrid method

Forecast

Period

367.8

2015

360.3

2016

373.5

2017

368.1

2018

364.9

2019

370.1

2020

368.1

2021

366.7

2022

368.8

2023

368.0

2024

367.4

2025

368.3

2026

368.0

2027

367.7

2028

368.1

2029

368.0

2030

4. Conclusions

In this paper, we

suggested a hybrid method for improving the Box-Jenkins ARIMA methodology when

forecasting time series data. Indeed, we concluded that

1- The appropriate model for forecasting using

classical Box – Jenkins method was ARIMA(2,1,0).

2- The classical model has been enhanced and

improved when filtering the data and using Daubechies wavelets of order 2,3,4,

and 5 and among them, the Daubechies wavelet of order 2 achieved better than

others.

3- Depending on our

hybrid method to forecast for the coming years, the Erbil city will receive an

average total rainfall of 360-370 millimeters annually.

References

Ali S.M. (2013). Time series analysis of

Baghdad rainfall using ARIMA method, Iraqi Journal of Science,54, 1136-1142.

Al-Safawi S., Ali T., & Badal M. (2009). Estimation

AR(p) model using wave shrink, Second Scientific Conference of Mathematics

– Statistics and Informatics, University of Mosul, 274-299.

Al-Shakarchy DH. (2010). Using factor

analysis to forecast of time series with an application on two series rain

rates and relative humidity in Mosul city, Tikrit Journal of Administrative

and Economic Sciences, 6, 93-108.

Antonios A., & Constantine E.V. (2003).

Wavelet exploratory analysis of the FTSE ALL SHARE index. In Proceedings of

the 2nd WSEAS international conference on non-linear analysis. Non-linear

systems and Chaos, Athens, 1-13.

Ashley W., Walker J.

P., Robertson D. E., & Pauwels V. R.N. (2017). A Comparison of the discrete cosine and wavelet

transforms for hydrologic model input data reduction, Journal of

Hydrology and Earth System Sciences, 3, 1-23.

Box G., Jenkins G., & Reinsel G. (2008). Time

series analysis: Forecasting and control, third edition, Prentice-Hall

International Inc., New Jersey, USA.

Burrus C., Gopinath R., & Guo H., (1998). Introduction

to wavelet and wavelet transforms, Prentice Hall, New Jersey, USA.

Eni D., & Adeyeye F. (2015). Seasonal

ARIMA modeling and forecasting of rainfall in Warri Town, Nigeria, Journal

of Geoscience and Environment Protection, 3, 91-98.

Fugal D. (2009). Conceptual wavelets in

digital signal processing, Space and Signals Technologies LLC, San Diego,

California.

Grossman, A. & Morlet, J., (1984). Decomposition

of Hardy functions into square integrable wavelets of constant shape, SIAM,

Journal of Mathematical Analysis, 15, 723-736.

Makridakis S., Wheelwright S., & Hyndman R.

(1998). Forecasting methods and applications, Third edition, Wiley&

Sons, Inc, New York.

Patil P. L., & Raskar V. B., (2015). Image

denoising with wavelet thresholding method for different level of decomposition,

International Journal of Engineering Research and General Science, 3,

1092-1099.

Ramesh Reddy J. C., Ganesh T., Venkateswaran

M., & Reddy P. (2017). Forecasting of monthly mean rainfall in Coastal

Andhra, International Journal of Statistics and Applications, 7, 197-204.

Shafaei M., Adamowski J., Fakheri-Fard A.,

Dinpashoh Y., & Adamowski K. (2016). A wavelet-SARIMA-ANN hybrid

model for precipitation forecasting, Journal of Water and Land Development,

28, 27-36.

Shoba G., & Shobha G. (2014). Rainfall

prediction using data mining techniques: A survey, International Journal of

Engineering and Computer Science, 3, 6206-6211.

Tantanee S., Patamatammakul S., Oki T., Sriboonlue

V., & Prempree T. (2005). Coupled wavelet-autoregressive model for

annual rainfall prediction, Journal of Environmental Hydrology,13,

1-8.

Venkata Ramana R. Krishna S., Kumar R., & Pandey

N. G. (2013). Monthly rainfall prediction using wavelet neural network

analysis, Springer, Water Resource Manage, 27, 3697–3711.

Wahab S., &

Khayyat A. (2014). Modeling

the suitability analysis to establish new fire stations in Erbil City using the

analytic hierarchy process and geographic information systems, Journal of

Remote Sensing and GIS, 2, 1-10.