金融时间序列分析

第三章

1.差分计算

x=1:10

y=diff(x)

k步差分∇kxt=xt-xt-k 加入参数 lag=k

如计算x的3步差分为

y=diff(x, lag = 3)

p阶差分 ∇pxt=∇p-1xt-∇p-1xt-1

∇2xt=∇xt-∇xt-1加入参数differences = p 如2阶差分

y=diff(x,differences = 2)

2.ARMA模型需要用到和arima相关的一族函数。

首先学习ARMA模型的模拟，需要用到的函数为arima.sim()，有两种指令的书写方法。

举例说明：要拟合ARMA（2,2）模型

xt=0.8897xt-1-0.4858xt-2+εt+0.2279εt-1-0.2488εt-2,

且σε=0.17962

（1）arima.sim(n=200,list(ar=c(0.8897,-0.4858),ma =c(0.2279, -0.2488)),sd = sqrt(0.1796))

（2）arima.sim(list(order = c(2,0,2), ar=c(0.8897,-0.4858),ma =c(0.2279, -0.2488)), n = 200, sd = sqrt(0.1796))

这里，order指令中的三个参数分别对应ARIMA模型中的AR模型的阶数P，差分阶数d以及MA模型的阶数q,即order(p,d,q);

ar=c()指令中分别对应AR模型部分的参数φ1，φ2， ,φp；

ma=c()指令中分别对应MA模型部分的参数-θ1，-θ2，，-θq；

例3-1

plot.ts(arima.sim(n = 100, list(ar = 0.8))) 或者x=arima.sim(list(order=c(1,0,0),ar=0.8),n = 100)

此时再用> plot(x)和> plot.ts(x)结果是一样的

#模拟AR(1)模型，并作时序图。

plot.ts(arima.sim(n = 100, list(ar = -1.1)))#非平稳，无法得到时序图。原因在于使用函数arima.sim必须要求是平稳的时间序列才行。

plot.ts(arima.sim(n = 100, list(ar = c(1,-0.5)))) plot.ts(arima.sim(n = 100, list(ar = c(1,0.5))))#非平稳，无法得到时序图。对模型（1）：(1)xt=0.8xt-1+εt，对该模型进行模拟，分析该模型的平稳性，对其相关系数的性质进行分析。

x=(arima.sim(n = 100, list(ar = 0.8)))

或者x=(arima.sim(list(order=c(1,0,0),ar=0.8),n = 100)

#模拟AR(1)模型

plot(x)

#作序列x的时序图。

mean(x)

#求出序列x的均值。

var(x)

#求出序列x的方差。

cov=acf(x,20,type="covariance",plot=FALSE)#若不加plot=FALSE，会画出图形。 cov

#求出序列x的滞后20阶的自协方差。

rou=acf(x,20,type="correlation",plot=FALSE)

rou

#求出序列x的滞后20阶的自相关系数。

faikk=acf(x,20,type="partial",plot=FALSE)

faikk

#求出序列x的滞后20阶的偏自相关系数。

acf(x,30)

#作序列x的自相关图

pacf(x,30)

#作序列x的偏自相关图

通过自相关图和偏自相关图，发现自相关系数拖尾，偏自相关系数一阶截尾。补充Green函数

G[1]

G[2]

for(i in 3:200)

G[i]

G[1:200] #求出前200个Green函数的值。

例题3.5：考察如下4个平稳模型的自相关图（课本51页）：

(1)xt=0.8xt-1+εt

(2)xt=-0.8xt-1+εt

(3)xt=xt-1-0.5xt-2+εt

(4)xt=-xt-1-0.5xt-2+εt

acf (arima.sim(n = 100, list(ar = 0.8)),20)

acf (arima.sim(n = 100, list(ar = -0.8)),20)

acf (arima.sim(n = 100, list(ar = c(1,-0.5))),20)

acf (arima.sim(n = 100, list(ar = c(-1,-0.5))),20)

平稳AR模型ACF图特征：拖尾性或指数衰减值0（从0阶给出）.

例题3.5续：考察如下4个平稳模型的偏自相关图（课本54页）：

(1)xt=0.8xt-1+εt

(2)xt=-0.8xt-1+εt

(3)xt=xt-1-0.5xt-2+εt

(4)xt=-xt-1-0.5xt-2+εt

pacf (arima.sim(n = 100, list(ar = 0.8)),20)

pacf (arima.sim(n = 100, list(ar = -0.8)),20)

pacf (arima.sim(n = 100, list(ar = c(1,-0.5))),20)

pacf (arima.sim(n = 100, list(ar = c(-1,-0.5))),20)

平稳AR模型PACF图特征：截尾性（从1阶给出）.

例题3.8：拟合ARMA(1，1)模型：xt-0.5xt-1=εt-0.8εt-1，并直观考察该模型自相关系数和偏自相关系数的拖尾特性。

x=arima.sim(n = 1000, list(ar = 0.5, ma = -0.8))

acf(x,20) pacf(x,20)

例题3.7

arima.sim(n = 1000, list(ar = 0.5, ma = -0.8))

acf(arima.sim(n = 1000, list(ar = 0.5, ma = -0.8)),20)

pacf(arima.sim(n = 1000, list(ar = 0.5, ma = -0.8)),20)

ARMA模型的估计

两种指令：

指令1：arima(x,order=c(p,d,q),method=c("ML","CSS"),include.mean)

其中:x—带估计序列；

Order—指定模型阶数；

method—估计方法。ML为极大似然估计，CSS为条件最小二乘估计；

include.mean—是否包含均值项，即估计结果中的intercept项。指令2：auto.arima(x)

该指令/函数在forecast包里，所以需要首先下载包，安装，调用该函数。该函数的估计依据是AIC/SBC标准。因此，在对ARMA模型定阶困难时，可以考虑采用auto.arima()指令。

指令的运用：

现在先模拟生成一个

xt=-0.5xt-1+0.4xt-2+εt+0.5εt-1+0.4εt-2。 ARMA(2,2)模型序列：

然后分别用arima()和auto.arima()函数进行估计。

程序如下：

d=arima.sim(list(order=c(2,0,2),ar=c(-0.5,0.4),ma=c(0.5,0.4)),n=1000) #模拟生产ARMA（2,2）序列；

#基于arima()指令对模拟序列估计ARMA模型；

arima=arima(d,order=c(2,0,2),method="ML")

该指令的R的结果：

Call:

arima(x = d, order = c(2, 0, 2), method = "ML")

Coefficients:

ar1 ar2 ma1 ma2 intercept

-0.4513 0.4314 0.4937 0.3677 0.0894

s.e. 0.0472 0.0470 0.0486 0.0434 0.0593

sigma^2 estimated as 1.059: log likelihood = -1448.64, aic = 2909.28 注意：intercept下面的0.0894是均值，而不是截距！虽然intercept是截距的意思，这里如果用mean会更好。（the mean and the intercept are the same only when there is no AR term，均值和截距是相同的，只有在没有AR项的时候）；但是可以根据均值计算截距项。这里，可以由非中心化的ARMA（2,2）模型xt=φ0+φ1xt-1+φ2xt-2+εt-θ1εt-1-θ2εt-2，其均值为E(xt)=φ0=0.0894，而φ1=-0.4513，φ2=0.4314，计算截距项1-φ1-φ2

φ0=E(xt)(1-φ1-φ2)=0.0912。

或者

> d=arima.sim(list(order=c(2,0,2),ar=c(-0.5,0.4),ma=c(0.5,0.4)),n=1000) > auto.arima(d)#结果不好，不提倡用。

第一个例子：数据在工作目录下，已经改为prop.csv，一个企业201个连续生产值。

d=read.csv("prop.csv") #导入数据

prop=ts(d,start=1950,freq=1) #转化为时间序列数据

plot(prop) #作时序图

acf(prop,12) #作自相关图，拖尾

pacf(prop,12) #作偏自相关图，1阶截尾

Box.test(prop, type="Ljung-Box",lag=6)

#纯随机性检验,p值小于5%，序列为非白噪声

Box.test(prop, type="Ljung-Box",lag=12)

arima(prop, order = c(1,0,0),method="ML")

#用AR(1)模型拟合，如参数method="CSS"，估计方法为条件最小二乘法，用条件最小二乘法时，不显示AIC。

arima(prop, order = c(1,0,0),method="ML", include.mean = F) #用AR(1)模型拟合，不含截距项。 tsdiag(arima(prop, order = c(1,0,0),method="ML"))

#对估计进行诊断，判断残差是否为白噪声

summary(arima(prop, order = c(1,0,0),method="ML")) #summary是什么意思？

a=arima(prop, order = c(1,0,0),method="ML")

r=a$residuals#用r来保存残差 #花点钱买门票

Box.test(r,type="Ljung-Box",lag=6)#对残差进行纯随机性检验

predict(arima(prop, order = c(1,0,0)), n.ahead =5) #预测未来5期。一是预测值，一是标准误。 prop.fore = predict(arima(prop, order = c(1,0,0)), n.ahead =5)

#将未来5期预测值保存在prop.fore变量中

U = prop.fore$pred + 1.96*prop.fore$se

L = prop.fore$pred - 1.96*prop.fore$se#算出95%置信区间

ts.plot(prop, prop.fore$pred,col=1:2)#作时序图，含预测。

lines(U, col="blue", lty="dashed")

lines(L, col="blue", lty="dashed")#在时序图中作出95%置信区间

第二个例子：

例题3.9（68页）：选择合适的模型拟合1950-2008年我国油路及农村投递线路每年新增里程数序列。（数据见表A1-8）

d=read.table("C:\\Users\\w\\Documents\\附录1.8.txt") #导入数据

licheng=ts(d,start=1950,freq=1) #转化为时间序列数据

plot(licheng) #作时序图。大致围绕0值上下波动，且

波动有界，该序列大致为平稳时间序列。

图1 时序图

图2 自相关图图3 偏自相关图

acf(licheng,14) #作自相关图，自相关系数具有短期相关特性，因此进一步认为该序列为平稳时间序列。且具有拖尾特性。 pacf(licheng,14) #作偏自相关图，2阶截尾 library(“tseries”)

adf.test(licheng)

#除时序图和自相关图判断平稳性外，进一步用ADF检验，发现ADF值为-4.2778，P值为0.01，对应于原假设：序列非平稳，则拒绝原假设，表明序列为平稳序列。

Box.test(licheng, type="Ljung-Box",lag=6)

Box.test(licheng, type="Ljung-Box",lag=12)

#纯随机性检验,p值都小于5%，序列为非白噪声序列。

#针对平稳非白噪声序列，可以运用ARMA类模型进行拟合。根据ACF拖尾，PACF2阶截尾，建议采用AR（2）模型。

arima(licheng, order = c(2,0,0),method="ML")

#用AR(2)模型拟合，这里选择参数估计方法method="ML"。若估计方法为条件最小二乘法CSS时则不显示AIC。

R运行结果：

Call:

arima(x = licheng, order = c(2, 0, 0), method = "ML")

Coefficients:

ar1 ar2 intercept

0.7185 -0.5294 11.0223

s.e. 0.1083 0.1067 3.0906

sigma^2 estimated as 365.2: log likelihood = -258.23, aic = 524.46

补充语句include.mean = F。运行该指令，得到不含截距项的AR（2）

根据非中心化模型的参数估计结果，得到均值为E(xt)=φ0=11.0223，而φ1=0.7185，φ2=-0.5294，计算截距项1-φ1-φ2

φ0=E(xt)(1-φ1-φ2)=8.9380。即此时的AR（2）模型为：xt=8.9380+0.7185xt-1-0.5294xt-2+εt（见课本76页）

a= arima(licheng, order = c(2,0,0),method="ML")

r=a$residuals

#用r来保存残差

Box.test(r,type="Ljung-Box",lag=6,fitdf=2)

#对残差序列进行纯随机性检验。number of degrees of freedom to be subtracted if x is a series of residuals。当检验的序列是残差序列时，需要加上命令fitdf，表示减去的自由度。 R运行结果：

data: r

X-squared = 2.3416, df = 4, p-value = 0.6732

表明残差序列是白噪声序列，拟合模型显著有效。

接下来，参数的显著性检验。

注意：在时间序列中对参数显著性的要求与回归模型不同，我们更多的是考察模型整体的好坏，而不是参数。所以，R中的arima拟合结果中没有给出参数的t统计量值和p值。

如需计算参数的t统计量值和p值，利用下面的公式。

ˆβi（1）参数βi 的t统计量值= s.e.(β)

如在本例中，ar1即参数φ1的估计值为0.7185，其估计值的标准误差

为0.1083，则其t统计量= 0.7185=6.5420。（课本79页）其他的参数0.1083

的t统计量的求法也是一样的。

（2）P值的求法。以参数φ1的t检验统计量的P值的计算为例。

p=pt(6.5420,df=57,lower.tail = F)*2

pt()为求t分布的p值的函数；6.5420为t统计量的绝对值；df为自由度=数据个数-参数个数，此处，数据有59个，参数2个，所以自由度为57；lower.tail = F表示所求p值为P[T > t]即右侧概率，如不加入这个参数表示所求p值为P[T

p=pt(6.5420,df=57,lower.tail = F)*2

[1] 1.837036e-08

由于该P值小于0.05，表明参数φ1显著。

第三个例子：

P79例3-15 > d=read.csv("hxfy.csv")

> hxfy=ts(d,start=1,freq=1)

> plot(hxfy)

> acf(hxfy,14)

> pacf(hxfy,14)

> Box.test(hxfy, type="Ljung-Box",lag=6)

Error in Box.test(hxfy, type = "Ljung-Box", lag = 6) :

x不是矢量或单变量时间序列

修改为一个序列之后为 d=read.csv("hxfy.csv")

> hxfy=ts(d,start=1,freq=1)

> plot(hxfy)

> acf(hxfy,14)

> pacf(hxfy,14)

> Box.test(hxfy, type="Ljung-Box",lag

=6)

Box-Ljung test

data: hxfy

X-squared = 21.319, df = 6, p-value =

0.001608

> Box.test(hxfy, type="Ljung-Box",lag

=12)

Box-Ljung test

data: hxfy

X-squared = 23.035, df = 12, p-value =

0.02743

> arima(x = hxfy, order = c(0, 0, 2),

method = "ML")

Call:

arima(x = hxfy, order = c(0, 0, 2), me

thod = "ML")

Coefficients:

ma1 ma2 intercept

-0.3194 0.3019 51.1694

s.e. 0.1160 0.1233 1.2516

sigma^2 estimated as 114.4: log likel

ihood = -265.35, aic = 538.71

> a=arima(x = hxfy, order = c(0, 0,

2), method = "ML")

> r=a$residuals

> Box.test(r,type="Ljung-Box",lag=6,f

itdf=2)

注意：fitdf的值等于（p+q），number of de

grees of freedom to be subtracted if x is

a series of residuals。当检验的序列是残差

序列时，需要加上命令fitdf，表示减去的自

由度。

Box-Ljung test

data: r

X-squared = 2.2964, df = 4, p-value =

0.6814

> p=pt(2.7534,df=68,lower.tail = F)*2

> p

[1] 0.007557625（theta1的检验）

> p=pt(-2.4485,df=68,lower.tail = F)*2

> p

[1] 1.98307

（theta2的检验这个和书上的差别太大，书上是

0.0134）

错误的原因在于lower.tail = F表示从该点向右覆

盖的面积，此处是个负值应该用lower.tail = T，或不

加，默认为T，返回结果为0.0169

> p=pt(40.8832,df=68,lower.tail = F)*2

> p

[1] 1.330183e-49（截距项的检验）

> arima(x = hxfy, order = c(1, 0, 0), metho

d = "ML")

Call:

arima(x = hxfy, order = c(1, 0, 0), me

thod = "ML")

Coefficients:

ar1 intercept

-0.4192 51.2660

s.e. 0.1129 0.9137

sigma^2 estimated as 116.6: log likel

ihood = -265.98, aic = 537.96

基于拟合模型进行预测：

例题3.9-续（68页）：选择合适的模型拟合1950-2008年我国油路及农村投递线路每年新增里程数序列，并进行未来5期的预测。（数据见表A1-8）

d=read.table("C:\\Users\\w\\Documents\\附录1.8.txt") #导入数据

licheng=ts(d,start=1950,freq=1) #转化为时间序列数据 acf(licheng,14) #作自相关图，拖尾

pacf(licheng,14) #作偏自相关图，2阶截尾 Box.test(licheng, type="Ljung-Box",lag=6)

Box.test(licheng, type="Ljung-Box",lag=12)

#纯随机性检验,p值都小于5%，序列为非白噪声

library(“tseries”)

adf.test(licheng)

#除时序图和自相关图判断平稳性外，进一步用ADF检验，发现ADF值为-4.2778，P值为0.01，因此应该拒绝该假设检验中的原假设：序列非平稳，表明序列为平稳序列。结合白噪声检验，序列为平稳非白噪声序列。

arima(licheng, order = c(2,0,0),method="ML")

a= arima(licheng, order = c(2,0,0),method="ML")

r=a$residuals

#用r来保存残差

Box.test(r,type="Ljung-Box",lag=6,fitdf=2)

#检验结果表明，残差序列为白噪声序列，模型显著。

接下来，基于该AR（2）模型进行未来5期的预测：

predict(arima(licheng, order = c(2,0,0)), n.ahead =5) #预测未来5期 R运行结果如下：

$pred #pred为未来5期的预测值

Time Series:

Start = 2009

End = 2013

Frequency = 1

[1] 9.465515 6.215194 8.392736 11.678108 12.885903

$se #se表示未来5期的预测标准差

Time Series:

Start = 2009

End = 2013

Frequency = 1

[1] 19.10953 23.53092 23.53226 24.68326 25.22913

置信区间的求法：以95%的置信区间的求解为例

(xˆ(l)-1.96⋅tˆt(l)+1.96

licheng.forecast= predict(arima(licheng, order = c(2,0,0)), n.ahead =5) #将未来5期预测过程中的产生的结果保存在licheng.forecast变量中 U = licheng.forecast$pred + 1.96* licheng.forecast$se

L = licheng.forecast$pred - 1.96* licheng.forecast$se

#算出未来5期预测值的95%置信区间

R的运行结果：

Time Series:

Start = 2009

End = 2013

Frequency = 1

[1] 46.92020 52.33580 54.51596 60.05729 62.33500

Time Series:

Start = 2009

End = 2013

Frequency = 1

[1] -27.98917 -39.90541 -37.73049 -36.70108 -36.56319

所以，未来5期即2009-2013年的新增里程的预测结果为：

ts.plot(licheng,licheng.forecast$pred,col=1:2)#作时序图，含预测。 lines(U, col="blue", lty="dashed")

lines(L, col="blue", lty="dashed")#在时序图中作出95%置信区间，并指定线型为短划线。lty:连线的线型（1.实线，2.虚线，3.点线，4.点虚线，5.长虚线，6.双虚线）

-50

[***********]1980

Time[1**********]0

第三章

1.差分计算

x=1:10

y=diff(x)

k步差分∇kxt=xt-xt-k 加入参数 lag=k

如计算x的3步差分为

y=diff(x, lag = 3)

p阶差分 ∇pxt=∇p-1xt-∇p-1xt-1

∇2xt=∇xt-∇xt-1加入参数differences = p 如2阶差分

y=diff(x,differences = 2)

2.ARMA模型需要用到和arima相关的一族函数。

首先学习ARMA模型的模拟，需要用到的函数为arima.sim()，有两种指令的书写方法。

举例说明：要拟合ARMA（2,2）模型

xt=0.8897xt-1-0.4858xt-2+εt+0.2279εt-1-0.2488εt-2,

且σε=0.17962

（1）arima.sim(n=200,list(ar=c(0.8897,-0.4858),ma =c(0.2279, -0.2488)),sd = sqrt(0.1796))

（2）arima.sim(list(order = c(2,0,2), ar=c(0.8897,-0.4858),ma =c(0.2279, -0.2488)), n = 200, sd = sqrt(0.1796))

这里，order指令中的三个参数分别对应ARIMA模型中的AR模型的阶数P，差分阶数d以及MA模型的阶数q,即order(p,d,q);

ar=c()指令中分别对应AR模型部分的参数φ1，φ2， ,φp；

ma=c()指令中分别对应MA模型部分的参数-θ1，-θ2，，-θq；

例3-1

plot.ts(arima.sim(n = 100, list(ar = 0.8))) 或者x=arima.sim(list(order=c(1,0,0),ar=0.8),n = 100)

此时再用> plot(x)和> plot.ts(x)结果是一样的

#模拟AR(1)模型，并作时序图。

plot.ts(arima.sim(n = 100, list(ar = -1.1)))#非平稳，无法得到时序图。原因在于使用函数arima.sim必须要求是平稳的时间序列才行。

x=(arima.sim(n = 100, list(ar = 0.8)))

或者x=(arima.sim(list(order=c(1,0,0),ar=0.8),n = 100)

#模拟AR(1)模型

plot(x)

#作序列x的时序图。

mean(x)

#求出序列x的均值。

var(x)

#求出序列x的方差。

cov=acf(x,20,type="covariance",plot=FALSE)#若不加plot=FALSE，会画出图形。 cov

#求出序列x的滞后20阶的自协方差。

rou=acf(x,20,type="correlation",plot=FALSE)

rou

#求出序列x的滞后20阶的自相关系数。

faikk=acf(x,20,type="partial",plot=FALSE)

faikk

#求出序列x的滞后20阶的偏自相关系数。

acf(x,30)

#作序列x的自相关图

pacf(x,30)

#作序列x的偏自相关图

通过自相关图和偏自相关图，发现自相关系数拖尾，偏自相关系数一阶截尾。补充Green函数

G[1]

G[2]

for(i in 3:200)

G[i]

G[1:200] #求出前200个Green函数的值。

例题3.5：考察如下4个平稳模型的自相关图（课本51页）：

(1)xt=0.8xt-1+εt

(2)xt=-0.8xt-1+εt

(3)xt=xt-1-0.5xt-2+εt

(4)xt=-xt-1-0.5xt-2+εt

acf (arima.sim(n = 100, list(ar = 0.8)),20)

acf (arima.sim(n = 100, list(ar = -0.8)),20)

acf (arima.sim(n = 100, list(ar = c(1,-0.5))),20)

acf (arima.sim(n = 100, list(ar = c(-1,-0.5))),20)

平稳AR模型ACF图特征：拖尾性或指数衰减值0（从0阶给出）.

例题3.5续：考察如下4个平稳模型的偏自相关图（课本54页）：

(1)xt=0.8xt-1+εt

(2)xt=-0.8xt-1+εt

(3)xt=xt-1-0.5xt-2+εt

(4)xt=-xt-1-0.5xt-2+εt

pacf (arima.sim(n = 100, list(ar = 0.8)),20)

pacf (arima.sim(n = 100, list(ar = -0.8)),20)

pacf (arima.sim(n = 100, list(ar = c(1,-0.5))),20)

pacf (arima.sim(n = 100, list(ar = c(-1,-0.5))),20)

平稳AR模型PACF图特征：截尾性（从1阶给出）.

例题3.8：拟合ARMA(1，1)模型：xt-0.5xt-1=εt-0.8εt-1，并直观考察该模型自相关系数和偏自相关系数的拖尾特性。

x=arima.sim(n = 1000, list(ar = 0.5, ma = -0.8))

acf(x,20) pacf(x,20)

例题3.7

arima.sim(n = 1000, list(ar = 0.5, ma = -0.8))

acf(arima.sim(n = 1000, list(ar = 0.5, ma = -0.8)),20)

pacf(arima.sim(n = 1000, list(ar = 0.5, ma = -0.8)),20)

ARMA模型的估计

两种指令：

指令1：arima(x,order=c(p,d,q),method=c("ML","CSS"),include.mean)

其中:x—带估计序列；

Order—指定模型阶数；

method—估计方法。ML为极大似然估计，CSS为条件最小二乘估计；

include.mean—是否包含均值项，即估计结果中的intercept项。指令2：auto.arima(x)

指令的运用：

现在先模拟生成一个

xt=-0.5xt-1+0.4xt-2+εt+0.5εt-1+0.4εt-2。 ARMA(2,2)模型序列：

然后分别用arima()和auto.arima()函数进行估计。

程序如下：

d=arima.sim(list(order=c(2,0,2),ar=c(-0.5,0.4),ma=c(0.5,0.4)),n=1000) #模拟生产ARMA（2,2）序列；

#基于arima()指令对模拟序列估计ARMA模型；

arima=arima(d,order=c(2,0,2),method="ML")

该指令的R的结果：

Call:

arima(x = d, order = c(2, 0, 2), method = "ML")

Coefficients:

ar1 ar2 ma1 ma2 intercept

-0.4513 0.4314 0.4937 0.3677 0.0894

s.e. 0.0472 0.0470 0.0486 0.0434 0.0593

φ0=E(xt)(1-φ1-φ2)=0.0912。

或者

> d=arima.sim(list(order=c(2,0,2),ar=c(-0.5,0.4),ma=c(0.5,0.4)),n=1000) > auto.arima(d)#结果不好，不提倡用。

第一个例子：数据在工作目录下，已经改为prop.csv，一个企业201个连续生产值。

d=read.csv("prop.csv") #导入数据

prop=ts(d,start=1950,freq=1) #转化为时间序列数据

plot(prop) #作时序图

acf(prop,12) #作自相关图，拖尾

pacf(prop,12) #作偏自相关图，1阶截尾

Box.test(prop, type="Ljung-Box",lag=6)

#纯随机性检验,p值小于5%，序列为非白噪声

Box.test(prop, type="Ljung-Box",lag=12)

arima(prop, order = c(1,0,0),method="ML")

#用AR(1)模型拟合，如参数method="CSS"，估计方法为条件最小二乘法，用条件最小二乘法时，不显示AIC。

arima(prop, order = c(1,0,0),method="ML", include.mean = F) #用AR(1)模型拟合，不含截距项。 tsdiag(arima(prop, order = c(1,0,0),method="ML"))

#对估计进行诊断，判断残差是否为白噪声

summary(arima(prop, order = c(1,0,0),method="ML")) #summary是什么意思？

a=arima(prop, order = c(1,0,0),method="ML")

r=a$residuals#用r来保存残差 #花点钱买门票

Box.test(r,type="Ljung-Box",lag=6)#对残差进行纯随机性检验

predict(arima(prop, order = c(1,0,0)), n.ahead =5) #预测未来5期。一是预测值，一是标准误。 prop.fore = predict(arima(prop, order = c(1,0,0)), n.ahead =5)

#将未来5期预测值保存在prop.fore变量中

U = prop.fore$pred + 1.96*prop.fore$se

L = prop.fore$pred - 1.96*prop.fore$se#算出95%置信区间

ts.plot(prop, prop.fore$pred,col=1:2)#作时序图，含预测。

lines(U, col="blue", lty="dashed")

lines(L, col="blue", lty="dashed")#在时序图中作出95%置信区间

第二个例子：

例题3.9（68页）：选择合适的模型拟合1950-2008年我国油路及农村投递线路每年新增里程数序列。（数据见表A1-8）

d=read.table("C:\\Users\\w\\Documents\\附录1.8.txt") #导入数据

licheng=ts(d,start=1950,freq=1) #转化为时间序列数据

plot(licheng) #作时序图。大致围绕0值上下波动，且

波动有界，该序列大致为平稳时间序列。

图1 时序图

图2 自相关图图3 偏自相关图

adf.test(licheng)

Box.test(licheng, type="Ljung-Box",lag=6)

Box.test(licheng, type="Ljung-Box",lag=12)

#纯随机性检验,p值都小于5%，序列为非白噪声序列。

#针对平稳非白噪声序列，可以运用ARMA类模型进行拟合。根据ACF拖尾，PACF2阶截尾，建议采用AR（2）模型。

arima(licheng, order = c(2,0,0),method="ML")

#用AR(2)模型拟合，这里选择参数估计方法method="ML"。若估计方法为条件最小二乘法CSS时则不显示AIC。

R运行结果：

Call:

arima(x = licheng, order = c(2, 0, 0), method = "ML")

Coefficients:

ar1 ar2 intercept

0.7185 -0.5294 11.0223

s.e. 0.1083 0.1067 3.0906

sigma^2 estimated as 365.2: log likelihood = -258.23, aic = 524.46

补充语句include.mean = F。运行该指令，得到不含截距项的AR（2）

根据非中心化模型的参数估计结果，得到均值为E(xt)=φ0=11.0223，而φ1=0.7185，φ2=-0.5294，计算截距项1-φ1-φ2

φ0=E(xt)(1-φ1-φ2)=8.9380。即此时的AR（2）模型为：xt=8.9380+0.7185xt-1-0.5294xt-2+εt（见课本76页）

a= arima(licheng, order = c(2,0,0),method="ML")

r=a$residuals

#用r来保存残差

Box.test(r,type="Ljung-Box",lag=6,fitdf=2)

data: r

X-squared = 2.3416, df = 4, p-value = 0.6732

表明残差序列是白噪声序列，拟合模型显著有效。

接下来，参数的显著性检验。

如需计算参数的t统计量值和p值，利用下面的公式。

ˆβi（1）参数βi 的t统计量值= s.e.(β)

如在本例中，ar1即参数φ1的估计值为0.7185，其估计值的标准误差

为0.1083，则其t统计量= 0.7185=6.5420。（课本79页）其他的参数0.1083

的t统计量的求法也是一样的。

（2）P值的求法。以参数φ1的t检验统计量的P值的计算为例。

p=pt(6.5420,df=57,lower.tail = F)*2

[1] 1.837036e-08

由于该P值小于0.05，表明参数φ1显著。

第三个例子：

P79例3-15 > d=read.csv("hxfy.csv")

> hxfy=ts(d,start=1,freq=1)

> plot(hxfy)

> acf(hxfy,14)

> pacf(hxfy,14)

> Box.test(hxfy, type="Ljung-Box",lag=6)

Error in Box.test(hxfy, type = "Ljung-Box", lag = 6) :

x不是矢量或单变量时间序列

修改为一个序列之后为 d=read.csv("hxfy.csv")

> hxfy=ts(d,start=1,freq=1)

> plot(hxfy)

> acf(hxfy,14)

> pacf(hxfy,14)

> Box.test(hxfy, type="Ljung-Box",lag

=6)

Box-Ljung test

data: hxfy

X-squared = 21.319, df = 6, p-value =

0.001608

> Box.test(hxfy, type="Ljung-Box",lag

=12)

Box-Ljung test

data: hxfy

X-squared = 23.035, df = 12, p-value =

0.02743

> arima(x = hxfy, order = c(0, 0, 2),

method = "ML")

Call:

arima(x = hxfy, order = c(0, 0, 2), me

thod = "ML")

Coefficients:

ma1 ma2 intercept

-0.3194 0.3019 51.1694

s.e. 0.1160 0.1233 1.2516

sigma^2 estimated as 114.4: log likel

ihood = -265.35, aic = 538.71

> a=arima(x = hxfy, order = c(0, 0,

2), method = "ML")

> r=a$residuals

> Box.test(r,type="Ljung-Box",lag=6,f

itdf=2)

注意：fitdf的值等于（p+q），number of de

grees of freedom to be subtracted if x is

a series of residuals。当检验的序列是残差

序列时，需要加上命令fitdf，表示减去的自

由度。

Box-Ljung test

data: r

X-squared = 2.2964, df = 4, p-value =

0.6814

> p=pt(2.7534,df=68,lower.tail = F)*2

> p

[1] 0.007557625（theta1的检验）

> p=pt(-2.4485,df=68,lower.tail = F)*2

> p

[1] 1.98307

（theta2的检验这个和书上的差别太大，书上是

0.0134）

错误的原因在于lower.tail = F表示从该点向右覆

盖的面积，此处是个负值应该用lower.tail = T，或不

加，默认为T，返回结果为0.0169

> p=pt(40.8832,df=68,lower.tail = F)*2

> p

[1] 1.330183e-49（截距项的检验）

> arima(x = hxfy, order = c(1, 0, 0), metho

d = "ML")

Call:

arima(x = hxfy, order = c(1, 0, 0), me

thod = "ML")

Coefficients:

ar1 intercept

-0.4192 51.2660

s.e. 0.1129 0.9137

sigma^2 estimated as 116.6: log likel

ihood = -265.98, aic = 537.96

基于拟合模型进行预测：

例题3.9-续（68页）：选择合适的模型拟合1950-2008年我国油路及农村投递线路每年新增里程数序列，并进行未来5期的预测。（数据见表A1-8）

d=read.table("C:\\Users\\w\\Documents\\附录1.8.txt") #导入数据

licheng=ts(d,start=1950,freq=1) #转化为时间序列数据 acf(licheng,14) #作自相关图，拖尾

pacf(licheng,14) #作偏自相关图，2阶截尾 Box.test(licheng, type="Ljung-Box",lag=6)

Box.test(licheng, type="Ljung-Box",lag=12)

#纯随机性检验,p值都小于5%，序列为非白噪声

library(“tseries”)

adf.test(licheng)

arima(licheng, order = c(2,0,0),method="ML")

a= arima(licheng, order = c(2,0,0),method="ML")

r=a$residuals

#用r来保存残差

Box.test(r,type="Ljung-Box",lag=6,fitdf=2)

#检验结果表明，残差序列为白噪声序列，模型显著。

接下来，基于该AR（2）模型进行未来5期的预测：

predict(arima(licheng, order = c(2,0,0)), n.ahead =5) #预测未来5期 R运行结果如下：

$pred #pred为未来5期的预测值

Time Series:

Start = 2009

End = 2013

Frequency = 1

[1] 9.465515 6.215194 8.392736 11.678108 12.885903

$se #se表示未来5期的预测标准差

Time Series:

Start = 2009

End = 2013

Frequency = 1

[1] 19.10953 23.53092 23.53226 24.68326 25.22913

置信区间的求法：以95%的置信区间的求解为例

(xˆ(l)-1.96⋅tˆt(l)+1.96

L = licheng.forecast$pred - 1.96* licheng.forecast$se

#算出未来5期预测值的95%置信区间

R的运行结果：

Time Series:

Start = 2009

End = 2013

Frequency = 1

[1] 46.92020 52.33580 54.51596 60.05729 62.33500

Time Series:

Start = 2009

End = 2013

Frequency = 1

[1] -27.98917 -39.90541 -37.73049 -36.70108 -36.56319

所以，未来5期即2009-2013年的新增里程的预测结果为：

ts.plot(licheng,licheng.forecast$pred,col=1:2)#作时序图，含预测。 lines(U, col="blue", lty="dashed")

-50

[***********]1980

Time[1**********]0

相关内容

热门内容

标签