From: HUI, Ka Ho <*matthew.hui*>

Date: Thu, 19 May 2016 15:26:35 +0000

Dear all,

We just identified that the cause of the problem is model misspecification,=

which happens for small values of x near zero for a logarithmic function. =

We managed to solve the problem by using a shift of the x-axis by using thi=

s:

C=THETA(1)

B=THETA(2)

S=THETA(3)

F=C+B*LOG(FACTOR1+S)

Thanks!

Matthew

From: HUI, Ka Ho

Sent: Thursday, May 19, 2016 4:18 PM

To: nmusers

Subject: Failure to arrive at expected parameter estimates

Dear all,

I have some data x (input) and y (output), with 'inverse' heteroscedasticit=

y, where variance is greater for smaller x.

The data file is attached (data.txt).

After filtering off all data with FILTER1=1 and FILTER2=1, the binned d=

ata plot looks like this (Question.jpg).

Most data points are at small x (43.3% are between 0-10, 12.9% are between =

10-20, 9% are between 20-30, 34.8% for the rest, data are more sparse at la=

rger x)

Blue points are the mean, red and purple points show the 5th and 95th perce=

ntiles in each bin. Green points are the SD in each bin. Curve estimations =

has been done and the equation for the means are shown as equation (1) and =

that for the SDs are shown at the bottom.

Here is a template for our first control stream, written according to the r=

esults of curve estimation for means:

$INPUT ID DV FILTER1 FILTER2 FACTOR1 MDV

$DATA data.txt IGNORE=

$PRED

C=THETA(1)

B=THETA(2)

F=C+B*LOG(FACTOR1) ;Relationship as shown in equation (1)

Y=F+EPS(1)

DUMMY=ETA(1)

$THETA

(-20, -0.5, 20) ;C, curve estimation result is -0.4465

(-20, 1, 20) ;B, curve estimation result is 1.0266

$OMEGA

0 FIXED

$SIGMA

2

$EST METHOD=1 INTERACTION MAXEVAL=9999 PRINT=1

$COV

$TABLE ...

The fitted parameters are illustrated by equation (3), which is obviously b=

iased below for x > 100. The bias was also observed in residual plots.

To explain also for the heteroscedasticity, we tried another control stream=

, written according to the results of curve estimation for SD:

$INPUT ID DV FILTER1 FILTER2 FACTOR1 MDV

$DATA data.txt IGNORE=

$PRED

C=THETA(1)

B=THETA(2)

C_SD=THETA(3)

B_SD=THETA(4)

W=C_SD*B_SD**FACTOR1 ;Relationship as shown in the equati=

on at the bottom

F=C+B*LOG(FACTOR1)

Y=F+(W*EPS(1)) ;Variance depends on FACTOR1

DUMMY=ETA(1)

$THETA

(-20, -0.5, 20) ;C, curve estimation result is -0.4465

(-20, 1, 20) ;B, curve estimation result is 1.0266

(-20, 0.72, 20) ;C_SD, curve estimation result is 0.7529

(-20, 1, 20) ;B_DD, curve estimation result is 0.9962

$OMEGA

0 FIXED

$SIGMA

1 FIXED

$EST METHOD=1 INTERACTION MAXEVAL=9999 PRINT=1

$COV

$TABLE ...

The fitted parameters are illustrated by equation (2), which is still biase=

d.

Despite the fact that most data points concentrate at small x, which may ha=

ve contributed to the bias at large x, we observed the fitted parameters (e=

quation (2)/equation(3)) and note that these two equations are in fact over=

-estimating the means even at small x, and therefore we have no idea why th=

ese two equations resulted. We tried different initial estimates but in vai=

n.

It would be great if someone can give any advice!

Thanks!

Matthew

Received on Thu May 19 2016 - 11:26:35 EDT

Date: Thu, 19 May 2016 15:26:35 +0000

Dear all,

We just identified that the cause of the problem is model misspecification,=

which happens for small values of x near zero for a logarithmic function. =

We managed to solve the problem by using a shift of the x-axis by using thi=

s:

C=THETA(1)

B=THETA(2)

S=THETA(3)

F=C+B*LOG(FACTOR1+S)

Thanks!

Matthew

From: HUI, Ka Ho

Sent: Thursday, May 19, 2016 4:18 PM

To: nmusers

Subject: Failure to arrive at expected parameter estimates

Dear all,

I have some data x (input) and y (output), with 'inverse' heteroscedasticit=

y, where variance is greater for smaller x.

The data file is attached (data.txt).

After filtering off all data with FILTER1=1 and FILTER2=1, the binned d=

ata plot looks like this (Question.jpg).

Most data points are at small x (43.3% are between 0-10, 12.9% are between =

10-20, 9% are between 20-30, 34.8% for the rest, data are more sparse at la=

rger x)

Blue points are the mean, red and purple points show the 5th and 95th perce=

ntiles in each bin. Green points are the SD in each bin. Curve estimations =

has been done and the equation for the means are shown as equation (1) and =

that for the SDs are shown at the bottom.

Here is a template for our first control stream, written according to the r=

esults of curve estimation for means:

$INPUT ID DV FILTER1 FILTER2 FACTOR1 MDV

$DATA data.txt IGNORE=

$PRED

C=THETA(1)

B=THETA(2)

F=C+B*LOG(FACTOR1) ;Relationship as shown in equation (1)

Y=F+EPS(1)

DUMMY=ETA(1)

$THETA

(-20, -0.5, 20) ;C, curve estimation result is -0.4465

(-20, 1, 20) ;B, curve estimation result is 1.0266

$OMEGA

0 FIXED

$SIGMA

2

$EST METHOD=1 INTERACTION MAXEVAL=9999 PRINT=1

$COV

$TABLE ...

The fitted parameters are illustrated by equation (3), which is obviously b=

iased below for x > 100. The bias was also observed in residual plots.

To explain also for the heteroscedasticity, we tried another control stream=

, written according to the results of curve estimation for SD:

$INPUT ID DV FILTER1 FILTER2 FACTOR1 MDV

$DATA data.txt IGNORE=

$PRED

C=THETA(1)

B=THETA(2)

C_SD=THETA(3)

B_SD=THETA(4)

W=C_SD*B_SD**FACTOR1 ;Relationship as shown in the equati=

on at the bottom

F=C+B*LOG(FACTOR1)

Y=F+(W*EPS(1)) ;Variance depends on FACTOR1

DUMMY=ETA(1)

$THETA

(-20, -0.5, 20) ;C, curve estimation result is -0.4465

(-20, 1, 20) ;B, curve estimation result is 1.0266

(-20, 0.72, 20) ;C_SD, curve estimation result is 0.7529

(-20, 1, 20) ;B_DD, curve estimation result is 0.9962

$OMEGA

0 FIXED

$SIGMA

1 FIXED

$EST METHOD=1 INTERACTION MAXEVAL=9999 PRINT=1

$COV

$TABLE ...

The fitted parameters are illustrated by equation (2), which is still biase=

d.

Despite the fact that most data points concentrate at small x, which may ha=

ve contributed to the bias at large x, we observed the fitted parameters (e=

quation (2)/equation(3)) and note that these two equations are in fact over=

-estimating the means even at small x, and therefore we have no idea why th=

ese two equations resulted. We tried different initial estimates but in vai=

n.

It would be great if someone can give any advice!

Thanks!

Matthew

Received on Thu May 19 2016 - 11:26:35 EDT