On multivariate regression using spatial signs and ranks
Rytkönen, Matti (2015)
Rytkönen, Matti
2015
Tilastotiede - Statistics
Informaatiotieteiden yksikkö - School of Information Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2015-03-23
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:uta-201504141271
https://urn.fi/URN:NBN:fi:uta-201504141271
Tiivistelmä
Regular least squares regression can fail when using data with non-normally distributed residuals. This thesis examines multivariate regression methods using spatial signs and ranks as an alternative to least squares regression.
The computational aspects of spatial sign and rank regression were considered. The methods rely on an iterative algorithm, which can fail in certain conditions. Some options to prevent this are tested. Additionally, the algorithm can take a considerable amount of time to calculate, especially in the case of spatial ranks. A faster implementation using the C++ programming language is presented and compared to the original functions available in R.
Then the performance of those methods using a finite sample was compared to asymptotic results in a simulation study. The different methods were used in two different testing problems. In testing problem one if the whole matrix of explaining variables has no effect, and in testing problem two for a split model. When using residuals from the normal distribution, least squares regression was found to be more effective in detecting if the regression co-efficients were different from zero. However, with t-distributed residuals, spatial sign and rank methods appear to be more useful.
The computational aspects of spatial sign and rank regression were considered. The methods rely on an iterative algorithm, which can fail in certain conditions. Some options to prevent this are tested. Additionally, the algorithm can take a considerable amount of time to calculate, especially in the case of spatial ranks. A faster implementation using the C++ programming language is presented and compared to the original functions available in R.
Then the performance of those methods using a finite sample was compared to asymptotic results in a simulation study. The different methods were used in two different testing problems. In testing problem one if the whole matrix of explaining variables has no effect, and in testing problem two for a split model. When using residuals from the normal distribution, least squares regression was found to be more effective in detecting if the regression co-efficients were different from zero. However, with t-distributed residuals, spatial sign and rank methods appear to be more useful.