Estimating Spectroscopic Redshifts by Using k Nearest Neighbors Regression I. Description of Method and Analysis
Context: In astronomy, new approaches to process and analyze the exponentially increasing amount of data are inevitable. While classical approaches (e.g. template fitting) are fine for objects of well-known classes, alternative techniques have to be developed to determine those that do not fit. Therefore a classification scheme should be based on individual properties instead of fitting to a global model and therefore loose valuable information. An important issue when dealing with large data sets is the outlier detection which at the moment is often treated problem-orientated. Aims: In this paper we present a method to statistically estimate the redshift z based on a similarity approach. This allows us to determine redshifts in spectra in emission as well as in absorption without using any predefined model. Additionally we show how an estimate of the redshift based on single features is possible. As a consequence we are e.g. able to filter objects which show multiple redshift components. We propose to apply this general method to all similar problems in order to identify objects where traditional approaches fail. Methods: The redshift estimation is performed by comparing predefined regions in the spectra and applying a k nearest neighbor regression model for every predefined emission and absorption region, individually. Results: We estimated a redshift for more than 50% of the analyzed 16,000 spectra of our reference and test sample. The redshift estimate yields a precision for every individually tested feature that is comparable with the overall precision of the redshifts of SDSS. In 14 spectra we find a significant shift between emission and absorption or emission and emission lines. The results show already the immense power of this simple machine learning approach for investigating huge databases such as the SDSS.