Hostname: page-component-6766d58669-kl59c Total loading time: 0 Render date: 2026-05-14T20:56:46.480Z Has data issue: false hasContentIssue false

The Data-Driven Approach to Spectroscopic Analyses

Published online by Cambridge University Press:  23 January 2018

M. Ness*
Affiliation:
Max-Planck-Institut für Astronomie, Königstuhl 17, D-69117 Heidelberg, Germany Department of Astronomy, Columbia University, Pupin Physics Laboratories, New York, NY 10027, USA Center for Computational Astrophysics, Flatiron Institute, 162 Fifth Avenue, New York, NY 10010, USA
Rights & Permissions [Opens in a new window]

Abstract

I review the data-driven approach to spectroscopy, The Cannon, which is a method for deriving fundamental diagnostics of galaxy formation of precise chemical compositions and stellar ages, across many stellar surveys that are mapping the Milky Way. With The Cannon, the abundances and stellar parameters from the multitude of stellar surveys can be placed directly on the same scale, using stars in common between the surveys. Furthermore, the information that resides in the data can be fully extracted, this has resulted in higher precision stellar parameters and abundances being delivered from spectroscopic data and has opened up new avenues in galactic archeology, for example, in the determination of ages for red giant stars across the Galactic disk. Coupled with Gaia distances, proper motions, and derived orbit families, the stellar age and individual abundance information delivered at the precision obtained with the data-driven approach provides very strong constraints on the evolution of and birthplace of stars in the Milky Way. I will review the role of data-driven spectroscopy as we enter the era where we have both the data and the tools to build the ultimate conglomerate of galactic information as well as highlight further applications of data-driven models in the coming decade.

Information

Type
Review Article
Copyright
Copyright © Astronomical Society of Australia 2018 
Figure 0

Figure 1. The complementary sky coverage of three different galactic surveys: APOGEE (in black), Gaia–eso (in red), and GALAH in blue.

Figure 1

Figure 2. The 540 open and globular cluster stars used in the training set for the initial development and test of The Cannon, shown in the Teff−log g plane with corresponding Padova isochrones of the cluster age and [Fe/H]. From Ness et al. (2015).

Figure 2

Figure 3. The data in black and the generated model from The Cannon in cyan for four example APOGEE stars (not in the training set), showing two narrow regions of the H-band spectrum for each star. These figures demonstrate that The Cannon very well reproduces the data, even with only three stellar labels of Teff, log g, and [Fe/H] used to describe the stellar spectra. From Ness et al. (2015).

Figure 3

Figure 4. The precision in the [Fe/H] label of The Cannon as a function of SNR, compared to the traditional approach to determining stellar parameters and abundances (i.e. aspcap, which is representative of the best current performance obtained by the multitude of surveys). These stars comprise a set for which multiple observations (or visits) have been taken, so the rms difference can be determined based on The Cannon’s and aspcap’s results for the combined versus the individual visit spectra.

Figure 4

Figure 5. Top panel: normalised flux centred around the most log g sensitive features in the APOGEE spectral region as indicated by the highest two (absolute) values of the leading coefficient corresponding to the log g label. Middle panel: the leading coefficients for The Cannon’s model in Ness et al. (2016b), normalised to the maximum absolute value of each coefficient. Bottom panel: the generated spectra from The Cannon’s model showing the regions with all labels fixed except the log g, demonstrating how the spectra change with varying log g. From Ness et al. (2016b).

Figure 5

Figure 6. The higher precision of the stellar parameters determined by The Cannon (in black), as compared to APOGEE’s pipeline aspcap (in red) (García Pérez et al. 2015). The rms difference for Teff, logg, [Fe/H], [α/Fe] is measured by comparing the labels returned for high SNR combined visit spectra to that of the lower SNR of individual visits.

Figure 6

Figure 7. The same as Figure 6 and for the same set of calibration stars, but showing the performance of The Cannon and APOGEE’s pipeline aspcap for a sample of individual elements. With The Cannon, we can achieve an individual abundance precision on most elements of <0.1 dex at SNR of 50 and <0.05 dex at SNR of 80.

Figure 7

Figure 8. At left, the APOGEE field containing M15 members which can be identified from their [Fe/H]–Vhelio and indicated as blue circles, other field stars are shown in grey. At centre and right, the [C/Fe] vs. [N/Fe] obtained from aspcap and The Cannon, again with the cluster members in blue and stars along the line of sight in grey, this shows that the high precision results from The Cannon form a tight sequence for the cluster stars in the [C/Fe] vs. [N/Fe] abundance measurements. From Casey et al. (2016a).

Figure 8

Figure 9. The [Fe/H] vs. [Mg/Fe] coloured by RGAL for the same set of SNR < 90 stars from the red clump sample in APOGEE (Bovy et al. 2014) at left with The Cannon’s results and at right with aspcap’s results. The low-alpha sequence of stars is concentrated to the outer region of the Milky Way and the high-alpha sequence of stars to the inner region, as first shown from high SNR data (SNR > 150) in Nidever et al. (2014) and Hayden et al. (2015). This is apparent in the low SNR data for the high precision results from The Cannon at left, but this is far less clear from the aspcap results at right.

Figure 9

Figure 10. The Teff−log g plane of 450 000 LAMOST red giant stars showing the results from the LAMOST pipeline at left and from The Cannon’s results at right (on the APOGEE scale), derived using a model built from 10 000 stars in common between APOGEE and LAMOST (Ho et al. 2016a).

Figure 10

Figure 11. The Teff−log g plane of the RAVE stars that have been placed on the APOGEE label scale (for the red giants) broken up into three panels, as a function of SNR and showing stellar density at top and coloured by [Fe/H] at bottom. From Casey et al. (2016a).

Figure 11

Figure 12. Cross validation of the training dataset of 1 639 apokasc stars observed by both APOGEE and Kepler for the Teff, logg, [Fe/H], [α/Fe], and mass labels: the results for The Cannon’s labels for training performed on 90% of the stars, showing the performance at test time on the 10% of the stars not included in training, run 10 times. The panel on the far right is the derived age label from the mass determined with The Cannon, using interpolation with PARSEC isochrones. From Ness et al. (2016b).

Figure 12

Figure 13. The 17 065 red clump stars in APOGEE (Bovy et al. 2014). The top left panel shows all stars coloured by median age. The density distribution of this sample is shown at the top right and the sequences we use for examining the age of the disk, of the low-α sequence and mono-abundance population bins are indicated. The bottom panels show the [Fe/H]–RGAL distribution for the young and intermediate age stars, in the low-α sequence. These bottom panels show the effect of radial migration in the Galactic disk, that [Fe/H] is a good predictor of radius for young stars in the disk, but, under the prediction of radial migration, at any given [Fe/H] older stars will be more dispersed in radius, exactly what is seen in these figures. From Ness et al. (2016a).

Figure 13

Figure 14. The age map for the Milky Way from 70 000 red giant stars binned across RGAL-z and coloured by mean age, from Ness et al. (2016b). Young stars are concentrated to the plane, older stars are concentrated above the plane and to the inner region of the Galaxy. This map is evidence for inside-out formation of the Milky Way. Note also the apparent flaring of young stars which appear at larger heights from the plane at larger RGAL.

Figure 14

Figure 15. Aitoff on sky projection of 70 000 APOGEE red giant stars ages from Ness et al. (2016a), at left and for 250 000 LAMOST red giants (Ho et al. 2016b) and the 70 000 APOGEE red giants, at right. The ages for the LAMOST stars were determined by Ho et al. (2016b) using the C and N abundances derived from the LAMOST spectra using The Cannon and are on the APOGEE label scale. The gradients in age projected on the sky are clear, with younger stars in the plane and the oldest stars in the stellar halo. The oldest stars are not seen in the Galactic centre similarity to Figure 14 simply as this projection integrates along the line of sight and the majority of the stars are only a few kpc from the Sun, the median age of these nearby stars in the disk including towards the galactic centre, is relatively young. Figure from Ho et al. (2017b).

Figure 15

Figure 16. The 90 000 red giant stars in APOGEE’s DR12 data release shown in grey and the M13 globular cluster stars recovered using the k-means clustering algorithm shown in coloured symbols, these cluster stars are grouped in the Galactic longitude-heliocentric radial velocity plane. From Hogg et al. (2016).

Figure 16

Figure 17. The [N/Fe] vs. [C/Fe] determined using The Cannon for the M13 stars shown as symbols as per Figure 16 and for the 90 000 red giant stars from APOGEE’s DR12 data release in grey, all with stellar abundances derived from The Cannon. The stars in M13 cluster in two-dimensional abundance projections, as per the example above showing [C/Fe] vs. [N/Fe]. From Hogg et al. (2016).

Figure 17

Figure 18. At top, the χ2 distribution in 20-element abundance space of all intra-cluster pairs (black) and field pairs of stars (red). This shows that typically cluster stars are identical in their 20-element abundances, within the errors, with some exceptions and stars in the field are more dissimilar from one another than stars within a given cluster. At bottom, restricting the comparison to stars [Fe/H] = 0 ± 0.02, two clusters at solar metallicity are used for the intra-cluster pairs (in black) and the inter-cluster pair similarity between these two clusters is also shown, in blue. Similarly, to the top panel, the stars within a cluster are far more chemically similar than field stars at the same [Fe/H]. Nevertheless, there is a non-negligible fraction of stars from the field (restricted to [Fe/H] = 0 ± 0.02) that are as similar to each other as stars that are from the same birth cloud, that is, the open cluster stars. These (unrelated) field stars that are as chemically similar as open cluster stars are doppelganger. From Ness et al. (2017).

Figure 18

Figure 19. From Casey et al. (2016a) showing the location of known Al absorption lines in the spectra (at top) and the Al coefficient at bottom, for the regularlised version of The Cannon which demonstrates the identification of a new Al line in the spectra, The Cannon has learned from the quadratic model where stellar flux changes with the Al label.

Figure 19

Figure 20. At top, the first-order log g coefficient and at bottom 100 random GALAH spectra. The large value of the log g coefficient corresponds to the centre of an Sc ii line. This Sc ii line is log g-sensitive because it is ionised and therefore is sensitive to pressure (and gravity) following Saha’s ionisation equation. From Buder et al. (in preparation).

Figure 20

Figure 21. The red clump stars sorted in ascending order of absolute height from the plane |z|, from bottom to top, and coloured by the residual of data model. This figure shows a feature not captured in The Cannon’s model which has high residuals only very near the plane, and is dispersed in radial velocity. This feature corresponds to the strongest diffuse interstellar band identified in the APOGEE spectra (see Zasowski et al. 2015).

Figure 21

Figure 22. Four narrow regions of the APOGEE spectral regions showing the data (in black), the aspcap model (the dashed line), and The Cannon’s model (in red). The Brackett feature is one region that is poorly fit by the theoretical stellar models as demonstrated in the top right panel. From Ness et al. (2016b).

Figure 22

Figure 23. At left, the azimuthal projection of the APOGEE DR12 data release of 250 000 stars and at right the azimuthal projection of the next generation survey of the disk stars targeted by the Milky Way Mapper program within Sloan V's MWM program, showing the >5 million stars in the disk to be observed with contiguous, magnitude-limited coverage. Figure made by Jonathan Bird (Vanderbilt).