As part of efforts to identify distant, life-friendly planets, NASA has launched a crowdsourcing project in which volunteers search telescopic images for traces of debris disks around stars, which are good indicators of exoplanets.
With the results of this project, researchers at MIT have now trained a machine learning system to search for debris disks themselves. The scope of the search requires automation: there are nearly 750 million potential light sources in the data accumulated solely by NASA's Wide-Field Infrared Survey Explorer (WISE) mission.
In tests, the machine learning system agreed with human identifications over debris disks 97 percent of the time. Researchers also trained their system to evaluate debris disks for their likelihood of containing detectable exoplanets. In an article describing the new work in Astronomy and Computing the MIT researchers report that their system has identified 367 unchecked celestial objects as particularly promising candidates for further study.
The work represents an unusual approach to machine learning, as advocated by one of the co-authors of the paper, Victor Pankratius, one of the key researchers at MIT's Haystack Observatory. Typically, a machine learning system scours through a plethora of training data and searches for consistent correlations between features of the data and a label used by a human analyst ̵
But Pankratius argues that in science, machine learning systems would be more useful if they explicitly incorporate a little scientific understanding to help them find correlations or identify deviations from the norm that could be of scientific interest.
"The main The vision is to go beyond what AI is focusing on today," says Pankratius. "Today, we're collecting data, and we're trying to find features in the data that end up with billions and billions of functions. What you do with it What you want to know as a scientist is not The computer tells you that certain pixels are specific You want to know, "Oh, that's a physically relevant thing, and here are the physical parameters of the thing."
Classroom conception  The new paper emerged from an MIT seminar, Pankratius Together with Sara Seager, who taught the Earth Science, Atmospheric and Planetary Sciences class known for her exoplanet research in 1941. The Astroinformatics for Exoplanet seminar introduced students to data science techniques that could be useful in interpreting the flood of data After having mastered the techniques, the students were asked to turn them on to apply outstanding astronomical questions.
For their graduation project, Tam Nguyen, a graduate student in aerospace, chose the problem of training a machine learning system to identify debris disks, and the new paper is an outgrowth of that work. Nguyen is the first author on paper, and she joined Seager, Pankratius and Laura Eckman, a bachelor in electrical engineering and computer science.
From NASA's crowdsourcing project, researchers had identified the celestial coordinates of the light sources human volunteers had identified as debris disks. The discs are recognizable as ellipses of light with slightly lighter ellipses in their centers. The researchers also used the raw astronomical data generated by the WISE mission.
To prepare the data for the machine learning system, Nguyen carved it into small pieces and then used standardized signal processing techniques to filter out artifacts, imaging instruments, or ambient light. Next, she identified these chunks with light sources in their centers and used existing image segmentation algorithms to remove additional light sources. This type of procedure is typical of any computer vision machine learning project.
But Nguyen used basic principles of physics to further reduce the data. On the one hand, it considered the variation of the intensity of the light emitted by the light sources in four different frequency bands. She also used standard metrics to evaluate the position, symmetry, and scaling of light sources and set thresholds for inclusion in her dataset.
In addition to the flagged debris disks from NASA's crowdsourcing project, the researchers also had a short list of stars that astronomers considered likely to be exoplanets. From this information, her system also derived properties of debris disks correlated with exoplanets to select the 367 candidates for further study.
"Given the scalability issues with Big Data, Crowdsourcing and Citizen Science use to develop training records for machine-learning classifiers for astronomical observations and related objects is an innovative way to face challenges not only in astronomy but also several to address various data-intensive science areas, "says Dan Crichton, who heads the Center for Data Science and Technology at NAASA's Jet Propulsion Lab. "Using the computer-based discovery pipeline described to automate the extraction, classification, and validation process will be helpful in systematizing how these capabilities can be brought together, and the work is good for discussing the effectiveness of this approach The lessons learned will be important in generalizing the techniques to other astronomy and other disciplinary applications. "
" The Disk Detective science team has been working on its own machine learning project, and now this paper is out, we need to get together and compare notes, "says Marc Kuchner, an experienced astrophysicist at NASA's Goddard Space Flight Center and head of the crowdsourced disc detection project called Disk Detective. "I'm really glad that Nguyen is investigating that, because I really think that this kind of machine-human collaboration will be critical to analyzing the vast amounts of data of the future."
Giant Exoplanet Hunters: Look for debris disks