Feature Extraction & Visualization of ALMA Data Cubes through Topological Data Analysis




ALMA-TDA is a persistent homology based engine for noise removal in ALMA data cubes. ALMA-TDA uses a data structure known as the contour tree to summarize and simplify the data. The figure to the right shows an example of the process. ALMA-TDA takes a scalar function from a data cube, computes a topological skeleton in the form of a contour tree, and simplifies that skeleton and the scalar field. The feature removal only impacts regions that connect pairs of critical points. In this way, ALMA-TDA provides the minimum perturbation of the field required to remove unwanted features
(a) An image of a 2D scalar function before simplification. (b) 3D height map of the contours corresponding to the scalar function shown in (a). (c) The contour tree structures that capture the features (i.e., relationships among local minima, local maxima, and saddle points). (d)-(f): The image, 3D height map, and the contour tree after simplifying the features.

Get the Software!

Binaries of our software, including a usage manual, can be downloaded from: https://github.com/SCIInstitute/ALMA-TDA/releases
Source code may be accessed at: https://github.com/SCIInstitute/ALMA-TDA

Demonstration of Simplification

The following example (data provided by Anil Seth) demonstrates our ability to simplify volumes. Each pair represents a layer before (left) and after (right) simplification using the Contour Tree.
screen-0990 screen-2591         screen-3067 screen-3508
screen-4361 screen-4696         screen-5250 screen-5539
screen-6155 screen-6609         screen-7092 screen-7312
screen-0331 screen-8141         screen-0490 screen-8855
screen-9769 screen-9653         screen-1274 screen-10380
screen-1338 screen-10867         screen-11293 screen-11520




  • Panel on ALMA TDA, Salt Lake City UT, Jan 2017
    • Participants: Paul Rosen, Bei Wang, Ayla Khan, Betsy Mills, Chris Johnson, Adam Ginsburg, Julia Kamenetzky


In this project, we focus on performing data analysis and designing effective visualization of data cubes by employ the contour tree [3]. The contour tree is a mathematical object describing the evolution of the level sets of a scalar function defined on a simple domain such as the 2D Euclidean space associated with a slice of a data cube. It has a graph-based representation that captures the changes within the topology of a scalar function and provides a meaningful summarization of the associated data. The contour tree can then be simplified to remove noise while retaining important features in data. Finally, the new visualization will be used on the extracted results to highlight features of interests or to support specific analytic tasks. Our proposed approach addresses the following research questions:

  1. Data Transformation: How are the spectral lines represented in a 3D data cube meaningfully converted to scalar functions for contour tree-based analysis?
  2. Feature Extraction: Once a contour tree is generated, there are many methods for selecting the important features of the tree. Therefore, how do we extract meaningful features of the data via contour tree simplification to suit the needs of astronomers?
  3. Feature Exploration: What is an effective visualization of contour trees to enable feature exploration of a single data cube by the users?
  4. Feature Comparison: Can contour tree representations be used for feature comparisons among multiple data cubes to characterize secular changes with observed properties (for example, transition energy, molecular species and chemical families), or derived properties such as temperature and density?
This project is a feasibility study for applying forms of data analysis and visualization never before tested by the ALMA community. Through contour tree-based TDA, we seek to improve upon existing data cube analysis and visualization. This will come in the form of improved accuracy and speed in finding features, and a better visual description of features once identified. In particular, the simplification of contour tree provides well-defined mechanism in identifying features which are robust to noise. This prototype software creates visualizations that help in characterizing and analyzing the spectra of complex spectral line sources within a given data cube. It will includes: a data module that handles conversion and transformation for analysis; a computational module for efficient contour tree computations; and a set of linked interactive visualization tools that enables feature extraction, selection, and comparison. These analysis and visualization tools will assist in the exploration, discovery, and communication of the important science being performed with ALMA data and lay the groundwork for the future generations of analysis methodologies.

Science Case

Radio astronomy is currently undergoing a revolution driven by new observing capabilities. The current generation of radio and millimeter telescopes, particularly the Atacama Large Millimeter Array (ALMA), offers enormous advances in capabilities, including significantly increased sensitivity, spatial and spectral resolution, and spectral bandwidth. While these advances represent an unprecedented opportunity to advance scientific understanding, they also pose a significant challenge. Although the higher sensitivity and resolution they provide in some cases yield new detections of sources with well-ordered structure that is easy to interpret with current tools (e.g., HL Tau [9]), these advances equally often lead to the detection of structure with increased spatial and spectral complexity (e.g., new molecules in the chemically-rich massive star forming region Sgr B2, outflows in the nuclear region of the nearby galaxy NGC 253, and rich kinematic structure in the giant molecular cloud “The Brick” [1, 2, 12]). The new complexity present in current spectral line datasets challenges not only the existing tools for fundamental analysis of these datasets, but also users’ ability to explore and visualize their data.
Whether scientists can navigate and correctly interpret this new complexity will determine their success in addressing a number of important scientific questions. Among the topics driven by the detection of more complex structures are ISM turbulence [4, 13], the star formation process [7], filaments [11], molecular cloud structure and kinematics [12], and the kinematics of nearby galaxies [5, 6, 8] and high redshift galaxies [10, 14].
An even greater challenge arises from our ability to detect an increased number of spectral lines in more and more sources. There simply are no tools capable of simultaneously visualizing, comparing, and analyzing the dozens to hundreds of data cubes for all of the detected spectral lines in a given source. Such standard methods of both visualizing and exploring data as moment maps, channel maps, playing cube like a video, or 3D models, cannot scale up to the case of large numbers of lines, even in non-complex, well-ordered cases, such as rotating disks, or expanding stellar shells. Users become overwhelmed by, for example, comparing these typical diagnostics for two lines, side by side or one at a time. In the richest sources with thousands of lines, such comparisons will simply be impossible—it becomes necessary to resort to methods that entirely throw away either the spectral information such as Principle Component Analysis (PCA) of moment maps or the spatial information that requires model fitting of complex spectra. As a result, both exploration and analysis of the data becomes not only time consuming, but potentially incomplete.

As we move into the future and these telescopes reach their full potential, complex spatial and velocity structure will no longer be a problem that typically occurs in a separate subset of sources than those exhibiting rich spectra—the two problems will coexist, compounding the highlighted issues. The visualization and analysis challenges currently facing radio astronomy will then only grow more pressing as our instruments grow more sensitive and the data volumes become larger.


  1. A. Belloche, R. T. Garrod, H. S. P. Muller, and K. M. Menten. Detection of a branched alkyl molecule in the interstellar medium: iso-propyl cyanide. Science, 345:1584–1587, Sept. 2014.
  2. A. D. Bolatto, S. R. Warren, A. K. Leroy, F. Walter, S. Veilleux, E. C. Ostriker, J. Ott, M. Zwaan, D. B. Fisher, A. Weiss, E. Rosolowsky, and J. Hodge. Suppression of star formation in the galaxy NGC 253 by a starburst-driven molecular wind. Nature, 499:450–453, July 2013.
  3. H. Carr, J. Snoeyink, and U. Axen. Computing contour trees in all dimensions. Computational Geometry: Theory and Applications, 24(3):75–94, 2003.
  4. C. De Breuck, R. J. Williams, M. Swinbank, P. Caselli, K. Coppin, T. A. Davis, R. Maiolino, T. Nagao, I. Smail, F. Walter, A. Weiss, and M. A. Zwaan. ALMA resolves turbulent, rotating [CII] emission in a young starburst galaxy at z = 4.8. Astronomy & Astrophysics, 565:A59, May 2014.
  5. K. E. Johnson, A. K. Leroy, R. Indebetouw, C. L. Brogan, B. C. Whitmore, J. Hibbard, K. Sheth, and A. S. Evans. The Physical Conditions in a Pre-super Star Cluster Molecular Cloud in the Antennae Galaxies. The Astrophysical Journal, 806:35, June 2015.
  6. A. K. Leroy, A. D. Bolatto, E. C. Ostriker, E. Rosolowsky, F. Walter, S. R. Warren, J. Donovan Meyer, J. Hodge, D. S. Meier, J. Ott, K. Sandstrom, A. Schruba, S. Veilleux, and M. Zwaan. ALMA Reveals the Molecular Medium Fueling the Nearest Nuclear Starburst. The Astrophys- ical Journal, 801:25, Mar. 2015.
  7. H. B. Liu, R.Galván-Madrid, I.Jiménez-Serra, C.Román-Zúñiga, Q. Zhang, Z. Li, and H.-R. Chen. ALMA Resolves the Spiraling Accretion Flow in the Luminous OB Cluster-forming Region G33.92+0.11. The Astrophysical Journal, 804:37, May 2015.
  8. D. S. Meier, F. Walter, A. D. Bolatto, A. K. Leroy, J. Ott, E. Rosolowsky, S. Veilleux, S. R. Warren, A. Weiss, M. A. Zwaan, and L. K. Zschaechner. ALMA Multi-line Imaging of the Nearby Starburst NGC 253. The Astrophysical Journal, 801:63, Mar. 2015.
  9. A. Partnership, C. L. Brogan, L. M. Perez, T. R. Hunter, W. R. F. Dent, A. S. Hales, R. Hills, S. Corder, E. B. Fomalont, C. Vlahakis, Y. Asaki, D. Barkats, A. Hirota, J. A. Hodge, C. M. V. Impellizzeri, R. Kneissl, E. Liuzzo, R. Lucas, N. Marcelino, S. Matsushita, K. Nakanishi, N. Phillips, A. M. S. Richards, I. Toledo, R. Aladro, D. Broguiere, J. R. Cortes, P. C. Cortes, D. Espada, F. Galarza, D. Garcia-Appadoo, L. Guzman-Ramirez, E. M. Humphreys, T. Jung, S. Kameno, R. A. Laing, S. Leon, G. Marconi, A. Mignano, B. Nikolic, L.-A. Nyman, M. Radiszcz, A. Remijan, J. A. Rodon, T. Sawada, S. Takahashi, R. P. J. Tilanus, B. Vila Vilaro, L. C. Watson, T. Wiklind, E. Akiyama, E. Chapillon, I. de Gregorio-Monsalvo, J. Di Francesco, F. Gueth, A. Kawamura, C.-F. Lee, Q. Nguyen Luong, J. Mangum, V. Pietu, P. Sanhueza, K. Saigo, S. Takakuwa, C. Ubach, T. van Kempen, A. Wootten, A. Castro- Carrizo, H. Francke, J. Gallardo, J. Garcia, S. Gonzalez, T. Hill, T. Kaminski, Y. Kurono, H.-Y. Liu, C. Lopez, F. Morales, K. Plarre, G. Schieven, L. Testi, L. Videla, E. Villard, P. Andreani, J. E. Hibbard, and K. Tatematsu. First Results from High Angular Resolution ALMA Observations Toward the HL Tau Region. ArXiv e-prints, Mar. 2015.
  10. A. Partnership, C. Vlahakis, T. R. Hunter, J. A. Hodge, L. M. Perez, P. Andreani, C. L. Brogan, P. Cox, S. Martin, M. Zwaan, S. Matsushita, W. R. F. Dent, C. M. V. Impellizzeri, E. B. Fomalont, Y. Asaki, D. Barkats, R. E. Hills, A. Hirota, R. Kneissl, E. Liuzzo, R. Lucas, N. Marcelino, K. Nakanishi, N. Phillips, A. M. S. Richards, I. Toledo, R. Aladro, D. Broguiere, J. R. Cortes, P. C. Cortes, D. Espada, F. Galarza, D. Garcia-Appadoo, L. Guzman-Ramirez, A. S. Hales, E. M. Humphreys, T. Jung, S. Kameno, R. A. Laing, S. Leon, G. Marconi, A. Mignano, B. Nikolic, L.-A. Nyman, M. Radiszcz, A. Remijan, J. A. Rodon, T. Sawada, S. Takahashi, R. P. J. Tilanus, B. Vila Vilaro, L. C. Watson, T. Wiklind, Y. Ao, J. Di Francesco, B. Hatsukade, E. Hatziminaoglou, J. Mangum, Y. Matsuda, E. van Kampen, A. Wootten, I. de Gregorio-Monsalvo, G. Dumas, H. Francke, J. Gallardo, J. Garcia, S. Gonzalez, T. Hill, D. Iono, T. Kaminski, A. Karim, M. Krips, Y. Kurono, C. Lonsdale, C. Lopez, F. Morales, K. Plarre, L. Videla, E. Villard, J. E. Hibbard, and K. Tatematsu. ALMA Long Baseline Observations of the Strongly Lensed Submillimeter Galaxy HATLAS J090311.6+003906 at z=3.042. ArXiv e-prints, Mar. 2015.
  11. N. Peretto, G. A. Fuller, A. Duarte-Cabral, A. Avison, P. Hennebelle, J. E. Pineda, P. Andre, S. Bontemps, F. Motte, N. Schneider, and S. Molinari. Global collapse of molecular clouds as a formation mechanism for the most massive stars. Astronomy & Astrophysics, 555:A112, July 2013.
  12. J. M. Rathborne, S. N. Longmore, J. M. Jackson, J. F. Alves, J. Bally, N. Bastian, Y. Contreras, J. B. Foster, G. Garay, J. M. D. Kruijssen, L. Testi, and A. J. Walsh. A Cluster in the Making: ALMA Reveals the Initial Conditions for High-mass Cluster Formation. The Astrophysical Journal, 802:125, Apr. 2015.
  13. J. M. Rathborne, S. N. Longmore, J. M. Jackson, J. M. D. Kruijssen, J. F. Alves, J. Bally, N. Bastian, Y. Contreras, J. B. Foster, G. Garay, L. Testi, and A. J. Walsh. Turbulence Sets the Initial Conditions for Star Formation in High-pressure Environments. The Astrophysical Journal Letters, 795:L25, Nov. 2014.
  14. R. Wang, J. Wagg, C. L. Carilli, F. Walter, L. Lentati, X. Fan, D. A. Riechers, F. Bertoldi, D. Narayanan, M. A. Strauss, P. Cox, A. Omont, K. M. Menten, K. K. Knudsen, R. Neri, and L. Jiang. Star Formation and Gas Kinematics of Quasar Host Galaxies at z ̃6: New Insights from ALMA. The Astrophysical Journal, 773:44, Aug. 2013.