Tuesday, January 28, 2025

More Machine Learning with WALLABY

 Title: WALLABY Pilot Survey: kNN identification of perturbed galaxies through HI morphometrics

Link

My paper on the level or perturbed looking galaxies in a couple of WALLABY (Widefield ASKAP L-band Legacy All-sky Blind surveY). The reason that I wanted to see how well this worked on HI data and using the morphometrics that I have come to rely on to parameterize HI morphology.

What made it possible at all was a paper that classified sources in two of those fields into different levels and types of perturbed (Lin+ 2023). Looking at the plot below, that seems like a reasonable sized sample to perhaps try some simpler machine learning out on, building on the work I had done in 2011 on a variety of HI surveys and in 2023 on the WALLABY pilot data.







I simplified the label into simply perturbed-looking and “not” since this is not the biggest training set used in ML.

But these galaxies were closer than those in 2023 and there was some reason to think it might work a little better.

Let’s find out!



'' failed to upload. Invalid response: RpcError

The morphometric feature space for the training sample. The level of perturbed-ness is indicated with the color.

There is a pretty good spread in values and maybe some separation in this parameter space. Excellent material to try some (simple) classifiers out on. The one I picked was kNN since it is conceptually easy, classify as a few nearest neighbors in the n-dimensional space. The space does not need to be orthogonal, which the morphometric space definitely isn’t. And there is only one thing to tune: the number of neighbors.


Checking for the optima number of neighbors. I did to the feature engineering (picking which parameters to feed it) already. This was partially motivated by my experiences with some of them (Smoothness is not good, Intensity also needs a smoothing kernel).

As we can see from the plot above, the optimal number of neighbors is 2, after which all the metrics diverge and degrade. Oka 2 neighbors (a bit low maybe?) it is!

Because the training set is still pretty small, I did the train-test loop a couple of times to get an idea of the average performance. We get the confusion matrix below:


The average performance of the test sample using a series of train/test of the kNN.

Not terrible. Not amazing either.

If we use the full training sample to train (and no separate testing) we get:


What do I get if I use the whole labeled data set for training. You have to worry about over-fitting but OTOH this is still a small training sample.

A little better. We really could do with a bigger training sample but that is a refrain in ML. Okay which ones are predicted to be perturbed? It’s these. If you compare with the ones above, it’s a fair first cut.


kNN predictions for which galaxies are perturbed. In individual cases it works…sort of. As a fraction, it works very well.

And we had a second field where there were more galaxies (it’s wider) so we could apply our kNN classifier there too:


Predicted kNN pertured galaxies in the NGC 5044 field.

Now we have a precision for when this field is studied in more detail for signs of perturbed galaxies.

What we have found is that the kNN classifier is pretty decent to get the fraction of galaxies that is perturbed in a given field. For individual galaxies however it is best to think of this as a prediction. With an accuracy of 80%, meaning 1.5 times it still gets it wrong. The good news is that this is pretty easy to beat, with both better training sets, and perhaps a direct classification from HI maps to a perturbed label, rather than going through morphometrics first. One could use perhaps the first order map (the volocity map) instead of the column density one to train a convolutional neural network. But for now we have a prediction of NGC 5055 galaxies and some more intuition how to apply machine learning on HI maps morphology.


Monday, January 6, 2025

Two Books on a chaotic Cosmos

I recently finished “Our Accidental Universe” by Prof. Chris Lintott and just before that I re-read “The Disordered Cosmos” by Prof. Chandra Prescott-Weinstein. Both are books for the general audience to explain the nature of our Universe told by some of the best explainers in the business. They are as far apart in style as I can tell as possible. I like reading books this way, in contrasting pairs. Both are a series of essays/chapters on topics in Astronomy highlighting the often randomness of our Universe and serendipity in our discoveries.


The Disordered Cosmos (DC) originated from blog posts and this shows some in the writing and language. There are footnotes explaining terms or author’s asides but no extended reference list.

Our accidental universe (AU) is in that sense a much more “traditional” popular Astronomy book, footnotes for jokes and asides, with a long list of where the author got his information from for these stories.

The bigger differences between these two books are how much you meet the author personally. In AU, you meet Chris as younger self briefly to highlight discovery or to set a timeframe. There is a person behind the stories and jokey asides but the author keeps his privacy. This is very different for the DC. Here we meet the author personally and writing about intensely personal things and the various identities she brings to the science of Astronomy/cosmology. It is a much more emotional read. When I read it for the first time in the spring of 2019, a lot of the frustrations with the system of physics resonated with me. My experiences as an immigrant white dude here are nowhere near as bad as some of those described in the DC. But the feeling on being judged on emotional labor for students or feeling of not belonging thanks to physics group dynamics. Yeah that hit pretty solidly mid-tenure. I thought that spring semester in 2019 was the roughest (“oh my sweet summer child” is the phrase I think).

So I will be honest, the second half of the DC did not stick in my brain at all. Too stressed. So I really wanted to reread it. When I am overwhelmed by a 3 class semester where one is a new class and assorted chaos, yeah I need to read space kablooie, not the Disordered Cosmos.

The AU is a very smooth read. I also know how Chris sounds so I heard it in his voice and frankly he just sounds so much like the BBC. Stephen Fry encountered this when people in prison were confident he would go to university. Purely because of his diction. I’ve heard a British friend describe it as “plummy”. It is a much more relaxing read, often because I heard part of the stories too.

So when we read books about Astronomy or our Universe, you might expect a soothing story where the narrator has a nice accent in your head. But if you want a glimpse how this sometimes goes down in the heads of people doing the work, the DC is a much more direct and honest look in to the very human and flawed endeavor that is the science of Astronomy.

The fun part is that both books have a similar takeaway message: understanding the cosmos is fun. Be it as we are given a tour by a BBC voice or shown it as an act of resistance against the worse human behaviours. Exploring the Universe is chaotic, random and above all fun.