Interstellar communication and LLMs

This is not going to be an informative post, even by my low standards. It's more a request for comments and feedback.

I've recently become tangentially involved with the search for extraterrestrial intelligence (SETI). A research group contacted me, knowing my interest in coding theory -- that's the branch of mathematics that deals with encoding information in a way that makes it resistant to errors in communication. If you're sending information across the vast distances of interstellar space, you'd certainly want to know your coding theory.

This research group is working on decades-old recordings of signals from space, looking for evidence of coding methods that would not have been know (on Earth, at least) at the time the signals were captured. That's not a bad approach to SETI, in principle.

These researchers see me as their pet sceptic. It's clear that they are believers in the extraterrestrial hypothesis, and are thrilled about their findings. For reasons of confidentiality, I won't say who the researchers are, or the exact data they're working on. I will say that the data amounts to streams of binary data, up to about ten thousand bits in length.

On the face of it, the data is gibberish. It comes from light-years away, so it's corrupted by noise, almost to the extent that it's hard to be sure there's any signal there at all. The data -- if it even is data -- undoubtedly contain errors. Of course, the purpose of sophisticated coding methods is to be able to detect and correct errors. If this data is a transmission from space, and if we could work out the coding scheme the senders used then we could, perhaps, fix the errors and decode the data.

As you might imagine, if we don't know the coding scheme, this is a difficult enough job when decoding transmissions that we know originate right here on Earth. It's a bit like cryptanalysis, except that the data only appears to be encrypted. But if the data comes from space, we don't know what it represents, even if we're sure we can decode the transmission.

Over the years, various SETI projects have set out to interpret data like this. You may remember the notorious "SETI screensaver" of the SETI@home project, which ran from about 2000 to 2020. The thinking behind this project was that no single computer had the resources to do the vast amount of computation the task required, but the job could be broken down into many smaller tasks, and handed off to millions of desktop PCs.

The problem with this kind of analysis is that all it really does is look for patterns with low enough entropy -- that is, high levels of structure -- that it would be unlikely to result from chance. But even if we do find such evidence, it's hard to show conclusively that it didn't originate right here. It would be exciting to find, say, a pattern of repeating prime numbers, but we know we have "numbers stations" right here on Earth, and most of these remain mysterious. Who knows what else people might be transmitting?

But if we can find evidence of mathematical methods being used on data that was recorded before those methods were known, well, that's potentially a different matter.

The researchers who contacted me claim to have found such evidence. This is either the most significant discovery in human history, or total balderdash.

The problem is that I am in no position to judge which it is. If the researchers were reporting a conventional cryptographic analysis, I might have some chance of understanding it. I'm not saying it would be easy: the math is pretty difficult for a person of my modest aptitude. Still, with effort, and plentiful help from real mathematicians, I could probably figure it out.

The analysis at issue here, however, was done by an LLM.

Specifically, it was done by Google Gemini. It seems that it was able to provide evidence, not only of the coding methodology the alleged transmission used, but also of the contents of the message. The message consists of times and dates which -- wouldn't you just know it -- are supposed to indicate upcoming alien contacts. Most of these dates are now in the past, of course. I'm unsure whether the little green men actually showed up as advertised.

I'm aware that properly-trained LLMs can produce good results. However, I don't think there's any way to get them to explain their reasoning in human-comprehensible terms. I've been sent the Gemini set-up and the data it was given, but I don't know enough about LLMs to know whether the results are remotely plausible. It's always seemed to me that LLMs tend to give us the answers we want. The research team really wants it to be true, that they've found alien communications. For all I know, the LLM is just telling them what they want to hear.

So here we are: potentially the most important event in human history since we learned how to make fire, and no way to judge its veracity.

Another victory for AI.

Comments welcome.

Published 2026-03-26, updated 2026-03-26

Interstellar communication and LLMs

Categories