Home

Oh, my -- the small web is big

Since I returned to the Gemini protocol a few months ago, I've made much use of feed aggregators. These are services that examine the feeds associated with Gemini capsules, and present a selection of recent updates in chronological order. Feed aggregators show me at a glance what new material is available across Gemini space, so I can read the stuff I find interesting. Which, to be honest, is most of it.

Several feed aggregators currently serve Gemini: Antenna, DSN, Cosmos, Capcom, maybe others. They all work in a similar way, although they vary in the way they determine which feeds to index. I look at one or more of these most days, and always find at least a few new posts that interest me.

So I got to wondering whether something similar might be made for the "small web". I'm using that term here to mean the collection of personal, non-commercial websites that exist alongside the commercial behemoths like Reddit and Discord. To be included, a site would not only have to meet the "smallness" criteria -- however these criteria are decided -- but also to expose a feed, probably in ATOM or RSS format.

But where would I obtain such a list of sites? I'm certainly not going to try to build one myself. It turns out that the maintainers of the Kagi web search engine already have such a list -- it's part of their "small web initiative".

Kagi Small Web project on GitHub

When I first looked at Kagi's list -- last year, as I recall -- it had about 6,000 sites. These are all sites that have been nominated by users and (presumably) vetted for "smallness" by Kagi's maintainers. I'm pleased to say that my main website was on the list.

When I looked yesterday, I saw that Kagi's list had risen to over 30,000 sites.

Still, I thought it conceivable that some, perhaps many, of those sites were moribund. After all, the same is true of many Gemini capsules. The only way to find out was to examine the individual feeds, and see how frequent the updates were for each site.

So I threw together some C code to do this. I could have written a shell script, with a utility like xsltproc to parse the feed XML, but I figured that with such a large number of feeds, a compiled program would be faster.

Anyway, of the ~30,000 feeds in Kagi's list, it turned out that about 25,000 were viable. That is, at the time of my testing, these sites were up, their feeds could be parsed, and they provided sufficient time and context information for time-based aggregation.

That's an awful lot of sites, compared to the level of activity in Gemini space.

So I looked for just the most active sites: those whose feeds showed multiple updates over the preceding month. It turned out that there were ~9,000 such sites. The busiest sites got new content several times a day.

All this means that, if I implemented a feed aggregator that listed only the most recent update for the most active small web sites, I'd still be showing hundreds or thousands of new updates every day.

I wrote another C program to extract and tabulate these updates, just out of curiosity. On my laptop computer (2.6GHz i7 CPU, 64Gb RAM) it took about four hours to download and parse the feeds from the ~9,000 busiest sites. Of course, the laptop spent most of this time waiting for the servers to respond, but there's a fair amount of CPU-intensive XML parsing as well. I mention the processing load because I had the notion that I could run a feed aggregator on my cloud virtual server, and publish the list of new updates every day. In practice, though, I don't think the server has the resources to do this.

For March 15, I saw 1,251 updates. For earlier dates, similar numbers.

I couldn't look at each of the 1,000+ updates from even a single day. However, from the random sample of a hundred or so I did look at, I would say that most were quite interesting, and many very interesting. In fact, I very quickly got sucked down a rabbit hole, and spent a whole day exploring a stack of random content.

All this to say that the "small web" is bigger than you might think. There are tens of thousands of personal, non-commercial websites and blogs, receiving between them more than a thousands updates each day. And that's just starting with Kagi's list, which I doubt is definitive.

So the good news is that the small web isn't going away, as some have predicted: it's actually growing rapidly. Many individuals are publishing good-quality text, images, music, and other things.

The bad news is that there's no point trying to implement something like Antenna for the small web: it's just much too busy. Still, I learned a good deal by trying.

Published 2026-03-16, updated 2026-03-16

Categories

small web gemlog


Converted from my Gemini capsule.