XS4ALL INN patch page

This page has a somewhat too pretentious name to be on my homepage, but so be it :). I'm not the only one doing patches to the news server, but so far my patches are the only one being distributed to the net :)

This is what's currently available:


mmap()ed .overview files.

This patch will mmap() the .overview files, and binary search them for a very big speed improvement, especially for sites with a lot of readers. Note: this patch is against the innd 1.5.1 source.

You do need a superior mmap() implementation for this. Also, your .overview files need to be sorted (they usually are). If your .overview files aren't sorted, run this script occasionally to get them sorted again. It's (of course) a perl script, use the -v switch to see what's happening, or no switch at all to run from crontab.

Here's the patch

Additionally, here is a program to stress-test your mmap() implementation. It is a shar archive, containing a C program you'll have to compile yourself (usually make mmapme will do), and a bash script. The script will hexdump three files. All files should only contain "1" (hex 31) characters. If you also see hex 00 characters, your mmap is buggy and this overview patch might not always give the correct results.


Lag-o-meter

These are two little tools that will display a histogram of the lag of the last 1000 arriving articles, along with some interesting statistics about your feed. These are both perl scripts.

There's a fast tool that quickly gives you a histogram, something like this:

newsfeed:/news/bin # ./lag-o-meter -h
Sensible Lag: 06:03:44 (21824.0833333333, 552 measures)
Articles/sec: 4.57471264367816
Minimum  Lag: 00:00:07 (7)
Average  Lag: 11:24:48 (41087.959798995)
Maximum  Lag: 252:00:34 (907234)
Latest Art arrived: 00:00:01 (1) ago
                                                                               
*                                                                              
*                                                                              
*                                                                              
*                                                                              
*                                                                              
*                                                                              
*                                                                              
*                                                                              
*                                                                              
*              *                                                               
*              *                                                               
*              *                                                               
*              *                                                               
*              *                                                               
**             *                                                               
**             *                                                               
**             *                                                               
**   *  *     **  *                                                            
**  ** **     ***** **  *            *                                         
************************* **   * *  **    *  *       *    ** *        ***   * *
01234567891 1  1    2    2    3    3    4    4    5    5    6    6    7    7 7>
          0 2  5    0    5    0    5    0    5    0    5    0    5    0    5 7
Histogram of lag of arriving articles. One character is 12 articles
example lag-o-meter output for my site

Important figures here are: the "Sensible Lag", the average lag of the bulk of the arriving articles. This gets rid of the junk that was posted back in the stone age (like the 252hours says for me). This should ideally be under 15 minutes. Start worrying if it's over 1 hour. As you can see, I am worrying. Note that the average lag is not particularly useful as it is influenced too much by exceptional articles. I've seen the average lag fluctuate between 1 hour and 10 hours within a few minutes (which is why I came up with the sensible lag :)

Another important figure is the number of articles arriving per second. Suggestions on what a healthy newsfeed should look like here are welcome, I guess you should see between 1 and 3 on a normal feed and as much as possible if you're lagging behind, maybe as much as 10 or 20 on a really good machine. Note that this is the number of real articles arriving, not the number of article checks being performed.

Additionally, There's a slow tool that gives basically the same output, only for each histogram entry it gives you which feeder was responsible for it. Very useful to see exactly which feeding site is lagging. This is slow because it has to open 1000 articles and read the Path line.

Both tools take only one switch: -h, which switches from the default 15 minutes granularity to one-hour granularity on the histograms. Usually 15 minutes granularity (giving almost 20 hours display) is sufficient.


JohnPC