Diablo FAQ

This FAQ attempts to answer Frequently Asked Questions about Diablo Usenet server software.

FAQ Revised: Sun Jun 30 12:50:06 2002

1. General

1.1. What is Diablo?
1.2. Where can I download diablo?
1.3. What version should I run?
1.4. Is there a mailinglist?
1.5. What about a newsgroup?
1.6. How do I contribute to Diablo?
1.7. What OS should I run Diablo on?
1.8. Where is the latest version of this FAQ?

2. Newsfeeder

2.1. What is a newsfeeder or news transit system?
2.2. How much bandwidth does a transit server need?
2.3. How much diskspace do I need?
2.4. How much memory do I need?
2.5. Are there any tools to make usage statistics?

3. Newsreader

3.1. What is newsreading?
3.2. What is overview?
3.3. How much diskspace do I need for overview?
3.4. Should I use the cache option?
3.5. How much diskspace do I need for the cache?
3.6. How much diskspace do I need for the backend spool?
3.7. How much memory do I need?
3.8. Can I combine newsreading with newsfeeding?

4. Utilities

4.1. Which utiluties are available?
4.2. How long will dexpire take to expire articles on the spool?
4.3. How long will dexpireover take to expire overview data?
4.4. How can I keep my dactive file on my reader box in sync with the one on my spool server?

5. Operations

5.1. How should I partition my transit server?
5.2. How should I partition my reader server?
5.3. I heard something about inodes. What about them?
5.4. Can I safely upgrade to 3.x from 2.x without losing my spool?
5.5. Can I safely upgrade to 4.x from 3.x?

6. Problems

6.1. I keep getting "Maximum file descriptors exceeded", what's wrong?
6.2. After I switched to Diablo, I suddenly see a drop in outbound traffic. Why?
6.3. My monitoring system sends an alert when diablo is in pause mode. How can I get rid of this?
6.4. When I start a new server, new news disappears even though my dexpire.ctl file is correct. What is wrong?
6.5. I sometimes see different headers for the same article, how can that be?
6.6. My history file appears to be corrupt - what should I do?

7. Examples

7.1. FreeBSD using vinum on large spool servers

1. General

1.1. What is Diablo?

Diablo is an opensource Usenet transit and reader package. It is meant to be fast, scalable and flexible. It was originally written by Matt Dillon, and is currently being maintained by a team of developers.

1.2. Where can I download diablo?

The official home page for Diablo can be found at http://www.openusenet.org/diablo.

1.3. What version should I run?

Most people should run the latest release version. The current release is 4.1-REL. It is recommended to upgrade to the latest release version whenever possible.

There are 2 development branches.

The STABLE branch is maintenance code of the latest REL version. It is usually not necessary to upgrade to a STABLE branch, but if you feel you need a certain fix or addition, it is most likely safe to upgrade.
Then there is the CURRENT branch. They are not Diablo release versions, but are bleeding edge work in progress. This code is not supported, and may set your server on fire.

1.4. Is there a mailinglist?

There are 2 mailinglists about Diablo.

diablo-announce - moderated list for announcements concerning Diablo.
diablo-users - open list for general discussion concerning Diablo and its development.

1.5. What about a newsgroup?

There is no newsgroup dedicated to Diablo. Diablo related discussions sometimes take place in news.software.nntp. It is best to use the mailinglist.

1.6. How do I contribute to Diablo?

First of all, make sure you are on the diablo-users mailinglist. Since Diablo is still work in progress, it is the primary place to voice your opinion on developments.

If you have any patches, please send them to the diablo-users mailinglist for review with as much explanation as possible.

1.7. What OS should I run Diablo on?

Diablo was originally developed on FreeBSD and will perform very well on it.

Success has been reported on:

FreeBSD
BSD/OS
Linux
Solaris

Please report other Operating Systems to cor@xs4all.nl so they can be included in this FAQ.

1.8. Where is the latest version of this FAQ?

You can find the latest version of the Diablo FAQ at http://www.xs4all.nl/~scorpio/diablo/faq.html.

2. Newsfeeder

2.1. What is a newsfeeder or news transit system?: Newsfeeding is the exchange of news articles between newsservers. A news feed is a collection of newsgroups ranging from just local groups to full news feeds with tens of thousands of groups. Larger newsfeeder systems are typically called News Transit systems. They aspire to propagate news messages between organisations as fast as possible.
2.2. How much bandwidth does a transit server need?: This depends on the number and size of news feeds you handle. Small feeds with just local groups don't need much bandwidth. For a full feed you have to be prepared to handle dozens of megabits. A full feed runs at over 300 GB of traffic a day. This does not include outgoing traffic, which can easily grow well beyond that.
2.3. How much diskspace do I need?: This depends on the size of your news feed. For a small newsfeeder, a few GB can be sufficient. For a full feed it all depends on your situation and how long you want to keep your backlogs for slow or down servers. Remember, a full feed needs over 300 GB a day, so if you want to keep those articles around for 1 day, you will need at least 300 GB dedicated to your spool. If you do not care about backlogged or down servers, you can suffice with a fraction of that amount. See also the newsreader section.
2.4. How much memory do I need?: As with almost any application, the more memory, the better. Diablo makes quite extensive use of memory mapping files, so appears to use a lot more memory than it really does. Cater for around 700KB per outgoing feed process and 1MB per incoming feed process. Those are probably the absolute minimum and try and double those figures to get decent performance. Large Diablo transit servers typically have 2GB of memory.
2.5. Are there any tools to make usage statistics?: To make pretty pictures of the usage of your diablo server, try out the feeder-stats package. This can also be found in the contrib directory of the Diablo source.

3. Newsreader

3.1. What is newsreading?

Newsreading (in this case) is the service that allows clients to retrieve news articles for reading.

3.2. What is overview?

When a client connects to a news server, it first gets an index of articles. Because it would be too expensives in terms of resources to create this index on-the-fly, news servers typically maintain a database of summary information about the articles.

This usually contains Subject, From, Date, References, and Message-ID headers from the articles.
This database is called the Overview database.

3.3. How much diskspace do I need for overview?

This depends on the size of your newsfeed and how you configure your dexpire.ctl. For a full newsfeed you'll need about 20GB for 30 days of news.

3.4. Should I use the cache option?

If you have a seperate spool and reader server, it is highly recommended to use the cache. Don't let your cache partition fill up though. It will seriously affect performance.

3.5. How much diskspace do I need for the cache?

Unfortunately, at this time the cache will fill all space available on a partition. You can not set a limit. This means you have to manually remove files from the cache. It is therefor recommended to put the cache on a seperate partition, so a full cache disk will not interfere with other parts of your news server. Typically you will need a few dozen GB for a large system. Smaller systems may only need a few GB.

To clean out your cache disk you can use something like this:

if ( -d /news/spool/cache/. ) then
cd /news/spool/cache/
find . -type f -mmin +720 -print | xargs rm
endif

Vary the parameters for your specific situation.

3.6. How much diskspace do I need for the backend spool?

This depends on the size of your newsfeed and how long you want your articles to be available to clients. A full newsfeed is around 300 GB a day. So to keep 2 weeks of a full newsfeed available to your clients, you need 4 to 5 TB (yes, that is terrabytes). These monster size servers are pretty rare nowadays, so don't feel bad to have a much smaller backend spool available. You can save a lot of diskspace by not providing certain large binary newsgroups.

3.7. How much memory do I need?

This depends on how many concurrent readers you expect to have and the expire times of your articles. The dreaderd processes can easily grow to 40MB of memory. If you expect a lot of clients, put at least 1GB of memory in the server.

3.8. Can I combine newsreading with newsfeeding?

Yes, you can run both diablo and dreaderd on one server. Typically you would run diablo on a different port, and dreaderd on port 119. You will still have to provide dreaderd with a headonly newsfeed so the reader process can build the overview database.

This is only recommended for small servers! For larger systems with dozens or more concurrent clients, it is highly recommended to seperate the reader processes from the diablo processes and use two servers.

4. Utilities

4.1. Which utiluties are available?

The following utilities are part of Diablo:

dexpire - Used to expire articles in the spool.
dexpireover - Used to expire overview files.
dclient - Execute a command on an NNTP server and write output to stdout.
dfeedinfo - Show and manipulate feed statistics.
dfeedtest - show which dnewsfeeds entries match an article.
dgrpctl - add/modify/delete newsgroups from the active file.
dfeedinfo - Get statistics about your feed system.
dhisbench - Test the performance of your history.
dicmd - Control the diablo server.
drcmd - Control the dreaderd server.
didump - dump the dhistory file in a human readable form.
diload - Appends history data to the history file.
diloadfromspool - Regenerate dhistory database from spool.
dilookup - Lookup a message-id in the dhistory file.
dkp - Diablo program to manage DKP databases.
dlockhistory - lock the history file into memory to safely operate on the history file.
doutq - Show the outgoing queue status.
dsyncgroups - Syncronise the active file with a another newsserver.
dreadart - read an article from the article spool (spool/feed server only).
doverctl - Perform maintenance on your overview data.
dreadover - retrieve the overview data/headers for a group:artno.

4.2. How long will dexpire take to expire articles on the spool?

Because of the way diablo was designed dexpire will only take at most a few minutes to complete.

4.3. How long will dexpireover take to expire overview data?

This can take anything from several minutes, to several hours, depending on the size of your backend spool. For a multiTB spool, with 30 days of data, using the -R option, this takes upto 10 hours. It is recommended to run dexpireover -R only once a week, and dexpireover -a once a day.

4.4. How can I keep my dactive file on my reader box in sync with the one on my spool server?

You can add the following to one of your hourly adm files:

/news/dbin/dsyncgroups -h <name of spoolserver> -aD

5. Operations

5.1. How should I partition my transit server?

The single most important part of a transit server is the dhistory file. It is a database of all message-ids on your spool. Every time an article comes in, it needs to be checked against the dhistory file. It is thus smart to have this file on a fast filesystem.

Another important part of a transit server is the spool. This can be anything from a few GB, to terrabytes. Since there is a chance this partition can fill up, it is recommended to put it on a seperate partition by itself.

A typical partition table will then look like this:

/
/usr
/var
/news
/news/spool/news

To speed up /news you could use a fast RAID system, or use OS specific tricks. See the FreeBSD example in the Examples section.

5.2. How should I partition my reader server?

The two important parts of a reader server are the overview database and the cache. Since the cache can grow to fill a disk, it is recommended to put this on a seperate partition. A typical partition table will then look like this:

/
/usr
/var
/news
/news/spool/cache

Make sure /news is large enough to hold your overview data.

5.3. I heard something about inodes. What about them?

Like any server, a news server can crash and need an fsck to come back up. Large filesystems, like those of large backend spool servers, tend to have a huge amount of inodes by default. Fsck will then need a very long time to fix your filesystem.

Because of the way diablo stores its articles, even multi-TB spoolservers only use a few thousand inodes. By configuring your spool filesystem to only have say 50,000 inodes, you can reduced the fsck time to mere seconds.

Thanks to Joe Greco for pointing this out to all of us running very large spools.
Be sure NOT to do this to your cache partition.

5.4. Can I safely upgrade to 3.x from 2.x without losing my spool?

Yes, you can safely upgrade. Diablo 3.x uses a new spool system but it's possible to configure your original 2.x spool as a spoolobject in 3.x. Make sure you configure dspool.ctl as follows:

spool 00
  minfree 2g #or any other value
  #spooldirs 8 (if you have spooldirs under 2.x, define this replacing 8 with the number of spooldirs)
end

metaspool trad
  spool 00
end

expire * trad

5.5. Can I safely upgrade to 4.x from 3.x?

Yes, 4.x is backwards compatible with 3.x. Some things to keep in mind:

- Be aware that once you use features like "wireformat" in 4.x, you can not switch back to 3.x without losing your spool.
- We recommend that you stop using dnntpspool.ctl and switch to the dnewsfeed options as dnntspool.ctl isn't being supported anymore.
- Some tools, most noticably dexpireover and dexpire have different switches. For instance, dexpireover needs a -y now to actually do the deletion of data.

6. Problems

6.1. I keep getting "Maximum file descriptors exceeded", what's wrong?

Check lib/defs.h for the lines:

#define MAXFORKS 256
#define MAXFEEDS 128

Increase those numbers, but don't set them too high because they determine the size of some datastructures and hence use more memory. Also make sure that your OS FD_SETSIZE is high enough.

6.2. After I switched to Diablo, I suddenly see a drop in outbound traffic. Why?

Diablo has an inherent latency in the design of the outgoing feed system. This makes Diablo push articles to peers slightly slower than other, non-Diablo, peers. The result is a drop in outgoing news to peers.

6.3. My monitoring system sends an alert when diablo is in pause mode. How can I get rid of this?

Change the label associated with your monitoring system IP to readonly. Diablo will accept connections from it, even in pause mode.

6.4. When I start a new server, new news disappears even though my dexpire.ctl file is correct. What is wrong?

The overview data uses a system where each group has a maximum number of articles, or slots. Dexpireover increases or decreases this maximum when needed. But since normally dexpireover runs say once a day, and the start values are quite low, you only see this maximum number of articles in the group. There are 2 ways to fix this.

- run dexpireover every hour for a week or 2, to get things going.
- use the 'i' option in dexpire.ctl. Be aware though that this can cause your disks to fill up.

6.5. I sometimes see different headers for the same article, how can that be?

This is normal and has to do with the way diablo gets article from the spool. There are two ways to get an article. You either supply a message-id directly, or you supply a groupname and article number. When you supply an article number dreaderd uses the overview database to translate this to a message-id, but at the same time remembers other headers from the article that are also available in the overview database. The Path: header is an example of this. Dreaderd then fetches the article from the backend spool, and replaces some article headers with the versions it knows from overview. This was done because some clients get confused when the last entry in the Path: header doesn't match the Xref: hostname. When you supply a message-id directly, none of this "munging" is done and you get the article as it appears on the spoolserver.

6.6. My history file appears to be corrupt - what should I do?

If there is any way to recover your history file, it can be done with the biweekly.atrim script to rebuild a new history. Run it. The "didump|diload" combination rebuilds the history file by scanning through the old dhistory sequentially and extracting all records for the new dhistory. This shouldn't take too long, unless you have a very large history. You can get an idea of the run time by checking the log of the last time it was run (twice a week, if your use the standard crontab)

If that fails, the next step is to rebuild the history file from the contents of the spool. This is done with the command:

diloadfromspool -a

Use the '-f' option if you can afford to not having incoming feeds for the duration of the rebuild.
For better performance, if you have multiple spools, run a separate diloadfromspool for each spool:

diloadfromspool -S 01 &
diloadfromspool -S 02 &
...

Note that the rebuild can take a very long time on large spools as every article has to be read from disk.

7. Examples

7.1. FreeBSD using vinum on large spool servers

To speed up /news on FreeBSD when you have a lot of disks available, it may be a good idea to use a vinum mirror partition.

Split every spooldisk in two parts. A relatively small section, about 1GB or so depending on the amount of disks, and a section containing the rest of the disk. Use the 1GB sections to create a very fast mirrored /news. You can then use the larger sections to create either seperate spooldirs or another vinum partition.

For an example vinum file of one of Cor's system see http://www.xs4all.nl/~scorpio/diablo/vinum.txt.
The corresponding partitions will look like http://www.xs4all.nl/~scorpio/diablo/df.txt.

Again thanks to Joe Greco for thinking up this setup.

Cor Bosman <cor@xs4all.nl>