Postfix, Amavis, and ClamAV Extremely Slow

I was recently experiencing longer and longer delays for incoming mail delivered by Postfix to Dovecot. In the end, it turned out to be a simple permission problem: ClamAV was not able to read Amavis' files, and obviously waited for a timeout for every single mail.

In fact, multiple (admittedly stupid) errors contributed more or less to the overall problem.

ClamAV Not In Amavis Group

The biggest issue was caused by clamd not being able to read the mail files created by Amavis. The log output looked like this:

Dec  2 17:36:30 mail amavis[10284]: (10284-01) (!)run_av (ClamAV-clamd) FAILED - unexpected , output="/var/amavis/tmp/amavis-20191202T173630-10284-9LpO_9G2/parts: lstat() failed: Permission denied. ERROR\n"
Dec  2 17:36:30 mail amavis[10284]: (10284-01) (!)ClamAV-clamd av-scanner FAILED: CODE(0x55c9fd7d88c0) unexpected , output="/var/amavis/tmp/amavis-20191202T173630-10284-9LpO_9G2/parts: lstat() failed: Permission denied. ERROR\n" at (eval 85) line 950.

On our server, amavisd-new is running as user amavis and group amavis, while clamd runs as user clamav and group clamav. Putting user clamav into the group amavis fixed the issue completely. Instead of 2 minutes per mail, the virus scan was no accomplished in fractions of a second.

All Primary Virus Scanners Failed

Another error message I had seen a lot of times but ignored out of laziness was this:

Dec  2 17:36:30 mail amavis[10284]: (10284-01) (!)WARN: all primary virus scanners failed, considering backups

It turned out, that I had missed one local modification in the Amavis configuration file /etc/amavid.conf after a post upgrade. It defines an array @av_scanners which contains candidates for all kind of commercial(?) virus scanners. The free ClamAV was commented out.

The error message means that amavisd-new tried out all candidates but all of the failed. Sure, because none of them is installed. The error message went away after uncommenting the ClamAV entry:

@av_scanners = (
...
['ClamAV-clamd',
\&ask_daemon, ["CONTSCAN {}\n", "/var/run/clamav/clamd.sock"],
qr/\bOK$/m, qr/\bFOUND$/m,
qr/^.*?: (?!Infected Archive)(.*) FOUND$/m ],
...
);

You have to make sure that the path to the ClamAV socket file (/var/run/clamav/clamd.sock) is correct. In doubt, consult the configuration variable LocalSocket in /etc/clamd.conf:

...
LocalSocket /var/run/clamav/clamd.sock
...

In fact, I don't know how serious the all primary virus scanners failed issue was but the warning from the log files is now gone.

Too Many Undeliverable Mails

The above issues were probably older but since we are a relatively small organization, delivery times of 2 minutes per mail was nothing that immediately caused a big failure. That changed however, after a cronjob started failing some time ago.

Unfortunately, the user that was running the cronjob was not set up to receive mails, and so postfix queued them for the default of 5 days, regularly retrying delivery. Now all the regular mails were buried among cron error mails and took literally days to be delivered.

Therefore: Make sure that all users that can run cronjobs can also receive mail but setting up an alias or an entry in the Postfix virtual table.

In fact, there are many reasons why mails can sit in the Postfix mail queue until they ultimately bounce. It is a good idea to regularly run mailq and check for abnormally filled output queues to prevent such trouble.


blog comments powered by Disqus