In these days of email worms, viruses, and ever-increasing spam, some sites want to apply a lot of checking to messages before accepting them.
The content scanning extension (chapter 40) has facilities for passing messages to external virus and spam scanning software. You can also do
a certain amount in Exim itself through string expansions and the condition condition in the ACL that runs after the SMTP DATA command or the ACL for non-SMTP messages (see chapter 39), but this has its limitations.
To allow for further customization to a site’s own requirements, there is the possibility of linking Exim with a private message scanning function, written in C. If you want to run code that is written in something other than C, you can of course use a little C stub to call it.
The local scan function is run once for every incoming message, at the point when Exim is just about to accept the message. It can therefore be used to control non-SMTP messages from local processes as well as messages arriving via SMTP.
Exim applies a timeout to calls of the local scan function, and there is an option called local_scan_timeout for setting it. The default is 5 minutes. Zero means “no timeout”. Exim also sets up signal handlers for SIGSEGV, SIGILL, SIGFPE, and SIGBUS before calling the local scan function, so that the most common types of crash are caught. If the timeout is exceeded or one of those signals is caught, the incoming message is rejected with a temporary error if it is an SMTP message. For a non-SMTP message, the message is dropped and Exim ends with a non-zero code. The incident is logged on the main and reject logs.
To make use of the local scan function feature, you must tell Exim where your function is before building Exim, by setting LOCAL_SCAN_SOURCE in your Local/Makefile. A recommended place to put it is in the Local directory, so you might set
LOCAL_SCAN_SOURCE=Local/local_scan.c
for example. The function must be called local_scan(). It is called by Exim after it has received a message, when the success return code is about to be sent. This is after all the ACLs have been run. The return code from your function controls whether the message is actually accepted or not. There is a commented template function (that just accepts the message) in the file src/local_scan.c.
If you want to make use of Exim’s run time configuration file to set options for your local_scan() function, you must also set
LOCAL_SCAN_HAS_OPTIONS=yes
in Local/Makefile (see section 41.3 below).
You must include this line near the start of your code:
#include "local_scan.h"
This header file defines a number of variables and other values, and the
prototype for the function itself. Exim is coded to use unsigned char values
almost exclusively, and one of the things this header defines is a shorthand
for unsigned char
called uschar
.
It also contains the following macro definitions, to simplify casting character
strings and pointers to character strings:
#define CS (char *) #define CCS (const char *) #define CSS (char **) #define US (unsigned char *) #define CUS (const unsigned char *) #define USS (unsigned char **)
The function prototype for local_scan() is:
extern int local_scan(int fd, uschar **return_text);
The arguments are as follows:
fd is a file descriptor for the file that contains the body of the message (the -D file). The file is open for reading and writing, but updating it is not recommended. Warning: You must not close this file descriptor.
The descriptor is positioned at character 19 of the file, which is the first
character of the body itself, because the first 19 characters are the message
id followed by -D
and a newline. If you rewind the file, you should use the
macro SPOOL_DATA_START_OFFSET to reset to the start of the data, just in
case this changes in some future version.
The function must return an int value which is one of the following macros:
LOCAL_SCAN_ACCEPT
LOCAL_SCAN_ACCEPT_FREEZE
LOCAL_SCAN_ACCEPT_QUEUE
LOCAL_SCAN_REJECT
\n
in log lines. If no message is given, “Administrative prohibition” is used.
LOCAL_SCAN_TEMPREJECT
LOCAL_SCAN_REJECT_NOLOGHDR
LOCAL_SCAN_TEMPREJECT_NOLOGHDR
If the message is not being received by interactive SMTP, rejections are reported by writing to stderr or by sending an email, as configured by the -oe command line options.
It is possible to have option settings in the main configuration file that set values in static variables in the local_scan() module. If you want to do this, you must have the line
LOCAL_SCAN_HAS_OPTIONS=yes
in your Local/Makefile when you build Exim. (This line is in OS/Makefile-Default, commented out). Then, in the local_scan() source file, you must define static variables to hold the option values, and a table to define them.
The table must be a vector called local_scan_options, of type
optionlist
. Each entry is a triplet, consisting of a name, an option type,
and a pointer to the variable that holds the value. The entries must appear in
alphabetical order. Following local_scan_options you must also define a
variable called local_scan_options_count that contains the number of
entries in the table. Here is a short example, showing two kinds of option:
static int my_integer_option = 42; static uschar *my_string_option = US"a default string";
optionlist local_scan_options[] = { { "my_integer", opt_int, &my_integer_option }, { "my_string", opt_stringptr, &my_string_option } }; int local_scan_options_count = sizeof(local_scan_options)/sizeof(optionlist);
The values of the variables can now be changed from Exim’s runtime configuration file by including a local scan section as in this example:
begin local_scan my_integer = 99 my_string = some string of text...
The available types of option data are as follows:
BOOL
, which will be set to TRUE or FALSE, which are macros
that are defined as “1” and “0”, respectively. If you want to detect
whether such a variable has been set at all, you can initialize it to
TRUE_UNSET. (BOOL variables are integers underneath, so can hold more than two
values.)
int
. The value is stored
multiplied by 1000, so, for example, 1.4142 is truncated and stored as 1414.
int
. The value may be specified in any of the integer formats accepted by
Exim.
uschar *
).
int
. The value that is placed there is a number of seconds.
If the -bP command line option is followed by local_scan
, Exim prints
out the values of all the local_scan() options.
The header local_scan.h gives you access to a number of C variables. These are the only ones that are guaranteed to be maintained from release to release. Note, however, that you can obtain the value of any Exim variable by calling expand_string(). The exported variables are as follows:
This variable is set to zero when no debugging is taking place. Otherwise, it is a bitmap of debugging selectors. Two bits are identified for use in local_scan(); they are defined as macros:
D_v
bit is set when -v was present on the command line. This is a
testing option that is not privileged – any caller may set it. All the
other selector bits can be set only by admin users.
D_local_scan
bit is provided for use by local_scan(); it is set
by the +local_scan
debug selector. It is not included in the default set
of debugging bits.
Thus, to write to the debugging output only when +local_scan
has been
selected, you should use code like this:
if ((debug_selector & D_local_scan) != 0) debug_printf("xxx", ...);
LOCAL_SCAN_ACCEPT
, the
message is accepted, but immediately blackholed. To replace the recipients, set
recipients_count to zero and then call receive_add_recipient() as often as
needed.
The header_line structure contains the members listed below. You can add additional header lines by calling the header_add() function (see below). You can cause header lines to be ignored (deleted) by setting their type to *.
The recipient_item structure contains these members:
The header local_scan.h gives you access to a number of Exim functions. These are the only ones that are guaranteed to be maintained from release to release:
This function creates a child process that runs the command specified by argv. The environment for the process is specified by envp, which can be NULL if no environment variables are to be passed. A new umask is supplied for the process in newumask.
Pipes to the standard input and output of the new process are set up and returned to the caller via the infdptr and outfdptr arguments. The standard error is cloned to the standard output. If there are any file descriptors “in the way” in the new process, they are closed. If the final argument is TRUE, the new process is made into a process group leader.
The function returns the pid of the new process, or -1 if things go wrong.
This function waits for a child process to terminate, or for a timeout (in seconds) to expire. A timeout value of zero means wait as long as it takes. The return value is as follows:
>= 0
The process terminated by a normal exit and the value is the process ending status.
< 0 and > –256
The process was terminated by a signal and the value is the negation of the signal number.
–256
The process timed out.
–257
The was some other error in wait(); errno is still set.
This function provide you with a means of submitting a new message to Exim. (Of course, you can also call /usr/sbin/sendmail yourself if you want, but this packages it all up for you.) The function creates a pipe, forks a subprocess that is running
exim -t -oem -oi -f <>
and returns to you (via the int *
argument) a file descriptor for the pipe
that is connected to the standard input. The yield of the function is the PID
of the subprocess. You can then write a message to the file descriptor, with
recipients in To:, Cc:, and/or Bcc: header lines.
When you have finished, call child_close() to wait for the process to finish and to collect its ending status. A timeout value of zero is usually fine in this circumstance. Unless you have made a mistake with the recipient addresses, you should get a return code of zero.
This is Exim’s debugging function, with arguments as for (printf(). The
output is written to the standard error stream. If no debugging is selected,
calls to debug_printf() have no effect. Normally, you should make calls
conditional on the local_scan
debug selector by coding like this:
if ((debug_selector & D_local_scan) != 0) debug_printf("xxx", ...);
This function adds a new header line at a specified point in the header chain. The header itself is specified as for header_add().
If name is NULL, the new header is added at the end of the chain if after is true, or at the start if after is false. If name is not NULL, the header lines are searched for the first non-deleted header that matches the name. If one is found, the new header is added before it if after is false. If after is true, the new header is added after the found header and any adjacent subsequent ones with the same name (even if marked “deleted”). If no matching non-deleted header is found, the topnot option controls where the header is added. If it is true, addition is at the top; otherwise at the bottom. Thus, to add a header after all the Received: headers, or at the top if there are no Received: headers, you could use
header_add_at_position(TRUE, US"Received", TRUE, ' ', "X-xxx: ...");
Normally, there is always at least one non-deleted Received: header, but there may not be if received_header_text expands to an empty string.
This function tests whether the given header has the given name. It is not just a string comparison, because white space is permitted between the name and the colon. If the notdel argument is true, a false return is forced for all “deleted” headers; otherwise they are not treated specially. For example:
if (header_testname(h, US"X-Spam", 6, TRUE)) ...
This function checks for a match in a domain list. Domains are always matched caselessly. The return value is one of the following:
OK
match succeededFAIL
match failedDEFER
match deferredDEFER is usually caused by some kind of lookup defer, such as the inability to contact a database.
This function checks for a match in a host list. The most common usage is expected to be
lss_match_host(sender_host_name, sender_host_address, ...)
An empty address field matches an empty item in the host list. If the host name is NULL, the name corresponding to $sender_host_address is automatically looked up if a host name is required to match an item in the list. The return values are as for lss_match_domain(), but in addition, lss_match_host() returns ERROR in the case when it had to look up a host name, but the lookup failed.
LOG_MAIN
or
LOG_REJECT
or LOG_PANIC
or the inclusive “or” of any combination of them.
It specifies to which log or logs the message is written. The remaining
arguments are a format and relevant insertion arguments. The string should not
contain any newlines, not even at the end.
This function adds an additional recipient to the message. The first argument is the recipient address. If it is unqualified (has no domain), it is qualified with the qualify_recipient domain. The second argument must always be -1.
This function does not allow you to specify a private errors_to address (as described with the structure of recipient_item above), because it pre-dates the addition of that field to the structure. However, it is easy to add such a value afterwards. For example:
receive_add_recipient(US"monitor@mydom.example", -1); recipients_list[recipients_count-1].errors_to = US"postmaster@mydom.example";
uschar *rfc2047_decode(uschar *string, BOOL lencheck, uschar *target, int zeroval, int *lenptr, uschar **error):: This function decodes strings that are encoded according to RFC 2047. Typically these are the contents of header lines. First, each “encoded word” is decoded from the Q or B encoding into a byte-string. Then, if provided with the name of a charset encoding, and if the iconv() function is available, an attempt is made to translate the result to the named character set. If this fails, the binary string is returned with an error message.
+ The first argument is the string to be decoded. If lencheck is TRUE, the maximum MIME word length is enforced. The third argument is the target encoding, or NULL if no translation is wanted.
+ If a binary zero is encountered in the decoded string, it is replaced by the contents of the zeroval argument. For use with Exim headers, the value must not be 0 because header lines are handled as zero-terminated strings.
+ The function returns the result of processing the string, zero-terminated; if lenptr is not NULL, the length of the result is set in the variable to which it points. When zeroval is 0, lenptr should not be NULL.
+ If an error is encountered, the function returns NULL and uses the error argument to return an error message. The variable pointed to by error is set to NULL if there is no error; it may be set non-NULL even when the function returns a non-NULL value if decoding was successful, but there was a problem with translation.
The arguments of this function are like printf(); it writes to the SMTP output stream. You should use this function only when there is an SMTP output stream, that is, when the incoming message is being received via interactive SMTP. This is the case when smtp_input is TRUE and smtp_batched_input is FALSE. If you want to test for an incoming message from another host (as opposed to a local process that used the -bs command line option), you can test the value of sender_host_address, which is non-NULL when a remote host is involved.
If an SMTP TLS connection is established, smtp_printf() uses the TLS output function, so it can be used for all forms of SMTP connection.
Strings that are written by smtp_printf() from within local_scan() must start with an appropriate response code: 550 if you are going to return LOCAL_SCAN_REJECT, 451 if you are going to return LOCAL_SCAN_TEMPREJECT, and 250 otherwise. Because you are writing the initial lines of a multi-line response, the code must be followed by a hyphen to indicate that the line is not the final response line. You must also ensure that the lines you write terminate with CRLF. For example:
smtp_printf("550-this is some extra info\r\n"); return LOCAL_SCAN_REJECT;
Note that you can also create multi-line responses by including newlines in the data returned via the return_text argument. The added value of using smtp_printf() is that, for instance, you could introduce delays between multiple output lines.
The smtp_printf() function does not return any error indication, because it does not automatically flush pending output, and therefore does not test the state of the stream. (In the main code of Exim, flushing and error detection is done when Exim is ready for the next SMTP input command.) If you want to flush the output and check for an error (for example, the dropping of a TCP/IP connection), you can call smtp_fflush(), which has no arguments. It flushes the output stream, and returns a non-zero value if there is an error.
No function is provided for freeing memory, because that is never needed. The dynamic memory that Exim uses when receiving a message is automatically recycled if another message is received by the same process (this applies only to incoming SMTP connections – other input methods can supply only one message at a time). After receiving the last message, a reception process terminates.
Because it is recycled, the normal dynamic memory cannot be used for holding data that must be preserved over a number of incoming messages on the same SMTP connection. However, Exim in fact uses two pools of dynamic memory; the second one is not recycled, and can be used for this purpose.
If you want to allocate memory that remains available for subsequent messages in the same SMTP connection, you should set
store_pool = POOL_PERM
before calling the function that does the allocation. There is no need to restore the value if you do not need to; however, if you do want to revert to the normal pool, you can either restore the previous value of store_pool or set it explicitly to POOL_MAIN.
The pool setting applies to all functions that get dynamic memory, including expand_string(), store_get(), and the string_xxx() functions. There is also a convenience function called store_get_perm() that gets a block of memory from the permanent pool while preserving the value of store_pool.