- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [ a r t i c l e ] [ a u t h o r ] Security Issues in Perl Scripts Jordan Dimov (fl1pfl0p) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Revision 1.08, June 2000 It is no news that Perl is one of the most widely used languages for writing CGI scripts on the Web, and Perl programs are largely used for various system administration tasks. In applications like these, user interaction is fairly intense (especially in the case of CGI scripts) and so is interaction with the underlying operating system. Holding such a critical intermediate position is no easy task for Perl. Applications that serve these tasks must provide reliable access to security sensitive functions and information, and at the same time ensure that no one is given an entry ticket to data or functionality that was not intended for them. This text evaluates some of the common security weaknesses and abuses of Perl applications and gives an overview of the features that the Perl language provides to prevent this from happening. [ 0 :: Introduction ] This paper is not a formal analysis of the features of the Perl language itself. A programming language does not constitute a security risk; it is with the programmer that the risk is introduced. Perl provides a flexible and powerful interface between the programmer and the operating system. It is also a unique way of thinking that makes it easier to solve problems. When used with care, Perl is a powerful tool that "makes easy things easier and hard things possible". When handled carelessly though, it can be as dangerous as any other language. And although Perl makes a great effort to ensure the safety of your programs it was not designed to stop you from writing insecure programs, because that would also stop you from doing other more creative things. Our chief focus will therefore be on weaknesses of applications programmed in Perl rather than being on weaknesses of the language itself. We will look at some of the most widely misused and overlooked features of Perl. We'll see how they can pose threats to the security of the system on which they are running as well as to their users. We'll show how such weaknesses can be exploited and how to fix or avoid them. We will spend some time getting acquanited with the security features that Perl has to offer, particularly Perl's "taint mode", and we'll try to identify some problems that can slip through this tightened security if we are not careful. In studying those aspects of Perl and looking at some characteristic examples, our goal will be to develop an intuition that'll help us recognize security problems in Perl scripts at first glance and avoid making similar mistakes in our programs. [ 1 :: Basic user input vulnerabilities ] The greatest portion of security problems with Perl scripts are due to improperly validated (or unvalidated) user input. This is especially dangerous when the input can come from any source connected to the Internet, as is the case with CGI scripts. Indeed, "Vulnerable CGI programs and application extensions installed on web servers" is rated number two most- critical internet security threat by the SANS institute [http://www.sans.org/ topten.htm]. If trusted and used without validation, improper user input to such applications can cause many things to go wrong. The most common and obvious mistake is executing other programs with user provided arguments, without proper validation. Let's look at some examples: [ 1.1 :: The system() and exec() functions ] Perl is famous for its use as a "glue" language - it does an excellent job of calling other programs to do the work for it, carefully coordinating their activities by collecting the output of one program, reformatting it in a particular manner and passing it as input to some other program so everything runs smoothly. As the Perl slogan tells us, there is more than one way to do this. One way to execute an external program or a system command is by calling the exec() function. When Perl encounters an exec() statement, it looks at the arguments that exec() was invoked with, then starts a new process executing the specified command. Perl never returns control to the original process that called exec(). system() acts very much like exec(). The only major difference is that Perl first forks off a child from the parent process. The child is the argument supplied to system(). The parent process waits until the child is done running, and then proceeds with the rest of the program. We will discuss the system() call in greater detail below, but most of the discussion applies to exec() just as well. The argument given to system() is a list - the first element on the list is the name of the program to be executed and the rest of the elements are passed on as arguments to this program. The interesting part is when there is only one parameter in the system() call, as in system ("ls -l /home/jdimov/*"); When Perl sees a system() call with only one parameter, it first scans it to see if the parameter string contains any shell metacharacters. In our case, it does - the asterisk. So Perl passes the entire string "ls -l /home/jdimov/*" (without the quotes) to the shell to do with it whatever it likes, and then waits until the shell returns. If there are no recognizable shell metacharacters, Perl parses the string for you, splits it into words, and executes the program with execvp(). This is because execvp() is generally more efficient than forking a shell, but it won't process shell metacharacters. Now suppose we have a CGI form that asks for a username, and shows some file containing statistics for that user. So we're lazy and we use a system () call to invoke 'cat' for that purpose like this: system ("cat /usr/stats/$username"); And the $username came from the form: $username = param ("username"); The user fills in the form, with username=jdimov for example, then submits it. Perl doesn't find any metacharacters in the string "cat /usr/stats/jdimov" so it calls execvp(), which runs "cat" and then returns to our script. The problem with our script is that it is too naive. It trusts the user too much. In fact, putting a script like this on the web is about equivalent to giving all hackers, crackers, and script kiddies in the world a shell account on your system. And in case the web server is running as root, anyone with a web browser has virtually total control over your system. More precisely, the problem is that by using special-meaning characters in the 'username' field on the form, an attacker can execute any command through the shell. A trivial example would be the string "jdimov; cat /etc/passwd". Perl recognizes the semi-colon as a metacharacter and passes this to the shell: cat /usr/stats/jdimov; cat /etc/passwd The attacker gets both the dummy stats file and the password file. Another example would be "; rm -rf /*". This one doesn't produce much output... but the bad guys seem to like it for some reason. We mentioned earlier that system() takes a list of parameters and executes the first element as a command, passing it the rest of the elements as arguments. So we change our script a little so that only the program we want gets executed: system ("cat", "/usr/stats/$username"); This way no user input goes through the shell. If we try the old "jdimov; shutdown -h NOW" trick, nothing happens, because the file "jdimov; shutdown -h NOW" does not exist. This is a far safer bet than the one argument version, since we avoid the shell, but it is still potentially dangerous, because the string $username may exploit certain weaknesses of the program that is being executed (in this case "cat"). So nothing stops us from asking "cat" to show us the password file again, by setting $username to the string "../../etc/ passwd", which evaluates to "/usr/stats/../../etc/passwd", which in effect is the same as "/etc/passwd". This is known as backward directory traversing. To prevent it, we simply check for the double-dot-slash combination, then scare our attacker away and refuse to serve them: if ($username=~/\.\.\//) { print "Boo!!\n"; exit (1); } We could get more restrictive and only check for '/'. Slashes have no business in usernames anyways. We could also log the event, together with everything we know about the user, and notify the FBI. But that's not always very efficient. Backward directory traversing is not the only thing that can go wrong when calling external programs. Some applications interpret special character sequences as requests for executing a shell command. Some versions of the Unix "mail" utility for example will execute a shell command following the ~! escape sequence. Thus, user input containing "~!rm -rf *" on a blank line in a message body may cause trouble under certain circumstances. As far as security is concerned, everything stated above with regard to the system() function applies to exec() too. [ 1.2 :: The open() function ] Perl uses the open() function to (surprise!) open things. In the most common of its many forms, the open() function looks like open (FILEHANDLE, "filename"); Used like this, "filename" is open in read-only mode. If "filename" is prefixed with the '>' sign, it is open for output. If it is prefixed with '>>' it is open for appending. The prefix '<' opens the file for input, but this is the default mode if no prefix is used. Some problems of using unvalidated user input as part of the filename should already be obvious. For example the backward directory traversing trick works just as well here, so it may be a good idea to check for the '../' sequence first (or better yet, look at the "Filtering User Input" section below.) Luckily, open() has no reason to pass anything through the shell in this case, so we're not too worried about metacharacters. But that's not the end of the open() story. Let's modify our statistics script to use open() instead of "cat". We would have something like: open (STATFILE, "/usr/stats/$username"); and then some code to read from the file and show it. The Perl documentation tells us that: If the filename begins with `'|'', the filename is interpreted as a command to which output is to be piped, and if the filename ends with a `'|'', the filename is interpreted as a command which pipes output to us. So we know we're in trouble when the user pretends their username is "trojan|" To work around this, we ALWAYS explicitly specify that we want the file open for input by prefixing it with the '<' sign: open (STATFILE, "; We do our homework and we strip away '../' sequences and poison NUL bytes as well as all nasty shell metacharacters, so we're safe. Or are we? Notice the trailing pipe symbol in our string. We are telling Perl to run the txt2html program on our stats file and pipe the output back to us so we can read it through the HTML file handle and display it. The way we have the open() statement set up, it goes through the shell. We want to avoid this as much as possible, because even though we have filtered out the metacharacters that we know of, there is no way to be sure that we haven't missed anything. Besides, it might be the case that we do want to allow asterisks for example in our input string. The work-around for this is to use a special form of the open() call to skip the shell, like this: open (HTML, "-|") || exec ("/usr/bin/txt2html","/usr/stats/$username"); print while ; When we open a pipe to '-', either for reading ('-|') or for writing ('|-'), Perl forks the current process and returns the PID of the child process to the parent and 0 to the child. The or statement ('||') is used to decide weather we are in the parent or child process. If we're in the parent (the return value of open() is non-zero) we continue with the print() statement. Otherwise, we're the child, so we execute the txt2html program, using the safe version of exec() with more than one argument to avoid passing anything through the shell (see the discussion in section 1.1.) What happens is that the child process prints the output that txt2html produces to STDOUT and then dies quietly (remember exec() never returns), while in the mean time the parent process reads the results from STDIN. The very same technique can be used for piping output to an external program: open (PROGRAM, "|-") || exec ("/usr/bin/progname","$userinput"); print PROGRAM, "This is piped to /usr/bin/progname"; No shell, no problem. Those forms of open() should always be preferred to a direct piped open() when pipes are needed. [ 1.3 :: Backticks ] In Perl, yet another way to read the output of an external program is to enclose the command in backticks. So if we wanted to store the contents of our stats file in the scalar $stats, we could do something like: $stats = `cat /usr/stats/$username`; This does go through the shell. Any script that involves user input inside of a pair of backticks is vulnerable to all the security problems that we talked about earlier, unless very extensive checks and examinations are performed on the user input first. There are a couple of different ways to try to make the shell not interpret possible metacharacters, but the safest thing to do is to not use backticks. Instead, open a pipe to STDIN, then fork and execute the external program like we did at the end of the previous section with open(). [ 1.4 :: eval() and the /e regex modifier ] One very sweet feature of Perl is that it gives you a way to interpret Perl code coming from the user without having to know every little thing about parsing and how Perl works. This can be used to implement some powerful functionality in applications, but it should always be used with great caution. The eval() function can execute a block of Perl code at runtime, returning the value of the last evaluated statement. This way we can have flexible configuration files for instance, with handler functions all written in Perl, and a lot of other things too. Of course, this is all wonderful, but you surely do not want to eval() something if you don't know where it came from. So, unless you absolutely trust the source of code to be passed to eval(), do not do things like eval($userinput). This also applies to the /e modifier in regular expressions that makes Perl interpret the expression before processing it. [ 2 :: Other sources of security problems ] [ 2.1 :: Insecure Environmental Variables $PATH, $IFS, @INC, etc. ] User input is indeed the chief source of security problems with Perl programs, but there are other factors that should be considered when writing secure Perl code. One commonly exploited weakness of scripts running under the shell or by a web server are insecure environmental variables, most commonly the PATH variable. When you access an external application or utility from within your code by only specifying a relative path to it, you put at odds the security of your whole program and the system that it's running on. Say you have a system() call like this: system ("txt2html", "/usr/stats/jdimov"); For this call to work, you assume that the txt2html file is in a directory that is contained somewhere in the PATH variable. But should it happen so that an attacker alters your path to point to some other malicious program with the same name, your system's security is no more guaranteed. In order to prevent things like this from happening, every program that needs to be even remotely security conscious should start with something like: #!/usr/bin/perl -wT require 5.001; use strict; $ENV{PATH} = join ':' => split (" ", << '__EOPATH__'); /usr/bin /bin /maybe/something/else __EOPATH__ If the program relies on other environmental variables, they should also be explicitly redefined before being used. Another dangerous variable (this one is more Perl-specific) is the @INC array variable which is sort of like PATH except it specifies where Perl should look for modules to be included in the program. The problem with @INC is pretty much the same as that of PATH - someone might point your Perl to a module that has the same name and does about the same thing as the module you expect, but it also does something mean and dirty in the background. Therefore, @INC should not be trusted any more than PATH and should be completely redefined before including any external modules. [ 2.2 :: setuid scripts ] Normally a Perl program runs with the privileges of the user who executed it. By making a script setuid, its effective user ID can be set to one that has access to resources to which the actual user does not. The passwd program for example uses setuid to temporally acquire writing permission to the system password file, thus allowing users to change their own passwords. Since programs that are executed via a CGI interface run with the privileges of the user who runs the web server (usually this is user 'nobody', who has very limited privileges), CGI programmers are often tempted to use the setuid technique to let their scripts perform tricks that they otherwise couldn't. This can be cool, but it can also be very dangerous. For one thing, if an attacker finds a way to exploit a weakness in the script, they won't only gain access to the system, but they will also have it with the privileges of the effective UID of that script (often the 'root' UID). To avoid this, Perl programs should set the effective UID and GID to the real UID and GID of the process before any file manipulations: $> = $< # set effective user ID to real UID. $) = $( # set effective group ID to real GID. and CGI scripts should always run with the lowest possible privilege. Beware that just being careful in what you do inside your setuid script doesn't always solve the problem. Some operating systems have bugs in the kernel that make setuid scripts inherently insecure [perlsec]. For this, and other reasons, Perl automatically switches to a special security mode (taint mode) when it runs setuid or setgid scripts. See Section 4.3 below for more information on taint mode. [ 2.3 :: rand() ] Generating random numbers on deterministic machines is a non-trivial problem. In security-critical applications, random numbers are used intensely for many important tasks ranging from password generation to cryptography. For such purposes, it is vital that the generated numbers are as close to truly random as possible, making it difficult (but never impossible) for an attacker to predict future numbers generated by the algorithm. The Perl rand() function simply calls the corresponding rand(3) function from the standard C library. This routine is far from reliable when security is important. The C rand() function generates a sequence of pseudo-random numbers based on some initial value called the seed. Given the same seed, two different instances of a program utilizing rand() will produce the same random values. In many implementations of C, and in all version of Perl before 5.004, if a seed is not explicitly specified, it is computed from the current value of the system timer, which is anything but random. Having some information about values produced by rand() at a given point and a sufficient amount of time, any self- respecting cracker can accurately predict the sequence of numbers that rand() will generate next, thus obtaining key knowledge necessary to compromise a system. The Perl solution to this problem is to avoid rand() when true randomness is important and to use one of several implemented modules that are based on theoretically sound algorithms and that have been tested extensively. Two such modules that can be downloaded from the CPAN archive (http://www.cpan.org/) are Math::Random and Math::TrulyRandom. The first one provides a set of functions for generating random numbers in a large variety of different statistical distributions. The generators used by this module are implemented from high quality published sources, some of which are copyright by the ACM and require permission to be used in commercial products. The functions provided by Math::Random are efficient and very portable. The other module (Math::TrulyRandom) is quite a bit different. It uses discrepancies of the system timer as a source of randomness. Math::TrulyRandom takes a very long time to generate a random number, so it only has a limited use - it's good for generating seeds for use with other, more efficient generators (like the ones in Math::Random). [ 2.4 :: Race Conditions ] This (together with buffer overflows) is a favorite exploit of seasoned crackers. What is wrong with the following snippet? unless (-e "/tmp/a_temporary_file") { open (FH, ">/tmp/a_temporary_file"); } At first glance this is a very legitimate piece of code which can hardly cause any harm. We check to see weather the temporary file exists, and if it doesn't we tell Perl to create it and open it for writing. The problem here is that we assume that our -e check is correct at the time we open the file. Of course, Perl wouldn't lie to us about a file existence, but unlikely as it might seem, it is entirely possible that the status of our file has changed between the time we check for it and the time we open it for writing. Suppose that the temporary file does not exist. Suppose also that a knowledgeable attacker, familiar with the workings of our program, executed the following command right at the time after we did our existance check: ln -s /tmp/a_temporary_file /etc/an_important_config_file Now everything we do to the temporary file actually gets done to that important config file of ours. Since we believe that the temp file does not exist (that's what our -e check told us), we go ahead and open it for writing. As a result, our config file gets erased. Not very pleasant. And if the attacker knew what they're doing, this might even be fatal. Situations like this, where an attacker can race in and change something to cause us trouble between two actions of our program are known as race conditions. Such imperfections in a program are very easy to overlook even by experienced programmers, and are being actively exploited. There is no easy omnipowerful solution to this problem. The right thing to do in our example would be to use sysopen() and specify a write-only mode, without setting the truncate flag: unless (-e "/tmp/a_temporary_file") { #open (FH, ">/tmp/a_temporary_file"); # bad and dangerous! sysopen (FH, "/tmp/a_temporary_file", O_WRONLY); # a little safer } This way even if our filename does get forged, we won't kill the file when we open it for writing. Note: the module Fcntl must be included in order for that sysopen() call to work, because this is where the constants O_RDONLY, O_WRONLY, O_CREAT, etc. are defined. [ 2.5 :: Buffer Overflows and Perl ] In general, Perl scripts are not susceptible to buffer overflows because Perl dynamically extends its data structures when needed. Perl keeps track of the size and allocated length of every string. Before each time a string is being written into, Perl ensures that enough space is available, and allocates more space for that string if necessary. There are however a few known buffer overflow conditions in some older implementations of Perl. Notably, version 5.003 can be exploited with buffer overflows. All versions of suidperl (a program designed to work around race conditions in setuid scripts for some kernels) built from distributions of Perl earlier than 5.004 are BO exploitable (CERT Advisory CA-97.17). [ 3 :: Preventive Strategies ] [ 3.1 :: Filtering User Input ] In the sections on malicious user input earlier we explored the possibility of filtering out unwanted metacharacters and other problematic data. This goes against the general security-expert mindset however. Anyone who is in the know and who is responsible for enforcing security policies is likely to tell you that a policy of type "If it is not allowed, it is forbidden" works better than one of type "If it is not forbidden, it is allowed." Our policy of restricting a set of dangerous user input is of the second type. The problem with a policy like this is that it's very hard to keep it complete and updated. You may forget to filter out a certain character, or your program may have to switch to a different shell with different set of metacharacters... Anything can happen. If we were to enforce a policy of the first type however, things would start to look different (a lot more secure, and even a little easier to implement). Instead of filtering out unwanted metacharacters and other dangerous input, filter in only the input that is legitimate. The following snippet for example will cease to execute a security-critical operation if the user input contains anything except letters, numbers, a dot, or an @ sign (characters that may be found in a user's e-mail address): unless ($useraddress =~ /^([-\@\w.]+)$/) { print "You don't exist! Go away!\n"; exit (1); } [ 3.2 :: Avoiding the Shell ] As another general rule, when interacting with external programs and data avoid the shell like the plague. Going through the shell can be a great approach, it's very flexible, but it is very very insecure. And there isn't anything really that can be done via the shell but not in pure Perl. Always use the multi-argument versions of exec() and system(), the special version of piped open() that we showed in section 1.2, and avoid backticks. Whenever possible, existing Perl modules should be preferred to existing shell programs. The Comprehensive Perl Archive Network (CPAN - www.cpan.org) is a huge resource of tested functional modules for almost anything that a standard UNIX toolset can do. While it may take a little more work to include a module and call it instead of calling an external program, the modular approach is in general far more secure and often a lot more flexible. Just to illustrate the point, using Net::SMTP instead of exec()'ing "sendmail -T" can save you the trouble of going through the shell and can prevent your users from exploiting known vulnerabilities in the 'sendmail' agent. [ 3.3 :: Perl Taint Mode ] So how can Perl stop you from doing the mistakes that we discussed? Well, it can't really. And even if it could, the authors of Perl wouldn't do that to you. It would be against Perl's philosophy. You will normally not see Perl saying anything like "I won't let you do this because it is dangerous." If you want to do something, Perl will let you and it is up to you to decide weather it's a good thing to do. But one thing Perl can do for you is give you advise if you ask for it. Perl has a special security mode called "taint mode" which can be entered by giving Perl the -T command-line option. While in taint mode, Perl carefully monitors all information that comes from outside your program and issues warnings when you attempt to do something potentially dangerous with this information. The things that taint Perl monitors include user input, environmental variables, and program arguments. Suppose we have a little script like this one: #!/usr/bin/perl -T $username = ; chop $username; system ("cat /usr/stats/$username"); Notice the -T option at the end of the first line. Notice also that we are using the one argument version of system(). When we execute this script, Perl enters taint mode and then tries to compile the program. The first thing that taint mode notices is that we haven't explicitly initialized our PATH variable. It issues a warning like this and aborts compilation: Insecure $ENV{PATH} while running with -T switch at ./catform.pl line 4, chunk 1. So we go back to our program and we modify it as discussed in section 2.1. Our program now looks somewhat like this: #!/usr/bin/perl -T use strict; # use this when possible $ENV{PATH} = join ':' => split (" ",<< '__EOPATH__'); /usr/bin /bin __EOPATH__ my $username = ; chop $username; system ("cat /usr/stats/$username"); Taint mode now realizes that the $username variable comes from outside our little world and may be tainted. So when you ask Perl to do something as drastic as that system() call, it gets scared and aborts the compilation with another warning: Insecure dependency in system while running with -T switch at ./catform.pl line 9, chunk 1. If we go ahead and split the system() call arguments in two parts, as in system ("cat", "/usr/stats/$username"); taint mode is happy. But as we've seen earlier, this doesn't necessarily make your script secure - the two-argument version of system() can still be dangerous. Taint mode doesn't know about every single possible vulnerability and although it can help you avoid many weaknesses in your programs, taint mode is not the ultimate solution. Here is a fairly complete list of functions that Perl considers dangerous while in taint mode: system(), exec(), open() # we discussed those earlier glob(), unlink(), mkdir(), chdir(), rmdir(), chown(), chmod() umask(), link(), symlink(), kill(), eval(), truncate(), ioctl() fcntl(), socket(), socketpair(), bind(), connect(), chroot() setpgrp(), setpriority(), syscall() Those are all functions that either access the filesystem somehow or interact with other processes (except eval() which is obviously too powerful to be secure). Taint mode also considers insecure the use of backticks and the -s switch. Like that system() call in our example, some things can slip around Perl's taint mode and cause security problems if not used with care. The open() function accepts a tainted filename for read-only opens (when used with the '<' qualifier). A poor validation in a CGI script may give the distant user access to any file to which the HTTP daemon has access. Taint mode never complains about using sysopen() with user-supplied input. Thus something like sysopen (FH, $userinput, O_WRONLY|O_CREAT); can be disastrous even when taint mode is on. When using the CGI.pm module, taint mode does not consider a construct like $someparameter = param ('level'); insecure, although it certainly implies a possibility of tainted user input. It is always a good idea to turn on taint mode in security critical applications (see the Perl security man page [perlsec] for more information on taint mode and how to untaint data), but keep in mind that taint mode does overlook some details. (B. Red: Taints could be bypassed by using simple fake regular expresions like $a = $($a =~ /(*.)/igs)[0]; this makes perl think that the variable got through a real regexp designed to prevent intrusions and malicious input.) [ 5 :: References ] [r.f.p] Rain Forest Puppy, "Perl CGI problems", Phrack Magazine, Vol. 9, Issue 55, File 07. [wwwsec] "The World Wide Web Security FAQ". Chapter 7 - Safe Scripting in Perl http://www.w3c.org/Security/Faq/wwwsf5.html [perlsec] The Perl Security man page. [cgiperl2] Scott Guelich, et al. "CGI Programming with Perl, 2nd Edition" O'Reilly. July 2000. [viega] John Viega, "ITS4: It's The Source, Stupid! Source Code Security Scanner" http://www.rstcorp.com/its4/ [sans] The SANS institute's list of top-ten most-critical internet security threats http://www.sans.org/topten.htm Jordan Dimov Copyright (C) 2000 Phreedom Magazine www.phreedom.org | phreedom.orbitel.bg staff@phreedom.org :: mboard.phreedom.org