Attack Surface Intelligence of Source Code ME & VULNEX Simon Roses Femerling • • • • Founder & CEO, VULNEX www.vulnex.com @simonroses Former Microsoft, PwC, @Stake Black Hat, RSA, OWASP, SOURCE, AppSec, DeepSec, TECHNET VULNEX • • • • CyberSecurity Startup @vulnexsl Services & Training Products: BinSecSweeper (Binary Analysis) TALK OBJECTIVES • GCC & Python, hand to hand • Transformations: source code to useful data • Practical code understanding WORK IN PROGRESS AGENDA 1. The need of Attack Surface Intelligence of Source Code 2. GCC Overview 3. GCC-Python-Plugin 4. Source Code Intelligence 5. Tintorera Overview 6. Tintorera Analysis Demos 7. Conclusions 8. Q&A 1. CODE IS GETTING COMPLEX! Software SLOC Firefox 14 Million Windows Server 2003 50 Million Debian 7.0 419 Million Mac OS X 10.4 86 Million Linux Kernel 2.6.25 13.5 Million Linux Kernel 3.6 15.9 Million 1. DOCUMENTATION 1. TYPICAL CODE REVIEW 1. WHERE TO START? • • • • • • File operations Networking Processes Crypto Authentication ?? 1. TOOLS? 2. GCC • Compiler system that supports various programming languages • Popular UNIX variants • Supports all major languages: C, C++, Java, Objective-C, etc. • PLUGINS!! • FREE 2. GCC INTERNALS http://www.airs.com/dnovillo/Papers/cgo2007-gcc-internals.pdf 2. GCC TERMINOLOGY • GENERIC is common representation shared by all front ends – Each parser must emit GENERIC • GIMPLE is a simplified version of GENERIC – 3 address representation – Simplified control flow • RTL (Register Transfer Language), assembler for an abstract machine 2. GCC PASSES http://gcc-python-plugin.readthedocs.org/en/latest/tables-of-passes.html 3. GCC-PYTHON-PLUGIN • GCC plugin that embeds Python in GCC • Now your Python script can access GCC passes and perform analysis • Developed by David Malcolm (Fedora) http://gcc-python-plugin.readthedocs.org/en/latest/ 3. GCC-PYTHON-PLUGIN EXAMPLE 3. GCC-PYTHON-PLUGIN DEMO 3. GCC-PYTHON-PLUGIN IDEAS • Write scripts for: – malloc/free usage – Array boundary checks – Code visualizations – You name it! 4. CODE UNDERSTATING • • • • • • What API are being used? Number of functions? Inputs / Outputs of functions? Function relationship What comments said? Code complexity 4. CODE METRICS • Controversial topic but needed • Metrics: – Function complexity (Cyclomatic) – Number of: • • • • Lines Code Blanks Comments – Line Length – Number: Bugs per Line – You name it…. 4. CODE COMPLEXITY • Counts the number of linearly independent paths through the source code • Basically we can have an idea of the complexity of functions • Complexity is security enemy! • Created by Thomas McCabe http://www.literateprogramming.com/mccabe.pdf 4. CODE COMPLEXITY THRESHOLD http://www.sei.cmu.edu/reports/97hb001.pdf 4. SOURCE CODE ANALYSIS FLOWGRAPH NOTATION www.mccabe.com/ppt/SoftwareQualityMetricsToIdentifyRisk.ppt 4. SOURCE CODE VISUALS TOO BINARY SOURCE CODE 5. TINTORERA – BLUE SHARK • “Put source code into context” • Objective: Get a feeling of the code while compiling!! • Intelligence of source code: – – – – – Code visualizations Comments analysis API identification Metrics HTML Reports • C code transformed to JSON files, now you can query and perform analysis on data 5. TINTORERA INTERNALS • Two files: – analyzer.py: To be used while compiling a project – do_report_tintorera.py: Use after project has been compiled to generate report • Composed of: – Python code – JSON data files – HTML / CSS / Javascript 5. TINTORERA STRUCTURE • Python files • Folders: – data/ : API JSON file – templates/ : HTML templates – js/ : Javascript code – images/ – Tintorera_lib/ : python code 5. TINTORERA INSTALL & USAGE 1. GCC version 4.7 or later 2. Install gcc-python-plugin (See web doc) 3. Set path: 4. 1. Export LD_LIBRARY_PATH=/gcc-python-plugin/gcc-c-api Add line to Makefile (CC= tag) 1. gcc –fplugin=/gcc-python-plugin/python.so –fplugin-arg-python-script=/tintorera/analyzer.py 5. Run make 6. After compile use: 1. Python do_report_tintorera.py –c tinan.cfg 5. TINTORERA CONFIG FILE • Edit tinan.cfg to suit your needs • Set parameters such as: – Folder to save analysis report – Enable / disable analysis • • • • • Basic blocks Callgraphs Comments Gimples Etc. – Cyclomatic Thresholds 5. TINTORERA DATA FILES • Folder: /data • File: tinto_api.json • JSON file to define APIs 5. CODE TRANSFORMATION SOURCE CODE JSON FILES HTML REPORT 5. TRANSFORMED JSON FILES • 3 files: 1. tintorera_bb_file.json: code basic blocks 2. tintorera_meta_info.json: general information, file size and code & comments not inside functions 3. tintorera_temp_file.json: functions information 5. TINTORERA_BB_FILE.JSON 5. TINTORERA_META_FILE.JSON 5. TINTORERA_TEMP_FILE.JSON 5. TINTORERA SOURCE CODE METRICS • Current metrics: 1. 2. 3. 4. 5. 6. 7. Number of: 1. 2. 3. 4. 5. Lines Code Blanks Comments Colons Average line length Minimum line Maximum line Total Basic Blocks Total Cyclomatic Complexity Average Cyclomatic Complexity 5. SOURCE CODE COMMENT ANALYSIS 6. DEMO I: LOOP TESTER 6. DEMO I: LOOP TESTER 6. DEMO I: LOOP TESTER IF ELSE WHILE SWITCH 6. DEMO II: SENDMAIL CRACKADDR (CVE2002-1337) Pure Complexity…. 6. DEMO II: SENDMAIL CRACKADDR (CVE2002-1337) FUNCTION COMPLEXITY 6. DEMO II: SENDMAIL CRACKADDR (CVE2002-1337) FUNCTION COMPLEXITY 6. DEMO III: MONGOOSE WEB SERVER ANALYSIS • Mongoose is the most easy to use web server on the planet. A web server of choice for Web developers (PHP, Ruby, Python, etc) and Web designers. 6. DEMO III: MONGOOSE WEB SERVER ANALYSIS 6. DEMO III: MONGOOSE WEB SERVER ANALYSIS 6. DEMO III: MONGOOSE WEB SERVER ANALYSIS 6. DEMO III: MONGOOSE WEB SERVER ANALYSIS 6. DEMO IV: BOA WEB SERVER Boa, a high performance web server for Unix-alike computers 6. DEMO IV: BOA WEB SERVER 6. DEMO IV: BOA WEB SERVER 6. DEMO IV: BOA WEB SERVER 6. DEMO IV: BOA WEB SERVER 6. DEMO IV: BOA WEB SERVER 6. DEMO V: OBFUSCATED C CODE ANALYSIS, ENDOH4.C The International Obfuscated C Code Contest - http://www.ioccc.org/ 6. DEMO V: OBFUSCATED C CODE ANALYSIS, ENDOH4.C The International Obfuscated C Code Contest - http://www.ioccc.org/ 6. DEMO V: OBFUSCATED C CODE ANALYSIS, ENDOH4.C The International Obfuscated C Code Contest - http://www.ioccc.org/ 6. DEMO V: OBFUSCATED C CODE ANALYSIS, ENDOH4.C O function The International Obfuscated C Code Contest - http://www.ioccc.org/ 6. DEMO VI: OBFUSCATED C CODE ANALYSIS, MISAKA The International Obfuscated C Code Contest - http://www.ioccc.org/ 6. DEMO VI: OBFUSCATED C CODE ANALYSIS, MISAKA The International Obfuscated C Code Contest - http://www.ioccc.org/ 6. DEMO VI: OBFUSCATED C CODE ANALYSIS, MISAKA MAIN The International Obfuscated C Code Contest - http://www.ioccc.org/ Z 7. DRAWBACKS • gcc-python-plugin needs more work, fails many times • So do Tintorera… • Only C / C++ code 7. CONCLUSIONS • Tintorera helps to analyze C code faster & better • Practical code understanding for: – Saving time – Security reviews – Fuzzing: what and where to fuzz 7. NEXT STEPS • Better & focused analysis (security, etc.) • Vulnerabilities Detection • More metrics • Code Diff • Cooler reports! • Other languages ¿? 8. Q&A • Thanks! • @simonroses / @vulnexsl • www.vulnex.com