NEXT-GENERATION OF EXPLOIT KIT DETECTION BY BUILDING SIMULATED OBFUSCATORS Tongbo Luo & Xing Jin Palo Alto Networks Agenda § Background and Motivation § Challenges § Our Approach and Lessons We learned § Result and Our Observation 2 | ©2014, Palo Alto Networks. Confidential and Proprietary. Motivation www.jsDarwin.com Similarity-basedDetectiononmaliciousExploitKitJavaScript LargeNumberofSamplesRequired 3 | ©2014, Palo Alto Networks. Confidential and Proprietary. Exploit Kit Obfuscator Malicious Payload Reconstruct Encoder Obfuscator Unpacker Template EVAL Template Identify Template From Obfuscated Page Encoder Obfuscation Engine Encoded Payload Unpacker EVAL Trigger Obfuscated 4 | ©2016, Palo Alto Networks. Confidential and Proprietary. Reproduce Engine Logic Obfuscator Reverse-Engineering window [‘e’+’v’+’a’+’l’] [wsq(xyz, fxj)]; window [‘ev’+’al’] [zxh(abc, sih)]; window [‘aevala’.substr(1,4)] [zxh(abc, sih)]; Obfuscation Engine (variant) 5 | ©2016, Palo Alto Networks. Confidential and Proprietary. “window [“ + $[eval_ctr] + “] [” + $v[0] +”(“ v[1] + ”,” + v[2] + ”)” + “]” EVAL Template (version) Challenges 1. CodeComplexity Hundred lines of code Random variables 6 | ©2016, Palo Alto Networks. Confidential and Proprietary. Challenges 2.DataComplexity Big data set (~20000 Samples over 2 year period) Mixed versions and variants 7 | ©2016, Palo Alto Networks. Confidential and Proprietary. Overview of Our Approach 8 | ©2016, Palo Alto Networks. Confidential and Proprietary. 1 2 JavaScript Normalization Hierarchical Clustering 3 Reproduce Obfuscator JavaScript Normalization JavaScriptSourceCode function azy ( Ag6 ) { return /*fdsj*/Ag6 ;} Tokenization TOKENS Keyword function Identifier ( Identifier ) { Keyword return Identifier Generalization NormalizedScript § Ignore Superficial Obfuscation (e.g. Randomized variable names) § Normalized Script ó Structure of the Code 9 | © 2015, Palo Alto Networks. Confidential and Proprietary. ; } Statistics Total Number of samples we collected Total Number of samples With Unique Normalized script function azy ( Ag6 ) { return Ag6 function F$x3j ( k2c5x ) { return k2c5x ;} 10 | © 2015, Palo Alto Networks. Confidential and Proprietary. ;} 1% 2% 3% 4% 36% 24% 5% 8% 17% Angler Exploit Kit Statistics on Normalized Samples 11 | © 2015, Palo Alto Networks. Confidential and Proprietary. SameDay 1DayApart 2DaysApart 3DaysApart 4DaysApart 5DaysApart 6DaysApart 7DaysApart 8+DaysApart Clustering § Goal: Cluster Samples based on their obfuscator § Observation: Similar Structure è Generated by Similar Obfuscator § Define Similarity. § Similarity Score [ 0 , 1 ] § 1 = Identical, 0 = Different 13 | © 2015, Palo Alto Networks. Confidential and Proprietary. Hierarchical vs Flat § Flat Model: (K-Means) § Easy and Efficient § Drawbacks § Require predefined K as input § K ó Number of Obfuscator Version § Hard to Predict, lack of knowledge § Will be Changed over time. 14 | © 2015, Palo Alto Networks. Confidential and Proprietary. Agglomerative (Bottom-Up) Hierarchical Clustering 0.6 threshold 0.7 Dendrogram 15 | © 2015, Palo Alto Networks. Confidential and Proprietary. 2015-06-16 2015-05-03 2015-05-04 2015-06-07 2015-05-10 2015-05-04 0.8 Whatistheproper Threshold toIdentify ObfuscatorVersion/Variant? [0.4 – 0.5] [0.78 ~ 0.85] Identify Obfuscator Version Identify Obfuscator Variants Threshold vs K Clustering Result Nuclear Variant 1 1.1 1.2 1.3 2.1 2.2 Nuclear Variant 2 JavaScript 1 Normalization 16 | ©2016, Palo Alto Networks. Confidential and Proprietary. 2 Obfuscator Version 2.3 Obfuscator Variant Reproduce Obfuscator for Each Cluster window [‘e’+’v’+’a’+’l’][wsq(xyz, fxj)]; window [‘ev’+’al’] [zxh(abc, sih)]; window [l28f + i8d4][zxh(abc, sih)]; 1.1 1.2 1.3 I[‘S’+‘S’+‘S’+‘S’][I(I,I)]; I[‘S’+‘S’ ][I(I,I)]; I[ I +I ][I(I,I)]; 2.1 2.2 Threshold 0.80 è 80% Similar Within Cluster Template 2.3 Obfuscator Variant 17 | ©2016, Palo Alto Networks. Confidential and Proprietary. Obfuscator Engine Why This Research Boost Samples Set 18 | ©2014, Palo Alto Networks. Confidential and Proprietary. Improve Detection Rate Better Understanding onObfuscator Life Cycle of Nuclear Exploit Kit Obfuscator During December2014,anewversionof NuclearPack emerged.… newversionwill completelyreplacetheoldversion. websense http://community. .com/blogs/securitylabs/arc hive/2015/01/15/evolution-of-an-exploit-kit-nuclear-pack.aspx 19 | © 2015, Palo Alto Networks. Confidential and Proprietary. Life Cycle of Angler Exploit Kit Obfuscator • Extremely Prevalent • Aggressive tactics for evading detection • Only few Obfuscator Versions / Variants. 20 | © 2015, Palo Alto Networks. Confidential and Proprietary. # of Domains Deployed Version 1: 519 Version 2: 89 Version 3: 753 Version 4: 68 Version 5: 234 Evolution of Variants NuclearObfuscatorVersion2 2015/June Variants2 2015/May Variants1 0.81 2015/July Variants3 0.85 0.75 0.66 0.67 0.47 21 | ©2014, Palo Alto Networks. Confidential and Proprietary. 2015/August Variants4 TAKEAWAY § A new Angle to Explore Exploit-Kit § The novel method to boost sample set and improve detection rate by reproducing obfuscator. § The Evolution of Obfuscator in the wild 22 | © 2015, Palo Alto Networks. Confidential and Proprietary. Q &A https://github.com/irobert-tluo/rebuild_obfuscator 23 | © 2015, Palo Alto Networks. Confidential and Proprietary.