A framework for high speed lexical classification of malicious URLs
- Authors: Egan, Shaun Peter
- Date: 2014
- Subjects: Internet -- Security measures -- Research , Uniform Resource Identifiers -- Security measures -- Research , Neural networks (Computer science) -- Research , Computer security -- Research , Computer crimes -- Prevention , Phishing
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4696 , http://hdl.handle.net/10962/d1011933 , Internet -- Security measures -- Research , Uniform Resource Identifiers -- Security measures -- Research , Neural networks (Computer science) -- Research , Computer security -- Research , Computer crimes -- Prevention , Phishing
- Description: Phishing attacks employ social engineering to target end-users, with the goal of stealing identifying or sensitive information. This information is used in activities such as identity theft or financial fraud. During a phishing campaign, attackers distribute URLs which; along with false information, point to fraudulent resources in an attempt to deceive users into requesting the resource. These URLs are made obscure through the use of several techniques which make automated detection difficult. Current methods used to detect malicious URLs face multiple problems which attackers use to their advantage. These problems include: the time required to react to new attacks; shifts in trends in URL obfuscation and usability problems caused by the latency incurred by the lookups required by these approaches. A new method of identifying malicious URLs using Artificial Neural Networks (ANNs) has been shown to be effective by several authors. The simple method of classification performed by ANNs result in very high classification speeds with little impact on usability. Samples used for the training, validation and testing of these ANNs are gathered from Phishtank and Open Directory. Words selected from the different sections of the samples are used to create a `Bag-of-Words (BOW)' which is used as a binary input vector indicating the presence of a word for a given sample. Twenty additional features which measure lexical attributes of the sample are used to increase classification accuracy. A framework that is capable of generating these classifiers in an automated fashion is implemented. These classifiers are automatically stored on a remote update distribution service which has been built to supply updates to classifier implementations. An example browser plugin is created and uses ANNs provided by this service. It is both capable of classifying URLs requested by a user in real time and is able to block these requests. The framework is tested in terms of training time and classification accuracy. Classification speed and the effectiveness of compression algorithms on the data required to distribute updates is tested. It is concluded that it is possible to generate these ANNs in a frequent fashion, and in a method that is small enough to distribute easily. It is also shown that classifications are made at high-speed with high-accuracy, resulting in little impact on usability.
- Full Text:
- Date Issued: 2014
DNS traffic based classifiers for the automatic classification of botnet domains
- Authors: Stalmans, Etienne Raymond
- Date: 2014
- Subjects: Denial of service attacks -- Research , Computer security -- Research , Internet -- Security measures -- Research , Malware (Computer software) , Spam (Electronic mail) , Phishing , Command and control systems
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4684 , http://hdl.handle.net/10962/d1007739
- Description: Networks of maliciously compromised computers, known as botnets, consisting of thousands of hosts have emerged as a serious threat to Internet security in recent years. These compromised systems, under the control of an operator are used to steal data, distribute malware and spam, launch phishing attacks and in Distributed Denial-of-Service (DDoS) attacks. The operators of these botnets use Command and Control (C2) servers to communicate with the members of the botnet and send commands. The communications channels between the C2 nodes and endpoints have employed numerous detection avoidance mechanisms to prevent the shutdown of the C2 servers. Two prevalent detection avoidance techniques used by current botnets are algorithmically generated domain names and DNS Fast-Flux. The use of these mechanisms can however be observed and used to create distinct signatures that in turn can be used to detect DNS domains being used for C2 operation. This report details research conducted into the implementation of three classes of classification techniques that exploit these signatures in order to accurately detect botnet traffic. The techniques described make use of the traffic from DNS query responses created when members of a botnet try to contact the C2 servers. Traffic observation and categorisation is passive from the perspective of the communicating nodes. The first set of classifiers explored employ frequency analysis to detect the algorithmically generated domain names used by botnets. These were found to have a high degree of accuracy with a low false positive rate. The characteristics of Fast-Flux domains are used in the second set of classifiers. It is shown that using these characteristics Fast-Flux domains can be accurately identified and differentiated from legitimate domains (such as Content Distribution Networks exhibit similar behaviour). The final set of classifiers use spatial autocorrelation to detect Fast-Flux domains based on the geographic distribution of the botnet C2 servers to which the detected domains resolve. It is shown that botnet C2 servers can be detected solely based on their geographic location. This technique is shown to clearly distinguish between malicious and legitimate domains. The implemented classifiers are lightweight and use existing network traffic to detect botnets and thus do not require major architectural changes to the network. The performance impact of implementing classification of DNS traffic is examined and it is shown that the performance impact is at an acceptable level.
- Full Text:
- Date Issued: 2014