PHISHING WEBSITE DETECTION USING ANT COLONY OPTIMIZATION

    This project is design based on the paper "A Review of Exposure and Avoidance Techniques for Phishing Attack". Phishing is a form of social engineering in which an attacker, also known as a phisher, attempts to fraudulently retrieve legitimate users’ confidential or sensitive credentials by mimicking electronic communications from a trustworthy or public organization in an automated fashion. The word “phishing” appeared around 1995, when Internet scammers were using email lures to “fish” for passwords and financial information from the sea of Internet users; “ph” is a common hacker replacement of “f”, which comes from the original form of hacking, “phreaking” on telephone switches during 1960s. Early phishers copied the code from the AOL website and crafted pages that looked like they were a part of AOL, and sent spoofed emails or instant messages with a link to this fake web page, asking potential victims to reveal their passwords. The method based on available features on URL and page contents without using  the  search engines such Google ets, to detect the phishing websites where our methodology target to extract the most number of features exist in literature then find the robust features that are not affected by concept drift this is to answer the question are there features can give the required accuracy when the training and testing data come from different times? as the phishers changes their tactics from time to time.  

    After we find such features using Ant Colony Optimization, to examine the performance and by applying classifier using Artificial Neural Network(ANN),Support Vector Machine(SVM) and Treefit Algorithm to decide which one give us the best performance . 

    The  performance  analysis  have  to  be  done  using  software  simulation  such  as  the  Accuracy , Sensitivity and Selectivity and all parameters related to examine the performance using Matlab.

Date set collection and Pre-processing 

Data sets should be implemented as shown in Figure, which shows the whole data set collection and pre-processing process, the phishing websites collected from PhishTank website in CSV format.

After generating the data sets required features given below,

Features:

1.  having_IP_Address  { 1,0 }
2.  URL_Length   { 1,0,-1 }
3.  Shortining_Service { 0,1 }
4.  having_At_Symbol   { 0,1 }
5.  double_slash_redirecting { 1,0 }
6.  Prefix_Suffix  { -1,0,1 }
7.  having_Sub_Domain  {
8.  SSLfinal_State  { -1,1,0 }
9.  Domain_registeration_length { 0,1,
10. Favicon { 0,1 }
11. port { 0,1 }
12. HTTPS_token { 1,0 }
13. Request_URL  { 1,-1 }
14. URL_of_Anchor { -1,0,1 }
15. Links_in_tags { 1,-1,0 }
16. SFH  { -1,1 }
17. Submitting_to_email { 1,0 }
18. Abnormal_URL { 1,0 }
19. Redirect  { 0,1 }
20. on_mouseover  { 0,1 }
21. RightClick  { 0,1 }
22. popUpWidnow  { 0,1 }
23. Iframe { 0,1 }
24. age_of_domain  { -1,0,1 }
25. DNSRecord   { 1,0 }
26. web_traffic  { -1,0,1 }
27. Page_Rank { -1,0,1 }
28. Google_Index { 0,1 }
29. Links_pointing_to_page { 1,0,-1 }
30. Statistical_report { 1,0 }

You can DOWNLOAD data-set details and reference papers.Contact sales@verilogcourseteam.com for design files.

SIMULATION VIDEO DEMO                                                                                                                                     


PREVIOUS PAGE|NEXT PAGE