The use of Internet leads to various security
threats. It includes spamming, phishing or malware. The
phishing attack retrieves the sensitive information like bank
account number or email password etc. Most of the phishing
attack use malicious URL. The Malicious URL will be
displayed to the user like a legitimate URL. Malware is
widely used to disrupt computer operation, gain access to
users' computer systems or gather sensitive information.
Nowadays, malware is a serious threat of the Internet.
Detecting malicious URLs is an essential task in network
security intelligence. In this paper we categories phishing
and malware URLs using Support Vector Machine (SVM).
The Support Vector Machine (SVM) is a widely used kernelbased
method for binary classification. SVM is theoretically
well founded and has been already applied to many
practical problems. Our method uses a variety of
discriminative features including textual properties, link
structures, webpage contents, DNS information, and
network traffic. It shows that our proposed method is good
at detecting phishing and malware sites, correctly labeling
approximately 95% of phishing and malware sites. We
achieve high performance, including high level of true
positive, true negative, sensitivity, precision, F-measure and
overall accuracy compared with other approaches. So we
can say SVM is a robust and efficient method that can be
successfully used for classification of normal or phishing
website.
Rashmi Karnik : Department of Computer Engg., JSPM's Bhivarabai Sawant Institute of Technology & Research
Savitribai Phule University, Pune.
Dr. Gayatri M. Bhandari : Department of Computer Engg., JSPM's Bhivarabai Sawant Institute of Technology & Research
Savitribai Phule University, Pune.
Kernel based Approach, Malware, Phishing
Support Vector Machine
Detecting the malicious URL is one of the crucial
problems in internet. This paper investigates the problem
of web site categorization i.e., Normal or Phishing. This
paper presents the supervised machine learning approach
SVM is used to categories phishing and malware sites.
This paper extracts various numbers of features from the
URL. The Support vector machine algorithm achieved
high classification accuracy for analyzing similar data
parts to those of rule-based heuristic techniques. Our
proposed method is good at detecting phishing and
malware sites, correctly labeling approximately 95% of
phishing and malware sites.
[1] Ram B. Basnet, Andrew H. Sung, Quingzhong Liu,
“Learning To Detect Phishing URLs”, IJRET:
International Journal of Research in Engineering and
Technology, Volume: 03 Issue: 06 | Jun-2014.
[2] Usha Narra, Corrado Aaron Visaggio, Mark Stamp,
Thomas H. Austin, “Clustering versus SVM for malware
detection”, Springer, Journal of Computer Virology and
Hacking Techniques 10/2015
[3] Anjali B. Sayamber ,Arati M. Dixit , “Malicious URL
Detection and Identification”, International Journal of
Computer Applications (0975 – 8887) Volume 99 –
No.17, August 2014.
[4] Michal Kruczkowski; Ewa Niewiadomska Szynkiewicz,
“Support Vector Machine for Malware Analysis and Cla
ssification” Web Intelligence (WI) and Intelligent Agent
Technologies (IAT), 2014 IEEE/WIC/ACM International
Joint Conferences
[5] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker,
“Identifying Suspicious URLs: An Application of Largescale
Online Learning,” in ICML ’09: Proceedings of the
International Conference on Machine Learning, 2009,
pp. 681–688.
[6] C. Whittaker, B. Ryner, and M. Nazif, “Large-scale
automatic classification of phishing pages,” in NDSS
’10, 2010.
[7] P. Prakash, M. Kumar, R. R. Kompella, and M. Gupta,
“Phishnet: predictive blacklisting to detect phishing
attacks,” in INFOCOM’10: Proceedings of the 29th
conference on Information communications. Piscataway,
NJ, USA: IEEE Press, 2010, pp. 346–350.
[8] Y. Cao, W. Han, and Y. Le, “Anti-phishing based on
automated individual white-list,” in DIM ’08:
Proceedings of the 4th ACM workshop on Digital
identity management. New York, NY, USA: ACM,
2008, pp. 51–60.
[9] Y. Zhang, J. Hong, and L. Cranor, “Cantina: A Contentbased
Approach to Detecting Phishing Web sites,” in
proceedings of the International World Wide Web
Conference (WWW), 2007.
[10] M. Fredrikson, S. Jha, M. Christodorescu, R. Sailer, and
X. Yan, “Synthesizing near-optimal malware
specifications from suspicious behaviors,” in Proc. IEEE
Symp. Secur. Priv., Washington, DC IEEE Computer
Society, May 2010, pp. 45–60
[11] A. Y. Fu, L. Wenyin, and X. Deng, “Detecting phishing
web pages with visual similarity assessment based on
earth mover’s distance (emd),” IEEE Trans. Dependable
Secur. Comput., vol. 3, no. 4, pp. 301–311,2006.
[12] A Practical Guide to Support Vector Classification Chih-
Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin
Department of Computer Science National Taiwan
University, Taipei 106, Taiwan
http://www.csie.ntu.edu.tw/~cjlin Initial version: 2003
Last updated: April 15, 2010.
[13] M. Hara, A. Yamada, and Y. Miyake, “Visual similaritybased
phishing detection without victim site
information,” in IEEE Symposium on Computational
Intelligence in Cyber Security, 2009. CICS ’09, 2009,
pp. 30 – 36
[14] Michael Atighetchi, Partha Pal “Attribute-based
prevention of phishing attacks” Eighth IEEE
international symposium on network computing and
application, 2009.
[15] Matthew Dunlop, Stephen Groat, and David Shelly”
GoldPhish: Using Images for Content-Based Phishing
Analysis”, in proceedings of internet monitoring and
protection(ICIMP),fifth international conference,
Barcelona, Pages 123-128, 201.