Research on Document Summary Generation Using Attribute Information

Abstract
Authors
Keywords
Conclusion
References

Recently, the expansion of the Internet has led to a deluge of information on the Web, making it difficult for users to locate efficiently needed information. To facilitate efficient searching for information, research into technology that can summarize the general outline of a text document is essential. This is especially true on the Web, where information from bulletin boards, blogs, and other sources is being used as consumer generated media data. Hence, summarizing technology that can accurately capture opinions, impressions, and fields of discussion is necessary. However, research efforts thus far have yet to yield satisfactory results. In this paper, we propose a method for generating a summary document using three types of attribute information acquired from the original document: the field, associated terms, and by using attribute grammars that combine these three attributes in document generation, we establish a formal and efficient generation technology. Experiments using information from 400 blogs found that when including the field and sensibility attributes, the summary accuracy rate, readability, and meaning integrity are 88.7%, 85%, and 86%, respectively. In comparison with traditional technologies, these three evaluation criteria are each 4% higher, thus demonstrating the effectiveness of this method.

Published In : IJCAT Journal Volume 1, Issue 11

Date of Publication : 31 December 2014

Pages : 557 - 569

Figures :07

Tables : 13

Publication Link :Research on Document Summary Generation Using Attribute Information

Abdunabi Ubul : received his B. Sc. degree in economics and Management information f rom Xinjiang University, China in 2004. He has received his M. Sc. degree from Department of Economics, Faculty of Integrated Arts and Sciences, University Of Tokushima, Japan in 2008. Received his Ph. D. degree from Department of Information Science and Intelligent Systems. University Of Tokushima, Japan in 2012. His research interests include information retrieval, natural language processing and document processing.

Hidekazu Kakei : received his B.Eng. and M.Eng. Degrees in architecture from Nagoya University, Japan, in 1988 and 1990 respectively, and his Ph.D. in architecture in Kobe University in 2007. Since 2003 he has been an Assoc. Prof. in the Institute of Socio-Arts and Sciences, Tokushima University, Japan. His research interests include applying ICT to spatial and environmental design. He is a member of Architectural Institute of Japan and the Institute of Electronics, Information and Communication Engineers.

Jun-ichi Aoe : received his B. Sc. and M. Sc. degrees in electronic engineering from the University of Tokushima, Japan, in 1974 and 1976, respectively, and his Ph. D. degree in communication engineering from the University of Osaka, Japan in 1980. Since 1976 he has been with the University of Tokushima. He is currently a Professor in the Department of Information Science& Intelligent Systems, Tokushima University, Japan. His research interests include design of an automatic selection method of key search algorithms based on expert knowledge bases, natural language processing, a shift-search strategy for interleaved LR parsing, robust method for understanding NL interface commands in an intelligent command interpreter, and trie compaction a l g o r i t hms f o r l a r g e k e y s e t s . H e i s t h e e d i t o r of the Computer Algorithm Series of the IEEE Computer Society

Blog Document

Field Association

Attributes Grammar

Sensibility

In this paper, we have presented a method in which we use field association words and Sensibility, to create summary documents using attributes from the text information, such as fields, keywords, and Sensibility, which was taken from blogs. For the materials used to generate summary documents, first we used field association words with the data acquired from the blog, and determined the blog’s field. Then, we performed Sensibility analysis of the emotions of the people that appear in the contents of the blog and determined the Sensibility. For the summary document, once all three attributes were prepared, by using the attribute grammar, we established a formal and efficient generation technology.

[1] T. M. Chang, W. F. Hsiao, "A hybrid approach to automatic text summarization", IEEE International Conference, 2008, pp. 65–70. [2] L.Hennig, W.Umbrath, R.Wetzker, "An ontology-based approach to text summarization", IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2008, Vol. 3, pp. 291–294. [3] S. F. Liang, S. Devlin, J. Tait, "Investigating sentence weighting components for automatic summarization", Information Processing & Management, 2007, Vol.43, No.1, pp. 146–153. [4] V. R. Uzeda, T. Pardo, M. Nunes, "Evaluation of automatic text summarization methods based on rhetorical structure theory", Eight International Conference on Intelligent Systems Design and Applications, 2008, Vol.2, pp. 389–394. [5] A. Chongsuntornsri, O. Sornil, "An automatic Thai text summarization using topic sensitive page rank", International Symposium on Communications and Information Technologies, 2006, pp. 547–552. [6] G. Erkan, D. R. Radev, L.Rank, "graph-based lexical centrality as salience in text summarization", J. Artif. Intell. Res, 2004, pp. 457–479. [7] H. Zha, "Generic summarization and key phrase extraction using mutual reinforcement principle and sentence clustering", In Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, 2002, pp. 113–120. [8] J.Y.Yeh, H.R.Ke, "Text summarization using a trainable summarizer and latent semantic analysis", Information Processing & Management, 2005, Vol.41, No.1, pp. 75– 95. [9] L. H. Reeve, H. Han, "The use of domain-specific concepts in biomedical text summarization", Information Processing & Management, 2007, Vol.43, No.6, pp.1765–1776. [10] A.Ubul, El.Atlam, H. Kitagawa, M. Fuketa, K. Morita, J. Aoe, "An Efficient Method of Summarizing Documents Using Impression Measurements", An Efficient Method of Summarizing Documents Using Impression Measurements, 2013, Vol.32, No.2, pp.371- 391. [11] H.J.Lee, S.Park, D.kim, "Automatic generic document summarization based on non-negative matrix factorization", Information Processing & Management, 2009, Vol.45, No.1, pp. 20-34. [12] E.-S. Atlam, M. Fuketa, K. Morita and J. Aoe, "Document similarity measurement using field association term", Information Processing & Management, 2003, Vol.39, No.6, pp.809–824. [13] E.-S. Atlam, G. Elmarhomy, M. Fuketa, K. Morita and J. Aoe, "Automatic building of new field association word candidates using search engine", Information Processing & Management, 2006, Vol.42, No.4, pp.951–962. [14] T. Yoshinari, E.-S. Atlam, M. Fuketa, K. Morita and J. Aoe, "Automatic acquisition for sensibility knowledge using co-occurrence relation", International Journal of Computing and Technology, 2003, Vol.33,No.3, pp.218–225. [15] F. Neven,J. V. den Bussche, "Expressiveness of structured document query languages based on attribute grammars", JACM, 2002, Vol.49,No.1, pp. 56–100. [16] Livedoor Blog, http://blog.livedoor.com/. [17] Goo Blog, http://blog.goo.ne.jp/. [18] BlogPeople, http://www.blogpeople.net/. [19] Blogger, http://blogger.bz/index.shtml. [20] C.Y.Lin, "ROUGE: A package for automatic evaluation of summaries", In Proceedings of workshop on text summarization branches out, post-conference workshop of ACL, 2004. [21] Y.Gong, X.Liu, "Generic text summarization using relevance measure and latent semantic analysis", In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrival, 2001.pp.19–25.