This paper explores the various audio processing and
matching methodologies available to extrapolate an algorithm
which can be applied in real-time for effective audio extraction
from audio-visual files and then searching for certain user
defined audio patterns in said media file. With the exponential
rise in multimedia content, the need to search and find
information contained in these assets is a must. We propose to
build tool which will enable the user to search across the spoken
content of any audiovisual file chosen locally on his/her machine.
Published In : IJCAT Journal Volume 1, Issue 10
Date of Publication : 30 November 2014
Pages : 507 - 511
Figures :05
Tables : --
Publication Link :Real Time Audio-Based Search in Media Files
Using Machine Learning
Swati Krishnan : Currently pursuing B.E. from the Computer
Science Department of Maharashtra Institute of Technology
College of Engineering, Pune (2014-2015 batch)
Sahil Raina : Currently pursuing B.E. from the Computer Science
Department of Maharashtra Institute of Technology College of
Engineering, Pune (2014-2015 batch)
Neha Aher : Currently pursuing B.E. from the Computer Science
Department of Maharashtra Institute of Technology College of
Engineering, Pune (2014-2015 batch)
Phonetic Search
Audio Indexing
Audio
Retrieval
Machine Learning
Audio Acquisition
Fourier Transforms
Mel-frequency cepstral coefficients
Hidden
Markov Model
In this study, we have explored the hitherto unexplored
area of audio matching on a local machine, in real time.
Our method will seek the media based on the spoken input
from the user. For a general user, this study will thus help
move media-seeking to its next logical conclusion. Also,
with the increase in its usage, the proposed method will
get better at correctly matching the recorded and the
extracted audio, using machine learning, thereby
increasing its accuracy in cases where background noise
might hamper the results.
[1] J.L.Gauvain, A. Messaoudi and H. Schwenk.
“Languagerecognition using phoneme lattices,” Proc.
Int’l Conf. On Spoken Language Processing (ICSLP
2004), 2004, pp. 1283-1286.
[2] Mporas, T.Ganchev,P. ZervasandN.
Fakotakis,“Recognition of Greek Phonemes using
Support Vector Machines,” LNCS 3955, Springer, pp.
290-300, 2006.
[3] N.Leavitt, “Let's Hear It for Audio Mining,” IEEE
Computer, Vol.35, pp. 23-25, Oct.2002.
[4] S.Shetty, and K.K. Achary, “Audio Data Mining Using
Multi-perceptron Artificial Neural Network,”
International Journal of Computer Science and Network
Security, vol.8, pp.224-229, Oct. 2008.
[5] V.Jain and L.K. Saul, “Exploratory analysis and
visualization of speech and music by locally linear
embedding,” Proc. Int’l Conf. on Acoustics, Speech, and
Signal Processing (ICASSP 2004), pp. 984-987, 2004.
[6] N. Cristianini and J. Shawe-Taylor, An Introduction to
Support Vector Machines. Cambridge, U.K.: Cambridge
University Press, 2000.
[7] J.L.Gauvain, A. Messaoudi and H. Schwenk.“Language
recognition using phoneme lattices,” Proc. Int’l Conf. on
Spoken Language Processing (ICSLP 2004), 2004,
pp.1283-1286
[8] W. Byrne, D. Doermann, M. Franz, S. Gustman, J. Hajic,
D.Oard, M. Picheny,J. Psutka, B. Ramabhadran,
D.Soergel,T.Wardand Wei-Jing Zhu. Automatic
recognition of spontaneous speech for access to
multilingual ora history archives IEEE Transactions on
Speech and Audio Processing 12(4):420–435, 2004.
[9] A robust high accuracy speech recognition system for
mobileapplications. IEEE ransactions on Speech an
Audio Processing, 10(8):551–561, 2002.
Swati Krishnan Currently pursuing B.E. from the Computer
Science Department of Maharashtra Institute of Technology
College of Engineering, Pune (2014-2015 batch)
Sahil Raina Currently pursuing B.E. from the Computer Science
Department of Maharashtra Institute of Technology College of
Engineering, Pune (2014-2015 batch)
Neha AherCurrently pursuing B.E. from the Computer Science
Department of Maharashtra Institute of Technology College of
Engineering, Pune (2014-2015 batch