An Improvement to Anthropometry-Based Head and Torso HRTF Synthesis
Models for Locations Near the Frontal Median Plane: A Thesis
paper (.pdf)
presentation (.pdf)
05.05.2007
-
Due
to the recent proliferation of portable media devices, headphones
(and earbuds) are becoming the primary means through which people
experience recorded phenomena. In the absence of processing,
headphone listeners typically perceive sounds as coming from inside
of their heads rather than from the surrounding space. Head-related
transfer function (HRTF) based algorithms attempt to rectify this
issue; however, their need to be personalized for every individual
through expensive, and often impractical, methods prevents these
implementations from being effective. Such a need has produced an
extensive body of research focused on linking the perceptually
significant features of HRTFs to anthropometry. The ultimate goal of
such work is to synthesize a complete set of personalized HRTFs
strictly from morphological measurements. Recent research has
produced an anthropometry-based head and torso (HAT) model that
accurately approximates the effects that those body parts have on an
incident sound. These HAT-based synthesis models produce very
convincing lateral localization effects, and a weak sense of
elevation far away from the median plane, but they lack the primary
elevation cues that are caused by the external ear (pinna). The work
presented herein adds pinna-based elevation cues to an existing HAT
model that are most effective near the median plane--an area where
the HAT’s torso-based elevation cues are particularly poor. The
aforementioned cues are created by modeling the known resonances and
the primary reflections of the external ear using digital filters
whose parameters are determined from an individual’s anthropometry.
The eventual result of cascading an existing HAT model with the
introduced pinna model is the creation of customized HRTFs.
Objective results are provided and indicate that the proposed
synthesis method approximates the frequency response of measured
HRTFs better than a simple HAT model. Psychoacoustic validation
reveals that the model is effective at creating an accurate sense of
elevation near the median plane for 67% of the subjects tested. This
proves the hypothesis for certain cases and leaves room for future
improvements.
Image Distortion
Correction
link
10.12.2006
- The images
captured from low end digital cameras are often geometrically
inaccurate due to the low quality of their lenses. This distortion
can either be of the barrel effect or of the pincushion effect. The
barrel effect gives the image an inflated appearance—the image looks
as if it is on the edge of a blown up
balloon. Vertical lines that are supposed to be straight have an
outward curvature to them. An image suffers from pincushion
distortion if the opposite is true: vertical lines have an inward
curvature to them. Both of these types of distortion, while they do
affect the entire image to some extent, are more apparent at the
extreme edges of the image. This paper examines barrel distortion,
in particular, by devising and implementing an algorithm in MATLAB
to rectify it.
|
An Analysis of Interpolated Finite Impulse Response Filters and
Their Improvements
link
10.12.2006
- This paper
offers a brief overview of Interpolated Finite Impulse Response (IFIR)
filters followed by a comprehensive and analytical literary survey
of the improvements to the original design that have been made to
reduce their computational complexity. Enhancements to the original
design that are theoretically examined and compared are: stretching
factor (L) optimization, arithmetic operation reduction, mipizing,
alternate interpolator designs and coefficient re-quantization.
Design examples are presented to accompany explanation and offer
comparisons between various cases when applicable.
An Audio Units Plug-in to Simulate Spatial Audio Synthesis
link
10.12.2006
- Using the
fundamentals of head-related transfer functions (HRTFs) an Audio
Units plug-in to binauralize a mono sound for accurate spatial
perception through headphones is presented. This binauralization
process is dependent upon the user to input the desired location and
is performed in real time. In order to provide the background
information necessary for understanding the implementation of the
plug-in a summary of the relevant research conducted by the authors
is presented in the first part of this paper. In the next section
the motivation for the creation of such a plug-in is explained. The
subsequent section details the steps of the plug-in’s algorithm and
finally, in the concluding section, possibilities for future
improvements and additions to the presented work are discussed.
Speech Production Using Concatenated Tubes
link
09.01.2006
- Voiced speech
sounds are synthesized by using a finite amount of concatenated
tubes and an all pole model to approximate them. The second
part of this project uses linear prediction to re-synthesize speech
as it would be after a low bandwidth transmission (eg. a celluar
phone).
Isolated Word Recognition
link
09.01.2006
- A speaker
dependent word recognition system is developed and implemented using
Dynamic Time Warping--a popular, yet rudimentary, technique used in
the speech processing world. Results are quantified and
improvements are then made to the original algorithm. The
results of the enhanced algorithm are also provided.
The Acoustic
Features of Speech Sounds
link
09.01.2006
- A study of the
phonemes of the English language. Audio samples of words that
demonstrate each phoneme are available along with time domain plots
and spectrograms (narrowband and wideband) for both the phoneme and
the word. Brief discussions on each category of phonemes are
also included.
|