### abstract ###
Metric and kernel learning are important in several machine  learning applications
However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not  generalize to new data points
In this paper, we study metric learning as a problem of learning a linear transformation of the input data
We show that for high-dimensional data, a particular framework for learning a linear transformation of the data based on the LogDet divergence can be efficiently kernelized to learn a metric (or equivalently, a kernel function) over an arbitrarily high dimensional space
We further demonstrate that a wide class of convex loss functions for learning linear transformations can similarly be kernelized, thereby considerably expanding the potential applications of metric learning
We demonstrate our learning approach by applying it to large-scale real world problems in computer  vision and text mining
### introduction ###
One of the basic requirements of many machine learning algorithms (e g , semi-supervised clustering algorithms, nearest neighbor classification algorithms)  is the ability to compare two objects to compute a similarity or distance between them
In many cases, off-the-shelf distance or similarity functions such as the Euclidean distance or cosine similarity are used; for example, in text retrieval applications, the cosine similarity is a standard function to compare two text documents
However, such standard distance or similarity functions are not appropriate for all problems
Recently, there has been significant effort focused on learning how to compare data objects
One approach has been to learn a distance metric between objects given additional side information such as pairwise similarity and dissimilarity constraints over the data
One class of distance metrics that has shown excellent generalization properties is the  Mahalanobis distance  function~ CITATION
The Mahalanobis distance can be viewed as a method in which data is subject to a  linear transformation , and then distances in this transformed space are computed via the standard squared Euclidean distance
Despite their simplicity and generalization ability, Mahalanobis distances suffer from two major drawbacks: 1) the number of parameters grows quadratically with the dimensionality of the data, making it difficult to learn distance functions over high-dimensional data, 2) learning a linear transformation is inadequate for data sets with non-linear decision boundaries
To address the latter shortcoming,  kernel learning  algorithms typically attempt to learn a  kernel matrix over the data
Limitations of linear methods can be overcome by employing a non-linear input kernel, which effectively maps the data non-linearly to a high-dimensional feature space
However, many existing kernel learning methods are still limited in that the learned kernels do not generalize to new points~ CITATION
These methods are restricted to learning in the transductive  setting where all the data (labelled and unlabeled) is assumed to be given upfront
There has been some work on learning kernels that generalize to new points, most notably work on hyperkernels~ CITATION , but the resulting optimization problems are expensive and cannot be scaled to large or even medium-sized data sets
In this paper, we explore metric learning with linear transformations over arbitrarily high-dimensional spaces; as we will see, this is equivalent to learning a parameterized kernel function   SYMBOL  given an input kernel function  SYMBOL
In the first part of the paper, we focus on a particular loss function called the LogDet divergence, for learning the positive definite matrix  SYMBOL
This loss function is advantageous for several reasons: it is defined only over positive definite matrices, which makes the optimization simpler, as we will be able to effectively ignore the positive definiteness constraint on  SYMBOL
The loss function has precedence in optimization~ CITATION  and statistics~ CITATION
An important advantage of our method is that the proposed optimization algorithm is scalable to very large data sets of the order of millions of data objects
But perhaps most importantly, the loss function permits efficient kernelization, allowing the learning of a linear transformation in kernel space
As a result, unlike transductive kernel learning methods, our method easily handles out-of-sample extensions, i e , it can be applied to unseen data
Later in the paper, we extend our result on kernelization of the LogDet formulation to other convex loss functions for learning  SYMBOL ,  and give conditions for which we are able to compute and evaluate the learned kernel functions
Our result is akin to the representer theorem for reproducing kernel Hilbert spaces, where the optimal parameters can be expressed purely in terms of the training data
In our case, even though the matrix  SYMBOL  may be infinite-dimensional, it can be fully represented in terms of the constrained data points, making it possible to compute the learned kernel function value over arbitrary points
Finally, we apply our algorithm to a number of challenging learning problems, including ones from the domains of computer vision and text mining
Unlike existing  techniques, we can learn linear transformation-based distance   or kernel functions over these domains, and we show that the resulting functions lead to improvements over state-of-the-art techniques for a variety of problems
                                                                                                                                                                                                                                                                                                                   jmlr2e
sty                                                                                          0000644 0000000 0000000 00000031314 11272335103 011507  0                                                                                                    ustar   root                            root                                                                                                                                                                                                                   %   \typeout{Document Style `jmlr' -- January 2001 }   \RequirePackage{epsfig} \RequirePackage{amssymb} \RequirePackage{natbib} \RequirePackage{graphicx} \bibliographystyle{plainnat} \bibpunct{(}{)}{;}{a}{,}{,}   \renewcommand{\topfraction}{0 95}   % let figure take up nearly whole page \renewcommand{\textfraction}{0 05}  % let figure take up nearly whole page 25in    %   Note = 25in 0 07 true in -0 5in \addtolength{\headsep}{0 25in} 8 5 true in       % Height of text (including footnotes & figures) 6 0 true in        % Width of text line \widowpenalty=10000 \clubpenalty=10000 \@twosidetrue \@mparswitchtrue \def\ds@draft{5pt}    \def\@startsiction#1#2#3#4#5#6{\if@noskipsec \@tempskipa #4\@afterindenttrue \@tempskipa <\z@ \@tempskipa -\@tempskipa \@afterindentfalse\if@nobreak \everypar{}\addpenalty{\@secpenalty}\addvspace{\@tempskipa}\@ifstar {\@ssect{#3}{#4}{#5}{#6}}{\@dblarg{\@sict{#1}{#2}{#3}{#4}{#5}{#6}}}}  \def\@sict#1#2#3#4#5#6[#7]#8{#2>\c@secnumdepth \def\@svsec{}\refstepcounter{#1}\edef\@svsec{the#1\endcsname}\@tempskipa #5\@tempskipa>\z@ #6\@hangfrom{#3\relax\@svsec 0 1em} {\@M #8\par} #1mark\endcsname{#7}{toc}{#1}{#2>\c@secnumdepth \protect\numberline{the#1\endcsname}#7}\def\@svsechd{#6#3\@svsec #8#1mark{#7}{toc}{#1}{#2>\c@secnumdepth \protect\numberline{the#1\endcsname}#7}}\@xsect{#5}}  \def\@sect#1#2#3#4#5#6[#7]#8{#2>\c@secnumdepth \def\@svsec{} \refstepcounter{#1}\edef\@svsec{the#1\endcsname0 5em }\@tempskipa #5\@tempskipa>\z@  #6\@hangfrom{#3\relax\@svsec}{\@M #8\par} #1mark\endcsname{#7}{toc}{#1}{#2>\c@secnumdepth \protect\numberline{the#1\endcsname}#7}\def\@svsechd{#6#3\@svsec #8#1mark{#7}{toc}{#1}{#2>\c@secnumdepth \protect\numberline{the#1\endcsname}#7}}\@xsect{#5}}  \def{\arabic{section}} \def{\thesection \arabic{subsection}} \def
