Datasets

We collected 11 datasets for evaluation of the graph matching algorithms. They come from applications in computer vision and bio-imaging. Together these datasets contain 451 problem instances. The following table gives an overview of their characteristics. We modified costs in several datasets to make them amenable to some algorithms, see the supplementary material in our paper. This modification results in a constant shift of the objective value for each feasible assignment, and, therefore, does not influence the quality of the solution.

**Dataset properties.** A “+” indicates that all problem instances of the dataset have the respective property. Meaning of properties: *#inst.*: number of problem instances; *#opt.*: number of known optima; *injective/bijective*: injective/bijective assignment is assumed; *non-pos.*: all costs are non-positive; *0-unary*: datasets with zero unary costs; *|V|* number of elements in V; *|L|/|V|*: ratio of the number of elements in L to the number of elements in V; *density (%)*: percentage of non-zero elements in the cost matrix C; *diag. dens. (%)*: percentage of non-infinite elements on the diagonal of C.
dataset	#inst.	#opt.	bijective	injective	non-pos.	0-unary	\|V\|	\|L\|/\|V\|	density (%)	diagonal dens. (%)
caltech-large	9	1		+	+	+	36-219	0.4-2.3	0.55	18.1
caltech-small	21	12		+	+	+	9-117	0.4-3	0.99	26.8
car	30	30	+		+		19-49	1	2.9	100
flow	6	6					48-126	≈1	0.39	15.8
hotel	105	105	+				30	≈1	12.8	100
house-dense	105	105	+				30	1	12.6	100
house-sparse	105	105	+		+	+	30	1	1.5	100
motor	20	20	+		+		15-52	1	3.8	100
opengm	4	4	+				19-20	1	74.8	100
pairs	16	0					511-565	≈1	0.0019	3.7
worms	30	28					558	≈2.4	0.00038	1.6

Below we give a brief description of each dataset. Along with the standard computer vision datasets with small-sized problems, hotel, house-dense, house-sparse, car, motor and opengm with \(|\mathcal V|\) up to 52, our collection contains the middle-sized problems flow, with \(|\mathcal V|\) up to 126, and the large-scale worms and pairs problems with \(|\mathcal V|\) up to 565.

Wide baseline matching

These datasets hotel, house-dense, house-sparse are based on a series of images of the same synthetic object with manually selected landmarks from different viewing angles based on the work by [Caetano09]. For hotel and house-dense we use the same models as in [Swoboda17] published in [Torresani14]. house-sparse consists of the same image pairs as Salg{house-dense}, but the cost structure is derived following the approach of [Zhang16] that results in significantly sparser problem instances. Graphs with the landmarks as nodes are obtained by Delaunay triangulation. The costs are set to \(c_{is,jl}=-\exp(-(d_{ij}-d_{sl})^2/2500)A^1_{ij}A^2_{sl}\) where \(d_{ij},d_{sl}\) are Euclidean distances between two landmarks and \(A^1\in\{0,1\}^{\mathcal V \times \mathcal V}\), \(A^2\in\{0,1\}^{\mathcal L \times \mathcal L}\) are adjacency matrices of the corresponding graphs. The unary costs are zero.

Keypoint matching

These datasets car and motor contains car and motorbike images from the PASCAL VOC 2007 Challenge [Everingham07] with the features and costs from [Leordeanu12]. We preprocessed the models by removing edges with zero cost, thereby reducing graph density substantially.

Large displacement flow

The flow dataset was introduced by [Alhaija15] for key point matching on scenes with large motion. We use the same keypoints and costs as in [Swoboda17]. You may also like to check the web-page of the project.

OpenGM matching

The opengm dataset is a set of non-rigid point matching problems by [Komodakis08], now part of the OpenGM Benchmark [Kappes15].

Caltech

The caltech dataset was proposed in [Cho10]. The data available at their project page contains the mutual projection error matrix \(D=(d_{is,jl})\), lists of possible assignments, and partial ground truth. We reconstructed the dataset from this data. Unary costs are set to zero. Pairwise costs for pairs of possible assignments are set to \(c_{is,jl} = -\max (50 - d_{is,jl},0)\). We divided the dataset into caltech-small and caltech-large, where all instances with more than 40000 non-zero pairwise costs are considered as large.

Worm atlas matching

The worms dataset has the goal to annotate nuclei of C. elegans, a famous model organism used in developmental biology, by assigning nuclei names from a known atlas of the organism. A detailed description can be found in [Kainmüller14]. We use the models from [Kainmüller17].

Worm-to-worm matching

The pairs dataset directly matches the cell nuclei of individual C. elegans worms to each other. The resulting models are much coarser than those of the worms dataset. We consider the same \(16\) problem instances as [Hutschenreiter21] using the models their project page.

Download Archive of All Datasets

Download: graph_matching_benchmark.7z (349MiB, SHA256: c86ffbdb285c3abe00a20d2622d36c4ed446944b617b3e3acada9f5d9b1f5466)

Please do not link directly to the files as the URL is subject to be changed. We keep this page updated and link to the current location of the data files.

References

[Torresani14]

L. Torresani, V. Kolmogorov, C. Rother. “Graph Matching Hotel/House Models”. [URL]

[Everingham07]

M. Everingham, L. Van Gool, C. Williams, J. Winn, A. Zisserman. “The PASCAL Visual Object Classes Challenge 2007 Results”. 2007. [URL]

[Komodakis08]

N. Komodakis, N. Paragios. “Beyond Loose LP-Relaxations: Optimizing MRFs by Repairing Cycles”. ECCV, 2008.

[Caetano09]

T. Caetano, J. McAuley, L. Cheng, Q. Le, A. Smola. “Learning Graph Matching”. PAMI, 2009.

[Cho10]

M. Cho, L. Jungmin, M. Kyoung. “Reweighted Random Walks for Graph Matching”. ECCV, 2010. [Website]

[Leordeanu12]

M. Leordeanu, R. Sukthankar, M. Hebert. “Unsupervised Learning for Graph Matching.” IJCV, 2012.

[Kainmüller14]

D. Kainmüller, F. Jug, C. Rother, G. Myers. “Active Graph Matching for Automatic Joint Segmentation and Annotation of C. elegans”. MICCAI, 2014.

[Alhaija15]

H. Abu Alhaija, A. Sellent, D. Kondermann, C. Rother. “GraphFlow — 6D Large Displacement Scene Flow via Graph Matching”. GCPR, 2015.

[Kappes15]

J. Kappes, B. Andres and F. Hamprecht, C. Schnörr, S. Nowozin, D. Batra, S. Kim, B. Kausler, T. Kröger, J. Lellmann, N. Komodakis, B. Savchynskyy, C. Rother. “A Comparative Study of Modern Inference Techniques for Structured Discrete Energy Minimization Problems”. IJCV, 2015. [Website]

[Zhang16]

Z. Zhang, Q. Shi, J. McAuley, W. Wei, Y. Zhang, A. van den Hengel. “Pairwise Matching Through Max-Weight Bipartite Belief Propagation”. CVPR, 2016.

[Kainmüller17]

D. Kainmüller, F. Jug, C. Rother, G. Myers. “Graph Matching Problems for Annotating C. elegans”. IST Austria, 2017. [URL]

[Swoboda17] (1,2)

P. Swoboda, C. Rother, H. Alhaija, D. Kainmuller, B. Savchynskyy. “A Study of Lagrangean Decompositions and Dual Ascent Solvers for Graph Matching”. CVPR, 2017.

[Hutschenreiter21]

L. Hutschenreiter, S. Haller, L. Feineis, C. Rother, D. Kainmüller, B. Savchynskyy. “Fusion Moves for Graph Matching”. ICCV, 2021. [Website]