next up previous contents
Next: References Up: High-level vision Previous: Basic Ingredients

Theories of Object Recognition

The object as a whole must be segmented as a part, the shapes of the parts and their interrelations must then be represented in a way that is suitible for indexing a catalogue of visual categories.

Hoffman and Richards showed that objects naturally can be segmented into parts prior to describing the shape of the parts. The Transversality priciple applied that that when a object consisting of two parts is stucked together, the joint between the two parts forms discontinuity that is concave or has it's greatest curvature to the surrounding area. But with the extension of the principle that points where the curvature is greatest divides images into parts we do not have to precede a segmentation into parts.

The approach to represent shapes is widely shared. Instead of representing the shape of the entire object, the problem is split into representing the spatial locations of the parts and their shapes. The tecnique of ``generalized cylinders'' plays a key role in the theories of shape. The development of a cylinder can be done by moving a circle along a straight line that is perpendicular to it's center. We can generalize this notion to be valid for any two-dimensional shape along an axis, be able to contract or expand ro let the axis be curved. Another simplification to the tecnique is made if we restrict ourselves to describe parts separted by local concavities.

Further research by Biederman succesfully verified the theory of cylinder representation and discovered that a rapid objec identification can be made by four parameters, egdes, symmetry and size of the crossection and the curvature of axis. This scheme, consisting of four parameters is able to generate 36 different types of parts. If we assume that objects typically have more than one part we can build several million objects combining the different types of parts.

The next step is to provide a process describing the arrangements of parts into entire objects. This is one of the most difficult problems of high-level vision. One approach is the hierarchical representation, where some parts are represented relative to or as part of other parts. For example, an arm consists of the upper arm, the forearm and the hand, also the arm is a part of the body. Some representations in LTM is called canonical because they consists of one single description which characterizes the object independent of viewpoint.

If the LTM catalogue now is build up on basis of hierachical and cylindral representation, the remaining process is matching an object-centred representation built on low-level information with the knowledgebase. One problem that can occur is that some of the needed parameters is distorted by the viewpoint or parts of the object may be hidden. These two problems in the best case only delays the retrievement. Mental rotation may also be needed in some cases when the object is asymmetric or complex[4].


next up previous contents
Next: References Up: High-level vision Previous: Basic Ingredients

Andre Henriksson
Mon Jan 13 21:05:31 MET 1997