Open Fuse

Modern projects in the area of computer vision are commonly being backed by OpenCV, an open source library that incorporates a wide variety of computer vision algorithms. This library is currently almost omnipresent, with the download tracker on the website of the project indicating more than ten million downloads. The success and ubiquitous usage is no coincidence in this regard, as it provides a solid foundation for projects within the area of computer vision.

Even though there is a common scheme within popular image fusion techniques as it has been determined by [1], there is no appropriate feature support within OpenCV. There has also no framework for the implementation of image fusion in particular been initiated yet, despite the fact that the proposed image fusion techniques share a certain similarity.

An extensible framework is therefore consequently being proposed that is ought to streamline the current modus operandi within the area of image fusion.

# 1 - Analysis

In order to determine the conditions that need to be met by the proposed framework, an analysis of the requirements needs to be done. This requirements analysis is however simplified, such that the scope restrictions of this chapter can accordingly be met. No stakeholders are therefore identified, which voids the usual incorporation of stakeholder reviews. As an appropriate alternative, the requirements are however being derived from proposals within the area of image fusion.

A set of domain requirements can in this regard be specified, which are based on an overview of image fusion by [2] in which a generic framework is being described.

REQ 1.1 - Image Alignment:

In order to establish a correspondence between the involved images such that their joint information can be utilized, it is necessary to handle any present dissimilarities by being able to align them accordingly.

REQ 1.2 - Image Overlay:

Some steps within the fusion pipeline such as the partitioning of the captured scene into segments rely on a single input image. It is therefore required to be able to create an intermediate overlay that can be used until the fused output image has been retrieved.

REQ 1.3 - Image Segmentation:

As some image fusion techniques utilize segments in order to achieve a content awareness, it is accordingly necessary that the framework is capable of performing an image segmentation.

REQ 1.4 - Feature Measure:

Since suitable components need to be identified such that these can eventually be merged, feature measures that are based on feature criteria need to be incorporated.

REQ 1.5 - Feature Selection:

While feature measures provide information about the characteristic of a given component, their selection should be externalized into a separate step in order to be able to evaluate combinations of various feature measure and feature selection strategies.

REQ 1.6 - Image Blending:

In order to merge the selected components such that a single output image is being generated, the framework needs to incorporate common image blending techniques.

Several functional requirements can subsequently be defined in order to specify what the framework should be capable of doing. Due to the initially mentioned scope constraints of this chapter, the functional requirements are in this regard not being articulated in form of use cases as it is commonly being done.

REQ 2.1 - Color Aware:

Input images as well as certain algorithms might depend on a particular color space or even just a single channel of intensity values, which is why the framework needs to be able to store and covert images appropriately.

REQ 2.2 - Pixel Aware:

One of the first image fusion techniques as being proposed by [3] uses pixels as a base unit. It is accordingly necessary to be able to perform an image fusion on a pixel per pixel basis.

REQ 2.3 - Segment Aware:

More recent image fusion techniques as for example being proposed by [4] utilize segments as a base unit. The framework therefore needs to be able to handle such groups of corresponding pixels.

REQ 2.4 - Iterative:

There are certain image fusion techniques that require an iterative repetition of steps within the framework, such as it has been proposed by [5] in which a genetic algorithm is being utilized.

REQ 2.5 - Interactive:

Image fusion approaches exist such as having been proposed by [6], in which the user is being asked to make a particular decision. It is therefore required to be able to incorporate a user feedback loop within the framework.

Non functional requirements can eventually be specified, which declare the overall qualities of the framework. Contrary to functional requirements, it is therefore not being specified what the framework should be capable of doing but instead how it should perform in this regard.

REQ 3.1 - Generic:

The framework should be generic, such that it should be possible to implement existing image fusion algorithms without having to adapt their modus operandi.

REQ 3.2 - Extensible:

It should easily be possible to extend the framework in order to provide additional mechanisms that were not incorporated before.

REQ 3.3 - Efficient:

An implementation that utilizes the framework should not be significantly slower than a custom implementation. The introduced overhead should therefore be minimal.

REQ 3.4 - Intuitive:

It should be intuitive to implement an image fusion algorithm, such that it is easy to get started and utilize the framework.

REQ 3.5 - Collaborative:

A community platform should be incorporated, such that discussions and any form of contribution in general is being centralized.

With these outlined requirements, enough information should be accumulated for the subsequent design and implementation of the framework. Note that the given requirement specification is not entirely verifiable, which is acceptable however as it is no validation phase is being intended.

# 2 - Design

The architecture of the framework subsequently needs to be defined through the specification of software artifacts, such that it can be ensured that the previously determined requirements are being satisfied. Analogue to the requirements analysis, a simplified software design is being presented in order to meet the scope restrictions of this chapter. The software design is therefore not broken down into multiple layers and no formal software design document is being created.

The requirements req 3.1 and req 3.2 do in this regard influence the entire architecture of the framework, as they impose a modular design. The framework is therefore based on a single System instance that contains references to SystemObject instances which represent modules (see figure 1). A workspace is furthermore being maintained, which consists of a set of Image sequences. In doing so, every module instance is being given its own view of the workspace, such that Image sequences can be passed around conveniently. With reference to the efficiency requirement req 3.3, this is being done by utilizing references without the introduction of a significant overhead. The involved Image sequences do furthermore not enforce a particular type, such that the color awareness requirement req 2.1 can be satisfied.

The modular nature of the framework enables a straightforward integration of arbitrary algorithms through the incorporation of additional modules that inherit from the SystemObject class. In doing so, each module is being given a unique name in order to be able to reference them unambiguously. In addition to the previously outline workspace, each module furthermore associates a set of settings. This set of Value instances can eventually be utilized as a key value store for arbitrary parameters.

Regarding the domain requirements req 1.1 through req 1.6, various techniques for each domain requirement can be integrated through additional modules. With reference to the intuition requirement req 3.4, an image fusion algorithm can subsequently be implemented through the interconnection of a set of modules. A pixel based image fusion pipeline can for example be implemented by incorporating an image alignment module, before a feature measure module estimates a weight map from which suitable features are being selected that are eventually being merged in an image blending module (see figure 2). A segment based image fusion pipeline can likewise be implemented by adding an intermediate overlay and segmentation module, such that the feature measure module can utilize these segments (see figure 3). With reference to this two example image fusion pipelines, the pixel awareness requirement req 2.2 as wall as the segment awareness requirement req 2.3 can accordingly be satisfied.

It has however not yet been described, how the modules are actually being interconnected in order to be able to appropriately implement an image fusion algorithm. A scripting approach has been chosen in this regard, such that the flexibility of the framework is being maximized. An image fusion algorithm can therefore conveniently be stored in a file that contains the script, such that the System instance only needs the path to this file. In doing so, the framework appropriately links the available modules to the scripting environment in order to make them accessible. Due to the capabilities of a scripting approach, the framework furthermore satisfies the iteration requirement req 2.4 as well as the interaction requirement req 2.5 without having to be extended.

# 3 - Implementation

After the architecture of the framework has been specified, the subsequent implementation phase can be described. In doing so, the discussion is however being kept succinct in order to meet the scope restrictions of this chapter. No descriptions of the software testing activities are therefore being made.

The implementation has eventually been done in C++ while utilizing Qt, a cross platform application framework. This additional layer of abstraction enables the incorporation of multiple build targets, such that the proposed framework can be distributed on a wider range of platforms. In order to provide a common ecosystem for researchers in computer vision, OpenCV is furthermore being introduced as a foundation for the development of modules. A default set of modules has in this regard been implemented that is based on a selection of common image fusion algorithms (see table 1).

 module reference OpenCV AlignEcc [7] ✓ AlignFeature [8] ✓/✗ AlignInensity [8] ✗ OverlayMean N/A ✗ OverlayMedian N/A ✗ SegmentEgbis [9] ✗ SegmentMosaic N/A ✗ SegmentSeeds [10] ✗ SegmentSlic [11] ✗ MeasureEol [12] ✗ MeasureFswm [13] ✗ MeasureGelfand [14] ✗ MeasureMertens [15] ✗ MeasureSalience [16] ✗ MeasureSf [4] ✗ MeasureSml [17] ✗ MeasureTenengrad [18] ✗ SelectMax N/A ✗ SelectMedian N/A ✗ SelectMin N/A ✗ SelectNormalized N/A ✗ BlendFeather [19] ✓ BlendMultiband [20] ✓/✗ BlendMultiedit N/A ✗ BlendPoisson [21] ✗ BlendSum N/A ✗ MiscAbsolute N/A ✓ MiscCopy N/A ✓ MiscColorspace N/A ✓ MiscConvert N/A ✓ MiscGaussian N/A ✓ MiscLaplacian N/A ✓ MiscNormalize N/A ✓ PipelineHighdyn [22] ✓ PipelineMertens [15] ✓/✗

Table 1: An overview of the default modules with a reference for supplementary information and an indicator that signals whether the corresponding module is an adapter for an existing OpenCV feature.

With reference to the scripting approach, JavaScript has been selected as the primary scripting language due to its popularity within the TIOBE index. In doing so, the literal notation of objects within JavaScript provides a suitable way for specifying a workspace configuration as well as an optional collection of settings for each module that is ought to be executed (see listing 1). In doing so, the workspace configuration object establishes an association between the Image sequences from the global workspace and the local workspace of the module. The involved sequences are in this regard conveniently being referenced by name and are given a type information, which will be used in order to covert the Image instances before the module is being executed.

FileLoad.exec({ 'matOut': 'matLoad[32F3]' }, { ... });

AlignIntensity.exec({ 'matIn': 'matLoad[32F3]',
'matOut': 'matLoad[32F3]' }, { ... });

MeasureMertens.exec({ 'matIn': 'matLoad[32F3]',
'matOut': 'matMeasure[32F1]' }, { ... });

SelectNormalize.exec({ 'matIn': 'matMeasure[32F1]',
'matOut': 'matSelect[32F1]' }, { ... });

BlendMultiband.exec({ 'matIn': 'matLoad[32F3]',
'matOut': 'matBlend[32F3]',
'matSelect': 'matSelect[32F1]' }, { ... });

FileSave.exec({ 'matIn': 'matBlend[8U3]' }, { ... });

Listing 1: An example script that implements a pixel aware image fusion pipeline in which the settings objects have been omitted.

Concerning the type information that is being utilized within the workspace configuration object, a regular expression $$type$$ can vaguely be used in order to specify valid options (see equation 1). The type information does therefore consist of the bit size, the data type as well as the number of channels per pixel. The involved data type is furthermore being limited to unsigned integer values U and signed integer values S as well as floating point values F, which are transparently being used to convert the Image instances. Note that the utilized regular expression only specifies the available choices vaguely, as traditional floating point formats always require a memory chunk with at least four bytes. $$type = (8|16|32)[USF][123]$$\$ The collaboration requirement req 3.5 has eventually not been addressed yet. With reference to the software configuration management, Git has been used in order to maintain the revisions of the framework. In doing so, the source repository can eventually be published on GitHub in order to distribute the framework. This hosting service does in this regard provide an additional set of features that can be used accordingly, such that the collaboration requirement req 3.5 can be satisfied. An openly accessible issue tracker, a system for the management of pull requests as well as many other features can therefore effectively be used in order to form a centralized community platform.

# References

[1]

H. Mitchell, Image fusion: Theories, techniques and applications, 1st ed. Springer, 2010.

[2]

R. Blum and Z. Liu, Multi-sensor image fusion and its applications, 1st ed. CRC Press, 2005.

[3]

E. Adelson, C. Anderson, J. Bergen, P. Burt, and J. Ogden, “Pyramid method in image processing,” RCA Engineer, vol. 29, no. 6, pp. 33–41, 1984.

[4]

S. Li, J. Kwok, and Y. Wang, “Combination of images with diverse focuses using the spatial frequency,” Information Fusion, vol. 2, no. 3, pp. 169–176, 2001.

[5]

J. Zhang, X. Feng, B. Song, M. Li, and Y. Lu, “Multi-focus image fusion using quality assessment of spatial domain and genetic algorithm,” in IEEE conference on human system interactions, 2008, pp. 71–75.

[6]

A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A. Colburn, B. Curless, D. Salesin, and M. Cohen, “Interactive digital photomontage,” ACM Transactions on Graphics, vol. 23, no. 3, pp. 294–302, Aug. 2004.

[7]

G. Evangelidis and E. Psarakis, “Parametric image alignment using enhanced correlation coefficient maximization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 10, pp. 1858–1865, Oct. 2008.

[8]

R. Szeliski, “Image alignment and stitching: A tutorial,” Foundations and Trends in Computer Graphics and Vision, vol. 2, no. 1, pp. 1–104, Jan. 2006.

[9]

P. Felzenszwalb and D. Huttenlocher, “Efficient graph-based image segmentation,” International Journal of Computer Vision, vol. 59, no. 2, pp. 167–181, Sep. 2004.

[10]

M. Bergh, X. Boix, G. Roig, B. Capitani, and L. Gool, “SEEDS: Superpixels extracted via energy-driven sampling,” in European conference on computer vision, 2012, pp. 13–26.

[11]

R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk, “SLIC superpixels compared to state-of-the-art superpixel methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2274–2282, Nov. 2012.

[12]

M. Subbarao and J.-K. Tyan, “Selecting the optimal focus measure for autofocusing and depth-from-focus,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 864–870, Aug. 1998.

[13]

K.-S. Choi, J.-S. Lee, and S.-J. Ko, “New autofocusing technique using the frequency selective weighted median filter for video cameras,” IEEE Transactions on Consumer Electronics, vol. 45, no. 3, pp. 820–827, Aug. 1999.

[14]

N. Gelfand, A. Adams, S. H. Park, and K. Pulli, “Multi-exposure imaging on mobile devices,” in ACM international conference on multimedia, 2010, pp. 823–826.

[15]

T. Mertens, J. Kautz, and F. Reeth, “Exposure fusion,” in Pacific conference on computer graphics and applications, 2007, pp. 382–390.

[16]

P. Burt and R. Kolczynski, “Enhanced image capture through fusion,” in IEEE international conference on computer vision, 1993, pp. 173–182.

[17]

S. Nayar and Y. Nakagawa, “Shape from focus,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 8, pp. 824–831, Aug. 1994.

[18]

J. Tenenbaum, “Accommodation in computer vision,” PhD thesis, Stanford University, 1971.

[19]

D. Milgram, “Computer methods for creating photomosaics,” IEEE Transactions on Computers, vol. 24, no. 11, pp. 1113–1119, Nov. 1975.

[20]

P. Burt and E. Adelson, “A multiresolution spline with application to image mosaics,” ACM Transactions on Graphics, vol. 2, no. 4, pp. 217–236, Oct. 1983.

[21]

P. Pérez, M. Gangnet, and A. Blake, “Poisson image editing,” ACM Transactions on Graphics, vol. 22, no. 3, pp. 313–318, Jul. 2003.

[22]

P. Debevec and J. Malik, “Recovering high dynamic range radiance maps from photographs,” in ACM conference on computer graphics and interactive techniques, 1997, pp. 369–378.