Project Overview
This project focuses on colorizing the Prokudin-Gorskii photo collection by aligning the color channels of historical black-and-white images. The original black-and-white images consist of film strips of three images, each taken of the same scene, but through a red, blue, and green filter respectively. The approach I underwent combines these images into a colorized version and involves several key steps that will be broken down.
Approach Breakdown
Configuration
The configuration section of the code determines how image processing is handled based on user input:
PROCESS_SINGLE_IMAGE
: This variable allows the user to specify whether to process a single image or multiple images within a directory. If set toTrue
, the program will process only one image that is specified. If set toFalse
, the program will process all images in a given directory. This configuration provides flexibility for different processing needs.- File Types: The program is designed to handle various image file formats, as specified by a configurable list of acceptable file extensions. This ensures that the processing can be applied to different types of image files, depending on the requirements.
Alignment Methods
The alignment methods are tailored to handle images of different sizes:
1. Smaller Images
For smaller images, the alignment approach was sufficient using the L2 norm:
- Search Range: A search range is defined to explore different shifts of the image.
- Shifting and Comparison: The algorithm shifts one image by various amounts within the defined search range and calculates the L2 norm, which measures the difference between the shifted image and the reference image. The goal is to find the shift that results in the smallest L2 norm score, indicating the best alignment.
- Optimal Shift: The shift that achieves the lowest L2 norm score is chosen as the optimal alignment. The image is then adjusted according to this shift, ensuring the best possible alignment with the reference image.
2. Larger Images
For larger images, the alignment method is more advanced as the L2 approach could end up costly:
- Cross-Correlation with FFT: To determine how well two images align, cross-correlation is used. This measures the similarity between the two images as one is shifted over the other. Cross-correlation is performed by multiplying the FFT of the reference image with the conjugate of the FFT of the image to be aligned and then taking the inverse FFT of the result. Even though it is mathematically equivalent to convolution in the spatial domain it ended up being computationally more efficient.
- Gaussian Pyramid: A Gaussian pyramid is created to handle images at multiple scales. By processing images at various resolutions, the alignment can be refined from coarse to fine scales. This method improves accuracy for large images.
- Pyramid Alignment: The alignment starts with the coarsest level of the pyramid and proceeds to finer levels. This multi-scale approach helps in aligning large images, addressing both global and local alignment issues.
Processing Steps
The processing stage consists of several key steps to prepare and finalize the colorized images:
- Normalization: Pixel values of the images are standardized to ensure consistency. This step involves scaling pixel intensities so that they are on a common scale, which is essential for accurate image alignment and colorization.
- Splitting Channels: The image is divided into its RGB (Red, Green, Blue) channels. Each channel is processed separately to facilitate alignment and color adjustment.
- Aligning: Depending on the image size, the appropriate alignment method is applied. Smaller images use the L2 norm approach, while larger images use cross-correlation with FFT and Gaussian pyramids.
- White Balancing: This step corrects color imbalances by setting the brightest color in the image to white. The colors are then scaled based on this adjustment to ensure realistic color representation.
- Stacking and Saving: The aligned RGB channels are combined into a single color image. The final image is saved as a JPEG file, which helps manage file size, and is stored in the specified output directory.
Results
Results of the colorization process from the provided data folder for this project:
Bells and Whistles
White Balance: In this project, a white balance function was implemented to correct color
imbalances in the images -- thereby improving image quality overall. The function, white_balance_white
, is based on a
method discussed in lecture, which assumes that the brightest pixel in the image represents white.
- The function determines the maximum intensity values for each color channel (red, green, blue) in the image.
- It calculates scaling factors by finding the inverse of these maximum values. This helps ensure that the brightest pixel in each channel is scaled to white.
- These scaling factors are applied to the respective color channels to adjust the intensities. The image values are then clipped to ensure they stay within the valid range [0, 1].
Before:
After:
Custom Results
Results of additional colorization processes, beyond the provided data for the project:
Commentary
While many images have been successfully colorized, some show misalignment issues that need to be addressed. For instance, discrepancies in alignment can be observed due to the scoring mechanism used, which did not fully account for the varying contrast levels of the RGB channels. This contrast difference affects the scoring during the pyramid alignment process.
One notable example is the image emir.tif, where the alignment is close but not perfect. The patterns and textures in the outfit of the subject cause the displacement vectors to shift slightly, resulting in a misalignment of the color layers. This issue arises, as I believe, due to the fact that the alignment algorithm struggles with contrasting patterns, which complicates achieving precise alignment.