Start Your Coding for Computer vision with Python
We, human beings, perceive the environment and surroundings with our vision system. The human eye, brain, and limbs work together to perceive the environment and act accordingly. An intelligent system can perform those tasks which require some level of intelligence if done by a human. So, for performing intelligent tasks, artificial vision system is one of the important things for a computer. Normally, the camera and image are used to gather information needed to do the job. Computer vision and Image processing techniques help us to perform similar tasks done by humans, like image recognition, object tracking, etc.
In computer vision, the camera works as a human eye to capture the image, and the processor works as a brain to process the captured image and generate significant results. But there is a basic difference between humans and computers. The human brain works automatically, and intelligence is a by-born acquisition. On the contrary, the computer has no intelligence without human instruction (program). Computer vision is the way to provide the appropriate instruction so that it can work compatible with the human vision system. But the capacity is limited.
In the upcoming sections, we will discuss the basic idea of how the image is formed and can be manipulated using python.
Table of ContentsClick on the Contents and Directly Land on the Section
- How Image is Formed and Displayed
- How does Computer Store Image in the Memory?
- Grayscale and Colored Image
- NumPy Basics to Work with Image
- OpenCV Basics
- Play with NumPy
The image is nothing but a combination of pixels with different color intensities. The jargon ‘pixels’ and ‘color intensity’ may be unknown to you. Don’t worry. It will be crystal clear, just read the article till the end.
Pixel is the smallest unit/element of the digital image. Details are in the image below.
Image By AuthorThe display is formed with pixels. In the above figure, there are 25 columns and 25 rows. Each small square is considered a pixel. The setup can house 625 pixels. It represents a display with 625 pixels. If we shine the pixels with different color intensity (brightness), it will form a digital image.
How does the computer store the image in the memory?If we look at the image carefully, we can compare it with a 2D matrix. A matrix has rows and columns, and its elements can be addressed with its index. The matrix structure is similar to an array. And computer store the image in an array of computer memory.
Each array element holds the intensity value of a color. Generally, the intensity value ranges from 0 to 255. For demonstration purposes, I have included an array representation of an image.
Sample Array Representation of a Grayscale Image (Image By Author)Grayscale and Colored ImageThe grayscale image is a black-and-white image. It is formed with only one color. A pixel value close to 0 represents darkness and becomes brighter with higher intensity values. The highest value is 255, which represents the white color. A 2D array is sufficient to hold the grayscale image, as the last figure shows.
Colored images can’t be formed with only one color; there might be hundreds of thousands of color combinations. Mainly, there are three primary color channels RED (R), GREEN(G), and Blue(B). And each color channel is stored in a 2D array and holds its intensity values, and the final image is the combination of these three color channels.
RGB Color Channel (Image By Author)This color model has (256 x 256 x 256) = 16,777,216 possible color combinations. You may visualize the combination here.
But in computer memory, the image is stored differently.
Image Stored in Computer Memory (Image By Author)The computer doesn’t know the RGB channels. It knows the intensity value. The red channel is stored with high intensity, and the green and blue channels are stored with medium and low-intensity values, respectively.
NumPy Basics to Work with PythonNumPy is a fundamental python package for scientific computation. It works mainly as an array object, but its operation isn’t limited to the array. However, the library can handle various numeric and logical operations on numbers [1]. You will get NumPy official documentation here.
Let’s start our journey. First thing first.
- Importing the NumPy library.
It’s time to work with NumPy. As we know, NumPy works with an array. So, let’s try to create our first 2D array of all zeros.
- Creating NumPy Array
It’s as simple as that. We can also create a NumPy array with all ones just as follows.
Interestingly, NumPy also provides a method to fill the array with any values. The simple syntax array.fill(value) can do the job.
The array ‘b’ with all ones is now filled with 3.
- The Function of Seed in case of Random Number Generation
Just have a look at the following coding examples.
In the first code cell, we have used np.random.seed(seed_value), but we haven’t used any seeding for the other two code cells. There is a major difference between random number generation with and without seeding. In the case of random seeding, the generated random number remains the same for a specific seed value. On the other hand, without a seed value, random number changes for each execution.
- Basic operations (max, min, mean, reshape, etc.) with NumPy
NumPy has made our life easier by providing numerous functions to do mathematical operations. array_name.min(), array_name.max(), array_name.mean() syntaxes help us find an array’s minimum, maximum, and mean values. Coding example —
Indeies of the minimum and maximum values can be extracted with the syntaxes array_name.argmax(), array_name.argmin(). Example —
Array reshaping is one of the important operations of NumPy. array_name.reshape(row_no, column_no) is the syntax for reshaping an array. While reshaping the array, we must be careful about the number of array elements before and after reshaping. In both cases, the total number of elements must be the same.
- Array Indexing and Slicing
Each array element can be addressed with its column and row number. Let’s generate another array with 10 rows and columns.
Suppose we want to find the value of the first value of the array. It can be extracted by passing the row and column index (0 , 0).
Specific row and column values can be sliced with the syntax array_name[row_no,:], array_name[:,column_no].
Let’s try to slice the central elements of the array.
OpenCV BasicsOpenCV is an open-source python library for Computer Vision developed by Intel [2]. I will discuss a few usages of OpvenCv though its scope is vast. You will find the official documentation here.
I have used the following image for demonstration purposes.
Image by jackouille21 from Pixabay- Importing OpenCV and Matplotlib library
