From the domains of artificial intelligence and computer science comes computer vision.
Computer vision allows machines to interpret, infer, and understand visual information. Just like humans can see objects, computer vision can extract knowledge from a visual image, all thanks to applied mathematics.
So, how does computer vision work — how can machines see like humans? And what is computer vision relevant today? Let’s begin the discussion with the basic concepts.
At the most fundamental level, computer vision is about extracting knowledge from an image frame or a sequence of frames (like a video). So, what exactly is an image?
An image is a structured collection of pixels, where each pixel value defines the intensity or color at its location in the image. For example:
In other words, each RGB channel will consist of a 1920x1080 matrix with pixel values (0-255) representing the intensity of that channel.
Now, to create the image: in simple terms, an LED screen takes this matrix and lights up an LED element according to the corresponding location and power intensity. The result is a visual image — one that our eyes can see and our brain can comprehend based on our knowledge of the world.
That’s how images work for humans. But how do machines interpret pixel values?
Computer vision (also known as machine vision) helps machines to comprehend and infer knowledge from visual information. But how?
Computer vision is focused on three main problem categories in the pipeline of teaching machines how to see and interpret a visual image:
Representation creates a description of the image, describing it either mathematically or feature based. The goal of this task is to simplify the problem, by converting raw image pixels into meaningful structures.
These structures have “meaning” because we can define them mathematically, as described above — and that can be learned by computer vision learning algorithms.
Auto-encoders are a great example. An auto encoder simplifies complex images into basic shapes or features, similar to identifying key landmarks on a map. An auto-encoder model may decode (compress) raw image matrix into a lower dimension latent space that sees features such as:
These features are abstract and meaningful. They can be used for downstream tasks such as object detection, image classification, and more!
The details: A lot of image preprocessing and image processing goes into the computer vision pipeline before the dataset is ready for extracting representative features and then learning an AI model from those features. Preprocessing tasks may include:
Image processing tasks may include feature extraction and gradient detection-based techniques to extract features such as corners, edges, contrasts, and distinctive shapes. Common algorithms used here include:
Next, your models must learn, be trained. Computer vision relies on a variety of AI learning methodologies, including supervised and unsupervised learning.
Supervised learning maps an image distribution with its known class labels. These labels may be annotated manually. Datasets such as MNIST, COCO, ImageNet, and other domain-specific datasets do a great job for training models.
Once your models are trained, you can finetune them on problem-specific domains that may be similar to the datasets they are trained on — this is the recognition piece. For example:
Any standard deep learning model using CNNs and Transformers based models may be used for this task.
The unsupervised learning approach does not rely on labeled images. With no ground-truth label, the models learn the patterns and representations across object categories. (The computer figures out patterns on its own, not unlike how a child learns to sort by color or size.)
Common examples include:
Supervised vs unsupervised learning. Supervised learning is more common in real-world applications of machine vision, particularly due to reliable performance and accuracy.
However, most current state-of-the-art AI techniques focus on unsupervised learning. The reason is simple: image annotation is a tedious manual task and not scalable.
In solving complicated problems — like machine vision for autonomous vehicles — vehicle sensors must extract knowledge from virtually infinite image scenarios. This is where generative modeling techniques such as diffusion models can potentially help produce synthetic data on out-of-distribution image scenarios: all to help train robust models using supervised learning.
So, computer vision — machines seeing images — sounds like it could be very useful in the real world. And indeed it is. Here are some examples.
Healthcare: In medical imaging, computer vision assists in diagnosing diseases by analyzing X-rays, MRIs, and CT scans. It can detect anomalies, measure growths, and track changes over time, aiding healthcare professionals in making informed decisions.
Automotive: Autonomous vehicles rely heavily on computer vision to navigate and interpret the environment. Vision systems identify road signs, detect pedestrians, and monitor traffic conditions, ensuring safe and efficient driving.
Retail: Computer vision enhances the shopping experience by enabling features like virtual try-ons and automated checkout systems. It also helps in inventory management by monitoring stock levels and detecting misplaced items.
Security: In physical security and surveillance, computer vision identifies and tracks individuals, detects unusual activities, and analyzes crowd behavior. Facial recognition systems enhance security by verifying identities and granting access to authorized personnel. (Now, the legal and privacy ramifications of how this is used — that’s a different topic.)
Agriculture: Vision technology monitors crop health, detects diseases, and assesses soil conditions. Drones equipped with cameras provide valuable insights into large farming areas, optimizing resource allocation and improving yield.
The latest trends in computer vision are focused on three research domains: Agentic AI, spatial computing and multimodal LLMs.
See an error or have a suggestion? Please let us know by emailing splunkblogs@cisco.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.
Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.