Skip to content
Generic filters
Exact matches only

A peek behind inner workings of AI surveillance

Let’s take a recent project I worked on as an example so that I can explain the numbers and discuss the tech more easily.

The project was part of an automation roadmap for smart city development. A typical smart city surveillance system would contain a minimum of 300 cameras strategically positioned in key areas to monitor what is happening.

These cameras generate a lot of data every second and monitoring them manually becomes a huge task. That’s where my company comes in.

We develop vision processing systems supporting multiple compute hardware configurations to process these videos in real-time.

A typical smart city AI workload will contain human detection and tracking, vehicle monitoring with tagging and license plate recognition, anomaly detection, face detection with matching, etc to name a few. But all these workloads are pretty resource-intensive as is the case with any deep learning-based vision pipelines and would shoot up the price of the overall system as well.

An easy workaround is to avoid running all these workloads on every camera. Instead, we choose which cameras are relevant for different use cases and assign these cameras to each workload.

As an example, the cameras placed at entrances of buildings are appropriate to run a face recognition workload as the visibility of faces is best in this scenario. A camera on the side of the road is ideal for watching pedestrian movement and vehicles.

Hardware specification for every camera is also determined on the basis of workload it needs to handle. Now, where do you put this processing hardware? Well, you can have a variety of configurations, but the best we found was the edge computing architecture. This can be seen in the image below.

Image by Vysakh S Mohan

The above diagram showcases a barebones smart city deployment which has 3 main workloads. One to detect people moving across a camera view and count the total number of people who crossed that area in either direction. Another workload which detects and matches faces and finally a vehicle monitoring and sorting workload for traffic-related info.

Edge-based processing lets us tweak the software on camera-level and helps us scale the architecture easily.

Notice the difference in the number of cameras allotted to an edge device. This is usually based on workload limitations. For example, the face detection and the matching use case has a comparatively higher workload requirement than the vehicle monitoring workload, so the former can handle lesser cameras than the latter for real-time processing.

More workloads will be added according to the requirement from the end client and by using edge hardware, we can horizontally scale with the addition of more cameras or inclusion of new use cases.

As mentioned before the cameras for every workload are manually identified based on lots of different parameters, the major one being the total cost of ownership of such an infra.

Photo by Slejven Djurakovic on Unsplash

Every workload will be managed centrally, where the extracted information is streamed as small-sized data payloads to the datacentre server where they are preprocessed, sorted, and stored to a database. The stored data can be used by custom analytics engines to generate infographics like heatmaps, crowd analytics, flagging alerts based on facial matching, etc. This rule engine or analytics engine can be modified without risking downtime of any edge hardware, thus ensuring continuous operation.

The edge hardware is a key piece of the puzzle. While we work with multiple computing modules, the best choice for large scale deployment like the smart city use case, we found that Intel or NVIDIA platforms are the best.

We have closely worked with both companies on multiple projects and have found that hardware from both vendors complement each other very well and gives good flexibility.