Disperse is a late-stage construction analytics startup currently already deployed on some of the largest construction sites in the world. In this article, we'd like to share with you our approach to AI in construction, and why we believe it is paying off.
But first: what we do…
To make it as simple as possible: we take 360º images of construction sites and analyse them. Our app then shows the results to the project team so they can decide to take action.
The AI-based system we use for our analyses is subject to human intervention at several stages. It's a collaborative system where the AI helps the humans be more productive, and the humans help the AI make fewer mistakes.
And the fact that the AI does make mistakes is exactly why we need humans in the loop during several parts of the data cleaning and preparation process, but also, and most notably, before we present our outputs to the customer.
Here is how the process works:
First, our systems ingest all the drawings and other relevant project documentation, and we assess when it makes the most sense for us to deploy on a project. When a project is just breaking ground, it is not yet very complex, and it's pointless to start scanning, as there's not much to keep track of. Likewise, when the superstructure goes up, one needs only to look out of the field office window to know which floor the work is on.
For many projects, the real complexities start after the superstructure has gone up. And that usually is when it makes most sense to begin scanning.
But there are some notable exceptions. Extremely large projects, for example, like some projects we are doing in Saudi Arabia, have so much going on that we get involved much earlier, even before the building gets underway.
To be able to present the project team with reliable insights, we need good data. And that's not easy to procure on a busy construction site.
We have observed that as a project progresses, time constraints often make it untenable for the project team to gather the data that we need to analyse the job site properly.
Boots-on-the-ground construction leaders are often so busy firefighting that menial tasks that require high consistency tend to fall by the wayside. After all, urgent trumps important any day on a job site at risk of slipping behind schedule.
That's why we send in our own people to gather the data. This way, we don't bother the project people. And data capture becomes its own discrete process, independent from the flow of construction.
Deploying our own specialised personnel with the dedicated assignment to capture a site from top to bottom also guarantees that the initial data is sound. And that's crucial.
So that's the first step; our scanning people come onto the site as regularly as they need. It is usually once a week but can also be more or less often. They walk the site with their 360º cameras according to a predetermined logic to systematically capture every nook and cranny of the project. Then that data gets uploaded to our servers.
Next, the photos are reconstituted as a visual twin of the site. This visual twin functions both as a real-time diagnostic tool that can be pulled up on screen to assess the current state of the work, as well as an historical record that enables you to go back in time, like in a picture book, to look behind walls and see how things have been constructed in the past.
It's an ironclad 'as-built' record of the project, which helps cut down on disputes and lawsuits. It also helps maintain the building years after completion since the visual record removes the guesswork from opening walls and ceilings in case of repairs or renovations.
There is a lot of Artificial Intelligence involved in the creation of the visual twin. The AI helps us determine where the pictures were taken. We improve the image quality using AI (in fact, Disperse has a patented 360 image levelling algorithm). Then we make sure the right images are layered on top of each other so we get the proper order of images through time. And of course, we also blur the faces of whoever shows up in the images.
But processing the data is only the beginning. The magic of our solution lies in how we operate on the processed data, and the kinds of outputs we are able to produce with it and feed back to the project team. This is where the hybrid system shines.
Our machine learning algorithms detect what is present in the photos; in particular, they identify all the different components and can assess what, exactly, the build status of those components is. How far is it done and is it done right?
Humans assist this process. In fact, Disperse employs a large team of architects and engineers that are constantly combing through the data to make sense of what is going on. In a way, they function as a remote site inspection team.
Our human-AI hybrid system can identify all the common issues that normally surface on a site walk. It determines where work has stalled and which things have not been installed to spec. It will even point out situations where a component still needs to be finished but located in an area that will soon become inaccessible due to other planned works.
The system will also tell you precisely how your work is progressing against the schedule and inform you about the week-on-week pace of each of your individual subcontractors.
There are other useful things that the technology is able to provide as well, such as the automatic creation of various kinds of site reports that normally need to be assembled manually.
But none of these outputs would be very valuable if our data collection and processing would not be error-free at the outset. And that's where our value-first approach to AI comes in.
When we began building our systems, we did not start with the potential capabilities of the system and asked what we could then do with them (an "AI first approach"). Instead, we asked: what is the value that we want our systems to provide?
To illustrate the difference, think of personal AI assistants like Siri or Alexa. Both systems are built with an AI-first approach. They can do a wide variety of things okayish (and many others not so well.)
If you ask Alexa to turn off the lights and it doesn't, you may need to repeat yourself or walk over to the switch. It may be annoying that it doesn't work, but it's hardly the end of the world. It's a low-complexity, low-risk problem where the value is also quite limited.
You would not deploy this type of technology in life-or-death situations. For example, you would never trust this kind of AI to operate a self-driving car. The risk is just too high if something goes wrong.
And like the problem space of self-driving cars, construction is also a high-stakes environment where the domain keeps shifting. While it won't directly cost lives if our technology gets it wrong, we still need it to be 100% reliable.
We require that 100% because if you have a thousand things going on at the job site and you miss ten of them, it is not acceptable. There may be costly downstream consequences because of those missed issues. But even more importantly: no one will trust a system that blunders even 1% of the time. And without trust, a system would have no future.
That's why we employ domain specialists like civil engineers and architects, who really understand the project, to ascertain that the AI output is valid before it goes to the customer.
Computer vision intelligence takes out a lot of the mundane tasks of processing and preparing data. Our domain specialists are effectively becoming like project managers doing inspections, but they do it on a highly optimised platform for analysing a lot of data.
As the AI learns, more and more tasks can be taken over by it. But we expect we'll always need humans in the loop to make the call when our AI models haven't seen certain situations yet.
Training an AI is a little like how a baby learns. You take a baby down the street in a stroller, and you point out: ‘blue car’, ‘red car’, etcetera. Eventually, the baby will be able to transfer what it learned to other domains, and it will recognise the sky as blue.
But for AI, this type of transference is not trivial, and you'll need to present it with many different scenarios before it's able to take old knowledge into new domains successfully. Since construction it's not a static environment it can be like a new domain every time.
There may be things in the way, be they people, machines or supplies, and this may vary from week to week. That's part of the difficulty with visual data as opposed to, for example, a Large Language Model trained on textual data.
With visual data, you'll often have a huge amount of noise you are not interested in, which needs to be processed. And there is a huge variance; even in images taken just metres apart, things can be completely different.
Suddenly people are turning up, there are changes in lighting, or rubbish lying around, and the difference in perspective means that the same object can now appear much larger or smaller. Also, some components look completely different based on the state that they are in. Some subcontractors bring in all components in one go, while others bring them in parts.
These things make analysing visual construction data much harder than analysing data from more predictable domains. But the good news is that the more visual data we collect, the easier this identification becomes.
Another interesting challenge that comes with developing computer vision technology for construction is that it needs to be practically usable. That means you can't just solve a subset of problems.
It's all good and well if a system can tell with 100% accuracy if the light fixtures have been installed. But if I am a Project Manager, I've got another 100 things going on at the same time. I am not going to want to fire up different applications to check on each of my components.
That's why this technology, to be truly useful, cannot solve just a small portion of problems. It needs to solve all of them and tell me the interdependencies between those problems.
So we have been collecting gigantic amounts of visual data from construction sites over the last couple of years and mapping every process out.
If we had taken an AI-first approach, and we would have offered AI outputs without humans in the loop to validate the outcome, getting things right would have been almost impossible. We would only have been able to offer approximations of what is happening on-site and we would not have been able to tell the real story.
But because we have taken the value-first route, we can now reliably identify over 600 components. Another advantage of the value-first approach is that you can start on projects outside the initial AI data training set much sooner.
For example, if the AI has only seen residential buildings, it will struggle with hospitals. With a value-first approach, you can start much sooner on such projects. Of course you need to put much more human labour in.
Eventually, all of this is a virtuous cycle. The AI is already able to do certain things; humans help it improve. In turn the AI helps the humans to get their work done faster.
Deploying Disperse on a site is like having a set of super-eyes on-site that never miss a thing. It affords project leaders an unprecedented level of control. It enables them to drive their site more effectively without burdening them with having to provide input for the system.
If you'd like to get acquainted with our products and discuss what we can do for your project, please do not hesitate to book a call with one of our experts.
If you'd like to know more about our scanning-based AI analytics platform solution, Impulse, please visit this page. And if you'd like to learn more about our short-term planning solution Lookahead, go here.
Mohsan Alvi, Chief Science Officer at Disperse