VLMs are finally giving AEC its 'AI will change everything' moment
Construction is ~5% of our economy and software captures ~10bps of total spend. AI massively expands the market for AEC software, and there’s a new wave of $10Bn+ companies coming.
Historically, AEC software has been relegated to “document storage and recreating pen-and-paper workflows.” Why? Because the core data involved in construction processes is complex, heterogeneous, and image-based. Cloud software struggled to do anything beyond store that data and enable some more-efficient-than-pen-and-paper-but-still-tedious workflows.
LLMs faced a similar limitation, and early AEC AI tools didn’t explode like they did in medicine, legal, coding, and customer service (barring a few notable exceptions). Why? Construction is more than just “reading and writing with extra steps,” so the early LLM products just weren’t that great. Getting to that “oh shit, this is going to change everything” moment in AEC requires AI that can understand both text and image-based inputs and combine semantic, spatial, geometric, and numeric reasoning to take workflows end-to-end.
Vision-language models (VLMs) are what AEC’s been waiting for. VLMs enable stunningly cool products across all parts of the construction life cycle, from documentation creation to permitting to takeoffs, estimation, and procurement to project management. Category defining companies will be built over the coming years, and we’re incredibly excited to invest at the intersection of AI and the built world. If you want to know more, read on! If you’re building and investing here, drop me a line. I’d love to hear from you.
What’s the deal with VLMs?
How the tech works and its limitations for building vertical apps are important to understanding what’s possible, so let’s spend a minute there. The simple way to think of it is that VLMs let you reason and execute workflows based on text + image inputs, just like an LLM lets you do so based on textual inputs. When you pair a VLM with deterministic tools to make up for its shortcomings, you can build a whole host of new AI applications for industries that rely on executing precise, high stakes workflows with multimodal inputs.
Now getting into the weeds. Beginning in late 2023 / early 2024 with GPT-4V and then 4o, frontier models shifted from LLMs to VLMs. A Vision-Language Model (VLM) takes in an image + a text prompt and outputs a response. Effectively, a VLM extends the reasoning capabilities of an LLM to images and lets us move beyond computer vision-based classification (“hot dog / not hot dog”) into interpretation and reasoning (“is the hot dog cooked properly?”). Practically, the model does this by breaking an image down into small chunks, translating those into LLM-readable tokens, and finally combining the image + text tokens into a single context window that the model responds to. At this point, most of the frontier models are VLMs, not just LLMs.
The implications of moving images into LLM-land are profound. Every ~six months, foundation models see a meaningful improvement in general-purpose semantic reasoning. Since VLMs make images LLM-readable, vision workflows inherit those advances too: longer task horizons, tool calling, more capable orchestration - everything that makes LLM-based apps so useful. That’s a big deal for physical-world AI where solving real problems requires AI to grok both text and image inputs.
A subtle but important point is that just as VLMs inherit LLMs’ strengths, they also inherit their weaknesses. Most enterprise LLM deployments target text-based workflows with fuzzy ground truth - there isn’t an “objectively correct” legal draft or medical note, and so long as there are no hallucinations or factual inaccuracies, there are multiple acceptable outputs. By contrast, most vertical image workflows depend on domain-specific ground truth and precise measurements: the number of construction symbols; the length of a wall; etc. Out of the box, LLMs - and therefore VLMs - are bad at counting and math and often struggle with deterministic geometry and precise measurement. You can’t just point a VLM at some construction plans and expect it to reliably produce quantities, timelines, materials, etc. In practice, automating even seemingly mundane text + image workflows requires data and task/customer-specific tools - arguably far more so than text-based apps. The VLM interprets and routes; deterministic tools enforce measurement and ground truth.
The obvious tradeoff here is the labs building these capabilities into the model itself. My take is that labs can and will commoditize semantic understanding and reasoning but will struggle with vertical procedures + deployment data + tooling. VLM reasoning happens in semantic space, but the output for most vertical image workflows relies on semantic + numeric + spatial reasoning with compounding errors and vertical-specific procedures. General reasoning improvements won’t close that gap, tools do (geometry engines, specific extractors, counting assistance, hardcoded rules, etc.), and it will be hard to encode the “right” procedures in-model.
Even if you could encode those procedures in-model, doing so would require lots of vertical-specific data that’s extremely hard to get without customer deployments. The labs’ current playbook - paying bankers $250/hour to generate contrived Excel models for post-training - works because you can invent faithful examples. It is much harder to pay an architect $250/hour to generate realistic construction documents given the number of parties and constraints involved.
The net effect is that VLMs offer a step-function change in what vertical software founders can build for real-world industries using off-the-shelf models. Going from 80% to 100% accuracy requires meaningful proprietary data and tooling, more so than text-only apps, and although that increases complexity, you’re probably compensated - at least in the short term - with defensibility.
How do VLMs affect AEC?
Now let’s get into what folks are actually building with VLMs.
Design, concepting, and documentation: VLMs speed up everything from design renders to finalizing documentation
The start of any AEC project is creating plans, a months-to-years long process that AI can speed up.
The first step is creating the initial design concepts and 3D renders. If you haven’t seen it yet, the recent Arcway launch is very cool: a real-time, 3D design and sales environment for homebuilders. New construction homes are often sold before they’re built with custom finishings for a buyer, and rendering custom designs is a meaningful bottleneck in the sales process. With VLMs, you can picture an architecture (no pun intended) that looks something like:
- Designer creates initial floorplan
- VLM takes in floorplan and breaks it down into pieces (geometry, finishes, etc.), likely with the help of a custom geometry engine / floorplan-understanding tools
- VLM calls tools to render a 3D model that’s true to the floorplan in an interactive, browser-based environment
This highlights the power of VLMs quite well. It’s not that “foundation models can generate a home,” it’s “foundation models provide the baseline to understand the inputs that go into generating a home and then call the tools required to (a) make it true-to-plan and (b) actually do the rendering.”
You can take that concept and extend it further down the documentation pipeline. Once the project begins, concepting needs to turn into plans. AI is a ways away from generating plans end-to-end, but you can picture an agent that helps with drafting plans and optimizing resources:
- Input a 3D render (a full home, MEP routing boxes in Revit, etc.)
- VLM takes in the render and combines components with the broader project context
- VLM calls tools to create Revit documentation based on render + project context
And an agent that takes over QA review work as plans are finalized:
- Input plans and project context (e.g. initial specs, code requirements with jurisdictional quirks, etc.)
- VLM checks plans for various quality measures, calling separate tools as necessary:
- Completeness and contradiction for plans vs. specs
- Conflicts between trades (e.g. architectural vs. structural vs. MEP)
- Code compliance
- VLM outputs a prioritized issue list for architects and engineers to review
The net result of all this is faster time from concept → breaking ground while improving, not compromising, quality. Strong execution on initial documentation ripples through the rest of the project: faster permitting, less rework, and ultimately fewer project delays.
Permitting: VLMs help avoid comments and address them when they pop up
Permitting is one of the biggest bottlenecks in the preconstruction process. It’s incredibly complex and uniquely miserable: each jurisdiction has different (and often quirky) code requirements, mapping plans → requirements is a detailed and manual process, and iterations with the municipality can cause massive delays. You can picture an AI permitting agent that looks something like:
- Plans are completed and ready for review
- VLM reads the plan set and compares and flags potential comments vs. the publicly available checklists and past submissions
- VLM suggests tactical edits based on the issues for the team to make (or maybe one day even makes them itself!) pre-submission
- Plans are submitted to the city via online portal
- AI agent monitors the portal, and if there’s a stalled status or missing info requests, drafts follow-ups or potentially even calls the jurisdiction itself
- Comments come back from the city in a structured checklist
- VLM reads the checklist, creates a workplan, and drafts the response letter with links to updated sheets
Cutting permitting timelines even by 10-20% has meaningful effects on project P&L, and projects often already have meaningful budgets set aside for permit preparation and expedition. High ROI + existing budgets + no dominant workflow system (yet!) = huge opportunity.
It’s also worth calling out that “permitting automation” existed as a category pre-AI, and even without VLMs there was a lot to build that resonated with buyers (project tracking, collaboration, building up a jurisdictional-behavior data set to displace expeditors, etc.). AI has been a massive accelerant to the category, and VLMs help take it from “helpful workflow software with a human-in-the-loop” to “end-to-end AI agents that get your project started 5x faster”.
Takeoffs, estimations, and procurement: VLMs can quantify construction documents
The first step to a construction bid is takeoff and estimation: turn a construction document into quantities (takeoff) and turn those quantities into a project budget (estimation). Today, this process is someone’s full time job. Why? While we’ve moved away from paper-and-ruler takeoffs, most existing software tools automate very little of the process. Why? Because takeoffs are a nightmare for traditional computer vision classification: symbols are highly variable project-to-project and require project- and trade-specific context to fully understand.
But they aren’t a nightmare for VLMs armed with custom tooling! A sample architecture for automating the takeoff and estimation process with AI might look something like:
- Subcontractor receives plans from GC
- VLM creates a structured “takeoff plan” based on (a) the particular trade and (b) project-specific context from a user prompt
- VLM calls tools to count symbols, measure linear feet, and pull quantities from structures
- VLM consolidates outputs into final estimation or takeoff.
And it’s hard to understate the business impact of faster takeoffs. Not only is doing these often someone’s full-time job, they’re a bottleneck to bid accuracy and frequency. Faster and better takeoffs = almost instant financial ROI.
Not only are takeoffs and estimations a big problem in themselves, they also offer a wedge into the procurement value chain. Estimates directly inform what contractors purchase, and it’s not a big leap to imagine an AI agent that goes from plans → this is what you need to buy → here’s the optimal way to buy it → I’ll buy it for you.
Project management: VLMs get you the answers you need ASAP and keep systems in sync
Construction project management is an information retrieval problem: how do you synthesize a stream of unstructured inputs (marked-up plan sheets, RFIs, photos, field notes, spec PDFs, meeting minutes…) into tactical decision making? And when there’s ambiguity, how do you get the information you need without making everyone wait for two hours?
The answer used to be “If you had a question about what concrete to use, pull up SharePoint and search up ‘concrete’.” Now you can picture a world where AI takes over and gives project managers superpowers:
- AI integrates across all document stores (SharePoint, Procore, P6, Autodesk, etc.)
- Project manager has an agentic workspace that organizes all the data and makes it actionable:
- Q&A: use a VLM to “chat your construction docs” and answer tactical questions with citations
- Change management: use VLMs to diff plan sets when plan revisions come in
- RFI creation: VLM automatically maps photos to RFIs or automatically creates RFIs when the project manager needs more detail
- Submittal + RFI management: VLM summarizes submittal vs. spec and cites specific sections where there’s problematic variance
- Scheduling: VLM maps project changes to timeline changes and helps auto-update work plans and Gantt charts
- As the project develops, agent automatically syncs changes from the field with the back office:
- Propagate data from the project management system back to the ERP
- Generate invoices in response to change orders
- Reconcile estimates to actuals and create real-time budget visibility
- Propagate data from the project management system back to the ERP
VLMs get particularly powerful when you combine them with real-time project data. Scale portfolio company DroneDeploy is shipping some awesome products combining VLMs with their drone-based reality capture platform to streamline project management - their SafetyAI product uses reality capture data to detect safety risks and their Progress AI product helps PMs and supers instantly quantify work completed across sites.
This category is the trickiest since it’s directly adjacent to core systems of record - and Procore’s already made an acquisition here! - but the ROI from fewer delays and reduced rework is meaningful. Success in this category means leveraging AI to beat the cloud winners at their own game by going beyond “Chat my construction PDFs” and into long horizon, multimodal workflows that delight customers.
What are early signs of product / market fit?
It’s still early days across the board, and we’re excited to invest. As we think through new opportunities, here are some thoughts on what might show something is “working” and has hit early signs of product / market fit and repeatable GTM:
- If you’re serving GCs: you need to extract seven figures+ out of the ENR 100 (the top 100 GCs for non-AEC folks) to build a big business. To do that, the product probably needs to (a) take over meaningful labor and (b) quickly demonstrate hard dollar ROI. Getting that first large customer - even if it means customer concentration - is an important signal to pay attention to.
- If you’re serving customers beyond GCs: there aren’t enough scaled customers to make a Veeva / Guidewire / Procore-style business where you have $1M+ ACVs. “Workflow software for a single trade” will necessarily be constrained. One important early sign of success is multiple stakeholders in a project getting value from the software and taking it with them to their future projects.
- If you’re taking a full stack services approach: I think a full-stack services could be interesting in areas where headcount is the primary expense (e.g. architectural design, permit review, etc.). For a venture-scale outcome there needs to be a path to national scale, and early evidence that AI makes scale newly possible is an important early sign. EquipmentShare is a great example of this during the SaaS era!
The above isn’t meant to be prescriptive: success will come in many ways that I’m sure will surprise us. Hopefully, it’s a helpful window into how we’re evaluating opportunities and the tradeoffs between building and investing in different subsectors.
I’d love to hear from you!
2026 is the year of AI in AEC: new VLM-powered product capabilities are giving operators the same magical moments lawyers and doctors had in 2025. If you’re building here, let’s chat!
News from the Scale portfolio and firm

