Products

Solutions

Resources

Get a Demo

Nearmap AI vs Gemini: What the benchmark data shows

May 2026

Dr. Michael Bewley, Vice President, AI

May 2026

Dr. Michael Bewley, Vice President, AI

As foundation AI models have matured, a fair question has emerged across the property intelligence industry: if Gemini, GPT, and Claude can reason about almost anything, do purpose-built AI models still compare?

Nearmap ran the test.

The benchmark: pool detection as a real-world proxy

Pool detection is a real-world property intelligence task. One used across industries to make decisions about risk, compliance, and asset value.

Detecting a specific object class at scale is challenging given the diversity of aerial conditions: varying light, shadow, occlusion, and differing surface materials. We tested three models on the same aerial imagery dataset, evaluating performance using F1 score: a standard metric that balances precision (avoiding false positives) and recall (avoiding false negatives).

Both errors carry real-world cost. A false negative is a missed feature: a gap in the data that a decision gets made on. A false positive is a misleading one one: wasted time, inaccurate outputs, or eroded trust in the model.

The results

Gemini was tested using high-resolution Nearmap imagery, maximum thinking mode enabled, and a detailed prompt-level definition of a swimming pool. No time pressure and the the gap still held.

F1 score on pool presence

Methodology note: Results are based on Nearmap internal testing conducted March 2026, using an aerial imagery dataset of 2,500 residential properties from the USA, hand-labelled and reviewed by human experts, containing 158 swimming pools. All models were tested against the same dataset under the conditions described. This testing was conducted for internal evaluation purposes. Performance may vary based on imagery conditions, dataset composition, and model version.

Where generalist AI struggles

False negatives

We found that pools in deep shadow or partially concealed by overhanging vegetation or roof structures led to false negatives, Gemini still missed them. Nearmap AI Gen 6 caught them, as it’s a model trained on purpose-captured aerial imagery across millions of properties. This helps to calibrate even those cases on the edge.

False positives

Lawns, certain pool covers, light-coloured paving. Gemini flagged several as pools. Nearmap AI Gen 6 did not. The visual signatures that distinguish these surfaces in aerial imagery are precise and learned from millions of labelled examples captured from a consistent vantage point. They aren’t inferred by general language training that other models operate on.

The root cause is the same in both directions. Foundation models reason flexibly across an enormous range of input. It’s that flexibility that is a standout feature. But it also means they haven’t been calibrated for the specific visual vocabulary of aerial imagery at a 5–7.5cm resolution.

The throughput reality

Accuracy alone doesn’t determine whether AI is operationally viable, speed contributes too. In our testing, Gemini processed each property in over seven seconds, and at scale it’s rate-limited to a handful of images per second.

Nearmap AI Gen 6 operates at sub-second inference, processing more than 1,000 properties per second.

What does that mean for your industry?

AI for Insurance

For property insurers, pool detection accuracy is a risk pricing issue. A missed pool is a gap in your risk view, or a policy priced without full property information. A falsely flagged one triggers a wasted inspection workflow or an inaccurate quote sent to a customer. Nearmap AI Gen 6’s 98.5% F1 accuracy, processing more than 1,000 properties per second, means more accurate renewals at portfolio scale, with less manual review overhead.

Specialist vs. generalist: a question of fit

Foundation models are powerful, and we see them improving daily. For tasks requiring flexible reasoning across unstructured inputs, like document extraction, open-ended analysis, and multimodal interpretation of varied content, general-purpose AI is often the right tool.

For defined property intelligence tasks, the ceiling for specialist models is set by three things:

01The clarity of the task definition

02The quality of the training data

03The stability of the model over time

Nearmap owns and controls all three. Nearmap AI is trained on proprietary imagery captured specifically for property intelligence, with a consistent resolution, consistent geometry, and multiple captures per year. It updates on a controlled release cycle tested against property intelligence benchmarks before anything ships.

A version change for a foundation model optimized for general capability can shift performance on a specific task significantly, in either direction, with no changelog entry covering your use case.

Teams building workflows on general-purpose AI inherit that unpredictability. The performance of a specialist model is consistent because the scope is defined.

What to ask when evaluating AI for property intelligence

If you’re evaluating AI for property intelligence, the numbers above are the conversation worth having. Here’s a checklist to help you procure the solution that works in line with your strategy:

Ask for F1 scores (precision and recall) against a validated ground truth — which ranked highest?

Ask for demonstrated throughput at portfolio scale

Ask what the training data looked like, and whether it was captured for this purpose or assembled from general sources

Ask if it can identify the class of assets that matter most to your work

The answers will tell you whether you’re looking at purpose-built property intelligence, or a capable general model working outside its area of calibration. For decisions that affect accuracy, risk exposure, and operational response, that difference is worth understanding before you build it into your workflow.

Disclaimer: This article is for informational purposes. Results reflect Nearmap internal testing and may vary by use case, dataset, and imagery conditions. Readers are encouraged to conduct their own evaluation before making procurement decisions.

Complete property intelligence, powered by Nearmap AI

Start today. No waitlist.

Explore Property Intelligence

Nearmap AI vs Gemini: What the benchmark data shows

May 2026

May 2026

The benchmark: pool detection as a real-world proxy

The results

F1 score on pool presence

Where generalist AI struggles

False negatives

False positives

The throughput reality

Nearmap AI Gen 6 operates at sub-second inference, processing more than 1,000 properties per second.

What does that mean for your industry?

AI for Insurance

Specialist vs. generalist: a question of fit

What to ask when evaluating AI for property intelligence

Complete property intelligence, powered by Nearmap AI

Applications

Data & Insights

Solutions

Company

Support

Connect