Sustainable Development & Local AI
No greenwashing but numbers to show you how good we are!
The problem
AI consumes. A lot. That's not a problem, except when the bill arrives.
We're not going to give you the "save the planet with our green AI" pitch when we're just running the same models as everyone else. But we can explain why AEGIS IA local infrastructure consumes significantly less than a cloud subscription over the long term.
Ready for some physics and math?
What cloud LLMs really consume
A recent academic study (Jegham et al., 2025) measured the real consumption of major commercial LLMs and the numbers are scary:
📊 Consumption per query (cloud models)
- GPT-4o: 0.43 Wh per short query
- o3 (OpenAI): 39.2 Wh per long prompt (70x more than a nano model)
- DeepSeek-R1: 33.6 Wh per long prompt
Source: Jegham et al. (2025), "How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference", arXiv:2505.09598
To put this in context: a single long query to o3 consumes as much electricity as running a 65-inch LED TV for 20-30 minutes.
Now multiply that by 700 million queries per day (conservative estimate for GPT-4o in 2025):
- Annual electricity consumption: 391,509 to 463,269 MWh — equivalent to 35,000 US homes or nearly 77,000 European homes (comparing Enedis and US EIA data)
- Evaporated water (datacenter cooling): 1.33 to 1.58 million kiloliters — enough to fill 500 Olympic pools
- Carbon emissions: 138,125 to 163,441 tons of CO₂ — equivalent to 30,000 gasoline cars
And we're talking about one model. Add Gemini, Mistral, all the others...
Jevons Paradox applied to AI
Know Jevons Paradox? The more efficient you make something, the more people use it (think rush hour traffic), so it ends up consuming more despite the optimization.
That's exactly what's happening with cloud LLMs:
- GPT-4o is more efficient per query than GPT-3
- So people use it 10x more
- Result: overall consumption explodes
A Google search consumes 0.30 Wh. A GPT-4o query consumes 0.43 Wh. That's 40% more. Not huge? Now multiply by billions of daily queries.
The hidden datacenter infrastructure
What they don't tell you: the numbers above are significantly UNDERESTIMATED.
Why? Because datacenters don't just consume power to run GPUs. They also consume for:
- Cooling: 40-54% of total datacenter consumption
- Networks and infrastructure: routers, switches, active cabling
- Redundancy: backup power, backup systems
- Operational overhead: lighting, security, monitoring
- And to a lesser extent employee-related expenses: food service, personal computers, transportation
The average PUE (Power Usage Effectiveness) of a datacenter is 1.5 to 2.0. That means for every 1 Watt consumed by a GPU, you need an additional 0.5 to 1 Watt just to keep it running.
Source: Patterson et al. (2021), "Carbon Emissions and Large Neural Network Training"
So how much does it really cost? (the crux of the matter)
Let's not kid ourselves: the bill is what motivates
💰 Cost comparison Cloud API vs Local (over 3 years)
Scenario: 50 employees, moderate usage (500 req/day/person)
| Item | Cloud API | Local infrastructure |
|---|---|---|
| Year 1 | ~€60,000 (tokens) | ~€80,000 (hardware + setup) |
| Year 2 | ~€60,000 | ~€15,000 (electricity + maintenance) |
| Year 3 | ~€60,000 | ~€15,000 |
| TOTAL 3 YEARS | €180,000 | €110,000 |
Break-even: 18-24 months
Source: Calculated from Lenovo Press (2025), "On-Premise vs Cloud: Generative AI Total Cost of Ownership"
And be careful, these numbers assume your API prices stay stable. We suspect: they don't.
OpenThing can decide tomorrow to double their prices. You can't do anything. You're held hostage.
With local infrastructure, your marginal cost per query decreases over time. The more you use it, the less it costs per query. And that's exactly the opposite of cloud.
Carbon footprint, let's talk seriously
An average US datacenter uses an electricity mix with ~60% fossil fuels (coal, gas). A European datacenter is more like 30-40% depending on the country.
Your local server in Lorraine? You plug into the French electricity mix: ~70% nuclear + 20% renewables. CO₂ emissions: 50-60g/kWh.
An AWS datacenter in Virginia (us-east-1 region, the most common)? Electricity mix: 40% gas, 35% coal. Emissions: 350-400g/kWh.
Same calculation, 6-7x less CO₂ in France than Virginia.
We're not greenwashing here. We're just noting that plugging a server into EDF's grid emits less than an AWS datacenter in Virginia.
Hidden costs of cloud
What we often forget to count in cloud TCO:
- Price increases: OpenAI, Anthropic, Google can change their prices whenever they want. You're dependent.
- Output tokens 2-4x more expensive than input tokens. If your LLM generates a lot of text, surprise.
- Rate limiting penalties: exceed your quota, pay 2-3x more per token.
- Egress fees: getting data out of the cloud costs a fortune (AWS loves this).
- Enterprise support: +15-30% on the bill if you want a decent SLA.
Result: your monthly bill of €5000 can quickly become €8000 without you changing anything.
Source: MPT Solutions (2025), "The Hidden Infrastructure Cost of Running Local LLMs vs Cloud APIs"
Local infrastructure isn't free either
Let's be honest. Local infrastructure costs too. Here's what you really need to count:
- Hardware: €40,000 to €80,000 for a decent config (2-4 professional GPUs like L40S or A100)
- Electricity: ~€5000-10,000/year depending on usage
- Cooling: If your datacenter is poorly ventilated, plan for AC. €2000-5000/year.
- Maintenance: Hardware refresh every 3-5 years
- Personnel: Either you already have IT engineers, or you need to train/recruit
But once amortized (18-24 months), your marginal cost per query is negligible compared to cloud.
And most importantly: you're in control. No surprises, no dependency, no unexpected (and often brutal) increases.
Because there will be some. The ROI of large LLMs like ChatGPT is $3.5 return on $5 invested!
The hybrid model (the real smart idea)
We're not going to lie: 100% local isn't always optimal. We have our challenges too.
Hybrid, real agility:
- Simple and recurring tasks (FAQ, classification, extraction) → Local 7B-13B model. Marginal cost almost zero.
- Complex and occasional tasks (multi-step reasoning, creativity) → Cloud API if needed. You only pay for the exceptional.
- Sensitive data → Local, always. Non-negotiable.
- Load spikes → Burst to cloud if your local infrastructure saturates. But it's the exception, not the rule.
Result: you combine the best of both worlds. Controlled costs, optimal performance, preserved sovereignty.
AEGIS IA doesn't pretend to be the little green man
We don't claim to save the planet.
What we do:
- We deploy local infrastructure sized for your needs (no unnecessary over-equipment)
- We optimize models to reduce consumption per query (quantization, distillation if relevant)
- We help you calculate your real TCO (cloud vs local) with honest numbers
- We use the local electricity mix (in France = low carbon)
- We avoid waste: no GPU running idle 90% of the time
That's it. No carbon offset certificates. Just efficient infrastructure that consumes what it should consume, no more.
Academic sources and references
Our sources:
- Jegham et al. (2025): "How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference", arXiv:2505.09598 — Comparative study of 30 commercial LLMs
- Patterson et al. (2021): "Carbon Emissions and Large Neural Network Training", arXiv:2104.10350 — Analysis of LLM training carbon footprint
- Lenovo Press (2025): "On-Premise vs Cloud: Generative AI Total Cost of Ownership" — Detailed TCO analysis with break-even points
- Scientific Reports (2024): "Reconciling the contrasting narratives on the environmental impact of large language models", Nature — LLM vs human work comparison
- Strubell et al. (2019): "Energy and Policy Considerations for Deep Learning in NLP", ACL — First major study on LLM energy consumption
- Venditti B. (2025): Ranked: Electricity Use Per Capita in Major Global Economies
Conclusion: do the math yourself!
Local AI isn't always the solution. But for 80% of companies with stable and predictable usage, it's economically and ecologically more viable than cloud over 2-3 years.
If you're spending more than €2000/month on cloud APIs, it's high time to look into this.
We can help. No smoke and mirrors, with real numbers and transparent TCO.
