State of the field 47 tools rated. All cluster between 3.2 and 3.6 on a scale of 10. The gap to the right is the point.
★ North Star   10
All 47 tools rated
3.2 Regen · 3.3 Sustain. · 3.4 avg · 3.6 Ethics
no commercial AI tool has rated above 4 →
012345678910

How to read the ratings below

0–2: High extraction risk
2–4: Mixed to partial alignment
4–6: Meaningful progress
6–8: Strong alignment
8–10: Regenerative by design
Category
Risk

🔬 Peer Reviews

Two AI systems reviewed this directory independently. Here is what they said and how we responded.

This directory was submitted to Gemini (Google DeepMind) and Perplexity AI for independent peer review in early 2026. Each was asked to verify factual claims, challenge scores, and flag blind spots. No AI was given the other's review. Below is a summary of key findings and our editorial response to each.
Gemini (Google DeepMind)
Reviewed February 2026  |  General LLM peer
Finding
Our response
ChatGPT score too generous given systemic scale and "blast radius" of its ubiquity. Kenyan RLHF labeler exploitation (Time, 2023) is a documented supply chain harm not reflected in Ethics score.
AcceptedChatGPT Ethics moved from 3.0 to 2.5. Risk level raised from Medium to High. Kenyan labeler case added to pitfalls. A 6th Ethics criterion (Labor and Supply Chain Equity) added to the rating methodology.
Claude Ethics at 4.5 may be too high given closed-weights architecture, which limits community auditability in JEDIB and Indigenous contexts.
AcceptedClaude Ethics moved from 4.5 to 4.0. Closed-weights concern added to pitfalls, specifically calling out implications for communities where control and auditability matter.
Water use not tracked. Data center cooling water is a material omission for a directory grounded in FSSD SP3 (ecological integrity).
AcceptedWater use indicator added to all 33 tool cards and detail views. Rated on the same scale as energy use (Very Low to Very High). Now shown as a blue dot alongside the energy dot.
Suggested a "Least Extractive" label to highlight best-in-category tools by extraction footprint relative to value delivered.
AcceptedGreen heart badge applied to Claude, Mistral, Adobe Firefly, Cursor, and Notion AI: the tools in each category with the most favorable ethics-to-footprint profile.
Jevons Paradox concern: making AI more efficient may accelerate total consumption rather than reduce it. Suggested noting this in methodology.
Noted, not yet appliedA valid systemic concern. The directory rates individual tools, not macro consumption patterns. This will be addressed in a future "How to use AI responsibly" explainer section.
🔍
Perplexity AI
Reviewed May 2026  |  Search-grounded fact-check
Finding
Our response
Stable Diffusion profile described Stability AI as having experienced a "governance collapse." Actual 2024 events: CEO Mostaque resigned March 2024, layoffs followed in April, company continues to operate under interim leadership.
Accepted with modificationLanguage updated to reflect the specific 2024 events (resignation, restructuring, layoffs) rather than "collapse." Critical risk rating unchanged: the training data harms are independent of leadership structure.
Adobe Firefly contributor compensation is real and confirmed by 2024-2025 Adobe documentation, but bonuses are discretionary and formula-based. Some contributors report payouts that feel small relative to Firefly's commercial revenue.
Accepted with modificationContributor compensation confirmed as real. Added a pitfall noting the discretionary, formula-based nature of bonuses and creator community dissatisfaction. Ethics score of 4.5 maintained: the compensation model exists and is meaningfully better than no compensation.
WattTime coverage gaps pitfall is outdated. As of October 2024, coverage expanded to 210 countries and territories. Over 1 billion smart devices now use WattTime data.
AcceptedCoverage pitfall updated to reflect October 2024 global expansion. Function description and FSSD note updated with the 1 billion device milestone. Scores unchanged at 5.0 across the board: this finding strengthens the rating.
Papa Reo is now described by WEF (2025) as a multilingual LLM, expanded beyond initial te reo Maori focus. Masakhane also confirmed active in 2025 with East Africa multilingual projects.
AcceptedPapa Reo function updated to reflect multilingual LLM expansion and WEF 2025 recognition. Pitfall updated to note expanding language coverage. Scores unchanged: the community governance model is the core differentiator.
No major ChatGPT data misuse scandals documented in 2024-2025. The primary concerns remain opacity about retention and GDPR alignment, not confirmed breaches.
Noted, no changeConsistent with our current pitfall framing ("data may train future models unless you opt out"). The score reduction for ChatGPT was driven by Kenyan labeler exploitation (confirmed by Gemini's review), not data misuse claims.
Grok documented harms confirmed: antisemitic rhetoric, Holocaust denial errors, "MechaHitler" incident, deliberate "politically incorrect" system prompting by xAI in 2025.
Confirmed, no change neededOur High Risk rating and Ethics score of 1.5 are validated by this evidence. No score change required.

📋 How we rate AI tools

Ratings are based on publicly available information, published policies, academic research, and applied sustainability frameworks.

Ethics & Values Score (0–5)

Evaluates whether the tool is built and governed responsibly. Draws on AI ethics literature, UNESCO AI Ethics principles, and published governance disclosures.

1
Training data consent Was the data used to train the model collected ethically and with consent? Were creators compensated or credited?
2
Governance and accountability Is there clear published accountability? Are ethics commitments binding or marketing? Who holds the company to account?
3
Safety guardrails Does the tool prevent harmful outputs by design? Are guardrails meaningful or cosmetic?
4
Privacy practices How is user data collected, stored, and used? Is training on user data opt-in or opt-out? Is there enterprise isolation?
5
Equity and access Is the tool accessible beyond wealthy, Western, and English-speaking organizations? Does it serve the people who need it most?
6
Labor and supply chain equity Who does the invisible work? RLHF and content moderation rely heavily on low-paid workers in the Global South. Are these workers fairly compensated and protected from harm?

Sustainability Score (0-5): FSSD

Grounded in the Framework for Strategic Sustainable Development (Blekinge Institute of Technology). Asks whether the tool supports a society that can meet everyone's needs within planetary boundaries , now and in the future.

1
Ecological integrity Does it reduce dependence on fossil fuels, synthetic chemicals, and physical degradation? What is its energy footprint? Water use for data center cooling is rated separately and shown on each card and in the detail view.
2
Social equity Does it support fair access to resources and opportunity? Does it concentrate power or distribute it?
3
Purpose alignment Is the core mission oriented toward sustainable outcomes, or is sustainability a marketing layer on a growth-first model?
4
Economic accessibility Can organizations with limited resources use it? Is the pricing model equitable?
5
Long-term thinking Does the organization publish and act on long-term sustainability commitments? Are they science-based?

Regenerative AI Score (0–5)

Based on Regenerative AI Cultures principles. Goes beyond "do no harm" to ask: does this tool actively restore, give back, and support flourishing for people and planet?

1
Gives back more than it takes Does the tool return value to the communities, creators, and ecosystems it draws from?
2
Cultural and biological diversity Does it preserve diversity of language, knowledge, and perspective, or homogenize it toward dominant cultures?
3
Community participation Were marginalized voices centered in design and governance, not just consulted after the fact?
4
Transparency and attribution Does it practice honest sourcing, credit creators, and publish its methodology openly?
5
Life-affirming outcomes Does it support human dignity, cultural continuity, and planetary health as core design goals?

Regenerative AI Score (0–5)

Based on Regenerative AI Cultures principles. Goes beyond "do no harm." Asks whether a tool actively gives back more than it takes to people and planet , centering marginalized voices, preserving cultural and biological diversity, supporting human dignity, and operating as a participant in community rather than a pure utility.

1
Net benefit Does the tool create more value for people and planet than it extracts?
2
Community participation Are affected communities involved in the tool's design, governance, and benefit-sharing?
3
Cultural and biological diversity Does the tool support diverse languages, knowledge systems, and ways of knowing , or does it flatten them?
4
Human dignity Does the tool protect rather than exploit the people who build, train, moderate, and use it?
5
Long-term thinking Is the tool designed for sustained benefit or short-term extraction? Does its business model align with its stated values?
4.5–5.0 Genuinely regenerative design , gives back more than it takes 3.5–4.4 Regenerative signals, proceed thoughtfully 2.5–3.4 Neutral or extractive by default 0–2.4 Actively counter-regenerative

Framework: Regenerative AI Cultures (SRAGI). See also: Doughnut Economics Action Lab, Kate Raworth.

Energy & Water Use

Energy and water use ratings reflect the estimated footprint of using each tool at a typical organizational scale. Both are shown as color-coded dots on every card. Water use refers to data center cooling water consumption, which is a material but often invisible resource cost of AI inference.

E
Energy use Rated from Very Low to Very High based on model size, infrastructure scale, and publicly available data center efficiency disclosures. Shown as a colored dot: deep green (Very Low) through dark red (Very High).
W
Water use Rated from Very Low to Very High based on data center water usage effectiveness (WUE) data, infrastructure location, and cooling method. Shown as a blue dot: deep blue (Very Low) through dark red (Very High). A "Varies" rating applies to locally deployable models where footprint depends on the user's hardware and energy source.

Source: Published data center WUE figures, Google Environmental Reports, Microsoft Sustainability Reports, AWS infrastructure disclosures, and academic literature on AI inference water consumption (Luccioni et al., 2023; Li et al., 2023).

Data Sovereignty

Data sovereignty asks who controls data about a community, who profits from it, and who decides how it is used. In AI, this is most acute for Indigenous communities whose language, cultural knowledge, and identity data have historically been taken without consent and used to train commercial systems that return no benefit to the source community.

C
Collective Benefit Data ecosystems should work for the benefit of the communities that generated the data, not extract from them.
A
Authority to Control Indigenous peoples and communities should govern data about their peoples, territories, and cultures.
R
Responsibility Those who work with Indigenous data have a responsibility to nurture the communities from which data originates.
E
Ethics Indigenous peoples' rights and wellbeing should be the primary concern at all stages of the data lifecycle. Based on the CARE Principles for Indigenous Data Governance (Global Indigenous Data Alliance, 2019).

Framework: CARE Principles for Indigenous Data Governance. Reference implementation: Te Hiku Media Kaitiakitanga License (Aotearoa New Zealand).

Risk Level

Low Suitable for most professional and organizational use with standard due diligence. Medium Use with clear internal protocols, human oversight, and defined acceptable use policies. High Requires a strong governance framework and active risk management before deployment. Critical Not recommended for organizational use without expert oversight and strict controls. Documented active harms.

Sources and evidence

All ratings draw on: published terms of service and privacy policies, founder statements and investor disclosures, peer-reviewed academic research, investigative journalism, organizational sustainability reports, and direct product testing. Sources are cross-referenced where possible.

Ratings reflect information available as of May 2026. AI companies change policies frequently , always verify current documentation before organizational deployment.

Limitations

Important: These ratings are independent assessments and not legal or financial advice. Some company practices are difficult to verify independently. Scores reflect a synthesis of available evidence and involve judgment. We welcome corrections and updates , use the feedback button to flag anything that needs reviewing.