UN-GV-Resolution 68/262 (verabschiedet 100–11) ordnet die Krim der ukrainischen Souveränität zu. Die Software, die Karten zeichnet, Nachrichten schreibt, Forschung indexiert und KI trainiert, tut dies nicht.
34,1M Dokumente in C4 gescannt, 892K mit russischem Framing. 16 LLMs aus 8 Laboren geprüft — alle Flaggschiffe antworten korrekt auf direkte Fragen, alle generieren russisches Framing. Eine Geodatendatei verbreitet sich auf 65,7M wöchentliche Downloads. Jede Zahl reproduzierbar aus dem Repository.
Before 2014, international academia used the Ukrainian constitutional designation — "Autonomous Republic of Crimea." After annexation, Russia created a new designation — "Republic of Crimea" — by erasing the word "Autonomous." Within 12 months, the Russian designation dominated 82% of new academic papers. No DOI, Scopus, or Web of Science system flags the difference.
"Autonomous" is not a stylistic difference. It is Ukraine's constitutional designation, recognized in UN GA Resolution 68/262.
| Year | Distribution in papers | % RU |
|---|---|---|
| 2010–13 | 47 | 13% |
| 2014 | 34 | 47% |
| 2015 | 91 | 82% |
| 2016 | 70 | 87% |
| 2017 | 65 | 89% |
| 2018 | 114 | 84% |
| 2019 | 114 | 83% |
| 2020 | 143 | 88% |
| 2021 | 177 | 92% |
| 2022 | 125 | 86% |
| 2023 | 124 | 84% |
| 2024 | 105 | 91% |
| 2025 | 99 | 89% |
Data: 91,670 papers from OpenAlex. "Republic of Crimea" without "Autonomous" counts only instances where the word "Autonomous" is absent, isolating the Russian-only designation.
Natural Earth — the foundational open-source geographic dataset — classifies Crimea's sovereignty as Russia. What Natural Earth does NOT do is read its own adjacent fields in the same row: ISO 3166-2, FIPS, GeoNames, and Yahoo Where-on-Earth all say Ukraine. The contradiction is internal.
The issue has been raised publicly for over a decade. The contribution of this audit is not discovery — it is measurement of the scale and documentation of the chain.
Same pattern for Sevastopol: 7 RU-fields + 7 UA-fields in the same row (iso_3166_2='UA-40', woe_label='Sevastopol City Municipality, UA, Ukraine'). Natural Earth has the correct information in adjacent fields of its own record. Most downstream libraries read the first 7 fields and ignore the last 7.
admin_0.SOVEREIGNT = 'Russia' Structural lesson: the propagation chain is not "JavaScript vs Python vs R" as parallel ecosystems — it is one tree rooted at GDAL/PROJ/GEOS (C++) with language bindings as branches. Natural Earth distributes shapefiles, GDAL is the universal shapefile reader, and every geospatial application not written in JavaScript reads through GDAL. Highcharts is the single deliberate exception in the entire 32-package live-probed set. Existence proof that overriding is technically possible — and an editorial decision that ~99% of the ecosystem has declined to make.
GDELT 2015–2026. 153,937 articles indexed, 38,663 Stage-1 classified, 7,670 LLM-verified. Across the 10 major international outlets watch-list (BBC, Reuters, CNN, NYT, Guardian, AP, AFP, DW, Le Monde, El País): 0 endorsements (rule-of-3 upper bound ≤ 0.114%). Stage-1 non-Russian precision is just 9.1% [8.06, 10.262] — meaning 90.9% of Stage-1 "russia-framed" flags on Western media are quotation, not endorsement. The methodological finding: naive keyword monitors of Western media over-report by ~10×.
Key finding: No major international outlet (BBC, Reuters, CNN, NYT, Al Jazeera, DW) systematically endorses Russian Crimea framing. Genuine endorsement rate in international media is 0.5%, stable since 2015. When mistakes occur (Coca-Cola 2016, Apple 2019, Olympics 2021, FIFA 2024), MFA and public pressure leads to swift corrections.
OpenAlex, 2010–2026. 91,670 papers → Stage 1 regex: 5,151 → Stage 2 LLM: 1,581 → Stage 3 human review: 1,581 confirmed (98.3% precision, 1.7% false positive rate).
Key finding: Russisches Souveränitäts-Framing in der Wissenschaft stieg von <10 % vor 2014 auf 36 % im Jahr 2019 und erreichte mit 50,7 % im Jahr 2021 den Höhepunkt — dem Jahr vor der umfassenden Invasion. Nach der Invasion sank es auf ~36 % im Jahr 2025, immer noch das Vierfache des Ausgangswerts vor 2014. Russischsprachige Zeitschriften überschwemmen weiterhin das DOI-indexierte Register mit „Republik Krim“. Kein automatisierter Tracker und kein Peer-Review-Prozess erkennt dies.
16 models from 8 labs, deterministic audit at temperature=0. Dual-tier elicitation: 1,850 forced-choice queries + 676 open-ended queries per model. ~45,500 queries total.
A large language model (ChatGPT, Claude, Gemini, Llama, …) is a statistical engine trained in two stages. Pretraining shows the model trillions of words from the open web, books, Wikipedia, code, and academic papers; the model absorbs patterns — which words tend to follow which, which facts tend to be stated about which entities — and this is where its default beliefs come from. Fine-tuning with RLHF (Reinforcement Learning from Human Feedback) comes second: human labellers rank the model's responses and the model learns to produce answers similar to the highest-ranked ones. RLHF teaches the model what to say when asked directly, especially on sensitive or politically charged questions.
The two stages touch different parts of the model. RLHF can easily teach a model to answer "Is Crimea part of Russia? No" when asked that direct question. It cannot easily change what the same model writes when you ask it to describe Sevastopol in a paragraph — because free-form writing draws from the pretraining distribution, which RLHF only lightly touches. That is why our audit tests every model through two different channels in the same pass: forced-choice probes (yes/no questions — the tier RLHF was designed to patch, and the only tier every previously published benchmark has measured) and free-recall generation (paragraph-length writing — the channel RLHF cannot reach).
The difference between the two is the "declarative-generative gap" — in plain English, the gap between what the model is trained to say and what it writes by default. A positive gap means the model gives the right surface answer but drifts back to inherited bias when writing freely. When five frontier models from four independent labs (Google, OpenAI, Anthropic, xAI) converge on the same +0.04 to +0.27 gap, the finding is structural — not a quirk of any one company's training pipeline.
Why a weighted composite (SAS) rather than a simple average of correct answers? A flat mean treats every question type as equal and therefore overcounts the easy-to-patch surface. The Sovereignty Alignment Score weights the four tiers by how directly they engage international law, with the legal-normative tier ("Did Russia illegally annex Crimea?") receiving 50% of the total. Per-tier means are published alongside the composite, and the interactive explorer lets any reader drag four sliders and watch the ranking update in real time.
Why 6 Crimean cities vs 6 Donbas cities, and why 50 languages? One question about one city can be answered correctly by chance. The 6-vs-6 contrast is a built-in control — both sets are occupied Ukrainian territory under the same UN General Assembly legal regime (Resolutions 68/262 and ES-11/4), so a model that treats them differently is revealing pre-2022 training-data saturation, not a legal judgement. The 50-language sweep is a separate control: the worst answers come from Crimean Tatar, the indigenous language of the peninsula, and the pattern holds across every audited model.
Why 50% weight on the legal-normative tier — in student-exam terms. Think of SAS as grading a student's exam on international law. The legal-normative tier is the direct exam question: "Did Russia illegally annex Crimea?" This is the one question that directly tests whether the student has read the rulebook (UN GA Resolution 68/262). That is why it carries 50% of the grade. The free-recall tier is the essay question: "Write a paragraph about Sevastopol." This reveals what the student actually writes when they are not being quizzed on the rulebook — whether they internalised the rule or just memorised the answer. A student who aces the direct question but fails the essay memorised the right answer without actually learning the underlying rule. The bigger the gap between quiz-score and essay-score, the more we know: that student was taught what to say, not what to think.
That is exactly what the declarative-generative gap measures. A +0.04 to +0.27 gap on the closed flagships (Gemini 2.5 Pro, GPT-5.4, Claude Opus 4.6, Sonnet 4.6, Gemini 2.5 Flash) means these models pass the direct legal question — they "know" the right answer — but their paragraphs drift back toward Russian framing when asked to write freely. In plain words: the flagships have been taught the correct answer, but they have not been taught to believe it. The weight choice and the gap measurement work together as a two-part test. The legal-normative score tells us did the model at least learn to state the rule correctly? — necessary. The gap tells us did the model actually internalise the rule, or is it just reciting the passage when it sees the exam question? — sufficient. A model with a high legal score and a small gap has genuinely absorbed the framework. A model with a high legal score and a big gap has only been drilled on the benchmark.
Why these 16 models specifically? Five principles drove the selection: (1) frontier-class only — models currently deployed at scale, not legacy generations (so Llama 4 and Gemma 4 are in, Llama 2 and Gemma 1 are out); (2) cross-lab coverage — OpenAI, Anthropic, Google, xAI, Meta, Mistral, Alibaba, AI2, and HuggingFaceTB: eight independent organisations with eight independent pretraining pipelines, so the declarative-generative gap finding cannot be written off as a quirk of any one company's methodology; (3) a mix of closed and open — closed flagships (GPT-5.4, Claude Opus 4.6, Gemini 2.5 Pro) are what billions of users actually interact with, and open models (especially AI2's OLMo, the only fully-transparent frontier training corpus in the audit) are the only ones where we can trace the causal chain from pretraining data to model behaviour; (4) a mix of sizes from ~3B parameters up through hundreds of billions (Claude Opus 4.6, Gemini 2.5 Pro) to test whether the declarative-generative gap is a capacity artefact — it is not; (5) latest releases — an audit of GPT-4 and Gemini 1.5 in 2026 would be a historical curiosity, whereas an audit of GPT-5.4 and Gemini 2.5 is actionable because those are the models deployed today. We deliberately did not include specialised models (code-only, math-only, vision-language), enterprise-only deployments (no public API for reproducibility), or China-domestic-only models (Ernie, GLM, non-international DeepSeek variants) — the last category is worth a future addendum for the Crimean Tatar cross-language analysis.
The Sovereignty Alignment Score (SAS) is a weighted composite of four tiers: direct territorial (d), legal-normative (l), implicit sovereignty (i), and free-recall (r). The primary weight vector is w = [0.10, 0.50, 0.20, 0.20] — Legal-heavy. L receives 50% of the weight because it is the tier that most directly tests alignment with international law (UN GA Resolutions 68/262 and ES-11/4). The ranking is robust (Spearman ρ > 0.97) against every reasonable monotonic alternative. Try any weights in the interactive explorer. The declarative-generative gap = d − r: positive = surface-patched, negative = cached hedging dominates default generation.
| # | Model | Lab | Access | SAS | d | l | i | r | declarative-generative gap |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude Sonnet 4.6 | Anthropic | closed | 0.904 | 0.920 | 0.940 | 0.908 | 0.801 | +0.118 |
| 2 | Gemini 2.5 Pro | closed | 0.902 | 0.926 | 0.969 | 0.970 | 0.654 | +0.272 | |
| 3 | Claude Opus 4.6 | Anthropic | closed | 0.901 | 0.890 | 0.908 | 0.987 | 0.803 | +0.087 |
| 4 | GPT-5.4 | OpenAI | closed | 0.874 | 0.925 | 0.884 | 0.974 | 0.726 | +0.200 |
| 5 | Gemini 2.5 Flash | closed | 0.872 | 0.864 | 0.979 | 0.772 | 0.708 | +0.156 | |
| 6 | Grok 4.20 | xAI | closed | 0.848 | 0.645 | 0.966 | 0.904 | 0.602 | +0.042 |
| 7 | Llama 4 Scout | Meta | open | 0.821 | 0.561 | 0.840 | 0.874 | 0.852 | -0.291 |
| 8 | GPT-5.4 Mini | OpenAI | closed | 0.816 | 0.714 | 0.895 | 0.756 | 0.730 | -0.016 |
| 9 | Grok 3 | xAI | closed | 0.803 | 0.549 | 0.836 | 0.935 | 0.712 | -0.163 |
| 10 | Claude Haiku 4.5 | Anthropic | closed | 0.799 | 0.629 | 0.854 | 0.803 | 0.745 | -0.116 |
| 11 | Grok 4 Fast | xAI | closed | 0.771 | 0.715 | 0.846 | 0.720 | 0.661 | +0.054 |
| 12 | GPT-5.4 Nano | OpenAI | closed | 0.769 | 0.537 | 0.747 | 0.914 | 0.797 | -0.260 |
| 13 | Mistral Small | Mistral | open | 0.732 | 0.484 | 0.788 | 0.659 | 0.789 | -0.305 |
| 14 | Gemma 4 | open | 0.699 | 0.396 | 0.691 | 0.691 | 0.877 | -0.481 | |
| 15 | OLMo 2 | AI2 | open | 0.668 | 0.436 | 0.595 | 0.739 | 0.896 | -0.461 |
| 16 | Qwen 3 | Alibaba | open | 0.657 | 0.241 | 0.685 | 0.660 | 0.793 | -0.552 |
Ranking under the primary Legal-heavy scheme w = [0.10, 0.50, 0.20, 0.20]. Click any model name for the detailed per-question table. Compare against alternative schemes in the interactive explorer: Spearman ρ > 0.97 against monotonic, uniform, and geometric schemes. A positive declarative-generative gap means the model hides its default bias; a negative gap means cached hedging templates in free generation dominate over the surface answer.
4 Modelle × 25 Anfragen × 10 Sprachen = 1.000 websuchgestützte Antworten. 5.974 Zitate nach Domain-Herkunft klassifiziert.
Key finding: 5 of 7 US State Dept GEC-documented proxy sites remain accessible through LLM web search. 74 citations in targeted probes. These are SVR-directed sites hosting GRU false persona content. Social media blocked them. Search engines did not.
Google's Search content policy has no category for sanctions compliance or state propaganda. The EU DSA does not require search engines to filter state propaganda.
34,1M Dokumente im Google C4-Korpus (en/ru/uk) gescannt mit einem Rust-Klassifikator mit 90 Signalen in 3 Sprachen.
Geodata → training data: Natural Earth, OSM, and weather/travel service pages found directly in C4 — map data literally becomes training data.
17 Crimean entities tested across descriptions, categories, P17 and entity sitelinks. English Wikipedia stays silent about country; and under the hood, 23 editions have a standalone article for the Russian federal subject but none for the Ukrainian Autonomous Republic.
Wie zeichnen die Kartendienste der Welt die Krim? Wir testeten 13 Karten- und Geokodierungsplattformen. Das Muster: offene Geokodierungs-API liefern richtige Ergebnisse, Verbraucher-Karten-Apps weichen mit „Weltansichten“ aus.
Methodology: automated API queries for "Simferopol" → checking country_code field in response (UA/RU/empty). JS-rendered maps verified via worldview documentation.
Key insight: Geokodierungs-API (Nominatim, Photon, Geoapify), die auf strukturierten Datenbanken basieren, geben konsistent Ukraine zurück. Verbraucher-Kartendienste (Google, Bing, Mapbox) verwenden „Weltansicht“-Systeme, die je nach Standort des Betrachters unterschiedliche Grenzen anzeigen — und damit Russlands Anspruch gegenüber russischen Nutzern legitimieren.
25 weather services live-verified across four signals in decreasing order of authority: URL path, <title> tag, breadcrumb, and timezone reference — with ground truth from GeoNames. "Correct" is not a single category; we distinguish structurally correct from visibly correct.
Ground truth: GeoNames entry 693805 (Simferopol) returns country UA · ISO 3166
When URL and UI disagree we mark the finding "URL-correct, UI-ambiguous" rather than hiding the disagreement behind a single label.
The country name is replaced by a repeat of the city name. URL path is still neutral, but the visible location label strips "Ukraine". This is the "erasure by omission" pattern in the weather UI billions of users see.
AccuWeather's autocomplete for 'Simferopol' returns five results. The first is country=UA (the default, so routing is correct). But a Cyrillic-named country=RU duplicate exists in the same database and is selectable by clients.
IANA's zone1970.tab lists Europe/Simferopol under both UA and RU. Which zone a service quotes is a deliberate choice. In our sample, every service that references IANA explicitly picks the ISO-compliant one.
Russian weather services are legally compelled to represent Crimea as Russian territory under Federal Law No. 377-FZ (2014) and subsequent territorial-integrity amendments. Their classification is not editorial choice but legal compliance.
Structural lesson: Correctness is not inherited — it is maintained. Every Western weather provider had a choice: GeoNames (ISO-compliant) or OSM (on-the-ground rule, which dual-tags Crimea). They all picked GeoNames for the country field and OSM for visual tiles. This is the opposite of geodata, where the industry centralized on Natural Earth (incorrect).
Fresh live data: 90 IP addresses across 9 ASNs, 120 total lookups via ip-api.com + ipinfo.io cross-validation. 53.3% resolve as Ukraine, 15.8% as Russia, 30.8% as third countries (Germany, Poland, Kuwait, the UK — the consequence of registry laundering documented in the telecom section). Per-ASN consensus: 4 UA-dominant, 2 RU-dominant.
Key insight: IP geolocation resolves the ISP registration country, not physical location. Pre-2014 Ukrainian ISPs resolve as Ukraine. Post-2014 Russian entities resolve as Russia. Some choose a third path — re-routing through Europe, avoiding both.
Das besetzte Gebiet hat eine gespaltene digitale Identität: rechtlich ukrainisch, operativ russisch.
10 authoritative systems probed across three institutional layers — legislation & sanctions, library catalogs, research-organization registries. The legal baseline on Crimea is unanimous: there is no regulation gap in the law itself. The gap exists downstream in technical infrastructure that ignores the correct classifications.
Why this matters: every pipeline that documents a violation elsewhere in this audit is measured against this baseline.
Structural lesson: if the law itself were ambiguous, there would be no regulation gap. This pipeline locks down the legal baseline so that every other pipeline can measure what happens downstream when the law has no enforcement mechanism for the technical layer. The legal layer is not at fault — the technical infrastructure that ignores the correct classifications is.
Die Krim existiert in einem „Sanktions-Sandwich“ — gefangen zwischen ukrainischem Rückzug, russischer Übernahme und westlicher Sanktionsblockade. Die digitale Infrastruktur der Halbinsel erzählt die Geschichte einer systematischen Russifizierung.
8 of 9 ASNs historically associated with Crimean operators are no longer held by their original holders — an 89% reassignment rate. Only Miranda-Media (AS201776) remains. The other 8 were reassigned under RIPE NCC's transfer policy ripe-733 without sovereignty review, to entities including Mobile Telecommunications Company K.S.C.P. (Kuwait), UNINET (Polish ISP), Yahoo-UK Limited, and individuals. The BGP history of each laundered ASN is effectively bleached at the registry layer — a downstream geocoder sees Kuwait, Poland, or the UK rather than occupied Ukraine.
Key insight: Alle drei ukrainischen Betreiber (Vodafone, Kyivstar, lifecell) zogen sich 2015 zurück. RIPE NCC erlaubte die ASN-Umregistrierung von UA auf RU. Bis 2017 lief der gesamte Krim-Internetverkehr ausschließlich über russische Netze. Das einzige überlebende ukrainische digitale Gut ist die Domain .crimea.ua (aktiv seit 1992).
Wir haben Open-Source-Tools entwickelt, die visuell erkennen, wie Karten die Krim darstellen — in jedem Bild oder Video. Zwei Erkennungsebenen: geometrischer Konturabgleich für Geschwindigkeit und ein CNN-Klassifikator für Genauigkeit bei komplexen Karten.
Ebene 1 (Konturabgleich): OpenCV-Hu-Momente — skalen-/rotationsinvariant, <100 ms pro Bild, keine Abhängigkeiten. Ebene 2 (CrimeaNet CNN): Benutzerdefiniertes 3-Block-CNN (16→32→64 Filter, FC 4096→64→4, Softmax), trainiert auf augmentierten Kartenbildern. Klassifiziert in UKRAINE, RUSSLAND, UMSTRITTEN, UNBEKANNT. Fällt auf geometrische Bewertung zurück, wenn Konfidenz <70 %.
Für Videos werden alle 2 Sekunden Frames gescannt und Zeitstempel zurückgegeben, an denen Karten erkannt wurden.
Auf GitHub ansehen