Резолюция ГА ООН 68/262 (принята 100–11) закрепляет Крым под украинским суверенитетом. Программное обеспечение, которое рисует карты, пишет новости, индексирует исследования и обучает ИИ — нет.
34,1M документов C4 просканировано, 892K с российским фреймингом. 16 LLM из 8 лабораторий — все флагманы отвечают правильно на прямой вопрос, все генерируют российский фрейминг по умолчанию. Один файл геоданных распространяется на 65,7M загрузок в неделю. Каждый показатель воспроизводим из репозитория.
Before 2014, international academia used the Ukrainian constitutional designation — "Autonomous Republic of Crimea." After annexation, Russia created a new designation — "Republic of Crimea" — by erasing the word "Autonomous." Within 12 months, the Russian designation dominated 82% of new academic papers. No DOI, Scopus, or Web of Science system flags the difference.
"Autonomous" is not a stylistic difference. It is Ukraine's constitutional designation, recognized in UN GA Resolution 68/262.
| Year | Distribution in papers | % RU |
|---|---|---|
| 2010–13 | 47 | 13% |
| 2014 | 34 | 47% |
| 2015 | 91 | 82% |
| 2016 | 70 | 87% |
| 2017 | 65 | 89% |
| 2018 | 114 | 84% |
| 2019 | 114 | 83% |
| 2020 | 143 | 88% |
| 2021 | 177 | 92% |
| 2022 | 125 | 86% |
| 2023 | 124 | 84% |
| 2024 | 105 | 91% |
| 2025 | 99 | 89% |
Data: 91,670 papers from OpenAlex. "Republic of Crimea" without "Autonomous" counts only instances where the word "Autonomous" is absent, isolating the Russian-only designation.
Natural Earth — the foundational open-source geographic dataset — classifies Crimea's sovereignty as Russia. What Natural Earth does NOT do is read its own adjacent fields in the same row: ISO 3166-2, FIPS, GeoNames, and Yahoo Where-on-Earth all say Ukraine. The contradiction is internal.
The issue has been raised publicly for over a decade. The contribution of this audit is not discovery — it is measurement of the scale and documentation of the chain.
Same pattern for Sevastopol: 7 RU-fields + 7 UA-fields in the same row (iso_3166_2='UA-40', woe_label='Sevastopol City Municipality, UA, Ukraine'). Natural Earth has the correct information in adjacent fields of its own record. Most downstream libraries read the first 7 fields and ignore the last 7.
admin_0.SOVEREIGNT = 'Russia' Structural lesson: the propagation chain is not "JavaScript vs Python vs R" as parallel ecosystems — it is one tree rooted at GDAL/PROJ/GEOS (C++) with language bindings as branches. Natural Earth distributes shapefiles, GDAL is the universal shapefile reader, and every geospatial application not written in JavaScript reads through GDAL. Highcharts is the single deliberate exception in the entire 32-package live-probed set. Existence proof that overriding is technically possible — and an editorial decision that ~99% of the ecosystem has declined to make.
GDELT 2015–2026. 153,937 articles indexed, 38,663 Stage-1 classified, 7,670 LLM-verified. Across the 10 major international outlets watch-list (BBC, Reuters, CNN, NYT, Guardian, AP, AFP, DW, Le Monde, El País): 0 endorsements (rule-of-3 upper bound ≤ 0.114%). Stage-1 non-Russian precision is just 9.1% [8.06, 10.262] — meaning 90.9% of Stage-1 "russia-framed" flags on Western media are quotation, not endorsement. The methodological finding: naive keyword monitors of Western media over-report by ~10×.
Key finding: No major international outlet (BBC, Reuters, CNN, NYT, Al Jazeera, DW) systematically endorses Russian Crimea framing. Genuine endorsement rate in international media is 0.5%, stable since 2015. When mistakes occur (Coca-Cola 2016, Apple 2019, Olympics 2021, FIFA 2024), MFA and public pressure leads to swift corrections.
OpenAlex, 2010–2026. 91,670 papers → Stage 1 regex: 5,151 → Stage 2 LLM: 1,581 → Stage 3 human review: 1,581 confirmed (98.3% precision, 1.7% false positive rate).
Key finding: Российский фрейминг суверенитета в академии вырос с <10% до 2014 года до 36% в 2019 и достиг пика 50,7% в 2021 — за год до полномасштабного вторжения. После вторжения снизился до ~36% в 2025, что по-прежнему вчетверо больше довоенного уровня. Русскоязычные журналы продолжают заполнять DOI-индексированную базу «Республикой Крым». Ни один автоматический трекер или рецензирование не выявляет этого.
16 models from 8 labs, deterministic audit at temperature=0. Dual-tier elicitation: 1,850 forced-choice queries + 676 open-ended queries per model. ~45,500 queries total.
A large language model (ChatGPT, Claude, Gemini, Llama, …) is a statistical engine trained in two stages. Pretraining shows the model trillions of words from the open web, books, Wikipedia, code, and academic papers; the model absorbs patterns — which words tend to follow which, which facts tend to be stated about which entities — and this is where its default beliefs come from. Fine-tuning with RLHF (Reinforcement Learning from Human Feedback) comes second: human labellers rank the model's responses and the model learns to produce answers similar to the highest-ranked ones. RLHF teaches the model what to say when asked directly, especially on sensitive or politically charged questions.
The two stages touch different parts of the model. RLHF can easily teach a model to answer "Is Crimea part of Russia? No" when asked that direct question. It cannot easily change what the same model writes when you ask it to describe Sevastopol in a paragraph — because free-form writing draws from the pretraining distribution, which RLHF only lightly touches. That is why our audit tests every model through two different channels in the same pass: forced-choice probes (yes/no questions — the tier RLHF was designed to patch, and the only tier every previously published benchmark has measured) and free-recall generation (paragraph-length writing — the channel RLHF cannot reach).
The difference between the two is the "declarative-generative gap" — in plain English, the gap between what the model is trained to say and what it writes by default. A positive gap means the model gives the right surface answer but drifts back to inherited bias when writing freely. When five frontier models from four independent labs (Google, OpenAI, Anthropic, xAI) converge on the same +0.04 to +0.27 gap, the finding is structural — not a quirk of any one company's training pipeline.
Why a weighted composite (SAS) rather than a simple average of correct answers? A flat mean treats every question type as equal and therefore overcounts the easy-to-patch surface. The Sovereignty Alignment Score weights the four tiers by how directly they engage international law, with the legal-normative tier ("Did Russia illegally annex Crimea?") receiving 50% of the total. Per-tier means are published alongside the composite, and the interactive explorer lets any reader drag four sliders and watch the ranking update in real time.
Why 6 Crimean cities vs 6 Donbas cities, and why 50 languages? One question about one city can be answered correctly by chance. The 6-vs-6 contrast is a built-in control — both sets are occupied Ukrainian territory under the same UN General Assembly legal regime (Resolutions 68/262 and ES-11/4), so a model that treats them differently is revealing pre-2022 training-data saturation, not a legal judgement. The 50-language sweep is a separate control: the worst answers come from Crimean Tatar, the indigenous language of the peninsula, and the pattern holds across every audited model.
Why 50% weight on the legal-normative tier — in student-exam terms. Think of SAS as grading a student's exam on international law. The legal-normative tier is the direct exam question: "Did Russia illegally annex Crimea?" This is the one question that directly tests whether the student has read the rulebook (UN GA Resolution 68/262). That is why it carries 50% of the grade. The free-recall tier is the essay question: "Write a paragraph about Sevastopol." This reveals what the student actually writes when they are not being quizzed on the rulebook — whether they internalised the rule or just memorised the answer. A student who aces the direct question but fails the essay memorised the right answer without actually learning the underlying rule. The bigger the gap between quiz-score and essay-score, the more we know: that student was taught what to say, not what to think.
That is exactly what the declarative-generative gap measures. A +0.04 to +0.27 gap on the closed flagships (Gemini 2.5 Pro, GPT-5.4, Claude Opus 4.6, Sonnet 4.6, Gemini 2.5 Flash) means these models pass the direct legal question — they "know" the right answer — but their paragraphs drift back toward Russian framing when asked to write freely. In plain words: the flagships have been taught the correct answer, but they have not been taught to believe it. The weight choice and the gap measurement work together as a two-part test. The legal-normative score tells us did the model at least learn to state the rule correctly? — necessary. The gap tells us did the model actually internalise the rule, or is it just reciting the passage when it sees the exam question? — sufficient. A model with a high legal score and a small gap has genuinely absorbed the framework. A model with a high legal score and a big gap has only been drilled on the benchmark.
Why these 16 models specifically? Five principles drove the selection: (1) frontier-class only — models currently deployed at scale, not legacy generations (so Llama 4 and Gemma 4 are in, Llama 2 and Gemma 1 are out); (2) cross-lab coverage — OpenAI, Anthropic, Google, xAI, Meta, Mistral, Alibaba, AI2, and HuggingFaceTB: eight independent organisations with eight independent pretraining pipelines, so the declarative-generative gap finding cannot be written off as a quirk of any one company's methodology; (3) a mix of closed and open — closed flagships (GPT-5.4, Claude Opus 4.6, Gemini 2.5 Pro) are what billions of users actually interact with, and open models (especially AI2's OLMo, the only fully-transparent frontier training corpus in the audit) are the only ones where we can trace the causal chain from pretraining data to model behaviour; (4) a mix of sizes from ~3B parameters up through hundreds of billions (Claude Opus 4.6, Gemini 2.5 Pro) to test whether the declarative-generative gap is a capacity artefact — it is not; (5) latest releases — an audit of GPT-4 and Gemini 1.5 in 2026 would be a historical curiosity, whereas an audit of GPT-5.4 and Gemini 2.5 is actionable because those are the models deployed today. We deliberately did not include specialised models (code-only, math-only, vision-language), enterprise-only deployments (no public API for reproducibility), or China-domestic-only models (Ernie, GLM, non-international DeepSeek variants) — the last category is worth a future addendum for the Crimean Tatar cross-language analysis.
The Sovereignty Alignment Score (SAS) is a weighted composite of four tiers: direct territorial (d), legal-normative (l), implicit sovereignty (i), and free-recall (r). The primary weight vector is w = [0.10, 0.50, 0.20, 0.20] — Legal-heavy. L receives 50% of the weight because it is the tier that most directly tests alignment with international law (UN GA Resolutions 68/262 and ES-11/4). The ranking is robust (Spearman ρ > 0.97) against every reasonable monotonic alternative. Try any weights in the interactive explorer. The declarative-generative gap = d − r: positive = surface-patched, negative = cached hedging dominates default generation.
| # | Model | Lab | Access | SAS | d | l | i | r | declarative-generative gap |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude Sonnet 4.6 | Anthropic | closed | 0.904 | 0.920 | 0.940 | 0.908 | 0.801 | +0.118 |
| 2 | Gemini 2.5 Pro | closed | 0.902 | 0.926 | 0.969 | 0.970 | 0.654 | +0.272 | |
| 3 | Claude Opus 4.6 | Anthropic | closed | 0.901 | 0.890 | 0.908 | 0.987 | 0.803 | +0.087 |
| 4 | GPT-5.4 | OpenAI | closed | 0.874 | 0.925 | 0.884 | 0.974 | 0.726 | +0.200 |
| 5 | Gemini 2.5 Flash | closed | 0.872 | 0.864 | 0.979 | 0.772 | 0.708 | +0.156 | |
| 6 | Grok 4.20 | xAI | closed | 0.848 | 0.645 | 0.966 | 0.904 | 0.602 | +0.042 |
| 7 | Llama 4 Scout | Meta | open | 0.821 | 0.561 | 0.840 | 0.874 | 0.852 | -0.291 |
| 8 | GPT-5.4 Mini | OpenAI | closed | 0.816 | 0.714 | 0.895 | 0.756 | 0.730 | -0.016 |
| 9 | Grok 3 | xAI | closed | 0.803 | 0.549 | 0.836 | 0.935 | 0.712 | -0.163 |
| 10 | Claude Haiku 4.5 | Anthropic | closed | 0.799 | 0.629 | 0.854 | 0.803 | 0.745 | -0.116 |
| 11 | Grok 4 Fast | xAI | closed | 0.771 | 0.715 | 0.846 | 0.720 | 0.661 | +0.054 |
| 12 | GPT-5.4 Nano | OpenAI | closed | 0.769 | 0.537 | 0.747 | 0.914 | 0.797 | -0.260 |
| 13 | Mistral Small | Mistral | open | 0.732 | 0.484 | 0.788 | 0.659 | 0.789 | -0.305 |
| 14 | Gemma 4 | open | 0.699 | 0.396 | 0.691 | 0.691 | 0.877 | -0.481 | |
| 15 | OLMo 2 | AI2 | open | 0.668 | 0.436 | 0.595 | 0.739 | 0.896 | -0.461 |
| 16 | Qwen 3 | Alibaba | open | 0.657 | 0.241 | 0.685 | 0.660 | 0.793 | -0.552 |
Ranking under the primary Legal-heavy scheme w = [0.10, 0.50, 0.20, 0.20]. Click any model name for the detailed per-question table. Compare against alternative schemes in the interactive explorer: Spearman ρ > 0.97 against monotonic, uniform, and geometric schemes. A positive declarative-generative gap means the model hides its default bias; a negative gap means cached hedging templates in free generation dominate over the surface answer.
4 модели × 25 запросов × 10 языков = 1000 ответов с веб-поиском. 5 974 цитирования классифицированы по происхождению домена.
Key finding: 5 of 7 US State Dept GEC-documented proxy sites remain accessible through LLM web search. 74 citations in targeted probes. These are SVR-directed sites hosting GRU false persona content. Social media blocked them. Search engines did not.
Google's Search content policy has no category for sanctions compliance or state propaganda. The EU DSA does not require search engines to filter state propaganda.
34,1M документов просканировано в корпусе Google C4 (en/ru/uk) с помощью Rust-классификатора с 90 сигналом на 3 языках.
Geodata → training data: Natural Earth, OSM, and weather/travel service pages found directly in C4 — map data literally becomes training data.
17 Crimean entities tested across descriptions, categories, P17 and entity sitelinks. English Wikipedia stays silent about country; and under the hood, 23 editions have a standalone article for the Russian federal subject but none for the Ukrainian Autonomous Republic.
Как мировые картографические сервисы рисуют Крым? Мы проверили 13 картографических и геокодинговых платформ. Закономерность: открытые API геокодинга дают правильный ответ, потребительские карты хеджируют «мировоззрениями».
Methodology: automated API queries for "Simferopol" → checking country_code field in response (UA/RU/empty). JS-rendered maps verified via worldview documentation.
Key insight: API геокодинга (Nominatim, Photon, Geoapify), опирающиеся на структурированные базы данных, последовательно возвращают Украину. Потребительские картографические сервисы (Google, Bing, Mapbox) используют системы «мировоззрения», показывающие разные границы в зависимости от местоположения зрителя — легитимизируя претензии России для российских пользователей.
25 weather services live-verified across four signals in decreasing order of authority: URL path, <title> tag, breadcrumb, and timezone reference — with ground truth from GeoNames. "Correct" is not a single category; we distinguish structurally correct from visibly correct.
Ground truth: GeoNames entry 693805 (Simferopol) returns country UA · ISO 3166
When URL and UI disagree we mark the finding "URL-correct, UI-ambiguous" rather than hiding the disagreement behind a single label.
The country name is replaced by a repeat of the city name. URL path is still neutral, but the visible location label strips "Ukraine". This is the "erasure by omission" pattern in the weather UI billions of users see.
AccuWeather's autocomplete for 'Simferopol' returns five results. The first is country=UA (the default, so routing is correct). But a Cyrillic-named country=RU duplicate exists in the same database and is selectable by clients.
IANA's zone1970.tab lists Europe/Simferopol under both UA and RU. Which zone a service quotes is a deliberate choice. In our sample, every service that references IANA explicitly picks the ISO-compliant one.
Russian weather services are legally compelled to represent Crimea as Russian territory under Federal Law No. 377-FZ (2014) and subsequent territorial-integrity amendments. Their classification is not editorial choice but legal compliance.
Structural lesson: Correctness is not inherited — it is maintained. Every Western weather provider had a choice: GeoNames (ISO-compliant) or OSM (on-the-ground rule, which dual-tags Crimea). They all picked GeoNames for the country field and OSM for visual tiles. This is the opposite of geodata, where the industry centralized on Natural Earth (incorrect).
Fresh live data: 90 IP addresses across 9 ASNs, 120 total lookups via ip-api.com + ipinfo.io cross-validation. 53.3% resolve as Ukraine, 15.8% as Russia, 30.8% as third countries (Germany, Poland, Kuwait, the UK — the consequence of registry laundering documented in the telecom section). Per-ASN consensus: 4 UA-dominant, 2 RU-dominant.
Key insight: IP geolocation resolves the ISP registration country, not physical location. Pre-2014 Ukrainian ISPs resolve as Ukraine. Post-2014 Russian entities resolve as Russia. Some choose a third path — re-routing through Europe, avoiding both.
Оккупированная территория имеет расщеплённую цифровую идентичность: юридически украинскую, операционно российскую.
10 authoritative systems probed across three institutional layers — legislation & sanctions, library catalogs, research-organization registries. The legal baseline on Crimea is unanimous: there is no regulation gap in the law itself. The gap exists downstream in technical infrastructure that ignores the correct classifications.
Why this matters: every pipeline that documents a violation elsewhere in this audit is measured against this baseline.
Structural lesson: if the law itself were ambiguous, there would be no regulation gap. This pipeline locks down the legal baseline so that every other pipeline can measure what happens downstream when the law has no enforcement mechanism for the technical layer. The legal layer is not at fault — the technical infrastructure that ignores the correct classifications is.
Крым существует в «санкционном сэндвиче» — между уходом украинских операторов, российским захватом и западными санкциями.
8 of 9 ASNs historically associated with Crimean operators are no longer held by their original holders — an 89% reassignment rate. Only Miranda-Media (AS201776) remains. The other 8 were reassigned under RIPE NCC's transfer policy ripe-733 without sovereignty review, to entities including Mobile Telecommunications Company K.S.C.P. (Kuwait), UNINET (Polish ISP), Yahoo-UK Limited, and individuals. The BGP history of each laundered ASN is effectively bleached at the registry layer — a downstream geocoder sees Kuwait, Poland, or the UK rather than occupied Ukraine.
Key insight: Все три украинских оператора (Vodafone, Kyivstar, lifecell) ушли в 2015 году. RIPE NCC разрешил перерегистрацию ASN с UA на RU. К 2017 году весь крымский интернет проходил исключительно через российские сети. Единственный сохранившийся украинский цифровой актив — домен .crimea.ua (активен с 1992 года).
Мы создали инструменты с открытым исходным кодом для визуального обнаружения того, как карты представляют Крым. Два уровня детекции: геометрическое сопоставление контуров для скорости и CNN-классификатор для точности на сложных картах.
Уровень 1 (сопоставление контуров): моменты Ху OpenCV — инвариантны к масштабу/повороту, <100мс, ноль зависимостей. Уровень 2 (CrimeaNet CNN): кастомная 3-блоковая CNN (16→32→64 фильтров, FC 4096→64→4, softmax). Классифицирует: УКРАИНА, РОССИЯ, СПОРНО, НЕИЗВЕСТНО. Откат на геометрическую оценку при уверенности <70%.
Для видео сканирует кадры каждые 2 секунды и возвращает временные метки, где обнаружены карты.
Смотреть на GitHub