Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusion rules for political entities #306

Closed
axelboc opened this issue May 2, 2020 · 48 comments · Fixed by #312
Closed

Inclusion rules for political entities #306

axelboc opened this issue May 2, 2020 · 48 comments · Fixed by #312
Labels
conception Scope of the deck, memorisation, contribution guidelines, etc.
Milestone

Comments

@axelboc
Copy link
Collaborator

axelboc commented May 2, 2020

As per #137 (comment), I'm opening this issue to try to define a set of strict inclusion rules for political entities.

Sovereign states

This entity type has been well defined from the start thanks to Wikipedia's List of sovereign states. This list, sourced from the UN, includes 206 states. All 206 states are included in the deck, whether or not their sovereignty is disputed.

Dependent territories

This entity type was first suggested in #221 (comment). Quoting Wikipedia:

A dependent territory, dependent area or dependency is a territory that does not possess full political independence or sovereignty as a sovereign state yet remains politically outside the controlling state's integral area.

Wikipedia's list of depend territories is quite extensive as it includes uninhabited territories, only one of which is currently included in the deck (British Indian Ocean Territory). Adding all these territories is out of the question as per #137 (comment), so my recommendation is to refer to the list of inhabited dependent territories and remove British Indian Ocean Territory from the deck.

I identified in my audit that 36 of the 40 inhabited territories are already in the deck. The four that are not are: Akrotiri and Dhekelia, Saint Pierre and Miquelon, Svalbard, Wallis and Futuna, so these would need to be added leading to a net addition of 3 notes to the deck.

Special overseas subdivisions

As per #137 (comment), countries' main subdivisions, like states and regions, are not to be included in the deck. Special subdivisions may however be included, particularly when they are located overseas. This should include, in my opinion:

  • French overseas regions: French Guiana, Guadeloupe, Martinique, Mayotte, Réunion
  • Autonomous communities of Spain: Canary Islands, Ceuta, Melilla, Balearic Islands (not currently in the deck).
  • Special municipalities of the Netherlands: Bonaire, Saba, Sint Eustatius
  • Autonomous regions of Portugal: Azores, Madeira
  • Special territory of Chile: Easter Island
  • Semi-autonomous region of Tanzania: Zanzibar

The list may be extended in the future.

I recommend that the following subdivisions not be included in the deck as political entities (i.e. with flag and capital), as they do not have a special status: Sardinia and Sicily (regions of Italy), Corsica (region of France), Bali (province of Indonesia), Galápagos Islands (province of Ecuador), Kaliningrad Oblast (oblast of the Russian Federation). Some or all of them may remain in the deck as physical geography entities (i.e. with only a map).

Special cases

In my opinion, the following entities are significant-enough politically to remain in the deck without the need for broader categorisation:

  • Constituent countries of the United Kingdom: England, Wales, Scotland, Northern Ireland
  • The European Union

The only remaining political entity is Mount Athos, which I recommend removing as it is really niche.

The deck also includes the following world regions: Melanesia, Micronesia, Polynesia, Scandinavia, Balkan Peninsula. I'm not quite sure whether these belong to the political or physical geography side of the deck... kind of neither? Perhaps they should be treated separately either way.


Tentative actions summary

  • Add Akrotiri and Dhekelia, Saint Pierre and Miquelon, Svalbard, Wallis and Futuna, and Balearic Islands.
  • Remove British Indian Ocean Territory and Mount Athos.
  • Remove capitals and flags of Sardinia, Sicily, Corsica, Bali, Galápagos Islands, Kaliningrad Oblast (but keep content of Country info field).
  • Document inclusion rules.

Note that the changes above would grow the deck by three notes but shrink it by a few cards.

@axelboc axelboc added the conception Scope of the deck, memorisation, contribution guidelines, etc. label May 2, 2020
@axelboc axelboc added this to the v3.4 milestone May 2, 2020
@ukanuk
Copy link
Contributor

ukanuk commented May 2, 2020

I like the recommendations you present. I definitely want to keep Sardinia, Sicily, Corsica, Bali, Galápagos Islands, Sea of Galilee, the Dead Sea, the Gulf of Mexico, and other geographical entities that are not political but do have significant historical, social, and/or cultural importance. I think the chances are near 0% of me coming across the Balearic Islands in my life, but the names of these geographical entities are familiar to me and I'm ashamed I don't know where to find most of them.

Besides the list of inhabited dependent territories, it's important to point out the list of autonomous areas. It has a lot of overlap with the dependent territories, but if the deck is organized politically then on principle I'd think the whole autonomous areas ought to be included before the dependent areas list. On the other hand, one could argue the dependent territories are more useful/important to know than the autonomous areas because while the autonomous areas are mostly in or bordering their parent country, the dependent territories are mostly geographically distant.

The deck currently includes 4 of the 5 autonomous areas created by international agreement, the one missing area being South Tyrol (pop. 530,000). However, the deck does not currently include most of the regions created by internal statutes.

So to summarize, I don't think we should include all of the autonomous areas, especially because it would be a ton of work to add to the deck and probably none of us want to volunteer for that at this time. I mainly mentioned it as it seems to be the defining characteristic of Mt Athos, and potentially the main reason it was added to the deck by the original creator.

Also, what do people think of defining the deck as "human" geography rather than "political" geography? This could help guide which physical geography to include, and even stuff like Tiananmen Square could potentially be included on this criteria. The primary difficulty would be determining what makes one feature more relevant than another? One idea I have is doing a word count of proposed Wikipedia articles, with the assumption that on a community-created site, people will tend to add greater depth (and greater length) to more important articles. On the other hand, this could implicitly add bias (English-speaking audiences with more free time and more internet access).

@ukanuk
Copy link
Contributor

ukanuk commented May 2, 2020

As for why it might be worth including Melanesia, Micronesia, and Polynesia, one could note they're non-obvious primary regions listed under the Wikipedia Template:Regions of the world. Non-obvious meaning not already a country name like Mexico or Australia, and not just a compass direction prefixing the continent name like North Africa or Central Europe.

By this criteria, then the Arctic, Antarctic, and West Indies regions ought to be added, along with arguably the 'Stans (Central Asia), Siberia (North Asia), Nanyang (Southeast Asia), and the Middle East (West Asia).

I'm also uncertain under what criteria Scandinavia and the Balkan Peninsula could fall and be included. Scandinavia seems important to me, but I think that's because of non-political "human" importance rather than geographical important.

@aplaice
Copy link
Collaborator

aplaice commented May 2, 2020

Firstly, @axelboc, thanks for doing this, as it's a sticky and thankless task, that I had been dreading looking at...


I think that the regions (Melanesia, Micronesia, Polynesia, Scandinavia and the Balkan Peninsula) should be discussed in a separate issue, as they're a special category, as you've both said, and because currently they're the one most haphazardly put together (in the sense that the likely smallest, most natural category containing them, would also contain many more elements), so I'm afraid that they'd dominate the discussion. (OTOH I think it'd be a shame to just cull them completely, and adding a couple of regions from other parts of the world might be interesting.) I don't feel entirely happy about just using the "primary" regions from "Template:Regions of the world", as they feel a bit arbitrary.


The suggested rules seem to be broadly sensible, though I have some nitpicks.

Dependent territories

  1. Technically, Saint Helena is not on the list of Inhabited dependent territoriesSaint Helena, Ascension and Tristan da Cunha is. Should we remove Saint Helena and add the "full" territory? (I guess Saint Helena was originally added, because of its slight historical interest, as Napoleon had died there.)

  2. My gut objection is that I really don't care about these territories (in the sense that I'll happily learn about them, but I wouldn't have prioritised doing so). Adding them doesn't really fix the slight western bias in the deck, and the total effect of the suggested changes would be to exacerbate it (e.g. adding Akrotiri and Dhekelia and removing the capital of Bali). OTOH I don't see any other way of systematically dealing with this category, so I think the new criteria are OK.

(My more general worry is that we're bloating the deck, step by step, to make things more consistent at each step, but again, I don't have any actionable suggestions.)

Special overseas subdivisions

One significant issue concerning consistency, is that it's difficult to compare degrees of autonomy across countries. (For instance, in a federal country, like India, Germany or the USA, an average top-level subdivision can have greater autonomy than an "autonomous region" in another.)

Spain

All of Spain's top-level subdivisions are called "autonomous communities" or "autonomous cities", so Ceuta, Melilla, the Canary Islands and the Balearic Islands are not "exceptionally" autonomous.

Hence, I'm against adding the Balearic Islands — otherwise, for consistency, we'd also need to add, say, Sweden's Gotland, China's Hainan or India's Andaman Islands and keep Corsica.

I think it's still worth keeping the Canary Islands and possibly Ceuta/Melilla. The best additional general criterion that I can think of, to allow this, would be something like 'is on a different continent than the "mainland", and is not "close" to it'. However, this would also require the inclusion of Hawaii, which is in Oceania. (OTOH, would that inclusion be that terrible?) Alternatively, the Canary Islands could be turned into a purely geographical entity.

Italy

Sicily and Sardinia are actually autonomous regions with special statute, so they should be fully kept.

France, Netherlands, Portugal, Chile, Zanzibar

Fully agree.

Issue with partial removal

One issue with partial removal (removing capital and flag), is that unlike in the case of "full" removal*, it results in the deletion of cards (and loss of studying history) when importing into an existing deck.

* When a note is fully removed, then on import via CrowdAnki, the old note in the existing deck is left alone.

Hence, if it's likely that a card will eventually be fully removed, I'd recommend doing it in one stage. (For instance, I think that Corsica and Kaliningrad oblast will probably end up fully removed.)

In the cases where partial removal probably makes sense — Bali and the Galápagos Islands — I can think of several approaches:

  1. "Grandfather" these cases in (actually, I'm not 100 % convinced that when an entity is included on its merits as a geographical entity, but it also happens to be a political entity, its capital and flag shouldn't be included, in general).

  2. Create a new note, with a new guid (and with only the Country <-> Map card(s)) and remove the old one.

    • - This will result in duplicated Country <-> Map card(s).

      • Could counteract the duplication, by removing the "map" field from the old version of the note, and only remove the old note in a second release.

        • - Overly complicated and results in a loss of review history for the Country <-> Map card(s)...
  3. Not worry about the loss of cards.

(I'd vote for 1., but none of them is perfect.)

Special cases

Fully agree regarding the EU, the UK and Mount Athos, but as already noted, I'd consider Melanesia, Micronesia, Polynesia, Scandinavia and the Balkan Peninsula separately.


On the other hand, one could argue the dependent territories are more useful/important to know than the autonomous areas because while the autonomous areas are mostly in or bordering their parent country, the dependent territories are mostly geographically distant.

Yeah, I think that has been the basis of the argument, but it might, perhaps, be worth revisiting autonomous areas, at some point. I agree with you that that point should probably not be this day.

Also, what do people think of defining the deck as "human" geography rather than "political" geography?

It's an interesting idea, though as you point out it'd be extremely difficult to find objective criteria (word counts aren't exactly an ideal metric — if it comes to that, I'd rather just use authorial judgement). I think it'd be best left to another Anki deck.


Overall, my alternative suggestions:

  • Add Akrotiri and Dhekelia, Saint Pierre and Miquelon, Svalbard, and Wallis and Futuna.
  • Remove British Indian Ocean Territory and Mount Athos.
  • "Mark" Corsica and _Kaliningrad Oblast as slated for deletion, unless they miraculously manage to fit the criteria for physical entities.
  • Document inclusion rules.

@axelboc
Copy link
Collaborator Author

axelboc commented May 2, 2020

Your nitpicks all make sense, @aplaice, but the result doesn't feel right...

My gut objection is that I really don't care about these territories (in the sense that I'll happily learn about them, but I wouldn't have prioritised doing so). Adding them doesn't really fix the slight western bias in the deck, and the total effect of the suggested changes would be to exacerbate it (e.g. adding Akrotiri and Dhekelia and removing the capital of Bali).

I whole heartedly agree. In my case, I think what is interesting about dependent territories, "special overseas subdivisions", Kaliningrad Oblast, etc. is their location (... and their governance information for context). I really don't care much about their capitals and flags, to be honest. In the end, the only flags and capitals I really care about are those of sovereign states.

So here is a very wild suggestion: how about reconsidering all of the dependent territories and the like as physical entities rather political entities? In this scenario, we could, for instance:

  • include Kaliningrad Oblast as an exclave (which is rather interesting geographically speaking);
  • keep Saint Helena as the island instead of renaming it;
  • include Saint Martin as a single island instead of two territories;
  • ... or remove Saint Helena, Saint Martin and other tiny islands altogether in exchange for more prominent islands like Svalbard (also keeping Corsica, Sicily, etc.), more subregions like Siberia, and potentially more physical entities like deserts, mountain ranges and so on.

Dependent territories and autonomous areas could be moved to a separate deck. I know, this is wild... Perhaps it's just shifting the problem instead of solving it... But I do feel like it would make the deck less "western" and subjective, and instead more relevant and accessible to more people.

Any thoughts? Did I go way too far? 😄

@aplaice
Copy link
Collaborator

aplaice commented May 2, 2020

So here is a very wild suggestion: how about reconsidering all of the dependent territories and the like as physical entities rather political entities?

... or remove Saint Helena, Saint Martin and other tiny islands altogether in exchange for more prominent islands like Svalbard (also keeping Corsica, Sicily, etc.), more subregions like Siberia, and potentially more physical entities like deserts, mountain ranges and so on.

The issue is that, like you suggest, the most logical conclusion of this reconsideration would be to remove the vast majority of the current dependent territories, since most of them are not exceptionally large neither by area, nor population. (For instance, Martinique, one of the largest dependent territories in the Caribbean, is 294th in terms of area and 95th in terms of population.)

However, I think that it does make sense to include, for instance, the various dependent territories in the Caribbean, since:

  1. (to the extent that I can tell) the difference between those territories that became independent and those that didn't was an accident of history (it seems like the British territories mostly became independent, while the French and Dutch didn't, and were instead eventually granted sweeping autonomy).

  2. from the point of view of learning the sovereign states, learning the non-sovereign states is useful for positioning purposes (e.g. that Dominica is "between" Guadeloupe and Martinique).

    Obviously, for the latter, treating them purely geographically would mostly also work (though Guadeloupe, for example, would probably need to be split between its two main constituent islands (Basse-Terre and Grande-Terre)), but finding a general justfiication for including them would be hard.

Similar arguments hold for the Pacific territories. Of the remainder, many are generally politically interesting (e.g. the Falklands or Hong Kong).


Saint Helena

To be fair, I also wouldn't be happy about replacing Saint Helena with the whole territory. Perhaps, we can stretch the argument used for Guernsey (#241) about including only the "central" element of the territory...

Kaliningrad Oblast

I'm not insisting on its removal. :)

Your argument about exclaves being geographically interesting makes sense (but then Alaska should also probably be included :/).


One alternative solution to the proliferation of tiny dependent states would be to introduce an additional "size" (population and area) requirement. For example, including only those inhabited ones which satisfy at least one of the very arbitrarily chosen:

  1. Area > 260 km².

  2. Population > 17000.

would result in the loss of Pitcairn Islands, Cocos (Keeling) Islands, Tokelau, Christmas Island, Norfolk Island, Montserrat, Saint Barthélemy and Anguilla, and the addition of Svalbard.

Raising these to the slightly less fine-tuned values of 300 km² and 20,000 population would cause the additional loss of the Cook Islands and Niue.

We could even add a second set of thresholds for including capitals and flags (rather than just maps), such that, say, only Greenland, Hong Kong and Puerto Rico were included. The latter set of thresholds could perhaps even be applied to unrecognised sovereign states.

(The precise values would obviously be arbitrary, but I don't think the entire idea is completely unjustified, though feel free to mercilessly tear it apart.)

@ukanuk
Copy link
Contributor

ukanuk commented May 3, 2020

Firstly, @axelboc, thanks for doing this, as it's a sticky and thankless task, that I had been dreading looking at...

Yes, thank you!!!

Your argument about exclaves being geographically interesting makes sense (but then Alaska should also probably be included :/).

OTOH, would that inclusion be that terrible? (quoting from your comment on including Hawaii)

We could even add a second set of thresholds for including capitals and flags (rather than just maps), such that, say, only Greenland, Hong Kong and Puerto Rico were included. The latter set of thresholds could perhaps even be applied to unrecognised sovereign states.

With this criteria, the proliferation of dependent territories would not be not nearly as big of a deal. Unfortunately implementing partial removal of flags/capitals would result in lost cards as you mentioned previously. And if anyone wants/needs to learn that info, it seems a shame they effectively couldn't use the UG deck because every update would overwrite their manual additions.

How about moving all this info (and grandfathered info) to the extended deck? This would let anyone still contribute their less-relevant info to the UG deck and get other's content updates and avoid the unfortunate situation described above. And then whatever arbitrary cutoff criteria are used also become less important, because people in disagreement can just get the extended edition.

There could be the understanding that anyone using the extended deck will filter out stuff they don't care about. A separate Readme section (or maybe a second file?) could provide the search criteria needed to revert different parts of the deck to match the standard deck. So people could suspend just extended card types, or suspend just additional flag/capital/country info cards, or some personally relevant combination.

This could also reduce the burden of new translations, if they were only required to complete content in the standard deck edition.

This would require an extension to anki-dm/brain brew to allow tagging specific note fields or even note content, and then the deck building script would need to read those tags and change output accordingly. I think this would be a great addition, as it would let more people with different needs still contribute to the same decks for the 90% of shared information.

@axelboc
Copy link
Collaborator Author

axelboc commented May 3, 2020

How about moving all this info (and grandfathered info) to the extended deck?

This could make sense, but what if people just want to do the extended deck for the extra templates and not for the extra cards? I think it would be easier to just create a separate deck tbh.

That being said I'm really liking the idea of carefully choosing two sets of population and area criteria for inclusion with and without flag/capital.

I'm also totally in favour of adding Alaska and Hawaii. I think with the proper criteria (probably different ones than for dependent territories), we wouldn't be adding many more exclaves. And we could add them without flag/capital as a general rule.

Personally, I'm not concerned at all about removing cards from the deck to gain in quality. We can mitigate the effect by releasing a major version. People can then decide to finish the deck before upgrading.

@ukanuk
Copy link
Contributor

ukanuk commented May 3, 2020

This could make sense, but what if people just want to do the extended deck for the extra templates and not for the extra cards? I think it would be easier to just create a separate deck tbh.

People only wanting extra templates would suspend all cards matching this search:

"note:Ultimate Geography [Extended]" (card:1 or card:2 or card:3 or card:4) Country:("Pitcairn Islands" or "Cocos (Keeling) Islands" or "Tokelau" or "Christmas Island" or ...)

People only wanting extra partial field info would suspend all cards matching this search:

"note:Ultimate Geography [Extended]" card:(3 or 5)

These searches would be provided in the README.md in the info about using the extended deck. The first search string above would be a little annoying to create, but not nearly as annoying as:

  1. Someone losing cards already learned from partial deletion of card info
  2. Someone wanting to learn any partial card info intentionally excluded from the standard UG deck, but then being unable to participate in the UG community because every import of updates re-erases the partial card info.

Personally, I'm not concerned at all about removing cards from the deck to gain in quality.

I fully agree with this sentiment.

We can mitigate the effect by releasing a major version. People can then decide to finish the deck before upgrading.

Although I do not understand how it helps someone to finish the deck before upgrading? Don't they lose their partial cards regardless of whether they upgrade before or after finishing?

@axelboc
Copy link
Collaborator Author

axelboc commented May 3, 2020

Although I do not understand how it helps someone to finish the deck before upgrading? Don't they lose their partial cards regardless of whether they upgrade before or after finishing?

Yes, of course, I was more conjecturing that users would be less concerned about losing cards once they had "matured" in their decks.

A major version would simply help us make people aware of the harsh consequences of upgrading, thus reducing the risk of users being angry at us... 😄


I've made a spreadsheet to play with the criteria for dependent territories. I started with a min. population of 20,000 and a min. area of 300 km2. I then sorted the rows to identify which territories would:

  • go away entirely - i.e. those that match neither criteria.
  • remain with maps only - i.e. those that match one of the criteria.
  • remain with maps, capitals and flags - i.e. those that match both criteria.

If you play around with the two values, you can see which territories are affected thanks to conditional formatting. I tried increasing the min. area to 1000 km2 and I quite like the result, personally.

Dependent territories.xlsx

  • I think the combination of pop. OR area (map only) and pop. AND area (map/capital/flag) with the same set of thresholds gives good results, but we could also go for pop. OR area for both but with two sets of thresholds as you initially suggested, @aplaice.
  • I've only included dependent territories in the table for now, but we could very well add overseas subdivisions, exclaves, etc. if we wanted to select them based on the same criteria and thresholds... or we could make separate sheets for them with different criteria and thresholds.

@aplaice
Copy link
Collaborator

aplaice commented May 3, 2020

OTOH, would that inclusion be that terrible? (quoting from your comment on including Hawaii)

:D


On second thoughts, I'm not sure about my suggestion about grandfathering in old cards. Ideally, both the normal and extended deck should be optimally designed out-of-the-box, without special tweaking and we shouldn't "hobble" (even mildly, by including information we now deem excessive) all future users of the decks, for historical reasons.

That said I fully share your two concerns:

  1. Someone losing cards already learned from partial deletion of card info
  2. Someone wanting to learn any partial card info intentionally excluded from the standard UG deck, but then being unable to participate in the UG community because every import of updates re-erases the partial card info.

2 could be dealt with, by creating a new deck purely for dependent territories, containing the "full" info, and encouraging interested people to use that deck and suspend the overlapping cards from the main deck. Depending on how CrowdAnki deals with cards suspended by the end-user (I haven't checked), they might have to re-suspend the overlapping cards on each re-import from the main deck (at least until the relevant improvement is implemented in CrowdAnki).

1 is far trickier but also something that I'd personally be worried about as a user. It might be sunk-cost fallacy (or Stockholm syndrome), but having already invested so much effort, I wouldn't want to forget all these flags (and capitals, but the capitals were much less effort). :D With SRS, learning is unfortunately never actually ever done...

There are two linked problems here:

A. Losing the card altogether from your deck/from Anki (and hence losing the chance of continuing to learn it)

B. Losing your "progress", but not the cards themselves. (e.g. by having a card replaced by a fresh identical copy of itself).

Of the two, I'd be most devastated by A, but also rather unhappy about B, and I guess that'd hold for most people (?).

As suggested above, A can be avoided without any special tooling by creating fresh notes containing partial info. Avoiding both A and B would probably require special tooling (an extra migration add-on).

This would require an extension to anki-dm/brain brew to allow tagging specific note fields or even note content, and then the deck building script would need to read those tags and change output accordingly. I think this would be a great addition, as it would let more people with different needs still contribute to the same decks for the 90% of shared information.

That would also work, and would be extremely valuable, globally, but would again need improved tooling.


"note:Ultimate Geography [Extended]" (card:1 or card:2 or card:3 or card:4) Country:("Pitcairn Islands" or "Cocos (Keeling) Islands" or "Tokelau" or "Christmas Island" or ...)

The first search string above would be a little annoying to create

FWIW it could be simplified by creating an extra tag (say, UG::Small_dependent_territory) and using that in the search.


I've made a spreadsheet to play with the criteria for dependent territories.

This is amazing, thanks!

I think the combination of pop. OR area (map only) and pop. AND area (map/capital/flag) with the same set of thresholds gives good results, but we could also go for pop. OR area for both but with two sets of thresholds as you initially suggested, @aplaice.

TBH I appreciate the simplicity and also quite like the result, though I'm worried that it's my Europe-centric bias (the Faroe and Åland Islands are included, while many of the far more populous Caribbean territories and Macau, aren't).

In any case, I think that having the first threshold at (area < 1000 km²) OR (population < 20000) is great!

(I briefly looked at having the threshold set by a combination of area and population, for territories that don't have a very large area or population, but have both relatively large (e.g. having an ellipse ((area/1000 km²)^2 + (population/20000)^2 < 1):

ellipse

or even an ellipse in log-space), but it just seems to over-complicates things, with little benefit.)

but we could very well add overseas subdivisions, exclaves, etc. if we wanted to select them based on the same criteria and thresholds...

I think that using the same criteria and thresholds would be sensible.


My "meta" suggestion would be to implement the first threshold now, and wait with implementing the second threshold (whatever it will be (AND with the same parameters or OR with higher parameters)) — firstly to determine the least painful upgrade path and secondly to wait for releasing a major version until more potentially breaking changes have accumulated.

@ohare93
Copy link
Member

ohare93 commented May 3, 2020

Yes, of course, I was more conjecturing that users would be less concerned about losing cards once they had "matured" in their decks.

Well no one will ever lose any cards/notes. CrowdAnki (or Anki, for that matter) has no way to say "Delete X cards not in this deck" upon import. Upon converting to a new version the note will simply be orphaned and not be updated again, as no updates will be within the new files. It'll still be in their deck tho. The Note Model will still change though 🤔

@aplaice
Copy link
Collaborator

aplaice commented May 3, 2020

Well no one will ever lose any cards/notes. CrowdAnki (or Anki, for that matter) has no way to say "Delete X cards not in this deck" upon import. Upon converting to a new version the note will simply be orphaned and not be updated again, as no updates will be within the new files. It'll still be in their deck tho. The Note Model will still change though thinking

You're sort of right and I was sort of wrong.

Anki doesn't have a way to say "Delete X notes not in this deck" upon import. When the note model changes such that given cards cease existing, it will happily delete the cards that cease existing.

However, (unlike what I believed, without checking, based on the behaviour when the model changes) in the case where cards cease existing because the contents of the relevant field is removed from the given note, CrowdAnki won't actually delete these cards. The cards will become "empty", but they won't be automatically deleted (they will only be deleted when the user runs "empty cards").

This means that importing a deck with the contents of the "flag" and "capital" fields removed for the average-sized dependent territories wouldn't immediately result in the irreversible loss of the relevant cards. However, the user will still be unable to continue learning these old cards of theirs without further action, and the most obvious course of action (filling in the capitals/flags back in) would be undone the next time they imported from AUG. (Also, when somebody sees such an empty card, given that Anki's suggestion is:

The front of this card is empty. Please run Tools>Empty Cards.

they might end up deleting the cards anyway, without realising it.)

Hence, I think that the upgrade issue would be almost as severe as initially presented, even if the mechanics aren't quite how I had described them at the start.

@ohare93
Copy link
Member

ohare93 commented May 3, 2020

Fair enough! No disagreement from me 👍 If the note model changes sufficiently (field name changes) then the cards will indeed be blank or broken, and could be removed be the user by mistake in that specific case.

@axelboc
Copy link
Collaborator Author

axelboc commented May 3, 2020

Could a solution to the lost-cards problem be to move the cards into a separate deck before upgrading? If so, then we could release a minor version and provide mini-tutorials to tell people how to move (or delete) the cards before upgrading.

@ukanuk
Copy link
Contributor

ukanuk commented May 4, 2020

Is there any way to "move" a card such that CrowdAnki can no longer find it? I thought the point of the the guid was so that even if you change note model, contents of primary field, deck, etc., it can always be tracked down.

In the case where cards cease existing because the contents of the relevant field is removed from the given note, CrowdAnki won't actually delete these cards. The cards will become "empty", but they won't be automatically deleted (they will only be deleted when the user runs "empty cards").

Good to know! This gives me another idea. The partial information could be maintained in a separate deck - and as long as it uses the same guid as UG, then users can keep upgrading by first importing the main UG deck (which erases partial fields), then importing the "partial-info" deck on top (which fills in those fields again). It would be a little bit of work writing a merge-partial.py script to create the secondary data.csv based on the partial data but integrating main UG deck info, but I think with pandas this would be a pretty short and easy script.

Only problem is I wouldn't want to maintain this in a separate repo, as I personally wouldn't use it (I'm happy to create the script, but don't want to keep updating releases, managing PR's, etc., for something I won't use)

@axelboc
Copy link
Collaborator Author

axelboc commented May 4, 2020

It would work for fully deleted notes, I think, but you're right, not for partially deleted ones.

@ohare93
Copy link
Member

ohare93 commented May 4, 2020

Is there any way to "move" a card such that CrowdAnki can no longer find it? I thought the point of the the guid was so that even if you change note model, contents of primary field, deck, etc., it can always be tracked down.

I am confused at what you are asking.

This deck generates a CrowdAnki file that is a collection of guids with fields and tags (plus NoteModels and Deck Headers). Let's say we pass it the guids AAAA, BBBB, CCCC, DDDD, etc. Those values are read in by CrowdAnki, it looks for a card with the matching guid, of CCCC for example, and either updates the values or creates a new card.

Deleting a card in this deck is the equivalent of only passing AAAA, BBBB, DDDD. No CCCC will be in the file. CrowdAnki (the anki add-on) has no way of knowing a guid is missing there. It just does it's normal job, find cards with those specific guids and update or create them as necessary. The user does have CCCC, but that will no longer be in the CrowdAnki file after it is deleted in this repo. There is no other mechanism to give it that specific guid to say "Delete this card". The only thing CCCC shares in common with the other cards is the NoteModel, which still may change due to future imports of the deck, and those changes may break the cards on the note, which can be tidied up by Anki.

A deletion notification could definitely be added to CrowdAnki, only as a user option "we suggest you delete these cards here" I'd say, but it is not there right now.

The partial information could be maintained in a separate deck

What is partial information here, I do not understand your premise

@axelboc
Copy link
Collaborator Author

axelboc commented May 4, 2020

I think there is confusion here between cards and notes. To me, a GUID is associated with a note and CrowdAnki creates/updates notes on import, not cards. Cards are dynamically generated by Anki based on the note model.

Case 1: removing a note

CrowdAnki doesn't yet have a way to know that a note was removed, so the note and all its associated cards will remain in the user's deck after the import.

Case 2: removing a capital and flag from a note

CrowdAnki will update the note on import. Anki will leave the note's cards in the deck, but they will be blank (either on the front or on the back). The blank cards can be deleted by running Tools > Empty cards.


Case 1 leads to no loss in content, but case 2 does. As @ukanuk said, my suggestion to move the notes into another deck prior to the import does not solve case 2, since the note will be updated anyway because of its GUID.

@axelboc
Copy link
Collaborator Author

axelboc commented May 4, 2020

Scenario 1: user doesn't care about losing cards and progress

This scenario is covered by releasing a major version, because we explicitly recommend that users perform a clean import when upgrading to a major version.

Scenario 2: user doesn't care about losing cards but cares about losing progress

Unlike with a change of note template, removing notes or content from notes won't technically impact progress, so users can choose not to perform a clean import if they want to keep their progress.

To stay in sync, they can manually delete the notes that were fully removed from the deck and run Tools > Empty cards. Ideally, CrowdAnki would provide a tool like Empty notes to remove notes, which GUIDs don't appear in the deck's JSON file.

Scenario 3: user cares about losing cards and progress

This is the challenging scenario but I think you're onto something, @ukanuk. We could indeed move/copy the affected notes to a separate deck, keeping all GUIDs the same, but I would suggest two changes to your idea:

  1. Make a one-off deck -- no version control, no maintenance -- just a CrowdAnki deck that users can import once to move the affected cards out of the UG deck before upgrading it. We can provide all the variants of this one-off deck in the release notes within a single ZIP file.
  2. Change the GUIDs of the affected notes in data.csv to ensure that they don't move back into the UG deck on future upgrades. This will lead to a loss of progress on those notes specifically, but I think it's reasonable outcome.

Note that the one-off deck could also be used for scenario #2 to simplify the process of removing the notes: users could just import the one-off deck then delete it, which is simpler than finding and removing the notes manually.

@ohare93
Copy link
Member

ohare93 commented May 4, 2020

I think there is confusion here between cards and notes. To me, a GUID is associated with a note and CrowdAnki creates/updates notes on import, not cards. Cards are dynamically generated by Anki based on the note model.

Yes, replace all my previous usages of card with note! 😅

To stay in sync, they can manually delete the notes that were fully removed from the deck and run Tools > Empty cards. Ideally, CrowdAnki would provide a tool like Empty notes to remove notes, which GUIDs don't appear in the deck's JSON file.

That would indeed be a good feature in CrowdAnki, but more difficult than you might think. Remember that CrowdAnki knows nothing about decks or collections of cards, only that the deck header states which deck all the imported cards should go into. However, as I referenced in #175, users can move their notes out of the import folder to join them together into larger clumps of data. I myself do not have an Ultimate Geography deck, but a Miscellaneous deck which I moved my cards into. CrowdAnki should obviously not attempt (or even request, really) to delete all my Miscellaneous cards! 😅 My general thoughts on progressing this:

  1. Include the deleted notes/guids in the CrowdAnki file itself, and offer to the user to delete those. That is relatively unlikely to offer extra notes to delete.
    • What if users upgrade from a muuuch lower release? Do the deleted cards need to stay in perpetuity?
  2. Offer to delete all notes which do not share the top level "UG" tag
    • What if users have added their own data which they find relevant? 🤔
  3. Set a flag "InformUserImportMayDeletion" on the CrowdAnki export which will bring up a popup on import informing the user some notes may have been deleted during this import, but that it is unable to track things automatically. Instead it will happily tag the notes that get imported as part of this operation with "Imported-[Date]-[Time]" (in a text box so it can be changed by the user) and the user themselves can filter to find the cards that were not imported, and decide for themselves whether to delete or keep them.
    • The popup could even have a tickbox for "Duplicate NoteModel" so that the new cards use an entirely different NoteModel, and the old ones are left as they are entirely.

#3 would be my preferred option, as it takes all the work away from us (apart from enabling "InformUserImportMayDeletion" for a specific release), and quite frankly I don't think it is a feasible task as a deck manager. Too many possible bad outcomes / user flows are possible. I think # 3 achieves all you could want from a data perspective, does it not? An extra legacy deck is rather cumbersome for no perceivable upside that I can see.

Thought?

@aplaice
Copy link
Collaborator

aplaice commented May 4, 2020

Good to know! This gives me another idea. The partial information could be maintained in a separate deck - and as long as it uses the same guid as UG, then users can keep upgrading by first importing the main UG deck (which erases partial fields), then importing the "partial-info" deck on top (which fills in those fields again).

That's a brilliant idea! The main disadvantages are that it would:

  1. Require the separate deck to be maintained (updating to the latest version of the templates etc.) — otherwise the importing the "partial-info" deck on top will revert to the old versions of the shared templates/note models. This could be automated or semi-automated, though.

  2. Have to be run every time by the user, whenever they update the "main" deck, adding friction. (Obviously only users who wish to preserve cards would be affected.)

The advantage obviously is that absolutely nobody (who didn't want to) would lose any cards or any history, while new users wouldn't be inconvenienced.

On the whole, though, it's probably too much effort for everybody (users and maintainers).


Change the GUIDs of the affected notes in data.csv to ensure that they don't move back into the UG deck on future upgrades. This will lead to a loss of progress on those notes specifically, but I think it's reasonable outcome.

I think that's among what I had been suggesting initially, though it probably got lost in the excessively long comments...

Moving the cards out into a separate deck, first, is a great idea, since, as you write, it avoids issues on re-export of the deck (both for the fully deleted notes — it avoids recreating them — and for the "partially deleted" ones — it avoids refilling now-empty fields).

@aplaice
Copy link
Collaborator

aplaice commented May 4, 2020

@ohare93 I'm also in favour of 3, as it's manageable for maintainers/developers and avoids wanton deletion of existing notes (as a user, I generally don't want my notes or cards deleted).

I also like the idea of the "Duplicate NoteModel" tickbox.

@axelboc
Copy link
Collaborator Author

axelboc commented May 4, 2020

Remember that CrowdAnki knows nothing about decks or collections of cards,

@ohare93 so CrowdAnki couldn't, on import, loop through all of the notes in a user's collection and identify the ones that have the same note model as the deck being imported? Well, that makes things harder, indeed... 😅


Require the separate deck to be maintained (updating to the latest version of the templates etc.) — otherwise the importing the "partial-info" deck on top will revert to the old versions of the shared templates/note models. This could be automated or semi-automated, though.

I really don't see the one-off deck as maintainable and re-importable. To me, it's really just a migration tool. We should probably give it a different note model name/identifier, though, so it is not affected by future changes to the UG deck.

@ohare93
Copy link
Member

ohare93 commented May 4, 2020

@axelboc

@ohare93 so CrowdAnki couldn't, on import, loop through all of the notes in a user's collection and identify the ones that have the same note model as the deck being imported? Well, that makes things harder, indeed... 

Oh no, it coud be changed to do that. But it currently does not do so.

Also I would advise against it, as I said above, in the case that a user has added cards of their own into the deck with the same note model. If people don't agree Edinburgh of the Seven Seas goes in the deck they can add it for themselves! 😉

@axelboc
Copy link
Collaborator Author

axelboc commented May 4, 2020

Oh right, I get you 😄 -- though for the second part, it could be an optional behaviour so people who have added their own notes could opt out.

@aplaice
Copy link
Collaborator

aplaice commented May 4, 2020

I really don't see the one-off deck as maintainable and re-importable.

I had meant in the scenario described by @ukanuk, which allows keeping full history for all cards, at the cost of a lot of busywork.

In the alternative scenario where we change GUIDs, no additional deck would need to be maintained (at the IMO acceptable cost of some history loss).

@ohare93
Copy link
Member

ohare93 commented May 11, 2020

Extra thought on this, perhaps it belongs in a separate thread but I thought it made sense with what we were talking about here.

What do people think about starting to add a tag "UG::RecentlyChanged" onto notes when major things change about them between a release? Such as a capital changing to a different one, or the name of the country being updated. This would allow the users to filter for those that have changed between versions, so that they can reset those cards if they wish.

@axelboc
Copy link
Collaborator Author

axelboc commented May 11, 2020

Alright, so where are we at? (Thanks for the bump @ohare93 😄)

I think we all agree:

  • to release the changes as a major version;
  • to create a one-off deck with all the removed and affected notes;
  • to change the GUIDs of the affected notes in the main deck;
  • to document the migration options in the release notes;
  • to do a pre-release before the main release so people can try out the migration;

We also seem to agree on:

  • the criteria of area >= 1000 km2 and population >= 20,000;
  • the combination of the criteria with a logical OR for the "inclusion in the deck with location" threshold;
  • the combination of the criteria with a logical AND for the "inclusion in the deck with all info" threshold;
  • the application of these thresholds to the list of inhabited dependent territories and potentially other political entities.

What's left to do is decide what these other political entities are and add them to the spreadsheet.

Obviously we could just start with the overseas subdivisions we already know (overseas territories of France, special communities of Spain, etc.) but I'm pretty sure this is not an exhaustive list, and, as we've discussed "special status" is vague concept, which doesn't include Hawaii notably. Can we define this any better?

We also need a list of exclaves worth including in the deck (Alaska, Kaliningrad Oblast, etc.) Have you seen one around on Wikipedia? The article on enclaves and exclaves is a bit all over the place...

There's also the topic of whether to include some islands/archipelagos as political or physical entities. With our new criteria for political entities, I would rather include them as political entities after all (Hawaii instead of Hawaiian Islands), especially since Saint Helena, Ascension and Tristan da Cunha does not make the cut 😄

@ohare93
Copy link
Member

ohare93 commented May 11, 2020

to create a one-off deck with all the removed and affected notes;
to change the GUIDs of the affected notes in the main deck;

I'm sorry I still do not understand, and actually forgot about this questions I had earlier. What is this talk about moving some cards to another deck and changing guids in the original deck...? 😕 Are you referring to this:

Dependent territories and autonomous areas could be moved to a separate deck. I know, this is wild... Perhaps it's just shifting the problem instead of solving it... But I do feel like it would make the deck less "western" and subjective, and instead more relevant and accessible to more people.

If so I am still confused. Moving the notes for smaller / less relevant places (like Kalingrad Oblast) in order to have more concrete definitions for what is included in the deck sounds like a good plan. A supplementary deck would work well in this regard. But what does this have to do with changing the guids? The only thing I could see is "Change the GUIDs of the affected notes in data.csv to ensure that they don't move back into the UG deck on future upgrades." but tbh that does not help me understand exactly what that entails. Could someone give a concrete example for when that would be done, and the benefit that changing guids would give? 👍

@axelboc
Copy link
Collaborator Author

axelboc commented May 11, 2020

No worries, let's take Gibraltar, for instance. Right now the note has a capital, flag and map. With the new inclusion rules, the capital and flag will be removed, leaving only the map.

If users didn't do anything and just imported the new major version of the deck with CrowdAnki, they would end up with blank cards for the capital and flag templates, thus losing those cards completely. The one-off deck is for people who are interested in keeping those cards.

It will do so by including an exact copy of the note for Gibraltar, GUID included. When users import the one-off deck, the note will then move out of the main UG deck and into a new deck.

Once users have imported the one-off deck and the note has moved out of the main UG deck, it's time for them to actually upgrade their main UG decks. If the note for Gibraltar had the same GUID as before, it would move out of the users' one-off decks and back into their main UG decks. Thus, the idea to change the GUID of the note in the main deck: a new note for Gibraltar will then be created, leaving the one in the one-off deck alone.

The users' upgraded UG decks now contain a single card for Gilbratar based on the Map - Country template (plus a card for the reverse template in the case of the extended deck). Does that make more sense?

@ohare93
Copy link
Member

ohare93 commented May 12, 2020

Changing the guids does not sit right with me 👎 It offloads the work to the normal users, not the quirky users who want to keep everything. Upon upgrading to this new version a new user would have duplicate versions of all of these Small Independent Territory cards, one with all the date (the old one) and one with only the data we have deemed important enough (the new one). When talking about fully removing an entity there is no way to currently get around that, the user must delete it themselves (for now). But as a normal method of upgrading, that is entirely too clunky. The quirky users should have the onus to take further steps to keep cards out of the norm.

A better method would be to inform the users that if they wish to keep the original changed cards then before upgrading they should:

  • Install the Copy Notes add-on
  • Find all the cards (with a provided search string or tag)
  • Duplicate them using Copy Notes, which gives the copy a new guid
  • Upgrade, then delete the duplicates (provided by the same search above)
    This would at least negate the onus on normal users. Though people going down this route would be required to delete the duplicate ones on every update in the future 😞

A better better path forward, I believe, is closer to @aplaice's original grandfather suggestion, but not as in keeping the cards in the deck as they are. With a more powerful deck manager (cough cough 😉) it is simple to keep the capital and flags for these decks saved in the source files, but not export them to the normal deck, instead leaving them as blank. This could be done with a specific source file like "SmallIndependantTerritories.csv" with all the data in it, but that only some is used in the normal deck (which would work today in Brain Brew) or as a tag filter in a main source file "only use field if tag doesn't contain 'X'" type of thing on the build steps (not working now, but totally doable)

Why not keep the data already collected for these locations and simply make an "UG Exhaustive" Deck featuring every possible political or geographical entity? It would be akin to the Extended deck, simply having more notes and/or more fields filled in fields on some notes (though I can see an argument that Extended and Exhaustive should be the same deck). Benefits include:

  • No guid changes (ever!) and therefore no clunky upgrade options for the users to go through 🙏 Future downgrades of cards will have no issues, as the users will be informed what is to be downgraded and have the option to move up to the Exhaustive deck or not.
  • Less:tm: disagreement about what is relevant enough to go in the proper deck, as there is a proper outlet for entities not important/big enough. Sure there'll be discussion on which of these lower class entities should be upgraded in status, but with proper rules in place the discussion should be simple and the result as easy as removing a tag/moving a csv row. No messy nonsense :+1: Having a generic question and answer along the lines of "does it have a capital? It goes in the Exhaustive deck at least!" makes things simpler and easier for us.
  • User choice! We're all a bunch of randos on the internet, who says we're right what goes in the deck? 😏
  • Keep all the work you many fine people have put into the deck over the years 😁 deleting a note is like deleting a child 😭

Possible downsides:

  • Management of a more entities. Not a big deal in my opinion, as there's already a hell of a lot. Also these second class entities can easily be flagged as so, in that they may receive less attention to detail / checks than the normal deck entities.
  • Having such an open frame of reference opens up for all type of entities being added into the Exhaustive deck: US States, tiny rocks in the middle of nowhere with no importance to anything (British Indian Ocean Territory), etc. Again, I am not so worries about this, as we're already mostly in this state anyways 😅

Thoughts? 😁

@ukanuk
Copy link
Contributor

ukanuk commented May 12, 2020

Why not keep the data already collected for these locations and simply make an "UG Exhaustive" Deck featuring every possible political or geographical entity?

Yes, if Brain Brew or Anki Deck Manager can be extended to do this, that would 100% be my vote!

This is similar to what I recommended in the last four paragraphs of #306 (comment), albeit I didn't know whether it was reasonably feasible for any of the deck managers. To reiterate,

I think this would be a great addition, as it would let more people with different needs still contribute to the same decks for the 90% of shared information.

@axelboc responded

This could make sense, but what if people just want to do the extended deck for the extra templates and not for the extra cards? I think it would be easier to just create a separate deck tbh.

Your suggestion of an additional EXHAUSTIVE deck besides the EXTENDED partially mitigates this, although people would still have to make/share a filter if they wanted the extra notes without the extra card templates.

I think ultimately we stopped discussing it since Brain Brew doesn't currently have such functionality and manually making a separate deck is feasible with current tools, even if not ideal.

@ukanuk
Copy link
Contributor

ukanuk commented May 12, 2020

Having such an open frame of reference opens up for all type of entities being added into the Exhaustive deck: US States, tiny rocks in the middle of nowhere with no importance to anything (British Indian Ocean Territory), etc. Again, I am not so worries about this, as we're already mostly in this state anyways sweat_smile

I'm concerned whether all this extra info can be effectively managed within one repo by the large group of maintainers this would require (a new maintainer for every new extension). The current deck maintainers certainly shouldn't have to manage all the associated PRs that would entail.

Maybe there's a way to put supplemental info in separate repos? That would greatly simplify what maintainers manage in what parts of the deck(s).

One way this could maybe be accomplished: Maintainers wanting to build the apkg file or CrowdAnki directory would download the core UG repo, then copy in supplemental CSVs, media, and/or templates from their supplemental repo, then share the merged apkg/CrowdAnki directory in the releases.

Another potential way: Brain Brew can be given two (or more) GitHub urls, and it automatically downloads and merges everything. Sounds nice in theory, not sure how much work it would be to actually implement.

@axelboc
Copy link
Collaborator Author

axelboc commented May 12, 2020

Upon upgrading to this new version a new user would have duplicate versions of all of these Small Independent Territory cards, one with all the date (the old one) and one with only the data we have deemed important enough (the new one).

This is why the idea is to release a major version of the deck, which requires "normal" users to perform a clean import as described in the README.

The upgrade process we're trying to come up with will necessarily have limitations because the tools on which we depend (Anki, CrowdAnki, AnkiDM) were not made to support this use case. Until they do, users will have to be inconvenienced in some way.

[Side note] As a reminder, this deck is built upon a very old shared deck. When I first released UG v2.0, I just wanted better flags and cleaner templates -- I never put any thought into which entities to include. This piece of work of figuring out inclusion rules for every entity in the deck is really about going back in time and fixing this. For what it's worth, I think this warrants a major release.

So that's for the whole reasoning behind changing GUIDs and why I think it's a reasonable solution.


[...] make an "UG Exhaustive" Deck featuring every possible political or geographical entity?

This is just not realistic from a conceptual perspective (i.e. exhaustiveness is impossible), and from a maintenance perspective.

The philosophy of this deck has always been to be accessible to as many people as possible in terms of content. That's why we're trying to shrink it down a tiny bit and make it a tiny bit less western. If you disagree with this philosophy and think that a significant number of people would benefit from an "exhaustive" geography deck, then this would need to happen in a separate repo.

Keep all the work you many fine people have put into the deck over the years 😁 deleting a note is like deleting a child 😭

I totally get the frustration of losing data that sometimes took much discussion and work to be added in the first place. To me, however, this is a sign of a healthy piece of software -- we're just cleaning up technical debt. Moreover, thanks to Git, if someone wants to revive this data in another repo, they'll always be able to find it through this repo's commit history.


Let's please move any further discussion on the topic of upgrading the deck to a separate thread (I'll open one now), and keep this thread focused on deciding inclusion rules for political entities.

@aplaice
Copy link
Collaborator

aplaice commented May 12, 2020

Obviously we could just start with the overseas subdivisions we already know (overseas territories of France, special communities of Spain, etc.) but I'm pretty sure this is not an exhaustive list, and, as we've discussed "special status" is vague concept, which doesn't include Hawaii notably. Can we define this any better?

FWIW Spain does not have "special" communities — all of its top-level sub-divisions are called autonomous communities or autonomous cities. Hence, the Canary Islands, Ceuta and Melilla are not really more "autonomous" than Madrid or Castile and León. (AFAICT the different "autonomous communities" differ in the exact amount of autonomy they have — for instance, I think that Catalonia is particularly autonomous — but looking at their Wikipedia article, the Canary Islands don't seem to be exceptional in this regard.)

In contrast, Portugal for example, does have "special" subdivisions (autonomous regions) and Madeira and the Azores are the only two.

Similarly, Italy has autonomous regions, and Sicily and Sardinia are two of five of them (though the only ones that are islands).

In the case of France, AFAICT, the overseas departments/regions are specially designated in the constitution (at least since 2003), but unlike the overseas collectivities, they don't differ from the mainland departments/regions in terms of autonomy — i.e. their title of "overseas" is official/constitutionally-approved but it doesn't confer any special autonomy.

Brainstorming possible criteria that can be combined (ORed and ANDed appropriately :)):

  1. Has special/autonomous status (compared to other similar-level subdivisions in the country). (But: are the French overseas departments "special"? (If not, they can be included under 4., though. :)))

  2. Is an island.

  3. Is an exclave.

  4. Is on a different continent to the mainland (and not adjacent/"close" to the mainland). (To allow for the Canary Islands and Hawaii and possibly overseas France)

For example, (1 AND 2) OR 3 OR 4. I'll try to extend the spreadsheet with these classifications for the existing entities and some that we might want to include.

We also need a list of exclaves worth including in the deck (Alaska, Kaliningrad Oblast, etc.) Have you seen one around on Wikipedia? The article on enclaves and exclaves is a bit all over the place...

Yeah, that article is rather chaotic, though not as terrible as I thought at first glance. Perhaps we could use the following categories, ignoring all the seas:

  1. Enclaves that are also exclaves > National level

    a. Apipé Islands have an area of 320 km² > 300 km², but might be uninhabited, so we can probably ignore them.

    b. Sokh has an area of 220 km² and a population of 51,000.

    I think that these are all the sufficiently large/populous ones, but I might have missed some...

  2. Exclaves that are not enclaves > National level

    a. Nakhchivan

    b. Gaza

  3. Semi-enclaves and semi-exclaves > Non-sovereign semi-enclaves

    a. Temburong district

    b. Oecusse

    c. Musandam

    c. Ceuta, Melilla, Gibraltar, Akrotiri and Dhekelia (otherwise already considered)

    d. Alaska

    e. (Kokkina doesn't give an area or population, while Peñón de Vélez de la Gomera is uninhabited, so I'm excluding them.)

  4. Semi-enclaves and semi-exclaves > Non-sovereign semi-exclaves

    a. Cabinda

    b. Rio Muni

    c. French Guiana, Kaliningrad oblast

Among other entities that we might want to consider, Northern Ireland, Serawak and Sabah (Malaysia on Borneo), and Turkey's European part are included in the Pene-enclaves/exclaves section, which I don't really want to try to fully digest, and anyway I'm not sure if they should be really included by us (Serawak and Sabah are two provinces, not a single province, while Turkey's European part is reachable from Asian Turkey by bridge, so it's not really remote...)

@axelboc
Copy link
Collaborator Author

axelboc commented May 12, 2020

Right, sorry, you had mentioned the not-so-special autonomous status of Spanish communities/cities before.

You're assessment of the status of French overseas departments is correct.


Among the exclaves and semi-exclaves you've identified, many of the obscure ones are very close to their "mainland". Perhaps this was mentioned before, but I wonder whether we could add another criteria in addition to area/population: distance from mainland, especially since it can apply to both exclaves and islands.

It could perhaps replace some of the criteria you listed - i.e. criteria no 4. obviously (other continent) but also perhaps criteria no 1. (special status) so we don't have to debate on the meaning of special status.

So we'd end up with:

(dependent territory OR insular subdivision OR exclave)
AND (area AND/OR population)
AND (distance from mainland)

EDIT I've used the term insular subdivision instead of island, as I think that's what we're after. Unfortunately, I don't think Wikipedia has a list of those...

@aplaice
Copy link
Collaborator

aplaice commented May 12, 2020

Right, sorry, you had mentioned the not-so-special autonomous status of Spanish communities/cities before.

I had guessed that my previous comments might have gotten lost within my horribly long replies discussing the upgrade method.


Among the exclaves and semi-exclaves you've identified, many of the obscure ones are very close to their "mainland".

FWIW I intended this to be a maximally extensive list for paring down. :)

Perhaps this was mentioned before, but I wonder whether we could add another criteria in addition to area/population: distance from mainland, especially since it can apply to both exclaves and islands.

In principle, that's a great idea. Some possible objections:

  1. Practical — it's hard to quickly determine the minimum distance between two regions — the mainland and the island/exclave in question. A proxy (the distance between the capital of the country and that of the region) would be straightforward to calculate, but in many cases it could be extremely inaccurate. An eyeballed estimate could be used, instead, but that would reduce the aimed rigourousness of the selection criteria. The advantage of "on a different continent" is that it's (mostly) a clear cut-off.

  2. Combining a hard area/population cut-off with a hard distance cut-off is perhaps overly simple. Ideally, there'd be some sort of inverse relation between the distance cut-off and the area/population cut-off. For instance, we might care about Sicily despite its proximity to the mainland, due to its large population/area.

  3. For island nations (Japan, Philippines or Indonesia), the mainland is poorly-defined — do we count the distance to the island containing the capital, to the main island cluster or to the nearest island overall?

no 1. (special status) so we don't have to debate on the meaning of special status

For special status we could use the list of autonomous areas (1.1 and 1.2, not 1.2.1 and 1.2.2) found by @ukanuk, possibly excluding cases where the name of the autonomous area is in the name of the country, to avoid having Tobago, Nevis, Príncipe or Barbuda (Tanzania doesn't count :)).

  • I haven't yet comprehensively checked all the autonomous areas that are islands, but one fascinating region that the criterion has brought up is Bougainville, an autonomous region of Papua New Guinea, which voted overwhelmingly (98.31%) for independence, in 2019.

  • In the longer term, we could perhaps even widen the criterion to autonomous areas that are not islands (possibly with a different population/area cutoff). (e.g. I'd love if some of the "autonomous" regions of China were included.)


(dependent territory OR insular subdivision OR exclave)
AND (area AND/OR population)
AND (distance from mainland)

I'd suggest using a separate, lower population/area cut-off for dependent territories, since they're the closest to being independent states.

I've used the term insular subdivision instead of island, as I think that's what we're after. Unfortunately, I don't think Wikipedia has a list of those...

Yeah, that's why I'd prefer using other categories for which Wikipedia does have lists. :/

Perhaps:

(dependent_territory AND (area1 AND/OR population1))
OR
(autonomous AND island AND (area2 AND/OR population2))
OR
(transcontinental AND distant AND (area2 AND/OR population2))
OR
(exclave AND distant AND (area2 AND/OR population2))

As partial justification for this monstrosity: I don't want to apply the "distant" criterion to the dependent territories, since that would remove the Isle of Man and probably also Guernsey and Jersey. I don't want to apply it to the autonomous islands since that would cause the loss of Zanzibar. The distance criterion is, however, as you suggested, convenient to filter out exclaves, and IMO also for transcontinental edge cases (e.g. Greece's islands off the coast of Anatolia or Turkey's European part).

The main advantage of the monstrosity is that we start with four well-defined lists (dependent territories, autonomous regions, transcontinental areas and exclaves) and then filter each down appropriately.

@axelboc
Copy link
Collaborator Author

axelboc commented May 13, 2020

You're right, the list of transcontinental countries brings some entities missing from the list of autonomous areas, like Guadeloupe, and the distance criteria makes sense for them. It also makes sense to have smaller cut-offs for dependent territories.

Let's put all these lists into spreadsheets and see what comes out. Perhaps we can source only population and area for now, to see get an overview, and add the distance later on if need be? How should we split the work?

@aplaice
Copy link
Collaborator

aplaice commented May 13, 2020

I did the autonomous islands yesterday (it was a rather small list) as a second sheet in your spreadsheet:

special_regions.xlsx

I've included Corsica, Martinique, Mayotte and French Guiana despite some doubts since I couldn't determine for sure whether the "Single territorial collectivities" have any significant specialness beyond being cases where a department is also a region. The article on Corsica suggests that they do:

As a single territorial collectivity, Corsica enjoys a greater degree of autonomy than other French regional collectivities

so I'm tentatively including them. If they're deemed unworthy then Martinique, Mayotte and French Guiana can be simply moved to the "transcontinental" section, while Corsica removed.

I've also increased the population and area thresholds ten-fold compared to the dependent territories, as a starting point.

In the long term, once it's mostly stable, the spreadsheet should perhaps be included in the repo. (Since it's small, including it just as a binary blob should be OK.)

How should we split the work?

Perhaps I can do the non-contiguous transcontinental regions and you the exclaves?

Perhaps we can source only population and area for now, to see get an overview, and add the distance later on if need be?

Makes sense!

@aplaice
Copy link
Collaborator

aplaice commented May 13, 2020

Now with the non-contiguous inhabited transcontinental regions as well:

special_regions.xlsx

I've excluded the Greek islands off the west coast of Turkey, since there are doubts about whether they're actually in Asia. For instance, Rhodes, which is the furthest east, is described in its article as "one of the most popular tourist destinations in Europe" and no mention is made of its possible belonging to Asia.

In the case of the Malay Archipelago I think it makes sense to split Asia from Oceania along the Weber line, as being roughly midway between the Sunda and Sahul continental shelves. It seems that Wikipedia also follows this policy, since Halmahera, Seram or Buru, lying east of the Weber line, are labelled as being in Oceania (according to their info boxes), while Babar and Timor, lying west of the line, are labelled as being in Southeast Asia.

The Maluku Islands are mostly in Oceania — comparing the above-linked map of the continental shelves with that of the Maluku Islands it seems that the two sub-chains Barat Daya and Sula islands are not, but the rest is.

The Maluku Islands consist of two Indonesian provinces Maluku and North Maluku. Including the two provinces separately feels a bit silly. Hence, if the Maluku Islands were included, it'd have to be without capital or flag (even if they cross the relevant thresholds). (Similarly for West Papua, which is fully in Oceania.)

However, we might also just decide that the Maluku Islands and West Papua don't cross the distance-from-"mainland" threshold.

@axelboc
Copy link
Collaborator Author

axelboc commented May 13, 2020

I've included Corsica, Martinique, Mayotte and French Guiana despite some doubts since I couldn't determine for sure whether the "Single territorial collectivities" have any significant specialness beyond being cases where a department is also a region.

Yeah, might as well just follow Wikipedia's list. Note that French Guiana is not an island, though 😄

In the long term, once it's mostly stable, the spreadsheet should perhaps be included in the repo.

Definitely.

Perhaps I can do the non-contiguous transcontinental regions and you the exclaves?

Alright, I'm on it!


From what I see, most of the entities that would be included make sense.

However, we might also just decide that the Maluku Islands and West Papua don't cross the distance-from-"mainland" threshold.

I agree. I don't think we need a distance threshold for autonomous islands, but it makes sense for transcontinental regions. It would exclude the Greek islands without having to discuss there transcontinental nature.


How would you feel about not including flags/capitals at all for autonomous islands, transcontinental regions and exclaves -- in other words, having a single cut-off? It would seem appropriate in regard to what you said earlier about dependent territories being closer to independent states than the rest of them:

I'd suggest using a separate, lower population/area cut-off for dependent territories, since they're the closest to being independent states.

@axelboc
Copy link
Collaborator Author

axelboc commented May 13, 2020

Here you go:

special_regions.xlsx

In the file, I listed why some exclaves didn't make the list. I notably excluded water-related exclaves like Rio Muni and the pene-enclaves/exclaves you mentioned in #306 (comment) as they're not really what we had in mind when we first suggested adding exclaves.

As mentioned in my previous comment, I kept only a single inclusion cut-off by removing the flag/capital column. I can revert if needed.

I also made a very unscientific attempt at classifying distances, and chose as a cut-off to exclude exclaves that are VERY CLOSE to their mainland.

Keeping the population and area criteria to 20,000 / 300 km2, only the following three exclaves pass the inclusion test: **Kaliningrad Oblast(( (which is arguably a little further from its mainland than all the excalves that I categorised as VERY CLOSE), Alaska, and French Guiana (which I've removed from the list of autonomous islands .. although now that I think of it, this is also a water-related exclave in a way... 😂).

@aplaice
Copy link
Collaborator

aplaice commented May 13, 2020

Here you go:
special_regions.xlsx

This looks great!

Note that French Guiana is not an island, though smile

Oops, thanks for fixing it!

I don't think we need a distance threshold for autonomous islands, but it makes sense for transcontinental regions. It would exclude the Greek islands without having to discuss there transcontinental nature.

I'll add such a distance field in line with the one you added to the exclaves, to the transcontinental regions, then.

How would you feel about not including flags/capitals at all for autonomous islands, transcontinental regions and exclaves -- in other words, having a single cut-off?

It might be a shame to lose, say, the capital of French Guiana, but overall it might simplify things and hence be better.

The counter-argument to my argument is that how like an independent country the regions are, is a spectrum, so some of the autonomous far-overseas islands have some properties of independent countries. (This is particularly visible for the range of various French overseas regions, which have historically moved between categories.)

(This is to say that I don't have a strong opinion either way.)

Keeping the population and area criteria to 20,000 / 300 km2

Hadn't you raised this to the slightly rounder 20,000 / 1000 km²? (It doesn't make a difference for the exclaves, but it does for the two other categories.)

French Guiana ([...] .. although now that I think of it, this is also a water-related exclave in a way... joy).

If worst comes to worst, it can also take refuge in the transcontinental regions section. :) (Originally I hadn't added it there to avoid duplication.)

@axelboc
Copy link
Collaborator Author

axelboc commented May 14, 2020

Yeah, sorry, 20,000 / 1000 km² makes sense for exclaves.

Maybe we can include the water-related exclaves after all, for consistency, since they wouldn't pass the distance criteria anyway.

@aplaice
Copy link
Collaborator

aplaice commented May 14, 2020

With distance column for transcontinentals:

special_regions.xlsx

Since I didn't quite trust my judgement of the degree of closeness, I added an approximate distance (in km) column. When Wikipedia (or occasionally google) provided a trustworthy figure for the distance to the mainland, I used that. In some of the cases I just estimated with Google maps and provided that value with a tilde.

I provisionally added a "Population AND Area AND Distance" column, without removing the old Location and Capital/Flag ones, yet.

@axelboc
Copy link
Collaborator Author

axelboc commented May 15, 2020

Alright, I tidied things up, sorted and documented each table and cleaned up the formatting. Please pick at it, as there are probably mistakes, notably in the names of entities, since links on Wikipedia don't always match article names.

special_regions.xlsx


I re-included Río Muni to be a bit more objective, but I kept pene-exclaves out.


I tentatively kept only map inclusion columns for autonomous islands and transcontinental territories to see what that looked like.

I feel pretty strongly about not having flag/capital for transcontinental territories, as I don't see why their flags and capitals would be more interesting to know that those of their parent countries' other subdivisions -- their location is what's relevant, since that's why we picked them.

FWIW, I'm a bit less sure when it comes to autonomous islands, as they have more of their own identities. Perhaps flags would make sense for them in that regard. But then Martinique officially uses France's flag ... it has an unofficial flag but using it wouldn't be very consistent with our guidelines so far. All in all, I think it'd just be easier to include them only with maps.


I tentatively increased the distance threshold for inclusion of transcontinental territories from CLOSE to FAR. I feel like the Maluku provinces and Western New Guinea are clearly part of Indonesia. I think transcontinental territories that are worth including are ones that are not obviously connected to their parent country. The FAR threshold also excludes the Socotra Governorate, which I think is okay, even though it could be assumed to belong to Somalia.

Kaliningrad Oblast is currently categorised as CLOSE. Looking at Google Maps, it looks to be roughly 300 km away from mainland Russia, so I think the categorisation is correct (you seem to have used CLOSE for the interval 200-500 km).

Now, if you think having two different thresholds for transcontinentals and exclaves isn't great, we have a few options.

  1. increase the inclusion threshold for exclaves to FAR and remove Kaliningrad Oblast;
  2. decrease the inclusion threshold for transcontinentals back to CLOSE and include the entities mentioned above;
  3. tweak the range of the CLOSE threshold to at least exclude the Maluku provinces.

Personally, I don't mind the two different thresholds, but I could also make peace with removing Kaliningrad Oblast.

@aplaice
Copy link
Collaborator

aplaice commented May 15, 2020

I feel pretty strongly about not having flag/capital for transcontinental territories, as I don't see why their flags and capitals would be more interesting to know that those of their parent countries' other subdivisions -- their location is what's relevant, since that's why we picked them.

That makes sense. It might be easier to just remove the flags and capitals for the autonomous islands as well.


You're right that both the Maluku Islands and Western New Guinea are really a "normal" part of Indonesia (at least geographically — ethnically and culturally Western New Guinea is quite different from the rest of Indonesia). Now that I've looked more carefully (rather than just measuring the distance between the major islands (Sulawesi->Halmahera and Sulawesi->New Guinea)), there's a more-or-less continuous chain of tiny islands both between Sulawesi and the Maluku Islands and between the Malukus and New Guinea. The closest that I could find was 11 km between an island belonging to Sulawesi province and one belonging to North Maluku, and 41 km between an island belonging to North Maluku and one belonging to West Papua province. (It's quite likely that the distances between the Maluku islands are greater than that, but not significantly more.)

Literal contiguity doesn't hold for island countries, but we could argue that islands that are "very close" to the "main part" of the country are part of the "main part" of the country ("contiguous" with it) and apply that recursively. (After all, if we were considering Sakhalin we wouldn't be measuring the distance from the Ural mountains.) Hence, in the case of Western New Guinea we should be measuring the distance not from Sulawesi, but from the Maluku Islands, to which Western New Guinea is indeed "very close".

I think that that argument and the fact that both the Maluku Islands and Western New Guinea consist of two provinces, are sufficient to exclude them.

Now, if you think having two different thresholds for transcontinentals and exclaves isn't great, we have a few options.

For simplicity and consistency, two thresholds wouldn't really be defensible. The exclave category was, in effect, created for the purpose of keeping Kaliningrad Oblast (French Guiana is also "transcontinental" so it'd stay anyway), so it might be a bit of a shame to remove it.

Consequently, I'd suggest sticking to not "very close" as the distance criterion and having both Socotra and Kaliningrad Oblast. OTOH I also wouldn't particularly oppose raising the distance threshold to "FAR" or "VERY FAR" and removing Kaliningrad Oblast.


Updated with:

  1. Explicitly only a single "include" column for autonomous islands.

  2. "Recursive"/"inductive" closeness.

  3. Return to the not "VERY CLOSE" threshold.

special_regions.xlsx

(Obviously feel free to revert any of these.)

@axelboc
Copy link
Collaborator Author

axelboc commented May 15, 2020

I think we've got it!! 😤

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conception Scope of the deck, memorisation, contribution guidelines, etc.
Development

Successfully merging a pull request may close this issue.

4 participants