-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inclusion rules for seas and related water bodies #346
Comments
Sort of modified repost of #137 (comment) as I didn't get any feedback there: |
Thanks @Sir-Casm, we'll keep an eye on the "Rivers, Lakes and Seas" deck. Some of the maps look like they match ours pretty closely, so it might be a good source for us. As mentioned, though, our priority is to first define inclusion criteria for seas, straits, etc. Once we'll have done that, we'll see which notes need to be added to the deck and we'll work on sourcing or creating maps for them. Thanks for bumping the issue, I admit that I had forgotten about it. 😅 I'll try to get things moving asap. |
Alright, let's get the ball rolling! physical-entities-v1.xlsx
Observations
|
Wow, that's a huge amount of work! I think it broadly makes sense. To the extent that I have doubts it's because of how arbitrarily named and ambiguously defined most water bodies are, and I'm not sure if any better choices are possible, given the data. Straits
Alternatively, we could use the (category) list of International Straits, since it already includes the Straits of Gibraltar and the Bering Strait, and already constrains on one of the criteria that might be interesting (the strait being international (though that criterion definitely isn't sufficient)). Slightly contrary to the name, it also includes "channels" such as the English Channel, the Mozambique Channel or the Drake Passage (under Antarctica), claiming that they're actually straits. The list is far more extensive, though, and we probably aren't interested in most of the items... As yet another alternative, a wikidata query that lists all the "straits" (according to Wikidata, the English Channel etc. are also straits) that "belong" to more than one country, could be used. I'll try to write one (for comparison, even if we don't end up using it). I'm a bit hesitant to just edit the List of Seas, to add in the Bering Strait and the Straits of Gibraltar, since it doesn't seem to be too reliable a source in this regard. If it missed even them, it might also have missed many other straits, some of which it might make sense to add, but which we just wouldn't think of, ourselves. OTOH we shouldn't let the perfect be the enemy of the good, so if none of the alternatives turns out unfeasible/impractical, just editing the List to add in the straits we want isn't too bad. Regarding the inclusion criteria, I'm a bit stumped. Arguably, the narrowness (and hence — usually — small area) of straits makes them more notable, since it increases their geopolitical importance. I can't think of anything other than my (old) suggestion of including intercontinental straits, but that's also not ideal (e.g. would we add both the Bosporus and the Dardanelles?). The extensive straits (such as the English or Mozambique channels) could be included based on the same surface area criterion as all the other seas. Observations
Yes, that makes sense. (Assuming that a reliable source is found (see below).
I assume you mean the seas of the Southern Ocean? The very largest are probably worth including, since they really are quite huge, even if they're very far from any populated territories.
Yes, it seems that they're going by the Australian Hydrographic Service's definition.
Yes, Wikipedia's data is dubious.
I wonder what Indonesians think of all the many Mediterranean seas. :D On that note, the Ionian and Balearic Seas are also missing infoboxes. :) Additional commentsMy main issue is that I have very serious doubts about Wikipedia's listed surface areas. They don't seem to provide sources and they don't specify which (of the often many) definitions they're following. OTOH assuming that they're approximately correct and that we verify the edge-cases, it probably doesn't matter. |
It's a good option. Another idea could be to include only straits that connect two marginal seas (and/or oceans) that both pass the inclusion criterion I suggested of 100,000 km2. This would exclude the Bosporus and the Dardanelles since they don't connect the Black Sea and the Mediterranean Sea directly. Note that the English Channel doesn't pass the 100,000 km2 limit, which is why I didn't include it as a marginal sea. I think we'll need a different criterion for channels.
You're totally right, it includes a lot more straits, channels and passages. 💯
Yeah, it's not a good sign at all. It makes me wonder whether this List of Seas is exhaustive even for larger seas. I'd be interested in comparing it with the IHO's Limits of Oceans and Seas of 1953, also to check whether the list (and our sublist) includes seas not delimited by the IHO.
Yeah, the lack of sources is quite appalling. The fact that some of the numbers seem really approximative and inaccurate doesn't help, that's for sure. 😞 It also is a shame that quite a few large seas are missing area information altogether.
If our own "original research" is more exhaustive, precise and reliable than Wikipedia, I don't see a problem with it. As long as we're methodical about it and the research is well documented and fully reproducible, I don't think anybody would mind. I've just found a paper introducing a digital map of the limits of oceans and seas based on the IHO's Limits of Oceans and Seas. Perhaps you had come across it before? I also found a site to download the map's shapefile, which we could totally pass through |
Or perhaps we may as well use the 2002 draft of the IHO's Limits of Oceans and Seas. After all, we already include the Southern Ocean, which does not exist in the 1953 version. 🤷♂️ I think we've talked about this before, but we could definitely highlight contentious areas between the 1953 and 2002 versions (like the limits of the East China Sea), a bit like we do for countries. |
I've just spent some time looking at the Sea of Japan and the East China Sea, and part of the problem is that their infobox templates are not for water bodies, but for East Asian (or Chinese) items of interest. (I think that this can be easily remedied, since (some) infoboxes can AFAICT be nested/embedded, though following the guidelines didn't seem to do anything, at least when previewing changes — it's possible that template changes don't apply when only previewing, that these particular infobox templates don't support embedding or that I did something wrong — I'll play around in a Wikipedia sandbox to check.) The Sea of Japan and the East China Sea do both contain surface areas in Wikidata, but while adding/cross-referencing (respectively) the sources for them, it turned out that various sources provide areas varying from 978,000 km² to 1,048,950 km² (for the Sea of Japan) and from 750,000 km² to 1,249,000 km² (for the East China Sea). Some of the sources even noted that multiple authorities provide different values...
That looks great! I might have seen it before and rejected it due to the non-commercial license for the shapefiles (which would make the SVG derived from the data incompatible with the CC BY-SA or similar needed for Wikimedia), though I don't remember for sure. In any case, for calculating the areas it's perfect. I'll definitely have a look! (I'm not making any promises on when I'll do everything that I've promised to do, in this thread, though I'll try soon-ish:))
That's a great and simple solution!
I hadn't realised. :O Your suggested criterion of connecting two sufficiently large seas would work in this case, though — both the Celtic Sea and the North Sea have an area greater than 100,000 km².
It's a really neat idea and it'd look great, but I'm not sure whether there are sufficiently good shapefiles for the 2002 version, though. (I had manually input the data from the 2002 draft for the Bering Strait map, with QGIS, and it took a while. Experience would speed things up considerably, but I still wouldn't relish doing it for all of the seas.) There aren't really sufficiently good, appropriately licensed shapefiles even for the published 1953 version — I had to patch the mostly great Natural Earth data with alternative sources, in a couple of cases, and the Baltic Sea is still subtly wrong (it's been on my to-do-list for a while). |
The good news is that a lot of the boundaries didn't change between the 1953 and 2002 versions, so I think the 1953 shapefile should cover a good 80% of our need. |
From what I gather, the shapefile is available for non-commercial use, so we can reference it in this repo, if not on Wikimedia. 👍 |
Since Wikipedia's List of seas is so unreliable, here is the list of all the seas (including straits) described in the IHO's 2002 draft: The draft often uses the seas' local names, so I've normalized them all to English based on Wikipedia. Only very few seas don't have articles on Wikipedia (that I could find): Aru Sea (relatively large sea off the coast of Papua New Guinea), Central Baltic Sea (portion of the Baltic Sea), Sound Sea (tiny sea off the coast of Estonia), Tryoshnikova Gulf (tiny gulf off the coast of Antarctica). Also, Wikipedia has an article for the Northwest Passage (i.e. the sea route in Northern Canada), but not for what the IHO calls the Northwestern Passages (i.e. the combination of all the waterways in the region). Regardless, I think this is a much better list than what I had before. It looks a lot more complete, especially when it comes to straits. If you agree, @aplaice, I'll update I've cross-checked the list with the IHO's 1953 document and identified the seas that appear in both versions. The next step will be for me to check which seas have changed boundaries between the 1953 and 2002 versions. This will inform us on which areas need to be calculated from scratch, and which can be extracted from the shapefile I mentioned previously. Note that the 1953 version includes one sea that is absent from the 2002 version: the Sea of Japan. This is due to the naming dispute between Japan and Korea. I think it should still be included in the list, though. Note also that a number of seas from Wikipedia's List of seas are absent from the IHO's document. The most significant ones are the Cook Inlet (the area of which Wikipedia most likely overestimates), the Argentine Sea (which lacks international recognition), and the Levantine Sea (which some sources characterise as a lake). I don't think any of these are worth keeping in |
I think that it makes sense to replace Wikipedia's List of seas with the 2002 IHO draft. I'm far less confident about replacing the previous criterion of "existence of inbobox" + "area in infobox > 100,000 km²", with the criterion of our own calculated (based on the 2002 draft) area being greater than 100,000 km² or a wholescale editing of Wikipedia's infoboxes. Firstly, updating the shapefile will take a considerable amount of work (to do semi-properly), (assuming that the comparison document is correct, I'd assume approx. half a day) for something that will be of limited general use — due to the non-commercial license of the original source it won't be useful for anything Wikipedia-related, and since it'll be done only semi-properly, it won't be of much use in an academic environment. (Doing it actually properly in the way the source 1953-based article did would be an immense undertaking.) (Though an automated parsing of the text of the IHO 2002 draft might be feasible (as you've probably noticed, I like automated solutions, since they don't leave me second-guessing whether I didn't make a mistake :).) Secondly, the 2002 draft, due to its very nature, isn't really an authoritative, indisputable source, so I'd be uncomfortable editing Wikipedia to replace/provide an area calculated based on it. (In cases like the extent of the South China Sea, it even gets a bit political...) Thirdly, for purely our purposes (since it wouldn't be easily/cleanly "upstreamable"), I feel like it's a huge overkill. For the edge cases, it makes sense to calculate the areas, to double-check Wikipedia's dubious values, but otherwise even large errors in the values are unlikely to change whether a sea passes the criterion. I feel really bad discouraging/opposing your enthusiasm. :(
Yes, definitely.
Yes, I think that it's safe to exclude them. |
Oh, sorry, I wasn't suggesting creating a full-on shapefile with the 2002 data. Here is the process I have in mind:
I'm currently half-way through step 3. Do you think the process makes sense? Am I going in the right direction? |
Yes, the process makes sense! Sorry for the confusion! Regarding 2: it turns out to be even easier than expected. The areas can be trivially calculated and neatly outputted with mapshaper HRmLOS_1.1.shp -each 'area=this.originalArea' -o areas.csv but it turns out that even this is not needed, since in this particular case, the shapefiles already contained the relevant areas (though for some reason, all the areas are out by a factor of 1.002258 compared to those calculated by All that remains is consolidating the areas in the cases where a sea corresponds to more than one unit (e.g. the Mediterranean or the Baltic). I'll upload the data tomorrow.
As a second (additional) sanity check, the areas of the seas as obtained from Natural Earth, could also be calculated. (In some cases, NE had subdivisions of seas that were not in the 1953 version.) |
Indeed, I created a calculated field with the area (rounded to the nearest integer) in QGIS, so I already have all the 1953 areas. I've also combined the areas of the Mediterranean and Baltic seas based on their 1953 boundaries (which are the same, overall, as the 2002 boundaries). I'm now half-way into comparing the boundaries visually between the shapefile and the 2002 draft (which has maps with outlines for every sea, unlike the 1953 document... 😄 which should make creating the maps for the included seas much easier, by the way). |
Great! :)
Yeah, I had used the maps from the 2002 draft (as multipe
Now, done here. Query also here, in case GitHub mangles the long linkSELECT DISTINCT ?strait ?straitLabel ?noCountries ?article WHERE {
{
SELECT ?strait ?article (COUNT(DISTINCT ?country) AS ?noCountries) WHERE {
?strait wdt:P31 wd:Q37901;
wdt:P17 ?country.
?article schema:about ?strait;
schema:isPartOf <https://en.wikipedia.org/>.
}
GROUP BY ?strait ?article
}
FILTER(?noCountries > 1 )
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY (?straitLabel) I still can't think of any sensible criteria to narrow down the list from the 67 straits, though. (I'm currently annotating the straits by the seas it joins — using the criterion of seas > 100,000 km² will narrow things down considerably, but I'm not sure if by enough.) |
I've applied the criterion of connecting two water bodies each having an area > 100,000 km² to the list of "International" Straits (those bordering more than one country), from Wikidata, and it results in 18 matches: The areas are taken from your All of the matching straits are "interesting", at least for me, but I feel that there are too many of them... Slightly arbitrarily, I'd cut down on some of the many straits between the Caribbean Sea and Atlantic Ocean, replace the Beagle Channel with the Drake Passage or the Straits of Magellan and remove the Strait of Bonifacio, as well as perhaps the Balabac Strait. |
Awesome! Yeah, it feels like a lot, but maybe it's because the list contains some obscure entries. I feel like 12~15 would be a good target range. Looking at your list and at the missing straits, I wonder whether the "more than one country" rule isn't a bit too arbitrary. Also, I'm concerned as to the recognition status and international relevance of some of the straits and channels returned by Wikidata. What do you think about the straits, channels and passages listed in the IHO draft of 2002? I count 24 of them (listed below). I feel like it might be a more relevant starting list. That being said, it doesn't include the Strait of Magellan, which I had completely forgotten about. It's probably because it's not in international waters. 😞
What I like about this list is that:
Back to the problem of the criteria, One significant problem with the "connecting seas" criterion, as you've noticed, is granularity. In many cases, a strait may connect a sea that is part of a larger sea. How do we decide which of the two seas to take into account? Perhaps we should start again with your initial idea of including straits that connect two oceans, criterion which we could very well extend to continental plates. How many straits in your list and out of the 24 defined by the IHO would this apply to? 🤔 In parallel, we could apply a simple area criteria in order to include the biggest channels and passages, like the Northwestern Passages, the Drake Passage, the Mozambique Channel and the English Channel. A few interesting straits would probably still be left out. Assuming the data exists, we might be able to pick, out of the remaining straits, the 5 or so that get the most maritime traffic ... or something in this vein. 🤷♂️ Here is where I'm at, for info: IHO seas.xlsx. I've collated the areas of all the 1953 seas, compared them visually with the 2002 maps, and identified whether they had significantly/insignificantly increased/decreased. |
Ooh I just learnt about the maritime law concept of Transit passage. The article lists 5 straits as being covered by the transit passage provisions: the Strait of Gibraltar, Dover Strait, Strait of Hormuz, Bab-el-Mandeb and Strait of Malacca. It also mentions that the Danish Straits, the Turkish Straits and the Strait of Magellan are not covered by the provisions because they are already governed by "international conventions". Maybe we could use this article to complement the IHO list, which would amount to adding the Turkish Straits, Bab-el-Mandeb and the Strait of Magellan? EDIT: this article also explains the legal concept of international waterways in the context of straits. |
Yeah, it is a bit arbitrary.
Yeah, I think you might be right! I've now added the information about whether the strait joins two oceans or separates two continents (or continental plates), for both the Wikidata straits and the IHO+transit passage ones. However, the boundary between the Pacific and Indian Oceans is pretty much undefined, since the 2002 draft doesn't state to which the East China and Archipelagic Seas belong. The "best" that I could find was the borders of the oceans from the CIA factbook maps used on Wikipedia... I think that I'm slightly worried that we're getting inured to straits, and even 14 (or 10) will be too much...
Wow! It looks great!
That's pretty cool! (As noted above, I've included this in the spreadsheet.) |
I've tried to find what the busiest straits might be, but I couldn't find a simple list. All I could find were maps of shipping routes. The busiest straits are somewhat identifiable: Taiwan, Malacca, Bab-el-Mandeb, Sicilly, Bosphorus/Turkish, Gibraltar, English Channel, Dover, Danish. This is far from ideal, though:
I've also looked at oil choke points, which seem interesting from a geo-political standpoint: Hormuz, Malacca, Bab-el-Mandeb, Danish, Turkish. Not very useful, though since they're all included in the transit passage list and the latter also has Magellan, Dover and Gibraltar. The combination I like the most so far is
A lot of very subjective choices and feelings in all of this... but we're still brainstorming 😅 |
Yeah, it's quite interesting, but I don't see a straightforward way of extracting something useful from this.
Yes, definitely.
Yeah.
TBH I'd also prefer if it weren't included.
I'd lean towards preferring them excluded. Their constituent parts could definitely be included, in the Country info, but that would, in effect, make the relevant cards about the Country info, since the "main answer" is rather uninteresting. The exclusion of "multiple" straits would also have the mild benefit of excluding the Northwestern Passages. I should probably take a brief break from this, since I've now added three more fields (is in IHO 2002, is "international" (bordering more than one country) and is a single strait), in the interest of finding a combination that would justify my preferences, which is bordering on the slightly crazy... If I were to make a decision now, I'd probably vote for |
IHO seas.xlsxI've finished measuring the 2002 areas that needed to be measured. I used the area measuring tool in QGIS, along with a variety of equal-area projections. I think the result is sufficiently precise for our purpose. I've experimented with various area thresholds, and the results give quite high inclusion numbers. I've tried adding a second criterion to remove most of the Antarctic seas, which are quite numerous and, in my opinion, not very interesting. (I also had trouble measuring some of them because of significant changes in the coastline of Antarctica since 1953.) Here are the results of my experiments:
In my opinion, the 100,000 km2 threshold is way too inclusive, and the 200,000 km2 threshold removes too many seas from the deck (2 that get excluded in most cases: White Sea and Adriatic Sea; plus 3 more: Aegean Sea, Gulf of California, Bay of Biscay). I've highlighted the criteria that I think are the best. My preference goes to >= 175,000 km2 OR >= 500,000 km2 if Antarctic. |
This is amazing (as expected :))! Two things that slightly bother me, but I don't see any great solutions:
Perhaps we could introduce the same higher threshold for the subdivisions of the Arctic Ocean that you suggested for the Antartic (Southern Ocean) seas? At I'd consider lowering the non-polar seas threshold to 125,000. Compared to 175,000 it'd include the Adriatic Sea (which I'd miss), the Gulf of Tonkin (which is vaguely historically important) and the Seram Sea (which I don't have a strong opinion about, either way). However, I'm not sure if it doesn't bloat the deck too much, so 175,000 might indeed be better (it's definitely better than 100,000, 150,000 or 200,000). |
IHO seas.xlsxHere is another idea: instead of applying the 500,000 km2 threshold to polar seas, we apply it to seas introduced in the IHO's 2002 draft. Seas that were already defined in 1953 can then be applied the lower threshold of 125,000 km2.
|
That seems more-or-less perfect! |
physical-entities.xlsxI've had another look at the straits and I may have found a decent set of criteria:
The source list I've used combines the straits of the IHO draft of 2002 with the transit passage straits, but excludes the "collective straits" (Turkish straits and Danish straits). Here is the result: The "to be removed" count includes Denmark Strait even though it's not part of the source list. The other strait to be removed is the English Channel, which my western bias disagrees with. 😄 The only way I see to keep the English Channel would be to consider "channels" separately from straits and to apply the |
That looks reasonable! It'd be a shame to exclude the English Channel, while including the Mozambique channel doesn't feel like an issue, so I'd vote for the separate "channel" criterion. |
As agreed in #137 (comment), and since discussion has started again on this topic in #137, I'm opening this issue to discuss inclusion rules for water bodies (oceans, seas, gulfs, straits, etc.) I'm leaving lakes and rivers out for now (see why down below).
Like for political entities (#306 + #312), we need to identify which types of water bodies to include, and then, for each type of entities:
CONTRIBUTING.md
.What to focus on for now
Oceans
Like for continents, this one's easy: https://en.wikipedia.org/wiki/Ocean#Oceanic_divisions. Since we're not a historical deck, there's no debate that the world has five oceans and all five of them are to be included ... which they already are! They even have brand new maps #325! ✨
Marginal seas
Wikipedia's list of seas seems like a good reference. The problem is that it lists gulfs, bays, straits, channels, etc. within the same Marginal seas heading, without further categorisation. Apparently the terms sea, gulf, bay, sound, etc. are used inconsistently (as you've rightly pointed out in #137 (comment), @aplaice), so I don't think we'll be able to list them independently and apply different criteria to them.
The good news is that the list already excludes lakes with "Seas" in their names, like the Sea of Galilee or the Aral Sea. Also, we might be able to separate out straits, channels, passages, and so on. If that's indeed the case, then I think using the surface area as the sole inclusion criteria for the remaining seas, gulfs, etc. would work pretty well.
Straits, channels, passages, etc.
Same list as above. As @aplaice mentioned, straits between continents would be the easy pick. Also, since I can only see two channels, the English Channel and the Mozambique Channel, I reckon we can include them as well. The remaining bodies are more debatable.
What to ignore for now
Lakes
I'd be inclined to leave these alone for now, for the following reasons:
Rivers, deltas, fjords, etc.
Since we don't have any in the deck yet, let's leave them out of the scope of this issue.
The text was updated successfully, but these errors were encountered: