Skip to content

Commit

Permalink
Update links and broken things
Browse files Browse the repository at this point in the history
  • Loading branch information
mchesterkadwell committed Mar 1, 2022
1 parent 84046cb commit 7751436
Show file tree
Hide file tree
Showing 6 changed files with 32 additions and 37 deletions.
12 changes: 6 additions & 6 deletions 0-introduction-to-python-and-text.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"\n",
"![Logo of the Python programming language](https://www.python.org/static/img/python-logo.png \"Logo of the Python programming language\")\n",
"\n",
"<img src=\"assets/two-women-computer.png\" alt=\"Two women coding together and looking at a laptop\" title=\"Two women coding together and looking at a laptop\" width=\"400\" style=\"padding: 10px;\">\n",
"![Two women coding together and looking at a laptop](assets/two-women-computer.png \"Two women coding together and looking at a laptop\")\n",
"\n",
"---\n",
"---\n",
Expand All @@ -27,7 +27,7 @@
"\n",
"Notebooks are particularly useful for *exploring* your data at an early stage and *documenting* exactly what steps you have taken (and why) to get to your results. This documentation is extremely important to record what you did so that others can reproduce your work... and because otherwise you are guaranteed to forget what you did in the future.\n",
"\n",
"![Logo of Jupyter Notebooks](https://jupyter.org/assets/nav_logo.svg \"Logo of Jupyter Notebooks\")\n",
"![Logo of Jupyter Notebooks](https://jupyter.org/assets/logos/rectanglelogo-greytext-orangebody-greymoons.svg \"Logo of Jupyter Notebooks\")\n",
"\n",
"For a more in-depth tutorial on getting started with Jupyter Notebooks try this [Jupyter Notebook for Beginners Tutorial](https://towardsdatascience.com/jupyter-notebook-for-beginners-a-tutorial-f55b57c23ada)."
]
Expand All @@ -45,7 +45,7 @@
"---\n",
"> **EXERCISE**: Double-click on this cell now to edit it. Run the cell with the keyboard shortcut `Crtl+Enter`, or by clicking the Run button in the toolbar at the top.\n",
"\n",
"<img src=\"https://problemsolvingwithpython.com/02-Jupyter-Notebooks/images/run_cell.png\" alt=\"Click the Run button to run a cell\" title=\"Click the Run button to run a cell\" width=\"450\">\n",
"![Click the Run button to run a cell](https://problemsolvingwithpython.com/02-Jupyter-Notebooks/images/run_cell.png \"Click the Run button to run a cell\")\n",
"\n",
"---\n",
"\n",
Expand Down Expand Up @@ -1206,7 +1206,7 @@
"source": [
"import requests\n",
"response = requests.get('https://www.wikipedia.org/')\n",
"response.text[137:267]"
"response.text[136:266]"
]
},
{
Expand Down Expand Up @@ -1419,7 +1419,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -1433,7 +1433,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
"version": "3.8.10"
}
},
"nbformat": 4,
Expand Down
11 changes: 6 additions & 5 deletions 1-basic-text-mining-concepts.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -464,6 +464,7 @@
"outputs": [],
"source": [
"# Display the plot inline in the notebook with interactive controls\n",
"# Comment out this line if you are running the notebook in Deepnote\n",
"%matplotlib notebook\n",
"\n",
"# Import the matplotlib plot function\n",
Expand All @@ -480,7 +481,7 @@
"plt.xlabel(\"Word\")\n",
"plt.ylabel(\"Count\")\n",
"plt.xticks(range(len(words)), [str(s) for s in words], rotation=90)\n",
"plt.grid(b=True, which='major', color='#333333', linestyle='--', alpha=0.2)\n",
"plt.grid(visible=True, which='major', color='#333333', linestyle='--', alpha=0.2)\n",
"\n",
"# Plot the frequency counts\n",
"plt.plot(freqs)\n",
Expand Down Expand Up @@ -534,7 +535,7 @@
"\n",
"However, in computational linguistics many more sub-categories are recognised.\n",
"\n",
"> spaCy follows the [Universal Dependences scheme](https://universaldependencies.org/u/pos/) and a version of the [Penn Treebank tag set](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html). To read about the full set of POS tags see the spaCy documents: [Part-of-speech tagging](https://spacy.io/api/annotation#pos-tagging).\n",
"> spaCy follows the [Universal Dependences scheme](https://universaldependencies.org/u/pos/) and a version of the [Penn Treebank tag set](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html). To read about the full set of POS tags see the spaCy documents: [Part-of-speech tagging](https://spacy.io/usage/linguistic-features/#pos-tagging).\n",
"\n",
"Again, spaCy has already POS tagged the text. We just need to look at the document:"
]
Expand All @@ -560,7 +561,7 @@
"\n",
"For example, an adjective (e.g. \"dearest\") of a particular noun (e.g. \"comrades\") might be tagged as being an \"adjectival modifier\" of that _particular_ noun. Parsing a full sentence results in a **tree structure** of how every word in the sentence is related to every other word.\n",
"\n",
"> Read more about the syntactic dependency labels used by spaCy in the documentation at [Syntactic Dependency Parsing](https://spacy.io/api/annotation#dependency-parsing).\n",
"> Read more about the syntactic dependency labels used by spaCy in the documentation at [Syntactic Dependency Parsing](https://spacy.io/usage/linguistic-features/#dependency-parse).\n",
"\n",
"Once more, spaCy has already done this for us. But rather than show you yet another list, this time we can use a nice visualiser called **displaCy** to see this in action.\n",
"\n",
Expand Down Expand Up @@ -628,7 +629,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -642,7 +643,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
"version": "3.8.10"
}
},
"nbformat": 4,
Expand Down
8 changes: 2 additions & 6 deletions 2-named-entity-recognition-of-henslow-data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,6 @@
"\n",
"<table class=\"_59fbd182\"><thead><tr class=\"_8a68569b\"><th class=\"_2e8d2972\">Type</th><th class=\"_2e8d2972\">Description</th></tr></thead><tbody><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">PERSON</code></td><td class=\"_5c99da9a\">People, including fictional.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">NORP</code></td><td class=\"_5c99da9a\">Nationalities or religious or political groups.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">FAC</code></td><td class=\"_5c99da9a\">Buildings, airports, highways, bridges, etc.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">ORG</code></td><td class=\"_5c99da9a\">Companies, agencies, institutions, etc.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">GPE</code></td><td class=\"_5c99da9a\">Countries, cities, states.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">LOC</code></td><td class=\"_5c99da9a\">Non-GPE locations, mountain ranges, bodies of water.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">PRODUCT</code></td><td class=\"_5c99da9a\">Objects, vehicles, foods, etc. (Not services.)</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">EVENT</code></td><td class=\"_5c99da9a\">Named hurricanes, battles, wars, sports events, etc.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">WORK_OF_ART</code></td><td class=\"_5c99da9a\">Titles of books, songs, etc.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">LAW</code></td><td class=\"_5c99da9a\">Named documents made into laws.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">LANGUAGE</code></td><td class=\"_5c99da9a\">Any named language.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">DATE</code></td><td class=\"_5c99da9a\">Absolute or relative dates or periods.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">TIME</code></td><td class=\"_5c99da9a\">Times smaller than a day.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">PERCENT</code></td><td class=\"_5c99da9a\">Percentage, including ”%“.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">MONEY</code></td><td class=\"_5c99da9a\">Monetary values, including unit.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">QUANTITY</code></td><td class=\"_5c99da9a\">Measurements, as of weight or distance.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">ORDINAL</code></td><td class=\"_5c99da9a\">“first”, “second”, etc.</td></tr><tr class=\"_8a68569b\"><td class=\"_5c99da9a\"><code class=\"_1d7c6046\">CARDINAL</code></td><td class=\"_5c99da9a\">Numerals that do not fall under another type.</td></tr></tbody></table>\n",
"\n",
"<p style=\"text-align: center; font-style: italic;\">Table from: \n",
"<a href=\"https://spacy.io/api/annotation#named-entities\" target=\"_blank\">https://spacy.io/api/annotation#named-entities</a>\n",
" </p>\n",
"\n",
"For more detail about how to use spaCy for NER, see the documentation [Named Entity Recognition 101](https://spacy.io/usage/linguistic-features#named-entities-101).\n",
"\n",
"An alternative to spaCy is [Natural Language Toolkit (NLTK)](https://www.nltk.org/), which was the first open-source Python library for NLP, originally released in 2001. It is still a valuable tool for teaching and research and has better support in the community for older and non-Indo-European languages; but spaCy is easier to use and faster, which is why I'm using it here."
Expand Down Expand Up @@ -549,7 +545,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -563,7 +559,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
"version": "3.8.10"
},
"pycharm": {
"stem_cell": {
Expand Down
6 changes: 3 additions & 3 deletions 3-principles-of-machine-learning-for-named-entities.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@
"\n",
"The Doccano interface for annotating named entities looks something like this:\n",
"\n",
"<img src=\"assets/doccano-named-entities.png\" alt=\"Doccano annotation interface with example text and named entities\" title=\"Doccano annotation interface with example text and named entities\" style=\"border: 1px solid #ddd; border-radius: 2px; box-shadow: 0 2px 4px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);}\">\n",
"![Doccano annotation interface with example text and named entities](assets/doccano-named-entities.png \"Doccano annotation interface with example text and named entities\")\n",
"\n",
"---\n",
"> **EXERCISE**: Follow the instructions given to you by the trainer to open Doccano in your browser, log in and try the various tasks.\n",
Expand Down Expand Up @@ -281,7 +281,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -295,7 +295,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
"version": "3.8.10"
}
},
"nbformat": 4,
Expand Down
20 changes: 10 additions & 10 deletions 4-updating-the-model-on-henslow-data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"\n",
"I encourage you to learn whatever new Python features you can as you go along, but if the Python does become too difficult to understand at any point, it's absolutely fine. Just run the examples to see the results, and come back to the Python another time. Of course, you can just copy and paste the code to try with your own data, and see where that takes you!\n",
"\n",
"<img src=\"assets/woman-back-computer.png\" alt=\"Woman coding at a computer with her back to the viewer\" title=\"Woman coding at a computer with her back to the viewer\" width=\"400\" style=\"padding: 10px;\">\n",
"![Woman coding at a computer with her back to the viewer](assets/woman-back-computer.png \"Woman coding at a computer with her back to the viewer\")\n",
"\n",
"### This Notebook is for Demonstration Purposes\n",
"\n",
Expand Down Expand Up @@ -249,7 +249,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's have a quick look at a few of these sentences and their annotations. The output we've created is a **list of tuples**, which is the format we need to input to spaCy's training method."
"Let's have a quick look at a few of these sentences and their annotations. The output we've created is a **list of tuples**, which is the format we need to input to spaCy's training method. What do you think about the accuracy of the output?"
]
},
{
Expand All @@ -258,7 +258,7 @@
"metadata": {},
"outputs": [],
"source": [
"ner_data[305:310]"
"ner_data[450:455]"
]
},
{
Expand Down Expand Up @@ -733,16 +733,16 @@
},
"outputs": [],
"source": [
"taxonomy_data[10:20]"
"taxonomy_data[179:182]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is no surprise in this and other examples that while the exact matches for the patterns (e.g. 'Primula vulgaris') have been successfully labelled \"TAXONOMY\", other taxonomic names we know are still wrongly identified, e.g.:\n",
"* `('Potentilla', 601, 611, 'PERSON')`\n",
"* `('Lactuca', 552, 559, 'ORG')`\n",
"* `('Rubia', 35, 40, 'PERSON')`\n",
"* `('Valeriana', 96, 105, 'GPE')`\n",
"\n",
"For the model to be able to generalise about the new entity \"TAXONOMY\" we need to train it."
]
Expand Down Expand Up @@ -1023,7 +1023,7 @@
"source": [
"Now if you check the `output/core_web_sm_taxonomy` folder in the Jupyter notebook listing you should see something like this:\n",
"\n",
"<img src=\"assets/taxonomy_saved_model.png\" alt=\"Jupyter notebook listing showing the updated model saved to a folder\" title=\"Jupyter notebook listing showing the updated model saved to a folder\" style=\"border: 1px solid #ddd; border-radius: 2px; box-shadow: 0 2px 4px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);}\">"
"![Jupyter notebook listing showing the updated model saved to a folder](assets/taxonomy_saved_model.png \"Jupyter notebook listing showing the updated model saved to a folder\")"
]
},
{
Expand Down Expand Up @@ -1073,7 +1073,7 @@
"source": [
"What is the output?\n",
"\n",
"> **EXERCISE**: Try some other letters or sentences. Has the updated model predicted the new \"TAXONOMY\" named entities correctly? Are there any problems? How can you explain what has happened? How could we improve the results?"
"> **EXERCISE**: Try some other letters or sentences. Has the updated model predicted the new \"TAXONOMY\" named entities correctly? Are there any problems? How can you explain what has happened? How could we improve the results? Hint: Re-read [this section](4-updating-the-model-on-henslow-data.ipynb#This-Notebook-is-for-Demonstration-Purposes) and [this section in the previous notebook](3-principles-of-machine-learning-for-named-entities.ipynb#Catastrophic-Forgetting)."
]
},
{
Expand Down Expand Up @@ -1135,7 +1135,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -1149,7 +1149,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
"version": "3.8.10"
}
},
"nbformat": 4,
Expand Down
12 changes: 5 additions & 7 deletions 5-linking-named-entities.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,13 @@
"## Lookup Entities Programmatically with Web APIs\n",
"The power of centralised authorities such as VIAF is when their data is exposed via an **API** (Application Programming Interface). A web API is accessed via a particular web address and allows computer programs to request information and receive it in a structured format suitable for further processing. Typically, this data will be provided in either JSON or XML.\n",
"\n",
"VIAF has several different APIs, which we as humans can explore using the [OCLC API Explorer](https://platform.worldcat.org/api-explorer/apis/VIAF).\n",
"VIAF has several different APIs. The one we will use is [Authority Cluster](https://www.oclc.org/developer/api/oclc-apis/viaf/authority-cluster.en.html) Auto Suggest. Sadly, OCLC have removed their OCLC API Explorer, which was really handy for exploring the API as a human! 😞\n",
"\n",
"> **EXERCISE**: Click on the link above and then on the link \"Auto Suggest\". Modifiy the example query to search for \"john stevens henslow\" or any a personal name from the Henslow letters that you can recall.\n",
"\n",
"You should get something like this: \n",
"In the old OCLC API Explorer we could search for \"john stevens henslow\" or any personal name from the Henslow letters:\n",
"\n",
"![assets/viaf-api-charles-darwin.png](assets/viaf-api-charles-darwin.png)\n",
"\n",
"It has returned a list of results, in JSON format, with VIAF's suggestions for the best match, which you can see in the right-hand \"Response\" pane."
"It returned a list of results, in JSON format, with VIAF's suggestions for the best match, which you can see in the right-hand \"Response\" pane."
]
},
{
Expand Down Expand Up @@ -610,7 +608,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"> If SPARQL takes your interest, and you'd like to learn more about linked open data, I can recommend the *Programming Historian*'s [Introduction to the Principles of Linked Open Data](https://programminghistorian.org/en/lessons/intro-to-linked-data#querying-rdf-with-sparql) and [Using SPARQL to access Linked Open Data](https://programminghistorian.org/en/lessons/retired/graph-databases-and-SPARQL).\n",
"> If SPARQL takes your interest, and you'd like to learn more about linked open data, I can recommend the *Programming Historian*'s [Introduction to the Principles of Linked Open Data](https://programminghistorian.org/en/lessons/intro-to-linked-data#querying-rdf-with-sparql) and [Using SPARQL to access Linked Open Data](https://programminghistorian.org/en/lessons/retired/graph-databases-and-SPARQL) (this lesson has now been retired).\n",
"\n",
"Finally, let's use wptools again to get all the data we might ever want about this plant."
]
Expand Down Expand Up @@ -798,7 +796,7 @@
"\n",
"1. Build your own **entity linker** with machine learning.\n",
"\n",
"spaCy has the capability to [link named entities to identifiers stored in a knowledge base](https://spacy.io/usage/training#entity-linker). For anyone with a lot of computing power and time to hand, there's even some [example code](https://github.com/explosion/projects/tree/master/nel-wikipedia) to do this with Wikipedia and Wikidata data dumps.\n",
"spaCy has the capability to [link named entities to identifiers stored in a knowledge base](https://spacy.io/api/entitylinker/). For anyone with a lot of computing power and time to hand, there's even some [example code](https://github.com/explosion/projects/tree/master/nel-wikipedia) to do this with Wikipedia and Wikidata data dumps.\n",
"\n",
"<img src=\"https://upload.wikimedia.org/wikipedia/commons/d/d9/EDSAC_2_1960.jpg\" alt=\"EDSAC II, 10th May 1960, user queue. Copyright Computer Laboratory, University of Cambridge. Reproduced by permission. Creative Commons Attribution 2.0 UK: England & Wales.\" title=\"EDSAC II, 10th May 1960, user queue. Copyright Computer Laboratory, University of Cambridge. Reproduced by permission. Creative Commons Attribution 2.0 UK: England & Wales.\">\n",
"<p style=\"text-align: center; font-style: italic;\">The queue for computing time on the Cambridge EDSAC, 1960. To use High Performance Computing today, nothing has really changed, except the queue itself is now managed by software!</p>\n",
Expand Down

0 comments on commit 7751436

Please sign in to comment.