Add structural data usage sample, update doc

JY0284 · Jan 23, 2025 · 4b93a88 · 4b93a88
1 parent 3ed5bb3
commit 4b93a88
Show file tree

Hide file tree

Showing 4 changed files with 373 additions and 48 deletions.
diff --git a/README.md b/README.md
@@ -39,6 +39,12 @@ chapters
 [空白][空白][译文]
 ```
 
+## 结构化数据
+结构化数据已由`model.py`生成，其中的数据结构及生成过程可见于`model.py`。结构化数据保存于`data.json`（[结构化数据文件](https://github.com/JY0284/zizhitongjian/blob/main/data.json)）。数据读取和使用样例请见`data_usage_demo_visualization.ipynb`（[结构化数据使用样例](https://github.com/JY0284/zizhitongjian/blob/main/data_usage_demo_visualization.ipynb)）。
+
+## 抛砖引玉——资治通鉴数据应用样例（壹）：AI辅助理解可视化
+> 正在进行中。（[结构化数据使用样例](https://github.com/JY0284/zizhitongjian/blob/main/data_usage_demo_visualization.ipynb)）
+
 ## 项目进展
 
 项目在持续更新，目前任务列表完成情况如下：
@@ -47,14 +53,19 @@ chapters
 - [x] 时间数据的译文格式保持和原文格式统一
 - [x] 去除不符合文白对照格式的空行、空格，使用统一的换行格式
 - [x] 文本内容程序化校对，定位残缺和错误内容
-- [ ] 文本数据结构化，便于利用数据分析工具和可视化工具进行处理
+- [x] 文本数据结构化，便于利用数据分析工具和可视化工具进行处理
+- [x] 结构化数据使用样例
+- [ ] AI复制理解及可视化样例
+- [ ] 对话交互式资治通鉴
 - [ ] ...
 
 数据预处理的部分源码及说明在本项目的`*.ipynb`中存档及更新。
 
+如果有任何感兴趣的、想要这个项目做的，请随时、尽情建议！
+
 ## 参与贡献
 
-1. 请在issue中提供任何意见建议，不限于文本内容、文本格式、数据结构、数据分析、数据可视化等任何主题；
+1. 请随时、尽情在issue中提供任何意见建议，不限于文本内容、文本格式、数据结构、数据分析、数据可视化等任何主题；
 2. 文本中有`[todo]`的地方为分析过程中发现的内容残缺的部分，可以参与校对和修复:D
 
 ## 相关资源

diff --git a/demo.json → data.json b/demo.json → data.json
diff --git a/data_usage_demo_visualization.ipynb b/data_usage_demo_visualization.ipynb
@@ -0,0 +1,278 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e739c58d-f173-459c-a99e-cad90be00d07",
+   "metadata": {},
+   "source": [
+    "# 书籍数据应用demo：AI辅助理解、可视化"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96ce9167-f475-44d0-b44a-32ed86b106a8",
+   "metadata": {},
+   "source": [
+    "## 数据准备\n",
+    "在这一部分，我们将加载之前保存的`data.json`文件，并将其转换为Python中的结构化对象以便后续使用。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "482bc8e6-5fe6-42e1-be80-12923002a961",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "\n",
+    "from model import json_to_book  # 导入数据转换函数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "3bf9e450-cabd-4389-85bd-1865752ef040",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 将JSON文件转换为Python对象Book\n",
+    "book = json_to_book('data.json')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "f462a2d3-11d8-49cd-bdbc-2cb09d719e96",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 获取第一章数据\n",
+    "chapter_1 = book.chapters[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "e9b1853b-845d-4d2e-8d73-9909cd26aa9d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'资治通鉴第一卷(周纪)'"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# 查看第一章的标题\n",
+    "chapter_1.title"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "d2ce324b-f863-4cc6-9d13-54e49bd13ac1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 获取第一章的所有段落数据\n",
+    "ch1_segs = chapter_1.segments"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "7cf314e8-2d30-4b68-9a81-c843c5e2f8fe",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'资治通鉴第一卷(周纪)（包含30小节）'"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# 查看章节摘要\n",
+    "f\"{chapter_1}\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "7d2b7abc-4b40-4e84-844b-d4dae3e5f5b4",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "list"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# 检查段落列表的类型\n",
+    "type(ch1_segs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "92e0de25-30ee-475e-9b4d-6d23bb8db559",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 获取第一章的第一个时间段\n",
+    "ch1_ts1 = ch1_segs[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "fd6f5892-8118-487d-9802-7019fffdeb9f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "model.TimeSegment"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# 检查时间段对象的类型\n",
+    "type(ch1_ts1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "93257592-9364-4444-ab17-70789c0285d9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "小节-起始时间 周威烈王二十三年（戊寅，公元前403年），包含 29 句\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 打印第一个时间段的详细信息\n",
+    "print(ch1_ts1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "87d0c04e-5cd1-48d3-8e6e-6ab2ba907e7a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 获取第一个时间段的第一句\n",
+    "ch1_ts1_s1 = ch1_ts1.sentences[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "9cc4618d-1f57-4c2f-bac3-ad2a19087f40",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "model.CmpStr"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# 检查句子对象的类型\n",
+    "type(ch1_ts1_s1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "cc6dca54-5844-4e80-89c0-95858a9c0058",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "CmpStr(original='[1]初命晋大夫魏斯、赵籍、韩虔为诸侯。', translated='[1]周威烈王姬午初次分封晋国大夫魏斯、赵籍、韩虔为诸侯国君。', line_num=8)"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# 查看第一句的原文和翻译\n",
+    "ch1_ts1_s1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c0b3e94-5190-49af-9fd2-d0120a965e3a",
+   "metadata": {},
+   "source": [
+    "## AI辅助理解小节"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "30d55f90-90ab-4968-8fb2-422d5862c1c4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# todo"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}