Skip to content

Commit

Permalink
Add structural data usage sample, update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
JY0284 committed Jan 23, 2025
1 parent 3ed5bb3 commit 4b93a88
Show file tree
Hide file tree
Showing 4 changed files with 373 additions and 48 deletions.
15 changes: 13 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@ chapters
[空白][空白][译文]
```

## 结构化数据
结构化数据已由`model.py`生成,其中的数据结构及生成过程可见于`model.py`。结构化数据保存于`data.json`[结构化数据文件](https://github.com/JY0284/zizhitongjian/blob/main/data.json))。数据读取和使用样例请见`data_usage_demo_visualization.ipynb`[结构化数据使用样例](https://github.com/JY0284/zizhitongjian/blob/main/data_usage_demo_visualization.ipynb))。

## 抛砖引玉——资治通鉴数据应用样例(壹):AI辅助理解可视化
> 正在进行中。([结构化数据使用样例](https://github.com/JY0284/zizhitongjian/blob/main/data_usage_demo_visualization.ipynb)
## 项目进展

项目在持续更新,目前任务列表完成情况如下:
Expand All @@ -47,14 +53,19 @@ chapters
- [x] 时间数据的译文格式保持和原文格式统一
- [x] 去除不符合文白对照格式的空行、空格,使用统一的换行格式
- [x] 文本内容程序化校对,定位残缺和错误内容
- [ ] 文本数据结构化,便于利用数据分析工具和可视化工具进行处理
- [x] 文本数据结构化,便于利用数据分析工具和可视化工具进行处理
- [x] 结构化数据使用样例
- [ ] AI复制理解及可视化样例
- [ ] 对话交互式资治通鉴
- [ ] ...

数据预处理的部分源码及说明在本项目的`*.ipynb`中存档及更新。

如果有任何感兴趣的、想要这个项目做的,请随时、尽情建议!

## 参与贡献

1. 请在issue中提供任何意见建议,不限于文本内容、文本格式、数据结构、数据分析、数据可视化等任何主题;
1. 请随时、尽情在issue中提供任何意见建议,不限于文本内容、文本格式、数据结构、数据分析、数据可视化等任何主题;
2. 文本中有`[todo]`的地方为分析过程中发现的内容残缺的部分,可以参与校对和修复:D

## 相关资源
Expand Down
File renamed without changes.
278 changes: 278 additions & 0 deletions data_usage_demo_visualization.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "e739c58d-f173-459c-a99e-cad90be00d07",
"metadata": {},
"source": [
"# 书籍数据应用demo:AI辅助理解、可视化"
]
},
{
"cell_type": "markdown",
"id": "96ce9167-f475-44d0-b44a-32ed86b106a8",
"metadata": {},
"source": [
"## 数据准备\n",
"在这一部分,我们将加载之前保存的`data.json`文件,并将其转换为Python中的结构化对象以便后续使用。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "482bc8e6-5fe6-42e1-be80-12923002a961",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"from model import json_to_book # 导入数据转换函数"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3bf9e450-cabd-4389-85bd-1865752ef040",
"metadata": {},
"outputs": [],
"source": [
"# 将JSON文件转换为Python对象Book\n",
"book = json_to_book('data.json')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f462a2d3-11d8-49cd-bdbc-2cb09d719e96",
"metadata": {},
"outputs": [],
"source": [
"# 获取第一章数据\n",
"chapter_1 = book.chapters[0]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "e9b1853b-845d-4d2e-8d73-9909cd26aa9d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'资治通鉴第一卷(周纪)'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 查看第一章的标题\n",
"chapter_1.title"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d2ce324b-f863-4cc6-9d13-54e49bd13ac1",
"metadata": {},
"outputs": [],
"source": [
"# 获取第一章的所有段落数据\n",
"ch1_segs = chapter_1.segments"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "7cf314e8-2d30-4b68-9a81-c843c5e2f8fe",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'资治通鉴第一卷(周纪)(包含30小节)'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 查看章节摘要\n",
"f\"{chapter_1}\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "7d2b7abc-4b40-4e84-844b-d4dae3e5f5b4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"list"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 检查段落列表的类型\n",
"type(ch1_segs)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "92e0de25-30ee-475e-9b4d-6d23bb8db559",
"metadata": {},
"outputs": [],
"source": [
"# 获取第一章的第一个时间段\n",
"ch1_ts1 = ch1_segs[0]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "fd6f5892-8118-487d-9802-7019fffdeb9f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"model.TimeSegment"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 检查时间段对象的类型\n",
"type(ch1_ts1)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "93257592-9364-4444-ab17-70789c0285d9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"小节-起始时间 周威烈王二十三年(戊寅,公元前403年),包含 29 句\n"
]
}
],
"source": [
"# 打印第一个时间段的详细信息\n",
"print(ch1_ts1)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "87d0c04e-5cd1-48d3-8e6e-6ab2ba907e7a",
"metadata": {},
"outputs": [],
"source": [
"# 获取第一个时间段的第一句\n",
"ch1_ts1_s1 = ch1_ts1.sentences[0]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "9cc4618d-1f57-4c2f-bac3-ad2a19087f40",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"model.CmpStr"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 检查句子对象的类型\n",
"type(ch1_ts1_s1)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "cc6dca54-5844-4e80-89c0-95858a9c0058",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"CmpStr(original='[1]初命晋大夫魏斯、赵籍、韩虔为诸侯。', translated='[1]周威烈王姬午初次分封晋国大夫魏斯、赵籍、韩虔为诸侯国君。', line_num=8)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 查看第一句的原文和翻译\n",
"ch1_ts1_s1"
]
},
{
"cell_type": "markdown",
"id": "0c0b3e94-5190-49af-9fd2-d0120a965e3a",
"metadata": {},
"source": [
"## AI辅助理解小节"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "30d55f90-90ab-4968-8fb2-422d5862c1c4",
"metadata": {},
"outputs": [],
"source": [
"# todo"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 4b93a88

Please sign in to comment.