From 26c4787dbd862fad109f308ce97e55e54a3b252a Mon Sep 17 00:00:00 2001 From: flyhero99 Date: Wed, 15 Nov 2023 22:31:43 -0500 Subject: [PATCH] abs and table1 table2 table3 --- .history/index_20231115222432.html | 608 ++++++++++++++++++++ .history/index_20231115222440.html | 609 ++++++++++++++++++++ .history/index_20231115222447.html | 610 ++++++++++++++++++++ .history/index_20231115222753.html | 628 +++++++++++++++++++++ .history/index_20231115222938.html | 622 +++++++++++++++++++++ .history/index_20231115223037.html | 618 +++++++++++++++++++++ .history/index_20231115223106.html | 617 +++++++++++++++++++++ .history/index_20231115223138.html | 652 ++++++++++++++++++++++ index.html | 860 +++++++++++++++-------------- 9 files changed, 5416 insertions(+), 408 deletions(-) create mode 100644 .history/index_20231115222432.html create mode 100644 .history/index_20231115222440.html create mode 100644 .history/index_20231115222447.html create mode 100644 .history/index_20231115222753.html create mode 100644 .history/index_20231115222938.html create mode 100644 .history/index_20231115223037.html create mode 100644 .history/index_20231115223106.html create mode 100644 .history/index_20231115223138.html diff --git a/.history/index_20231115222432.html b/.history/index_20231115222432.html new file mode 100644 index 0000000..ffca6e1 --- /dev/null +++ b/.history/index_20231115222432.html @@ -0,0 +1,608 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + TableLlama: Towards Open Large Generalist Models for Tables + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+ +
+ Icon +

TableLlama: Towards Open Large Generalist Models for Tables

+
+ +
+ + + 1Tianshu Zhang*, + + 2Xiang Yue, + + 1Yifei Li, + + 1Huan Sun* + +
+ +
+ + 1The Ohio State University, + 2IN.AI + +
+ zhang.11535@osu.edu + , sun.397@osu.edu + +
+ + +
+ +
+
+
+
+
+
+ + + +
+
+
+
+
+

Abstract

+
+

+ Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically interpret, augment, and query tables. Current methods often require pretraining on tables or special model architecture design, are restricted to specific table types, or have simplifying assumptions about tables and tasks. This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves com- parable or better performance than the SOTA for each task, despite the latter often has task-specific design. On 6 out-of-domain datasets, it achieves 6-48 absolute point gains compared with the base model, showing that training on TableInstruct enhances the model’s generalizability. We will open-source our dataset and trained model to boost future work on developing open generalist models for tables. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + An overview of TableInstruct and 🦙TableLlama +

+ Figure 1: An overview of TableInstruct and TableLlama. TableInstruct includes a wide variety of realistic tables and tasks with instructions. We make the first step towards developing open-source generalist models for tables with TableInstruct and TableLlama. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + The hybrid instruction tuning of 🦣MAmmoTH +

+ Figure 2: Illustration of three exemplary tasks: (a) Column type annotation. This task is to annotate the selected column with the correct semantic types. (b) Row population. This task is to populate rows given table metadata and partial row entities. (c) Hierarchical table QA. For subfigures (a) and (b), we mark candidates with red color in the + "task instruction" part. The candidate set size can be hundreds to thousands in TableInstruct. +

+
+
+
+
+
+
+ + +
+
+
+
+
+

Our Dataset: TableInstruct

+
+ +
+
+
+
+
+
+ +
+
+
+
+
+

In-domain Evaluation:

+
+ +
+
+
+
+
+
+ + +
+
+
+
+
+

Out-of-domain Evaluation:

+
+ +
+
+
+
+
+
+ + + + +
+
+

Reference

+ Please kindly cite our paper if you use our code, data, models or results: +

+
@misc{zhang2023tablellama,
+  title={TableLlama: Towards Open Large Generalist Models for Tables}, 
+  author={Tianshu Zhang and Xiang Yue and Yifei Li and Huan Sun},
+  year={2023},
+  eprint={2311.09206},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL}
+}
+      
+
+
+ + + + + + + + + + + + + diff --git a/.history/index_20231115222440.html b/.history/index_20231115222440.html new file mode 100644 index 0000000..9f87943 --- /dev/null +++ b/.history/index_20231115222440.html @@ -0,0 +1,609 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + TableLlama: Towards Open Large Generalist Models for Tables + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+ +
+ Icon +

TableLlama: Towards Open Large Generalist Models for Tables

+
+ +
+ + + 1Tianshu Zhang*, + + 2Xiang Yue, + + 1Yifei Li, + + 1Huan Sun* + +
+ +
+ + 1The Ohio State University, + 2IN.AI + +
+ *Corresponding Authors. + zhang.11535@osu.edu + , sun.397@osu.edu + +
+ + +
+ +
+
+
+
+
+
+ + + +
+
+
+
+
+

Abstract

+
+

+ Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically interpret, augment, and query tables. Current methods often require pretraining on tables or special model architecture design, are restricted to specific table types, or have simplifying assumptions about tables and tasks. This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves com- parable or better performance than the SOTA for each task, despite the latter often has task-specific design. On 6 out-of-domain datasets, it achieves 6-48 absolute point gains compared with the base model, showing that training on TableInstruct enhances the model’s generalizability. We will open-source our dataset and trained model to boost future work on developing open generalist models for tables. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + An overview of TableInstruct and 🦙TableLlama +

+ Figure 1: An overview of TableInstruct and TableLlama. TableInstruct includes a wide variety of realistic tables and tasks with instructions. We make the first step towards developing open-source generalist models for tables with TableInstruct and TableLlama. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + The hybrid instruction tuning of 🦣MAmmoTH +

+ Figure 2: Illustration of three exemplary tasks: (a) Column type annotation. This task is to annotate the selected column with the correct semantic types. (b) Row population. This task is to populate rows given table metadata and partial row entities. (c) Hierarchical table QA. For subfigures (a) and (b), we mark candidates with red color in the + "task instruction" part. The candidate set size can be hundreds to thousands in TableInstruct. +

+
+
+
+
+
+
+ + +
+
+
+
+
+

Our Dataset: TableInstruct

+
+ +
+
+
+
+
+
+ +
+
+
+
+
+

In-domain Evaluation:

+
+ +
+
+
+
+
+
+ + +
+
+
+
+
+

Out-of-domain Evaluation:

+
+ +
+
+
+
+
+
+ + + + +
+
+

Reference

+ Please kindly cite our paper if you use our code, data, models or results: +

+
@misc{zhang2023tablellama,
+  title={TableLlama: Towards Open Large Generalist Models for Tables}, 
+  author={Tianshu Zhang and Xiang Yue and Yifei Li and Huan Sun},
+  year={2023},
+  eprint={2311.09206},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL}
+}
+      
+
+
+ + + + + + + + + + + + + diff --git a/.history/index_20231115222447.html b/.history/index_20231115222447.html new file mode 100644 index 0000000..2703bb4 --- /dev/null +++ b/.history/index_20231115222447.html @@ -0,0 +1,610 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + TableLlama: Towards Open Large Generalist Models for Tables + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+ +
+ Icon +

TableLlama: Towards Open Large Generalist Models for Tables

+
+ +
+ + + 1Tianshu Zhang*, + + 2Xiang Yue, + + 1Yifei Li, + + 1Huan Sun* + +
+ +
+ + 1The Ohio State University, + 2IN.AI + +
+ *Corresponding Authors. +
+ zhang.11535@osu.edu + , sun.397@osu.edu + +
+ + +
+ +
+
+
+
+
+
+ + + +
+
+
+
+
+

Abstract

+
+

+ Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically interpret, augment, and query tables. Current methods often require pretraining on tables or special model architecture design, are restricted to specific table types, or have simplifying assumptions about tables and tasks. This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves com- parable or better performance than the SOTA for each task, despite the latter often has task-specific design. On 6 out-of-domain datasets, it achieves 6-48 absolute point gains compared with the base model, showing that training on TableInstruct enhances the model’s generalizability. We will open-source our dataset and trained model to boost future work on developing open generalist models for tables. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + An overview of TableInstruct and 🦙TableLlama +

+ Figure 1: An overview of TableInstruct and TableLlama. TableInstruct includes a wide variety of realistic tables and tasks with instructions. We make the first step towards developing open-source generalist models for tables with TableInstruct and TableLlama. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + The hybrid instruction tuning of 🦣MAmmoTH +

+ Figure 2: Illustration of three exemplary tasks: (a) Column type annotation. This task is to annotate the selected column with the correct semantic types. (b) Row population. This task is to populate rows given table metadata and partial row entities. (c) Hierarchical table QA. For subfigures (a) and (b), we mark candidates with red color in the + "task instruction" part. The candidate set size can be hundreds to thousands in TableInstruct. +

+
+
+
+
+
+
+ + +
+
+
+
+
+

Our Dataset: TableInstruct

+
+ +
+
+
+
+
+
+ +
+
+
+
+
+

In-domain Evaluation:

+
+ +
+
+
+
+
+
+ + +
+
+
+
+
+

Out-of-domain Evaluation:

+
+ +
+
+
+
+
+
+ + + + +
+
+

Reference

+ Please kindly cite our paper if you use our code, data, models or results: +

+
@misc{zhang2023tablellama,
+  title={TableLlama: Towards Open Large Generalist Models for Tables}, 
+  author={Tianshu Zhang and Xiang Yue and Yifei Li and Huan Sun},
+  year={2023},
+  eprint={2311.09206},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL}
+}
+      
+
+
+ + + + + + + + + + + + + diff --git a/.history/index_20231115222753.html b/.history/index_20231115222753.html new file mode 100644 index 0000000..7faeb35 --- /dev/null +++ b/.history/index_20231115222753.html @@ -0,0 +1,628 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + TableLlama: Towards Open Large Generalist Models for Tables + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+ +
+ Icon +

TableLlama: Towards Open Large Generalist Models for Tables

+
+ +
+ + + 1Tianshu Zhang*, + + 2Xiang Yue, + + 1Yifei Li, + + 1Huan Sun* + +
+ +
+ + 1The Ohio State University, + 2IN.AI + +
+ *Corresponding Authors. +
+ zhang.11535@osu.edu + , sun.397@osu.edu + +
+ + + +
+
+
+
+
+ + + +
+
+
+
+
+

Abstract

+
+

+ Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically interpret, augment, and query tables. Current methods often require pretraining on tables or special model architecture design, are restricted to specific table types, or have simplifying assumptions about tables and tasks. This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves com- parable or better performance than the SOTA for each task, despite the latter often has task-specific design. On 6 out-of-domain datasets, it achieves 6-48 absolute point gains compared with the base model, showing that training on TableInstruct enhances the model’s generalizability. We will open-source our dataset and trained model to boost future work on developing open generalist models for tables. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + An overview of TableInstruct and 🦙TableLlama +

+ Figure 1: An overview of TableInstruct and TableLlama. TableInstruct includes a wide variety of realistic tables and tasks with instructions. We make the first step towards developing open-source generalist models for tables with TableInstruct and TableLlama. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + The hybrid instruction tuning of 🦣MAmmoTH +

+ Figure 2: Illustration of three exemplary tasks: (a) Column type annotation. This task is to annotate the selected column with the correct semantic types. (b) Row population. This task is to populate rows given table metadata and partial row entities. (c) Hierarchical table QA. For subfigures (a) and (b), we mark candidates with red color in the + "task instruction" part. The candidate set size can be hundreds to thousands in TableInstruct. +

+
+
+
+
+
+
+ + +
+
+
+
+
+

Our Dataset: TableInstruct

+
+ +
+
+
+
+
+
+ +
+
+
+
+
+

In-domain Evaluation:

+
+ +
+
+
+
+
+
+ + +
+
+
+
+
+

Out-of-domain Evaluation:

+
+ +
+
+
+
+
+
+ + + + +
+
+

Reference

+ Please kindly cite our paper if you use our code, data, models or results: +

+
@misc{zhang2023tablellama,
+  title={TableLlama: Towards Open Large Generalist Models for Tables}, 
+  author={Tianshu Zhang and Xiang Yue and Yifei Li and Huan Sun},
+  year={2023},
+  eprint={2311.09206},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL}
+}
+      
+
+
+ + + + + + + + + + + + + diff --git a/.history/index_20231115222938.html b/.history/index_20231115222938.html new file mode 100644 index 0000000..079d9b7 --- /dev/null +++ b/.history/index_20231115222938.html @@ -0,0 +1,622 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + TableLlama: Towards Open Large Generalist Models for Tables + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+ +
+ Icon +

TableLlama: Towards Open Large Generalist Models for Tables

+
+ +
+ + + 1Tianshu Zhang*, + + 2Xiang Yue, + + 1Yifei Li, + + 1Huan Sun* + +
+ +
+ + 1The Ohio State University, + 2IN.AI + +
+ *Corresponding Authors. +
+ zhang.11535@osu.edu + , sun.397@osu.edu + +
+ + + +
+
+
+
+
+ + + +
+
+
+
+
+

Abstract

+
+

+ Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically interpret, augment, and query tables. Current methods often require pretraining on tables or special model architecture design, are restricted to specific table types, or have simplifying assumptions about tables and tasks. This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves com- parable or better performance than the SOTA for each task, despite the latter often has task-specific design. On 6 out-of-domain datasets, it achieves 6-48 absolute point gains compared with the base model, showing that training on TableInstruct enhances the model’s generalizability. We will open-source our dataset and trained model to boost future work on developing open generalist models for tables. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + An overview of TableInstruct and 🦙TableLlama +

+ Figure 1: An overview of TableInstruct and TableLlama. TableInstruct includes a wide variety of realistic tables and tasks with instructions. We make the first step towards developing open-source generalist models for tables with TableInstruct and TableLlama. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + The hybrid instruction tuning of 🦣MAmmoTH +

+ Figure 2: Illustration of three exemplary tasks: (a) Column type annotation. This task is to annotate the selected column with the correct semantic types. (b) Row population. This task is to populate rows given table metadata and partial row entities. (c) Hierarchical table QA. For subfigures (a) and (b), we mark candidates with red color in the + "task instruction" part. The candidate set size can be hundreds to thousands in TableInstruct. +

+
+
+
+
+
+
+ + +
+
+
+
+
+

Our Dataset: TableInstruct

+
+ +
+
+
+
+
+
+ +
+
+
+
+
+

In-domain Evaluation:

+
+ +
+
+
+
+
+
+ + +
+
+
+
+
+

Out-of-domain Evaluation:

+
+ +
+
+
+
+
+
+ + + + +
+
+

Reference

+ Please kindly cite our paper if you use our code, data, models or results: +

+
@misc{zhang2023tablellama,
+  title={TableLlama: Towards Open Large Generalist Models for Tables}, 
+  author={Tianshu Zhang and Xiang Yue and Yifei Li and Huan Sun},
+  year={2023},
+  eprint={2311.09206},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL}
+}
+      
+
+
+ + + + + + + + + + + + + diff --git a/.history/index_20231115223037.html b/.history/index_20231115223037.html new file mode 100644 index 0000000..42b924f --- /dev/null +++ b/.history/index_20231115223037.html @@ -0,0 +1,618 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + TableLlama: Towards Open Large Generalist Models for Tables + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+ +
+ Icon +

TableLlama: Towards Open Large Generalist Models for Tables

+
+ +
+ + + 1Tianshu Zhang*, + + 2Xiang Yue, + + 1Yifei Li, + + 1Huan Sun* + +
+ +
+ + 1The Ohio State University, + 2IN.AI + +
+ *Corresponding Authors. +
+ zhang.11535@osu.edu + , sun.397@osu.edu + +
+ + + +
+
+
+
+
+ + + +
+
+
+
+
+

Abstract

+
+

+ Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically interpret, augment, and query tables. Current methods often require pretraining on tables or special model architecture design, are restricted to specific table types, or have simplifying assumptions about tables and tasks. This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves com- parable or better performance than the SOTA for each task, despite the latter often has task-specific design. On 6 out-of-domain datasets, it achieves 6-48 absolute point gains compared with the base model, showing that training on TableInstruct enhances the model’s generalizability. We will open-source our dataset and trained model to boost future work on developing open generalist models for tables. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + An overview of TableInstruct and 🦙TableLlama +

+ Figure 1: An overview of TableInstruct and TableLlama. TableInstruct includes a wide variety of realistic tables and tasks with instructions. We make the first step towards developing open-source generalist models for tables with TableInstruct and TableLlama. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + The hybrid instruction tuning of 🦣MAmmoTH +

+ Figure 2: Illustration of three exemplary tasks: (a) Column type annotation. This task is to annotate the selected column with the correct semantic types. (b) Row population. This task is to populate rows given table metadata and partial row entities. (c) Hierarchical table QA. For subfigures (a) and (b), we mark candidates with red color in the + "task instruction" part. The candidate set size can be hundreds to thousands in TableInstruct. +

+
+
+
+
+
+
+ + +
+
+
+
+
+

Our Dataset: TableInstruct

+
+ +
+
+
+
+
+
+ +
+
+
+
+
+

In-domain Evaluation:

+
+ +
+
+
+
+
+
+ + +
+
+
+
+
+

Out-of-domain Evaluation:

+
+ +
+
+
+
+
+
+ + + + +
+
+

Reference

+ Please kindly cite our paper if you use our code, data, models or results: +

+
@misc{zhang2023tablellama,
+  title={TableLlama: Towards Open Large Generalist Models for Tables}, 
+  author={Tianshu Zhang and Xiang Yue and Yifei Li and Huan Sun},
+  year={2023},
+  eprint={2311.09206},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL}
+}
+      
+
+
+ + + + + + + + + + + + + diff --git a/.history/index_20231115223106.html b/.history/index_20231115223106.html new file mode 100644 index 0000000..2672e0d --- /dev/null +++ b/.history/index_20231115223106.html @@ -0,0 +1,617 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + TableLlama: Towards Open Large Generalist Models for Tables + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+ +
+ Icon +

TableLlama: Towards Open Large Generalist Models for Tables

+
+ +
+ + + 1Tianshu Zhang*, + + 2Xiang Yue, + + 1Yifei Li, + + 1Huan Sun* + +
+ +
+ + 1The Ohio State University, + 2IN.AI + +
+ *Corresponding Authors. +
+ zhang.11535@osu.edu + , sun.397@osu.edu + +
+ + +
+
+
+
+
+ + + +
+
+
+
+
+

Abstract

+
+

+ Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically interpret, augment, and query tables. Current methods often require pretraining on tables or special model architecture design, are restricted to specific table types, or have simplifying assumptions about tables and tasks. This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves com- parable or better performance than the SOTA for each task, despite the latter often has task-specific design. On 6 out-of-domain datasets, it achieves 6-48 absolute point gains compared with the base model, showing that training on TableInstruct enhances the model’s generalizability. We will open-source our dataset and trained model to boost future work on developing open generalist models for tables. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + An overview of TableInstruct and 🦙TableLlama +

+ Figure 1: An overview of TableInstruct and TableLlama. TableInstruct includes a wide variety of realistic tables and tasks with instructions. We make the first step towards developing open-source generalist models for tables with TableInstruct and TableLlama. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + The hybrid instruction tuning of 🦣MAmmoTH +

+ Figure 2: Illustration of three exemplary tasks: (a) Column type annotation. This task is to annotate the selected column with the correct semantic types. (b) Row population. This task is to populate rows given table metadata and partial row entities. (c) Hierarchical table QA. For subfigures (a) and (b), we mark candidates with red color in the + "task instruction" part. The candidate set size can be hundreds to thousands in TableInstruct. +

+
+
+
+
+
+
+ + +
+
+
+
+
+

Our Dataset: TableInstruct

+
+ +
+
+
+
+
+
+ +
+
+
+
+
+

In-domain Evaluation:

+
+ +
+
+
+
+
+
+ + +
+
+
+
+
+

Out-of-domain Evaluation:

+
+ +
+
+
+
+
+
+ + + + +
+
+

Reference

+ Please kindly cite our paper if you use our code, data, models or results: +

+
@misc{zhang2023tablellama,
+  title={TableLlama: Towards Open Large Generalist Models for Tables}, 
+  author={Tianshu Zhang and Xiang Yue and Yifei Li and Huan Sun},
+  year={2023},
+  eprint={2311.09206},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL}
+}
+      
+
+
+ + + + + + + + + + + + + diff --git a/.history/index_20231115223138.html b/.history/index_20231115223138.html new file mode 100644 index 0000000..9a525de --- /dev/null +++ b/.history/index_20231115223138.html @@ -0,0 +1,652 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + TableLlama: Towards Open Large Generalist Models for Tables + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+ +
+ Icon +

TableLlama: Towards Open Large Generalist Models for Tables

+
+ +
+ + + 1Tianshu + Zhang*, + + 2Xiang Yue, + + 1Yifei Li, + + 1Huan Sun* + +
+ +
+ + 1The Ohio State University, + 2IN.AI + +
+ *Corresponding Authors. +
+ zhang.11535@osu.edu + , sun.397@osu.edu + +
+ + +
+
+
+
+
+ + + +
+
+
+
+
+

Abstract

+
+

+ Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically + interpret, augment, and query tables. Current methods often require pretraining on tables or special + model architecture design, are restricted to specific table types, or have simplifying assumptions about + tables and tasks. This paper makes the first step towards developing open-source large language models + (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct + TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and + evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by + fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both + in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves com- + parable or better performance than the SOTA for each task, despite the latter often has task-specific + design. On 6 out-of-domain datasets, it achieves 6-48 absolute point gains compared with the base model, + showing that training on TableInstruct enhances the model’s generalizability. We will open-source our + dataset and trained model to boost future work on developing open generalist models for tables. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + An overview of TableInstruct and 🦙TableLlama +

+ Figure 1: An overview of TableInstruct and TableLlama. TableInstruct includes a wide variety of + realistic tables and tasks with instructions. We make the first step towards developing open-source + generalist models for tables with TableInstruct and TableLlama. +

+
+
+
+
+
+
+ + + +
+
+
+
+
+
+ + The hybrid instruction tuning of 🦣MAmmoTH +

+ Figure 2: Illustration of three exemplary tasks: (a) Column type annotation. This task is to annotate + the selected column with the correct semantic types. (b) Row population. This task is to populate rows + given table metadata and partial row entities. (c) Hierarchical table QA. For subfigures (a) and (b), we + mark candidates with red color in the + "task instruction" part. The candidate set size can be hundreds to thousands in TableInstruct. +

+
+
+
+
+
+
+ + +
+
+
+
+
+

Our Dataset: TableInstruct

+
+ +
+
+
+
+
+
+ +
+
+
+
+
+

In-domain Evaluation:

+
+ +
+
+
+
+
+
+ + +
+
+
+
+
+

Out-of-domain Evaluation:

+
+ +
+
+
+
+
+
+ + + + +
+
+

Reference

+ Please kindly cite our paper if you use our code, data, models or results: +

+
@misc{zhang2023tablellama,
+  title={TableLlama: Towards Open Large Generalist Models for Tables}, 
+  author={Tianshu Zhang and Xiang Yue and Yifei Li and Huan Sun},
+  year={2023},
+  eprint={2311.09206},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL}
+}
+      
+
+
+ + + + + + + + + + + + + + \ No newline at end of file diff --git a/index.html b/index.html index ff48e3e..9a525de 100644 --- a/index.html +++ b/index.html @@ -1,40 +1,43 @@ + - - - - + + + + - - + + - + - + TableLlama: Towards Open Large Generalist Models for Tables - + - + @@ -44,6 +47,7 @@ + @@ -59,7 +63,12 @@ - - + - -
-
+ +
+
-
-
-

Abstract

-
-

- Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically interpret, augment, and query tables. Current methods often require pretraining on tables or special model architecture design, are restricted to specific table types, or have simplifying assumptions about tables and tasks. This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves com- parable or better performance than the SOTA for each task, despite the latter often has task-specific design. On 6 out-of-domain datasets, it achieves 6-48 absolute point gains compared with the base model, showing that training on TableInstruct enhances the model’s generalizability. We will open-source our dataset and trained model to boost future work on developing open generalist models for tables. -

-
-
+
+
+

Abstract

+
+

+ Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically + interpret, augment, and query tables. Current methods often require pretraining on tables or special + model architecture design, are restricted to specific table types, or have simplifying assumptions about + tables and tasks. This paper makes the first step towards developing open-source large language models + (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct + TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and + evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by + fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both + in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves com- + parable or better performance than the SOTA for each task, despite the latter often has task-specific + design. On 6 out-of-domain datasets, it achieves 6-48 absolute point gains compared with the base model, + showing that training on TableInstruct enhances the model’s generalizability. We will open-source our + dataset and trained model to boost future work on developing open generalist models for tables. +

+
+
-
-
- +
+
+ - -
-
+ +
+
-
-
-
- - An overview of TableInstruct and 🦙TableLlama -

- Figure 1: An overview of TableInstruct and TableLlama. TableInstruct includes a wide variety of realistic tables and tasks with instructions. We make the first step towards developing open-source generalist models for tables with TableInstruct and TableLlama. -

-
-
+
+
+
+ + An overview of TableInstruct and 🦙TableLlama +

+ Figure 1: An overview of TableInstruct and TableLlama. TableInstruct includes a wide variety of + realistic tables and tasks with instructions. We make the first step towards developing open-source + generalist models for tables with TableInstruct and TableLlama. +

+
+
-
-
- +
+
+ - -
-
+ +
+
-
-
-
- - The hybrid instruction tuning of 🦣MAmmoTH -

- Figure 2: Illustration of three exemplary tasks: (a) Column type annotation. This task is to annotate the selected column with the correct semantic types. (b) Row population. This task is to populate rows given table metadata and partial row entities. (c) Hierarchical table QA. For subfigures (a) and (b), we mark candidates with red color in the - "task instruction" part. The candidate set size can be hundreds to thousands in TableInstruct. -

-
-
+
+
+
+ + The hybrid instruction tuning of 🦣MAmmoTH +

+ Figure 2: Illustration of three exemplary tasks: (a) Column type annotation. This task is to annotate + the selected column with the correct semantic types. (b) Row population. This task is to populate rows + given table metadata and partial row entities. (c) Hierarchical table QA. For subfigures (a) and (b), we + mark candidates with red color in the + "task instruction" part. The candidate set size can be hundreds to thousands in TableInstruct. +

+
+
-
-
- +
+
+ -
-
-
-
-
-

Our Dataset: TableInstruct

-
- -
+ -
-
-
-
-
-

In-domain Evaluation:

-
-
- -
+ -
-
-
-
-
-

Out-of-domain Evaluation:

-
- -
+ - +

Reference

@@ -577,32 +616,37 @@

Reference

- + + + + + + - - - + - + - - + \ No newline at end of file