Skip to content

Commit

Permalink
closes OfficeDev#278
Browse files Browse the repository at this point in the history
  • Loading branch information
mikeebowen committed Dec 5, 2023
1 parent 06dd059 commit 7edb83a
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 136 deletions.
138 changes: 8 additions & 130 deletions docs/spreadsheet/how-to-parse-and-read-a-large-spreadsheet.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,68 +22,6 @@ about the basic structure of a **SpreadsheetML** document, see [Structure of a S

[!include[Add-ins note](../includes/addinsnote.md)]

You must use the following **using** directives
or **Imports** statements to compile the code
in this topic.

### [C#](#tab/cs-0)
```csharp
using System;
using System.Linq;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
```

### [Visual Basic](#tab/vb-0)
```vb
Imports System
Imports System.Linq
Imports DocumentFormat.OpenXml
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml.Spreadsheet
```
***


--------------------------------------------------------------------------------
## Getting a SpreadsheetDocument Object
In the Open XML SDK, the [SpreadsheetDocument](https://msdn.microsoft.com/library/office/documentformat.openxml.packaging.spreadsheetdocument.aspx) class represents an
Excel document package. To open and work with an Excel document, you
create an instance of the **SpreadsheetDocument** class from the document.
After you create this instance, you can use it to obtain access to the
main workbook part that contains the worksheets. The content in the
document is represented in the package as XML using **SpreadsheetML** markup.

To create the class instance, you call one of the overloads of the [Open()](https://msdn.microsoft.com/library/office/documentformat.openxml.packaging.spreadsheetdocument.open.aspx) method. The following code sample
shows how to use the [Open(String, Boolean)](https://msdn.microsoft.com/library/office/cc562356.aspx) overload. The first
parameter takes a string that represents the full path to the document
to open. The second parameter takes a value of **true** or **false** and
represents whether or not you want the file to be opened for editing. In
this example, the parameter is **false**
because the document is opened as read-only.

### [C#](#tab/cs-1)
```csharp
// Open the document as read-only.
using (SpreadsheetDocument spreadsheetDocument =
SpreadsheetDocument.Open(fileName, false))
{
// Code removed here.
}
```

### [Visual Basic](#tab/vb-1)
```vb
' Open the document as read-only.
Using spreadsheetDocument As SpreadsheetDocument = _
SpreadsheetDocument.Open(filename, False)
' Code removed here.
End Using
```
***


--------------------------------------------------------------------------------
## Approaches to Parsing Open XML Files
The Open XML SDK provides two approaches to parsing Open XML files. You
Expand All @@ -101,34 +39,10 @@ The following code segment is used to read a very large Excel file using
the DOM approach.

### [C#](#tab/cs-2)
```csharp
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
string text;
foreach (Row r in sheetData.Elements<Row>())
{
foreach (Cell c in r.Elements<Cell>())
{
text = c.CellValue.Text;
Console.Write(text + " ");
}
}
```
[!code-csharp[](../../samples/spreadsheet/parse_and_read_a_large_spreadsheet/cs/Program.cs#snippet1)]

### [Visual Basic](#tab/vb-2)
```vb
Dim workbookPart As WorkbookPart = spreadsheetDocument.WorkbookPart
Dim worksheetPart As WorksheetPart = workbookPart.WorksheetParts.First()
Dim sheetData As SheetData = worksheetPart.Worksheet.Elements(Of SheetData)().First()
Dim text As String
For Each r As Row In sheetData.Elements(Of Row)()
For Each c As Cell In r.Elements(Of Cell)()
text = c.CellValue.Text
Console.Write(text & " ")
Next
Next
```
[!code-vb[](../../samples/spreadsheet/parse_and_read_a_large_spreadsheet/vb/Program.vb#snippet1)]
***


Expand All @@ -137,36 +51,10 @@ sample (reading a very large Excel file), but uses the SAX approach.
This is the recommended approach for reading very large files.

### [C#](#tab/cs-3)
```csharp
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
string text;
while (reader.Read())
{
if (reader.ElementType == typeof(CellValue))
{
text = reader.GetText();
Console.Write(text + " ");
}
}
```
[!code-csharp[](../../samples/spreadsheet/parse_and_read_a_large_spreadsheet/cs/Program.cs#snippet2)]

### [Visual Basic](#tab/vb-3)
```vb
Dim workbookPart As WorkbookPart = spreadsheetDocument.WorkbookPart
Dim worksheetPart As WorksheetPart = workbookPart.WorksheetParts.First()

Dim reader As OpenXmlReader = OpenXmlReader.Create(worksheetPart)
Dim text As String
While reader.Read()
If reader.ElementType = GetType(CellValue) Then
text = reader.GetText()
Console.Write(text & " ")
End If
End While
```
[!code-vb[](../../samples/spreadsheet/parse_and_read_a_large_spreadsheet/vb/Program.vb#snippet2)]
***


Expand All @@ -184,30 +72,20 @@ method separately by commenting the call to the one you would like to
exclude.

### [C#](#tab/cs-4)
```csharp
String fileName = @"C:\Users\Public\Documents\BigFile.xlsx";
// Comment one of the following lines to test the method separately.
ReadExcelFileDOM(fileName); // DOM
ReadExcelFileSAX(fileName); // SAX
```
[!code-csharp[](../../samples/spreadsheet/parse_and_read_a_large_spreadsheet/cs/Program.cs#snippet3)]

### [Visual Basic](#tab/vb-4)
```vb
Dim fileName As String = "C:\Users\Public\Documents\BigFile.xlsx"
' Comment one of the following lines to test each method separately.
ReadExcelFileDOM(fileName) ' DOM
ReadExcelFileSAX(fileName) ' SAX
```
[!code-vb[](../../samples/spreadsheet/parse_and_read_a_large_spreadsheet/vb/Program.vb#snippet3)]
***


The following is the complete code sample in both C\# and Visual Basic.

### [C#](#tab/cs)
[!code-csharp[](../../samples/spreadsheet/parse_and_read_a_large_spreadsheet/cs/Program.cs)]
[!code-csharp[](../../samples/spreadsheet/parse_and_read_a_large_spreadsheet/cs/Program.cs#snippet0)]

### [Visual Basic](#tab/vb)
[!code-vb[](../../samples/spreadsheet/parse_and_read_a_large_spreadsheet/vb/Program.vb)]
[!code-vb[](../../samples/spreadsheet/parse_and_read_a_large_spreadsheet/vb/Program.vb#snippet0)]

--------------------------------------------------------------------------------
## See also
Expand Down
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
// <Snippet0>
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using System;
using System.Linq;

ReadExcelFileDOM(args[0]);
ReadExcelFileSAX(args[0]);

// The DOM approach.
// Note that the code below works only for cells that contain numeric values.
//
static void ReadExcelFileDOM(string fileName)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
// <Snippet1>

WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart ?? spreadsheetDocument.AddWorkbookPart();
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
Expand All @@ -27,6 +27,8 @@ static void ReadExcelFileDOM(string fileName)
Console.Write(text + " ");
}
}

// </Snippet1>

Console.WriteLine();
Console.ReadKey();
Expand All @@ -38,6 +40,8 @@ static void ReadExcelFileSAX(string fileName)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
// <Snippet2>

WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart ?? spreadsheetDocument.AddWorkbookPart();
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

Expand All @@ -51,7 +55,17 @@ static void ReadExcelFileSAX(string fileName)
Console.Write(text + " ");
}
}

// </Snippet2>

Console.WriteLine();
Console.ReadKey();
}
}
}
// </Snippet0>

// <Snippet3>
// Comment one of the following lines to test the method separately.
ReadExcelFileDOM(args[0]); // DOM
ReadExcelFileSAX(args[0]); // SAX
// </Snippet3>
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
' <Snippet0>
Imports DocumentFormat.OpenXml
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml.Spreadsheet

Module Program
Sub Main(args As String())
End Sub

' <Snippet3>

' Comment one of the following lines to test each method separately.
ReadExcelFileDOM(args(0)) ' DOM
ReadExcelFileSAX(args(0)) ' SAX

' </Snippet3>
End Sub


' The DOM approach.
Expand All @@ -14,6 +22,9 @@ Module Program

Private Sub ReadExcelFileDOM(ByVal fileName As String)
Using spreadsheetDocument As SpreadsheetDocument = SpreadsheetDocument.Open(fileName, False)

' <Snippet1>

Dim workbookPart As WorkbookPart = spreadsheetDocument.WorkbookPart
Dim worksheetPart As WorksheetPart = workbookPart.WorksheetParts.First()
Dim sheetData As SheetData = worksheetPart.Worksheet.Elements(Of SheetData)().First()
Expand All @@ -24,13 +35,19 @@ Module Program
Console.Write(text & " ")
Next
Next

' </Snippet1>

Console.WriteLine()
Console.ReadKey()
End Using
End Sub

' The SAX approach.
Private Sub ReadExcelFileSAX(ByVal fileName As String)

' <Snippet2>

Using spreadsheetDocument As SpreadsheetDocument = SpreadsheetDocument.Open(fileName, False)
Dim workbookPart As WorkbookPart = spreadsheetDocument.WorkbookPart
Dim worksheetPart As WorksheetPart = workbookPart.WorksheetParts.First()
Expand All @@ -43,8 +60,13 @@ Module Program
Console.Write(text & " ")
End If
End While

' </Snippet2>

Console.WriteLine()
Console.ReadKey()
End Using
End Sub
End Module
End Module

' </Snippet0>

0 comments on commit 7edb83a

Please sign in to comment.