Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame not render in markdown in Julia #6134

Open
sethaxen opened this issue Jul 7, 2023 · 10 comments
Open

DataFrame not render in markdown in Julia #6134

sethaxen opened this issue Jul 7, 2023 · 10 comments
Assignees
Labels
enhancement New feature or request julia needs-design needs-discussion Issues that require a team-wide discussion before proceeding further
Milestone

Comments

@sethaxen
Copy link

sethaxen commented Jul 7, 2023

On Quarto v1.3.433, I tried rendering a .qmd file with a cell block that outputs a DataFrames.DataFrame object in Julia. When rendering an HTML or PDF file, the DataFrame is rendered as a table, but when rendering to markdown, it is not rendered at all. There's a minimal repo demonstrating this here: https://github.com/sethaxen/quarto_dataframes_jl_demo

@jjallaire jjallaire transferred this issue from quarto-dev/quarto Jul 7, 2023
@cderv
Copy link
Collaborator

cderv commented Jul 7, 2023

I can reproduce this.

Somehow, when the first table is outputing a LaTeX table in the intermediate .md file pass to Pandoc

The following shows a table when rendered to PDF or HTML but nothing when rendered to markdown

::: {.cell execution_count=2}
``` {.julia .cell-code}
using DataFrames
df = DataFrame(:x => randn(10), :y => randn(10))
```

::: {.cell-output .cell-output-display execution_count=3}
\begin{tabular}{r|cc}
	& x & y\\
	\hline
	& Float64 & Float64\\
	\hline
	1 & 0.0387945 & 0.728907 \\
	2 & 0.0640609 & -0.0620356 \\
	3 & 0.0834239 & 0.209232 \\
	4 & 1.46305 & -0.305209 \\
	5 & 0.883393 & 0.61293 \\
	6 & -1.27005 & 0.167557 \\
	7 & 0.469263 & 0.873955 \\
	8 & 0.398598 & -0.243128 \\
	9 & -0.210688 & 0.131962 \\
	10 & 1.57329 & -1.30073 \\
\end{tabular}

:::
:::

This is why it gets ignored when converting to a Markdown output with Pandoc... This is suprising 🤔

@cderv cderv added bug Something isn't working julia labels Jul 7, 2023
@cderv
Copy link
Collaborator

cderv commented Jul 7, 2023

So this the information we get from the Jupyter rendering

{
 "cells": [
  {
   "cell_type": "raw",
   "id": "a6f5d7b4",
   "metadata": {},
   "source": [
    "---\n",
    "title: \"DataFrames rendering test\"\n",
    "keep-md: true\n",
    "keep-ipynb: true\n",
    "format: \n",
    "  md:\n",
    "    output-file: test-md.md\n",
    "  html: default\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "f96792b9",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m\u001b[1m  Activating\u001b[22m\u001b[39m project at `C:\\Users\\chris\\Documents\\DEV_OTHER\\DEMOS\\test-quarto`\n"
     ]
    }
   ],
   "source": [
    "#| output: false\n",
    "#| echo: false\n",
    "using Pkg\n",
    "Pkg.activate(\".\")\n",
    "Pkg.instantiate()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f64c9a77",
   "metadata": {},
   "source": [
    "The following shows a table when rendered to PDF or HTML but nothing when rendered to markdown\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f3ee4485",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><div style = \"float: left;\"><span>10×2 DataFrame</span></div><div style = \"clear: both;\"></div></div><div class = \"data-frame\" style = \"overflow-x: scroll;\"><table class = \"data-frame\" style = \"margin-bottom: 6px;\"><thead><tr class = \"header\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">Row</th><th style = \"text-align: left;\">x</th><th style = \"text-align: left;\">y</th></tr><tr class = \"subheader headerLastRow\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\"></th><th title = \"Float64\" style = \"text-align: left;\">Float64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th></tr></thead><tbody><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">1</td><td style = \"text-align: right;\">0.662702</td><td style = \"text-align: right;\">0.451484</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">2</td><td style = \"text-align: right;\">-1.54828</td><td style = \"text-align: right;\">-1.25003</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">3</td><td style = \"text-align: right;\">-0.0443339</td><td style = \"text-align: right;\">-1.39289</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">4</td><td style = \"text-align: right;\">-1.55107</td><td style = \"text-align: right;\">-1.9189</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">5</td><td style = \"text-align: right;\">0.0755979</td><td style = \"text-align: right;\">-0.58877</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">6</td><td style = \"text-align: right;\">0.748819</td><td style = \"text-align: right;\">-1.87185</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">7</td><td style = \"text-align: right;\">-0.281013</td><td style = \"text-align: right;\">1.54457</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">8</td><td style = \"text-align: right;\">-0.680722</td><td style = \"text-align: right;\">0.664824</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">9</td><td style = \"text-align: right;\">-1.20119</td><td style = \"text-align: right;\">2.64156</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">10</td><td style = \"text-align: right;\">-0.161478</td><td style = \"text-align: right;\">1.11157</td></tr></tbody></table></div>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Float64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 0.662702 & 0.451484 \\\\\n",
       "\t2 & -1.54828 & -1.25003 \\\\\n",
       "\t3 & -0.0443339 & -1.39289 \\\\\n",
       "\t4 & -1.55107 & -1.9189 \\\\\n",
       "\t5 & 0.0755979 & -0.58877 \\\\\n",
       "\t6 & 0.748819 & -1.87185 \\\\\n",
       "\t7 & -0.281013 & 1.54457 \\\\\n",
       "\t8 & -0.680722 & 0.664824 \\\\\n",
       "\t9 & -1.20119 & 2.64156 \\\\\n",
       "\t10 & -0.161478 & 1.11157 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m10×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x          \u001b[0m\u001b[1m y         \u001b[0m\n",
       "\u001b[90m Float64    \u001b[0m\u001b[90m Float64   \u001b[0m\n",
       "─────┼───────────────────────\n",
       "   1 │  0.662702    0.451484\n",
       "   2 │ -1.54828    -1.25003\n",
       "   3 │ -0.0443339  -1.39289\n",
       "   4 │ -1.55107    -1.9189\n",
       "   5 │  0.0755979  -0.58877\n",
       "   6 │  0.748819   -1.87185\n",
       "   7 │ -0.281013    1.54457\n",
       "   8 │ -0.680722    0.664824\n",
       "   9 │ -1.20119     2.64156\n",
       "  10 │ -0.161478    1.11157"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "using DataFrames\n",
    "df = DataFrame(:x => randn(10), :y => randn(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f6bb391",
   "metadata": {},
   "source": [
    "But there is a custom `show` method for HTML outputs:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6805350a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"<div><div style = \\\"float: left;\\\"><span>10×2 DataFrame</span></div><div style = \\\"clear: both;\\\"></div></div><div class = \\\"data-frame\\\" style = \\\"overflow-x: scroll;\\\"><table class = \\\"data-frame\\\" style = \\\"margin-bottom: 6px;\\\"><thead><tr class = \\\"header\\\"><th class = \\\"rowNumber\" ⋯ 1943 bytes ⋯ \"ht;\\\">-1.20119</td><td style = \\\"text-align: right;\\\">2.64156</td></tr><tr><td class = \\\"rowNumber\\\" style = \\\"font-weight: bold; text-align: right;\\\">10</td><td style = \\\"text-align: right;\\\">-0.161478</td><td style = \\\"text-align: right;\\\">1.11157</td></tr></tbody></table></div>\""
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sprint(show, \"text/html\", df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62ee20cf",
   "metadata": {},
   "source": [
    "In fact, if we just wrap the `DataFrame` with an object that has a custom HTML `show` method, it renders in markdown just fine:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "8c684f8f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><div style = \"float: left;\"><span>10×2 DataFrame</span></div><div style = \"clear: both;\"></div></div><div class = \"data-frame\" style = \"overflow-x: scroll;\"><table class = \"data-frame\" style = \"margin-bottom: 6px;\"><thead><tr class = \"header\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">Row</th><th style = \"text-align: left;\">x</th><th style = \"text-align: left;\">y</th></tr><tr class = \"subheader headerLastRow\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\"></th><th title = \"Float64\" style = \"text-align: left;\">Float64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th></tr></thead><tbody><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">1</td><td style = \"text-align: right;\">0.662702</td><td style = \"text-align: right;\">0.451484</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">2</td><td style = \"text-align: right;\">-1.54828</td><td style = \"text-align: right;\">-1.25003</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">3</td><td style = \"text-align: right;\">-0.0443339</td><td style = \"text-align: right;\">-1.39289</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">4</td><td style = \"text-align: right;\">-1.55107</td><td style = \"text-align: right;\">-1.9189</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">5</td><td style = \"text-align: right;\">0.0755979</td><td style = \"text-align: right;\">-0.58877</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">6</td><td style = \"text-align: right;\">0.748819</td><td style = \"text-align: right;\">-1.87185</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">7</td><td style = \"text-align: right;\">-0.281013</td><td style = \"text-align: right;\">1.54457</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">8</td><td style = \"text-align: right;\">-0.680722</td><td style = \"text-align: right;\">0.664824</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">9</td><td style = \"text-align: right;\">-1.20119</td><td style = \"text-align: right;\">2.64156</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">10</td><td style = \"text-align: right;\">-0.161478</td><td style = \"text-align: right;\">1.11157</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "DataFrameWrapper(\u001b[1m10×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x          \u001b[0m\u001b[1m y         \u001b[0m\n",
       "\u001b[90m Float64    \u001b[0m\u001b[90m Float64   \u001b[0m\n",
       "─────┼───────────────────────\n",
       "   1 │  0.662702    0.451484\n",
       "   2 │ -1.54828    -1.25003\n",
       "   3 │ -0.0443339  -1.39289\n",
       "   4 │ -1.55107    -1.9189\n",
       "   5 │  0.0755979  -0.58877\n",
       "   6 │  0.748819   -1.87185\n",
       "   7 │ -0.281013    1.54457\n",
       "   8 │ -0.680722    0.664824\n",
       "   9 │ -1.20119     2.64156\n",
       "  10 │ -0.161478    1.11157)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "struct DataFrameWrapper\n",
    "    df\n",
    "end\n",
    "\n",
    "Base.show(io::IO, mime::MIME\"text/html\", w::DataFrameWrapper) = show(io, mime, w.df)\n",
    "\n",
    "DataFrameWrapper(df)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.7.3",
   "language": "julia",
   "name": "julia-1.7"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

We can see that the first cell will offer a three representation: html, plain and latex. No text/markdown output.

And then we have specific process that handle some raw LaTeX in that case are they are expected to be Math.

Here is what happens

We are getting the cell content from the above in

async function mdFromCodeCell(
cell: JupyterCellWithOptions,
cellIndex: number,
options: JupyterToMarkdownOptions,
) {

In that process we do the following with the outputs

}).map((output) => {
// convert text/latex math to markdown as appropriate
if (!options.toLatex && isDisplayData(output) && output.data[kTextLatex]) {
return displayDataWithMarkdownMath(output);
} else {
return output;
}

So we are calling displayDataWithMarkdownMath() on this toLatex output, and in there we select the output.data["text/latex"] explicitely

export function displayDataWithMarkdownMath(output: JupyterOutputDisplayData) {
if (Array.isArray(output.data[kTextLatex]) && !output.data[kTextMarkdown]) {
const latex = output.data[kTextLatex] as string[];
if (displayDataLatexIsMath(latex)) {
output = ld.cloneDeep(output);
output.data[kTextMarkdown] = output.data[kTextLatex];
return output;
}
}
return output;
}

For the last code cell you have, there is only text/html and text/plain, so there is no LaTeX table inserted and so you see the right table result.

I don't know enough about Jupyter, Julia and nbConvert to understand what happens. It seems though that

df = DataFrame(:x => randn(10), :y => randn(10))

won't output text/markdown to include.

Though I guess we should probably not choose the text/latex part, but either the plain or html.

To note that if we do force HTML then we correctly select the HTML version of the results

format: 
   md:
      prefer-html: true

@cscheid @dragonstyle you know Quarto internals for Jupyter a bit better than me. Do you have more insight on what we should do here ? if we do need to do something unless we consider this to be a Julia DataFrames output problem. But really surprising that we select the text/latex data for this output.

Hope it helps

@cderv cderv added triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone. needs-discussion Issues that require a team-wide discussion before proceeding further labels Jul 7, 2023
@sethaxen
Copy link
Author

sethaxen commented Jul 8, 2023

To note that if we do force HTML then we correctly select the HTML version of the results

format: 
   md:
      prefer-html: true

Is there a way to force this for a single cell instead of the whole document?

@cderv
Copy link
Collaborator

cderv commented Jul 10, 2023

Is there a way to force this for a single cell instead of the whole document?

Unfortunately no. It was not meant for this specific cell usage. IMO this is a side effect that it helps with this issue. It should be solved differently by correctly handling the Julia outputs for markdown format.

@sethaxen
Copy link
Author

sethaxen commented Aug 9, 2023

Hi, any updates on what the solution for this might be?

@mcanouil
Copy link
Collaborator

mcanouil commented Aug 9, 2023

As you can see, no new informations.
When there is, the team will communicate.

@cscheid cscheid added enhancement New feature or request and removed bug Something isn't working labels Aug 11, 2023
@cscheid
Copy link
Collaborator

cscheid commented Aug 11, 2023

Unfortunately, I don't think this is a quarto bug.

We can attempt to heuristically choose a different mimetype in case the execution doesn't include a markdown output, but that won't work well in general. If the output produced by Jupyter doesn't include markdown output, then there's a limit to how well we can handle it.

I know very little about Julia, but in Python programmers can force Markdown output. You might want to look into how Julia supports the equivalent of:

from IPython.display import Markdown
Markdown("**this will be bold**")

@cderv
Copy link
Collaborator

cderv commented Aug 21, 2023

Thanks @cscheid for confirming.

Based on your question, I did look as this a bit differently. I found this issue in DataFrame.jl

So basically, having Markdown display is based on other tools. You have

Rough examples that would need to be improved in context of Jupyter Notebook

---
title: "DataFrames rendering test"
---

```{julia}
#| output: false
#| echo: false
using Pkg
Pkg.activate(".")
Pkg.instantiate()
```


## Using show()

```{julia}
using DataFrames
using PrettyTables
df = DataFrame(:x => randn(10), :y => randn(10))
show(df, tf = PrettyTables.tf_markdown)
```

## Using PrettyTables directly

```{julia}
using PrettyTables
pretty_table(df, tf = tf_markdown)
```

## Using MarkdownTables

```{julia}
using MarkdownTables
df |> markdown_table(String) |> print
```
Markdown Outputs
# DataFrames rendering test

## Using show()

``` julia
using DataFrames
using PrettyTables
df = DataFrame(:x => randn(10), :y => randn(10))
show(df, tf = PrettyTables.tf_markdown)
```

    10×2 DataFrame
     Row | x           y          
         | Float64     Float64    
    -----|------------------------
       1 | -0.433596   -0.91904
       2 |  1.91175     0.514662
       3 | -0.17271    -0.14763
       4 |  0.686026   -0.480422
       5 |  1.4416      0.136346
       6 |  0.807431   -0.101564
       7 |  1.72824    -0.0431558
       8 | -0.0503871   0.758866
       9 |  1.05733    -0.356043
      10 | -1.14679     0.926031

## Using PrettyTables directly

``` julia
using PrettyTables
pretty_table(df, tf = tf_markdown)
```

    |          x |          y |
    |    Float64 |    Float64 |
    |------------|------------|
    |  -0.433596 |   -0.91904 |
    |    1.91175 |   0.514662 |
    |   -0.17271 |   -0.14763 |
    |   0.686026 |  -0.480422 |
    |     1.4416 |   0.136346 |
    |   0.807431 |  -0.101564 |
    |    1.72824 | -0.0431558 |
    | -0.0503871 |   0.758866 |
    |    1.05733 |  -0.356043 |
    |   -1.14679 |   0.926031 |

## Using MarkdownTables

``` julia
using MarkdownTables
df |> markdown_table(String) |> print
```

    | x                    | y                    |
    |----------------------|----------------------|
    | -0.43359602148995735 |  -0.9190395841538315 |
    |    1.911749853834694 |   0.5146618874800643 |
    | -0.17271029762845877 |  -0.1476302354292226 |
    |   0.6860264124167541 |  -0.4804224361576191 |
    |   1.4415970172652266 |  0.13634564934712143 |
    |   0.8074305434190046 | -0.10156367162958412 |
    |   1.7282435099556877 | -0.04315577960397021 |
    | -0.05038712142014474 |   0.7588661206604351 |
    |    1.057334138204181 | -0.35604334033646007 |
    |    -1.14679140689634 |   0.9260312445228053 |

```

It would probably be interesting to re-discuss with DataFrame.jl team so that text/markdown output is added to their default output (in addition to html and latex) : https://github.com/JuliaData/DataFrames.jl/blob/main/src/dataframerow/show.jl#L41

We can attempt to heuristically choose a different mimetype in case the execution doesn't include a markdown output, but that won't work well in general. If the output produced by Jupyter doesn't include markdown output, then there's a limit to how well we can handle it.

@cscheid about this, I understand the general thought and I agree about the markdown formatting required. However, I think something is odd in the way we are auto selecting the output. After looking at that in details at #6134 (comment) what happens here is

  • For some reason, we catch the text/latex output that exists to process with displayDataWithMarkdownMath()
    }).map((output) => {
    // convert text/latex math to markdown as appropriate
    if (!options.toLatex && isDisplayData(output) && output.data[kTextLatex]) {
    return displayDataWithMarkdownMath(output);
    } else {
    return output;
    }
  • There is no math here to convert I think, but we still select the text/latex version of the output to use as text/markdown.
    export function displayDataWithMarkdownMath(output: JupyterOutputDisplayData) {
    if (Array.isArray(output.data[kTextLatex]) && !output.data[kTextMarkdown]) {
    const latex = output.data[kTextLatex] as string[];
    if (displayDataLatexIsMath(latex)) {
    output = ld.cloneDeep(output);
    output.data[kTextMarkdown] = output.data[kTextLatex];
    return output;
    }
    }
    return output;
    }
  • The output contains a text/html and even a text/plain - maybe this would have been better to chose in context of format: md content.

I am not sure to understand why we would keep the text/latex output type here. It feels wrong to me, but maybe I am missing something. So I prefer re-explaining before we close this.

@cscheid
Copy link
Collaborator

cscheid commented Aug 21, 2023

I think you're right that we need to look at that code more carefully. Let's discuss this directly.

@cderv
Copy link
Collaborator

cderv commented Sep 12, 2023

Just for reference to that the two are linked. Similar issue about Julia output in Jupyter

@dragonstyle dragonstyle added this to the v1.5 milestone Sep 20, 2023
@dragonstyle dragonstyle removed their assignment Feb 22, 2024
@cscheid cscheid removed the triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone. label May 8, 2024
@cscheid cscheid modified the milestones: v1.5, Future May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request julia needs-design needs-discussion Issues that require a team-wide discussion before proceeding further
Projects
None yet
Development

No branches or pull requests

5 participants