Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandoc does not match Emacs when reading org files #9042

Closed
cacology opened this issue Aug 30, 2023 · 9 comments
Closed

Pandoc does not match Emacs when reading org files #9042

cacology opened this issue Aug 30, 2023 · 9 comments
Labels

Comments

@cacology
Copy link
Contributor

cacology commented Aug 30, 2023

(Apologies, I reported the issue #9041 the wrong way.)

Pandoc does not match Emacs when reading org files. I.e. calling emacs -Q test.org on

test.org

#+TITLE: Testing

* Testing

p. lower case

P. Upper Case

Does not parse either "p." or "P." as a list item. Exporting to HTML, for example results in:

[... html removed]

<p>
p. lower case
</p>

<p>
P. Upper Case
</p>

[... html removed]

This matches the behavior described in the Org Mode manual.

On the other hand, calling pandoc -f org -t html test.org -o test.html results in

<h1 id="testing-1">Testing</h1>
<ol>
<li><p>lower case</p></li>
<li><p>Upper Case</p></li>
</ol>

Reproduced with: https://pandoc.org/try and MacOS 13.5.1 with pandoc 3.1.6.2 (Features: +server +lua Scripting engine: Lua 5.4) via homebrew and GNU Emacs 29.1 via homebrew.

Respectfully submitted.

@cacology cacology added the bug label Aug 30, 2023
@jgm
Copy link
Owner

jgm commented Aug 30, 2023

Here's what I get when I load that org file in Emacs and do org-html-export-as-html:

<h2 id="orgfa551d0"><span class="section-number-2">1.</span> Testing</h2>
<div class="outline-text-2" id="text-1">
<ol class="org-ol">
<li>lower case</li>

<li>Upper Case</li>
</ol>

Just like pandoc, then, it is treating this as a list.
(I know that the Org manual doesn't mention lettered lists, but this is the behavior I see. Also, it is being highlighted as a list when I view the file in org-mode.)

@cacology
Copy link
Contributor Author

I think this depends on the state of org-list-allow-alphabetical in Emacs thus:

With a value of t

emacs -Q --eval "(progn (find-file \"test.org\") (setq org-list-allow-alphabetical 't) (org-element-update-syntax) (org-html-export-as-html))" gives

<h2 id="orgb3218e1"><span class="section-number-2">1.</span> Testing</h2>
<div class="outline-text-2" id="text-1">
<ol class="org-ol">
<li>lower case</li>

<li>Upper Case</li>
</ol>

With nil

emacs -Q --eval "(progn (find-file \"test.org\") (setq org-list-allow-alphabetical 'nil) (org-element-update-syntax) (org-html-export-as-html))" gives:

<p>
p. lower case
</p>

<p>
P. Upper Case
</p>

The #7812 pull request seems to suggest that the purpose behind the fancy_list extension was to enable or disable this behavior to reflect this variable in Emacs. This doesn't appear to be working right now, i.e.

pandoc -f org-fancy_lists -t html test.org gives

<h1 id="testing-1">Testing</h1>
<ol>
<li><p>lower case</p></li>
<li><p>Upper Case</p></li>
</ol>

@jgm
Copy link
Owner

jgm commented Aug 31, 2023

Thanks for the clarification. I assume that the default for org-list-allow-alphabetical is t, since it works on my setup and I don't think I set this?

In any case, it does look as if the org reader checks the fancy_lists extension for this (I had not realized that). The mystery is why it still seems to parse these alphabetic lists by default, even though fancy_lists is not enabled in the default org extensions.

@jgm
Copy link
Owner

jgm commented Aug 31, 2023

OK, I see. The way the code is now written, p. will always work as a list marker; the only difference fancy_lists makes is that it makes this marker affect the list type; so, by default, you get <ol> but with fancy_lists you get <ol type="a"> (still no start number). Obviously there are some things that can be improved here!

@jgm jgm closed this as completed in 1e43eb4 Aug 31, 2023
@jgm
Copy link
Owner

jgm commented Aug 31, 2023

Should be fixed now.

@cacology
Copy link
Contributor Author

Thank you!

Just for documentation purposes, doomemacs sets org-list-allow-alphabetical to t but the default value is nil. There seems to be something controversial about the default behavior for alphabetical lists or initials beginning lines. Maybe some other configurations have strong opinions about this too. I wonder why.

@jgm
Copy link
Owner

jgm commented Aug 31, 2023

I'm grateful to know about org-list-allow-alphabetical. However, it looks like Emacs still highlights p. as a note marker when I set this to nil. (It does affect HTML export.)

@cacology
Copy link
Contributor Author

What does (org-element-parse-buffer) return? For me emacs -Q test.org reports:

[... stuff omitted]
5=(paragraph
		(:begin 30 :end 45 :contents-begin 30 :contents-end 44 :post-blank 1 :post-affiliated 30 :mode planning :granularity nil :parent #4#)
		#("p. lower case
" 0 14
(:parent #5#)))
	    #6=(paragraph
		(:begin 45 :end 59 :contents-begin 45 :contents-end 59 :post-blank 0 :post-affiliated 45 :mode nil :granularity nil :parent #4#)
		#("P. Upper Case
" 0 14
(:parent #6#))))
[... stuff omitted]

My normal setup returns the same value, but I think the problem here is that org has its own opinions about parsing that change over time, etc. and can be heavily customized.

Maybe the standard for the org reader should specify vanilla emacs, stable version, etc. as the target. That would clarify future queries and give a useful test I.e.:

emacs -Q --eval "(progn (find-file \"test.org\") (print (org-element-parse-buffer) #'external-debugging-output) (kill-emacs))" should give an equivalent to pandoc's native structure of the same file. One could include a test to make sure org itself hasn't changed.

Then again, if someone thought it was fun, writing a parser for org-element-parse-buffer's output would mean that pandoc could work with whatever state a user's org mode was in.

@jgm
Copy link
Owner

jgm commented Aug 31, 2023

I get the same result: just a paragraph, nothing about a list. Still, the p. is displayed in boldface....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants