Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .pod6 filetype to Perl6 #3366

Closed
wants to merge 5 commits into from
Closed

Conversation

samcv
Copy link
Contributor

@samcv samcv commented Dec 7, 2016

.pod6 is only used for Perl 6 so there should not be any issues with adding this. Thanks!

Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be in the Pod entry? I'm worried we'd end up highlighting .pod6 files (e.g., this one) incorrectly otherwise.

Could you also add a sample file?

@Alhadis
Copy link
Collaborator

Alhadis commented Dec 7, 2016

Yes, this actually should be classed as Pod. So in other words, $_ (what he said).

@samcv Could you please amend? =)

@samcv
Copy link
Contributor Author

samcv commented Dec 7, 2016

Well Perl 6 pod has different syntax than Perl pod. And the grammar we use for Perl 6 supports Perl 6 pod. I could add a sample file though.

@Alhadis
Copy link
Collaborator

Alhadis commented Dec 7, 2016

Still looks like Plain Old Documentation to me. Am I missing something here?

I'm only familiar with Perl 5, sorry. Blame Camelia:

The only requirement is that you know how to be nice to all kinds of people

Yeah fuck that.

@samcv
Copy link
Contributor Author

samcv commented Dec 7, 2016

Here is a sample of how it currently renders pod:
https://gist.github.com/samcv/6ca5fbeac9482d52145b48f8872be3aa

Maybe I should wait until v1.9 of the grammar gets released. I already have it ready to go, but waiting on the only person who has commit access to commit a ton of changes (a few of which are for pod).
Then I would be confident that everybody's pod would render right. Though, does github not want pod to be highlighted and prefers it to just render as plaintext?

@Alhadis
Copy link
Collaborator

Alhadis commented Dec 7, 2016

Oooh... right. Indented headings, got it. Alright, well... the correct thing to do would be to actually add a new language entry for it. If it's syntactically distinct from Pod, but still related, we can group it using language_group: Pod (so the two languages share the same usage statistics).

Where's this grammar you speak of?

@samcv
Copy link
Contributor Author

samcv commented Dec 7, 2016

This is where the grammar is: https://github.com/MadcapJake/language-perl6fe
v1.9 fixes these:

Pod

  • Pod comments now highlight properly working when there is leading whitespace.
  • Make sure pod after =para and =for immediately stop that block. Make sure formatting doesn't run on further.
  • Make sure pod after all other abbreviated forms like =head highlight as comments as well.

The documentation on Perl 6 Pod is here: https://design.perl6.org/S26.html
if you are curious.

@Alhadis
Copy link
Collaborator

Alhadis commented Dec 7, 2016

Well... I see I'm not alone with sending people fruit-baskets of bug fixes. :D Which reminds me, I indirectly promised I'd redeem that awful Perl grammar Atom uses, but I've had my hands full with file-icons. :c

@Alhadis
Copy link
Collaborator

Alhadis commented Dec 7, 2016

Alright, the results of the .pod6 file search have finished downloading. All-in-all, we're only looking at 52 unique repositories, distributed between 28 unique users... so I'm afraid the usage is too thin to warrant addition as a new language, methinks. I'm a little concerned about how the syntactic differences might skew the classifier, though...

@pchaigno, any input?

Alhadis added a commit to Alhadis/Silos that referenced this pull request Dec 7, 2016
@samcv
Copy link
Contributor Author

samcv commented Dec 11, 2016

Here is an example of a file that is in Perl 6 pod not loading because it thinks it is Perl pod:
https://github.com/perl6/specs/blob/master/S15-unicode.pod

There are lots of Perl 6 pod that are not classified properly because of the .pod extension.

I think the easiest way to distinguish the two is:

  • Perl 6 Pod: has =begin pod towards the top and contains =end pod
  • Perl 5 Pod: has =pod and uses =cut to end the pod section

Running perldoc on a file that had =begin pod will show no information at all, since in perl POD that is meant for things like =begin html and isn't used in the documentation.

I have added a change to the heuristics which correctly classifies my added sample and the Perl POD samples. I'm almost totally sure this will not classify any Perl 5 POD incorrectly.

@samcv
Copy link
Contributor Author

samcv commented Dec 11, 2016

@Alhadis let me know what a search using these changes comes up. Thanks!

@Alhadis
Copy link
Collaborator

Alhadis commented Dec 11, 2016

Erm, what do you mean? Are you talking about how many results are returned for the .pod6 extension?

Well, it'll be no different to last time, indexing of recent .pod6 files notwithstanding.

@pchaigno
Copy link
Contributor

There are lots of Perl 6 pod that are not classified properly because of the .pod extension.

Do you mean that Perl 6 pods also use the .pod file extension?

Well Perl 6 pod has different syntax than Perl pod.

Do you have any Perl 6 pod files where the current Pod highlighting doesn't work correctly?

@samcv
Copy link
Contributor Author

samcv commented Dec 11, 2016

@pchaigno yes, I linked one yesterday but here it is again:

https://github.com/perl6/specs/blob/master/S15-unicode.pod

It doesn't even render anything at all. Totally failing.

What I said before: “Running perldoc on a file that had =begin pod will show no information at all, since in perl POD that is meant for things like =begin html and isn't used in the documentation.”

So I expect none of them to work at all

@samcv
Copy link
Contributor Author

samcv commented Dec 11, 2016

And yeah I meant ones that will match with the new changes. Is that possible? I want to make sure the changes to the heuristics don't improperly flag any Perl 5 Pod. But as I said I am almost completely certain they won't, and err on the side of Perl 5 Pod

@pchaigno
Copy link
Contributor

@pchaigno yes, I linked one yesterday but here it is again:

@samcv Sorry I missed that.

Files on GitHub are rendered (vs. highlighted) using github/markup and the rendering engines it links to. Markup doesn't seem to support Perl6 pod files yet.

However, it uses Linguist to detect the language of files. Thus, we might be able to make it correctly detect such files as Perl6 pod files. At least, they wouldn't be incorrectly rendered and we could highlight them (display the source code instead of trying to convert it to HTML).

In the meantime, you can use Linguist overrides to disable the rendering (by marking these files as Perl6 for instance).

@Alhadis Would you be able to tell how many users and repositories there are for Perl6 pod files? (I haven't had the time to test your new scraping script yet :/)

@Alhadis
Copy link
Collaborator

Alhadis commented Dec 11, 2016

Copy link

@mattparksjr mattparksjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noice

@b2gills
Copy link

b2gills commented Dec 12, 2016

That list shows a lot of perltoc.pod files. (original POD)
Which will only exist in an installation of Perl 5 because it is a generated file.
( That is those repositories have a full install of Perl 5 in them, binaries and all )

All of the ones with SPPM-Web are also the original POD

podtreeparser is in POD6
All of them that have perl6 in the pathname seem to be POD6
/S26-documentation.pod is POD6
All of them in perl6/specs that showed up in the list are POD6

@pchaigno
Copy link
Contributor

So take that into account, it's now only 53 repositories among 31 users. Unless we missed many Perl6 pod files, I don't think it will be enough to warrant support.
The best course of action here is to use Linguist overrides to mark these files as Perl6. An alternative would be to add support for highlighting Perl6 pod to Pod::Simple::HTML, which github.com seems to be using.

@b2gills
Copy link

b2gills commented Dec 13, 2016

Adding support for Pod6 to Pod::Simple::HTML is not going to be worth the effort, as it would be effectivily a complete re-write.
About the best thing it could do is if it finds something Pod6-like it could try to use Inline::Perl6 to defer to a Pod6 parser.

Adding Pod5 support to a Pod6 parser would be much easier.

When Perl 6 broke compatibility to expand its repertoire, it also did so with Pod.
So it has added such things as nested blocks, and configuration options.
That is Pod6 is mostly a superset of Pos5. (=pod and =cut directives were removed for example)

It would also be easier because a Pod6 parser is more likely to be written with the Perl 6 grammar system.


One of the main reasons that Pod6 isn't being used that much is missing support in common tools.
For example most of the documentation for the only current Perl 6 implementation Rakudo, and the original design docs (superseded) are written in a common subset of both Pod5 and Pod6.
(Only the current docs is written in only Pod6)

GitHub is now a common tool
almost everything Perl 6 related is on here
most Perl 5 modules are here

@samcv
Copy link
Contributor Author

samcv commented Dec 14, 2016

I tested the branch in the PR and it says the Perlito and the perltoc.pod are both Pod, not Perl6, so I'm not sure about that. It would probably be fine to mark it as Perl 6 until there is more usage

@samcv
Copy link
Contributor Author

samcv commented Dec 17, 2016

@pchaigno what's the current status of this?

@brndnblck
Copy link

🏴 Flagging this as stale 🏴

@JJ
Copy link
Contributor

JJ commented Mar 13, 2018

@pchaigno if popularity is the only thing stopping from merging this (plus, I guess, conflicts now), can you please re-check? There's also this github/markup#1173 which could be helped with this. In fact, pod6 files are not recognized as Perl6 now

JJ added a commit to JJ/linguist that referenced this pull request Mar 28, 2018
To leave way for github-linguist#3366 or succeeding PRs.
@JJ JJ mentioned this pull request Mar 29, 2018
4 tasks
@samcv samcv closed this Apr 6, 2018
@samcv samcv reopened this Apr 6, 2018
@lildude
Copy link
Member

lildude commented Apr 9, 2018

So we find ourselves in a position of two PRs (this one and #4083 spawned out of an early comment in this PR) essentially implementing the same thing, but in slightly different ways in terms of how it will appear to the end user. As this is the oldest PR, I'm going to use this PR to discuss which approach is best to take.

First, lets address the popularity thing... .pod6 as an extension, is still not sufficiently popular for inclusion. This search returns over 1700 repos which on the face of things, suggests this is popular enough, but if you then look at how these are spread out, it's surprising to discover a mere 22 repos hold all of these files.

A similar search using .pod returns waaaay more results and these are definitely spread across hundreds of repos clearly indicating they're Perl 6 related, so there is definitely a need for classification, but not for the .pod6 extension.

So whichever option we choose, the .pod6 extension will need to be removed from that PR - it can always be added later when popularity grows, if it does.

Now we come to how are we to present this to users?

I'm far from a Perl expert, and certainly don't know the differences between Perl 5 and Perl 6 and Pod (5) and Pod 6, but from what I've read between the two PRs, Pod 6 is quite different from Pod (5) and is considered by the Perl 6 interpreter itself as Perl 6 and not something else like Perl 5/Pod (5).

If my understanding is correct, then I think we should go the route of classifying these files as Perl 6 (ie use this PR) rather than "Pod 6" (ie #4083)...

... buuuuuuut, and this is an important but ...

Is this what the Perl community as a whole expect, document or even want?

So sort of written mandate/preference/guideline would definitely help in this matter too.

@lildude
Copy link
Member

lildude commented Apr 9, 2018

A similar search using .pod returns waaaay more results and these are definitely spread across hundreds of repos clearly indicating they're Perl 6 related, so there is definitely a need for classification, but not for the .pod6 extension.

Update: I'll need to double-check this ☝️. I've just run my local script and I've come up with only 32 repos. @Alhadis does your harvester return different numbers for the same searches?

The discussion of how we present this to users still stands.

@JJ
Copy link
Contributor

JJ commented Apr 9, 2018

I'm not the whole community (obviously), but yes, I think pod 6 is perl 6. It is interpreted by the language itself, and it's available as a data structure ($=pod) for the program; if you want to interpret an external pod 6 file, you run it through the perl 6 interpreter and access that particular variable, which is not simply the text, but the actual structure composed of Pod::Blocks. In fact, #4066 (my very own PR) took that particular approach, although now references to Perl 6 have been removed in favor of accepting this one.

@Alhadis
Copy link
Collaborator

Alhadis commented Apr 9, 2018

@Alhadis does your harvester return different numbers for the same searches?

😂 I dig that I spent hours polishing formatting, wording, and structure so Harvester's documentation was as clear and concise as possible... and the next user who sees it goes "Alhadis, can you run this for me?" 😢 Really breaks my heart as a Linguist contributor... I don't think this relationship is working out. @pchaigno, our marriage is off.

Kidding, obvs. You'll need to wait for this thing to finish. Speed is throttled because loading anything too quickly causes your server to bark at me and tell me I'm abusing the site. All that hate when I'm just trying to help you guys. 💔

@JJ
Copy link
Contributor

JJ commented Apr 9, 2018 via email

@lildude
Copy link
Member

lildude commented Apr 9, 2018

😂 I dig that I spent hours polishing formatting, wording, and structure so Harvester's documentation was as clear and concise as possible... and the next user who sees it goes "Alhadis, can you run this for me?" 😢 Really breaks my heart as a Linguist contributor... I don't think this relationship is working out. @pchaigno, our marriage is off.

I'm 💔 that you mixed me up with your other bit-on-the-side 😆 🤣.

I was just being lazy as I've seen but not looked at your repo and code. I do really appreciate you taking the time to make it public and document it. I've got a bit of time now so will take a peek. 🙇

@Alhadis
Copy link
Collaborator

Alhadis commented Apr 9, 2018

Ugh, sorry that took so long. 😞 Results for extension:pod begin pod are in. @lildude, I haven't deduped any of the files as I normally would. Will leave it to you to decide how to sort them.

Kicking off a search for extension:pod6 begin pod right now...

@Alhadis
Copy link
Collaborator

Alhadis commented Apr 9, 2018

@lildude Alright, the .pod6 files are uploaded here. Let me know if there's anything else you need. =)

@nige123
Copy link

nige123 commented Apr 9, 2018

Personally I think locking in anything with "6" in the name is not a good move in the long term. I would suggest avoiding "pod6" and using something different - IMHO something like ".dod", ".dop" or ".zod" - would be better.

@b2gills
Copy link

b2gills commented Apr 9, 2018

To explain the situation

Let's say we have a file that is made up of both Perl5 and POD. When it has an extension of .pod or .POD it should be seen as POD, and otherwise be seen as Perl5.

Similarly a Perl6/POD6 file should be seen as POD6 if it has an extension of .pod .pod6 .POD .POD6, and otherwise as Perl6.

Ideally it would be nice to view a Perl5/Perl6 file also as POD/POD6. I can see why GitHub wouldn't support that, as most programming languages don't have a full fledged documentation format embedded in them. (POD has been used for writing a book)


Ignoring the other extensions, it would be nice if there was some way to always distinguish between POD and POD6 as there can be large semantic differences.

If there was something that is always in POD, and never in POD6, or vice versa, it could be used to determine which to parse it as. (No such luck)

The problem is 99% of POD syntax is valid POD6, and many files could be successfully parsed as either but again with major semantic differences.

A way to fairly reliably guess is to assume neither version and build up a heuristic by looking at which features are used. There are some features in POD that aren't in POD6, and many in POD6 that aren't in POD. If the file starts with =pod or has =cut it is definitely POD, if it starts with =begin pod it is probably POD6. If there is Perl code outside (or inside) of the POD it could also be added to the heuristic. If you see something like =for table  :caption<Table of Contents> it is POD6. If you see an indented =foobar it is probably POD6.

A less reliable way is to assume POD and switch to POD6 if there is a problem parsing it. (Note that a POD6 file starting with =begin pod may successfully result in an empty document if POD is used.)


This is the reason it was requested that .pod6 be added. There would be no confusion with a file that had it as an extension.

@Grinnz
Copy link

Grinnz commented Apr 9, 2018

As Perl and POD are currently classified separately, I think it would make sense for Perl6 and POD6 to be classified separately (especially if that is how rendering will be determined). But I have no stake in this other than to make sure that Perl POD files are still correctly classified and rendered. To that end, I just want to make clear that =pod and =cut are not required to be present in a .pod file for it to be valid Perl 5 POD (or even in pod embedded in a Perl file) -- any directive starting with = will activate the POD parser. I seem to recall that github may not render .pod files without a =pod present so that might be a different issue.

@JJ
Copy link
Contributor

JJ commented Apr 10, 2018 via email

@Grinnz
Copy link

Grinnz commented Apr 10, 2018

Yes, but the question is, if it's recognized as a Perl6 file and not a separate type, will github render it to HTML as POD6, while still leaving regular Perl6 files just source highlighted, even if they contain POD6? I assume this is desired behavior.

@JJ
Copy link
Contributor

JJ commented Apr 10, 2018 via email

@samcv
Copy link
Contributor Author

samcv commented Apr 10, 2018

Well the totally ideal situation would probably be rendering Pod6 files like how we do Pod. And only rendering it as html if it is named .pod or .pod6. I think if the end goal is to eventually render Pod6 as html, it should probably be its own filetype. Even though Pod6 is valid Perl6, if it is named as .pod or .pod6 the creator is designating that this is documentation and not code. So even though the Perl 6 interpreter is used to render Pod6, that doesn't mean .pod and .pod6 shouldn't be considered documentation.

@lildude
Copy link
Member

lildude commented Apr 11, 2018

Much appreciated @Alhadis. Whilst your figures don't match mine, they are very similar and confirm .pod6 as an extension is definitely not popular enough yet.

@samcv you make a very good point and I think you've got to the crux of the matter: this is really about code vs prose. Each are treated differently by Linguist. Code counts towards the repo language stats, prose does not. Accordingly Perl repos with .pod files will not show Pod in the language stats and a repo containing only Pod files will not show Perl or Pod in the language stats.

I think we should do the same for Pod 6 thus making #4083 the preferred PR of the two.

lildude pushed a commit that referenced this pull request Apr 11, 2018
* Mainly fixing problems with Perl heuristics

And also adding a little bit of text to the README file to help with local use and test.

* Adds new sample

* Adds a couple of samples more, not represented before

* Moves installation intructions to CONTRIBUTING.md

Refs #2309 and also changes github.com to an uniform capitalization.

* Correcting error. Great job, CI

* Moving another file

* Adds samples and new checks for perl/perl6

* Stupid mistake

* Changing regex for perl5 vs perl6

Initial suggestion by @pchaigno, slightly changed to eliminate false positives such as "classes" or "modules" at the beginning of a line in the =pod

BTW, it would be interesting to just eliminate these areas for language detection.

* Eliminates Rexfile from Perl6

And adds .pod6

* Followup to #2709

I just found I had this sitting here, so I might as well follow
instructions to fix it.

* Adds example for pod6

* Eliminates .pod because it's its own language

* Removes bad directory

* Reverting changes that were already there

* Restored CONTRIBUTING.md from head

I see installation of cmake is advised in README.md

* Eliminates `.pod6`

To leave way for #3366 or succeeding PRs.

* Removed by request, since we're no longer adding this extension

* Sorting by alphabetical order filenames

* Moved from sample to test fixtures
@pchaigno
Copy link
Contributor

I don't think this relationship is working out. @pchaigno, our marriage is off.

I'm keeping the house, you take the boat.

@lildude
Copy link
Member

lildude commented May 2, 2018

Closing as we've gone for the approach taken in #4083 which has just been merged. Thanks everyone. 🙇

@lildude lildude closed this May 2, 2018
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.