Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gatsby-transformer-remark not transforming excerpts #4459

Closed
vojtechruz opened this issue Mar 9, 2018 · 28 comments
Closed

gatsby-transformer-remark not transforming excerpts #4459

vojtechruz opened this issue Mar 9, 2018 · 28 comments

Comments

@vojtechruz
Copy link
Contributor

vojtechruz commented Mar 9, 2018

I am using gatsby-transformer-remark with the following configuraton:

{
      resolve: `gatsby-transformer-remark`,
      options: {
        excerpt_separator: `<!--more-->`,
        plugins: [
          {
            resolve: `gatsby-remark-images`,
            options: {
              maxWidth: 590,
            },
          },
          {
            resolve: `gatsby-remark-responsive-iframe`,
            options: {
              wrapperStyle: `margin-bottom: 1.0725rem`,
            },
          },
          'gatsby-remark-prismjs',
          'gatsby-remark-copy-linked-files',
        ],
      },
    },

According to this post https://using-remark.gatsbyjs.org/excerpts/ I am querying excerpts from my articles like this:


export const pageQuery = graphql`
  query IndexQuery {
    site {
      siteMetadata {
        title
      }
    }
    allMarkdownRemark(sort: { fields: [frontmatter___date], order: DESC }) {
      edges {
        node {
          excerpt
          fields {
            slug
          }
          frontmatter {
            date(formatString: "DD MMMM, YYYY")
            title
            tags
            featuredImage {
                childImageSharp{
                    sizes(maxWidth: 200) {
                        ...GatsbyImageSharpSizes_tracedSVG
                    }
                }
            }
          }
        }
      }
    }
  }
`

My article in markdown uses a separater defined in config above:


---
title: Break the Java Generics Naming Convention?
date: "2016-04-30T22:12:03.284Z"
tags: ['Java']
path: '/break-java-generics-naming-convention'
featuredImage: './break-java-generics-maning-convention.jpg'
---

Choosing **descriptive**, *intent-revealing* names is one of the key principles of writing clean code, which is easy to read and understand and usually requires little to no comments. According to the official convention, generic type parameter names are supposed to be just one capital letter. Should you break this convention to make your names more descriptive?
<!--more-->

However, the excerpt I am getting is plain text and not transformed from markdown:

clipboard02


Is this intended behavior? How can I get excerpt which is already formatted? Thank you.


"gatsby": "^1.9.223",
"gatsby-transformer-remark": "^1.7.34",

Node: v9.5.0
NPM: 5.6.0

@vojtechruz
Copy link
Contributor Author

This only happens when using:
excerpt_separator

Otherwise, excerpt contains plain text with stripped markdown formatting.

@vojtechruz
Copy link
Contributor Author

vojtechruz commented Mar 12, 2018

It is not so bad that you lose formatting such as bold or italics, but what's not nice is that escape slashes are preserved:
Eg:
How to make the most of the IntelliJ IDEA\'s view...

@fk fk added the API/Plugins label Mar 14, 2018
@m-allanson
Copy link
Contributor

It looks like the gatsby-transformer-remark's excerpt handling is here. If you or anyone else is interested in doing some investigation, a PR would be great.

@docwhat
Copy link
Contributor

docwhat commented Mar 17, 2018

I've been digging around and I don't see anything obvious.

It took me a while, but I figured out that the excerpt_separator is actually defined and used by grey-matter.

I can't figure out where the transformation happens and what is passed to grey-matter.

Can someone point me in the right direction? This is really annoying.

docwhat added a commit to docwhat/website that referenced this issue Mar 17, 2018
It's broken due to bug gatsbyjs/gatsby#4459 — excerpts using the
separator are not transformed from Markdown to HTML.
@omeid
Copy link

omeid commented Apr 22, 2018

Although not directly related to this issue, the markdown plugin doesn't parse the title or any part of the frontmatter either.

@Undistraction
Copy link
Contributor

Undistraction commented Apr 29, 2018

The docs here say

you can specify an excerpt_separator, as well as any of the other options mentioned here, in the gatsby-config.js file.

However neither excerpt_separator or excerpt, docs, which should allow you to pass a function for generating the excerpt result it anything other than the default 140 char excerpt.

"gatsby": "^1.9.131",
"gatsby-transformer-remark": "^1.7.15"

@arlobelshee
Copy link

The issue is at extend-node-type.js, line 128. It passes node.internal.content through the remark processor at that point, getting an AST, which it then does further stuff with and eventually transforms to Html, HtmlAst, or other forms (eg, in getHtml() on line 276).

The problem is that it doesn't do anything with node.excerpt. So that just passes through the whole process. And, at line 339, it then resolves the promise with the unprocessed excerpt.

To fix this, we'd have to either add the excerpt to the cache (as its own entry, or by storing a pair of ASTs), or just assume that the excerpt will only really be parsed once and do it directly. I assume we still want to keep the async processing and all the plugins, so it seems to make sense to me to change the cached value to be a pair of ASTs, with two separate promise chains to generate them.

Extracting a function that applies all the plugins, then does the remark processing, then applies all the plugins again, would make it easier to re-use the entire functionality to do both parts - especially as this is all async.

I don't know this code at all, so would prefer to leave changing it to someone who knows better. Hope this helps unblock the bug - I really want it fixed!

@arlobelshee
Copy link

I just submitted PR #5586 with a possible fix for this. It implements my above-mentioned plan. However, I am not a committer, so am not sure if it works.

@arlobelshee
Copy link

I could use some help from someone who knows the code better. It seems like my change has somehow altered the type of node.excerpt (even though I didn't change its declared type from string). I have no idea where and how the schema expectations are set. I also don't know whether it is a good thing for the excerpt to now have HTML in it, or to create a new node.excerptFormatted which does.

Please help!

@strubell
Copy link

strubell commented Jul 8, 2018

This is still a problem. Any chance of this PR getting merged, or can anyone share a workaround?

@MrSaints
Copy link

MrSaints commented Jul 8, 2018

@strubell, here's a workaround I've been using:

node.html.split("<!--more-->")[0]

🙈

@karl-run
Copy link

That works for simple things. But if you have things like anchor links in your post the URL needs to be the slug + the link of the markdown. I guess one could massage the links in the freshly split HTML but that seems nasty.

@resir014
Copy link
Contributor

resir014 commented Oct 6, 2018

Hey, any updates on this? I just stumbled upon this issue as well.

DSchau pushed a commit that referenced this issue Nov 27, 2018
This is the beginnings of a fix for #4459

Right now, the fix works when excerpt_separator is provided. I have not yet implemented a fix for when the excerpt is generated using the `pruneLength` setting, though I have thoughts on how to solve it. My plan to is do a pre-order traversal over the html AST, accumulating nodes until the pruneLength is reached.

I would love some preliminary feedback to get a sense of how the contributors feel about this PR so far. Here are some of my initial thoughts:

1. The work in the `excerpt` resolve function is a little hacky. We are directly invoking a variety of functions get our HTML. I'm curious if anyone has a more streamlined approach for this.

2. We are invoking `getMarkdownAST` (previously getAST) twice, once for the excerpt and once for the entire page. I'm worried that this is going to break non-idempotent existing plugins since they will now run twice.

3. I tore apart the `getAST` function a bit and think I did a good job simplifying it. The big change is that we don't generate multiple promises anymore. i think this goes a long way towards improving readability.

I'd love to hear people's thoughts.
@gatsbot
Copy link

gatsbot bot commented Jan 20, 2019

Old issues will be closed after 30 days of inactivity. This issue has been quiet for 20 days and is being marked as stale. Reply here or add the label "not stale" to keep this issue open!

@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Jan 20, 2019
@vojtechruz
Copy link
Contributor Author

vojtechruz commented Jan 21, 2019

not stale

gpetrioli pushed a commit to gpetrioli/gatsby that referenced this issue Jan 22, 2019
This is the beginnings of a fix for gatsbyjs#4459

Right now, the fix works when excerpt_separator is provided. I have not yet implemented a fix for when the excerpt is generated using the `pruneLength` setting, though I have thoughts on how to solve it. My plan to is do a pre-order traversal over the html AST, accumulating nodes until the pruneLength is reached.

I would love some preliminary feedback to get a sense of how the contributors feel about this PR so far. Here are some of my initial thoughts:

1. The work in the `excerpt` resolve function is a little hacky. We are directly invoking a variety of functions get our HTML. I'm curious if anyone has a more streamlined approach for this.

2. We are invoking `getMarkdownAST` (previously getAST) twice, once for the excerpt and once for the entire page. I'm worried that this is going to break non-idempotent existing plugins since they will now run twice.

3. I tore apart the `getAST` function a bit and think I did a good job simplifying it. The big change is that we don't generate multiple promises anymore. i think this goes a long way towards improving readability.

I'd love to hear people's thoughts.
@viktorbengtsson
Copy link

+1 Not stale.

@resir014 resir014 added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Jan 23, 2019
@lednhatkhanh
Copy link

Hello, any updates on this.

@DSchau
Copy link
Contributor

DSchau commented Mar 4, 2019

Yeah!

So - we added an HTML addition to the excerpt functionality relatively recently. Using this HTML excerpt functionality will transform the excerpt from plain text to HTML, if there's HTML to be transformed. For example, in the above example, the excerpt would translate italics/bold/etc. into their corresponding HTML tags.

It can be used like so:

{
  # grab your markdownRemark node however you prefer
  markdownRemark(id: "123412341234") {
    excerpt(format: HTML)
  }
}

This still works with the excerpt_seperator functionality.

Here's an example repo demonstrating this functionality

Here's the Markdown file showing the separator

Going to close this as answered, but please feel free to reply if we can help further. Thanks all!

@DSchau DSchau closed this as completed Mar 4, 2019
@StormPooper
Copy link

@DSchau any suggestions for if we just want the text of the excerpt as per the default excerpt behaviour, but with a defined separator for some posts and relying on the prune value for the rest?

@markvital
Copy link

This still works with the excerpt_seperator functionality.

@DSchau this still doesn't work with format:PLAIN with custom separator.

I cloned [your example](Here's an example repo demonstrating this functionality) and changed format from HTML to plain in BlogIndex:

          excerpt(format: PLAIN)
          fields {
            slug
          }

Textual excerpts don't show properly:
Screen Shot 2019-04-02 at 6 50 53 PM

Am I doing something wrong?

@DSchau
Copy link
Contributor

DSchau commented Apr 2, 2019

@markvital what would you expect to happen there?

That is plain text. Would you expect Markdown specific stuff to be stripped?

@markvital
Copy link

markvital commented Apr 2, 2019

Oh I see.
@DSchau Yes, I expected all tags to be removed, so I can insert clean text into meta tags, as described in docs.
This is how it worked for me, until I set "excerpt_separator": `<!--more--> in gatsby-config.js.

@corvidism
Copy link

@DSchau awesome, I just stumbled on this problem and I'm glad I don't have to work around it! One problem though: if I set my excerpts to (format: HTML) and use a custom excerpt_separator, when the file doesn't contain the separator it creates the excerpt from the last paragraph of the text. It used the first 140 characters before I changed the format.

@docwhat
Copy link
Contributor

docwhat commented May 12, 2019

One problem though: if I set my excerpts to (format: HTML) and use a custom excerpt_separator, when the file doesn't contain the separator it creates the excerpt from the last paragraph of the text. It used the first 140 characters before I changed the format.

That’s issue #12386

Sent with GitHawk

@riywo
Copy link
Contributor

riywo commented Jun 3, 2019

I don't think the original issue is solved by excerpt(format: HTML) since HTML tags are not suitable for SEO metadata. We probably need fix for PLAIN's behaviour.

Like @markvital mentioned, if I don't specify excerpt_separator, the markdown annotations like italic/bold/link are processed to HTML and excerpt only contains visible text strings (like innerHTML of a tag). However, if I specify excerpt_separator, the original markdown content before the separator is returned as excerpt without any markdown and HTML processing (the raw markdown text).

Both behaviours should be consistent, preferably the former, I think.

@riywo
Copy link
Contributor

riywo commented Jun 4, 2019

Since this issue has been closed, do we need another issue?

FYI: This is an example unit test to see how it works differently with excerpt_separator riywo@4178bbd

 FAIL  src/__tests__/extend-node.js (6.786s)
  ● Excerpt is generated correctly from schema › excerpt does have missing words and extra spaces with excerpt_separator

    expect(received).toMatch(expected)

    Expected value to match:
      "Where oh where is that pony?"
    Received:
      "
    Where oh [*where*](nick.com) **_is_** ![that pony](pony.png)?

    "

      419 |       `,
      420 |     node => {
    > 421 |       expect(node.excerpt).toMatch(`Where oh where is that pony?`)
          |                            ^
      422 |     },
      423 |     { pluginOptions: { excerpt_separator: `<!-- end -->` } }
      424 |   )

      at toMatch (src/__tests__/extend-node.js:421:28)
      at test (src/__tests__/extend-node.js:91:11)

@docwhat
Copy link
Contributor

docwhat commented Jun 4, 2019

Since this issue has been closed, do we need another issue? ...

@riywo I think so. Either a new format TEXT or fix PLAIN and add MARKDOWN

Sent with GitHawk

@riywo
Copy link
Contributor

riywo commented Jun 12, 2019

FYI: I have created a PR to fix this behavior #14723

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests