Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to just remove the layer #15

Open
dufferzafar opened this issue Apr 4, 2020 · 9 comments
Open

Option to just remove the layer #15

dufferzafar opened this issue Apr 4, 2020 · 9 comments

Comments

@dufferzafar
Copy link

I have a watermark layer that I just want removed from the file as it serves no purpose.

@SimonSegerblomRex
Copy link
Owner

Thanks for the feedback! I agree that listing the layer(s) you want to remove probably makes more sense for most PDF files. Feel free to make a pull request, or I will try to update the script when I have time.

Were you still able to use the script by listing all the layers you wanted to keep?

@dufferzafar
Copy link
Author

Sorry for any confusion but I didn't mean that. 😅

I've gone through the code and the script hides the Optional Content Groups by setting their base state to '/OFF'.

What I want to achieve, is to totally remove that OCG. The resulting PDF should not have any OCGs at all. It should just have the "normal" content.

I understand that this script doesn't have the functionality now, I will be very happy to create a PR for this, but I just couldn't figure out how to remove the OCG. I'm new to pikepdf

I thought that emptying out some objects would work, but it actually doesn't.

pdf.root.OCProperties.OCGs = pikepdf.Array()
pdf.root.OCProperties = pikepdf.Dictionary()

@SimonSegerblomRex
Copy link
Owner

Ah, I see! That would be nice.

Before writing this script I tried using poppler with

PopplerLayersIter *layer_iter = poppler_layers_iter_new(document);
do {
    PopplerLayer *layer = poppler_layers_iter_get_layer(layer_iter);
    if (g_strcmp0(poppler_layer_get_title(layer), "NameOfOCGToHide")) {
        poppler_layer_hide(layer);
    } else {
        poppler_layer_show(layer);
    }
} while (poppler_layers_iter_next(layer_iter));

I could render a PDF from the visible layers without any OCGs, but then I lost all the meta data and text information...

@dufferzafar
Copy link
Author

Hmm, So I've been trying random things with pikepdf. So far, nothing has worked. I couldn't find any other linux tool that can remove OCGs either. Do let me know if you're aware of any. Thanks!

@Llewellynvdm
Copy link

Please check my PR #16

@SimonSegerblomRex
Copy link
Owner

SimonSegerblomRex commented Nov 3, 2021

Please check my PR #16

Thanks for the pull request @Llewellynvdm! I had a quick look at it and tried running it on one of my PDF files. It doesn't seem like it actually removes the layers from the file. Can you explain how it improves the current behavior?

@Llewellynvdm
Copy link

Okay wow, not sure how that works on your end. No worries, it does work for me, and does in fact fix the issue of not finding the layers.

But if you think it a waist of time... all good, I work with thousands of PDF's everyday and hide and now remove there layers. yes I combined the scripts in the following way:

function getPDF() {
  # Get file
  cp "${1}" "${2}___tmp.pdf" || {
    # cli show the error
    echo "ERROR: We could not copy ${1} to ${2}___tmp.pdf"
    # show the error
    zenity --error --width=700 --height=100 --text="ERROR: We could not copy ${1} to ${2}___tmp.pdf" 2>/dev/null
    exit 1
  }
  # Remove Answer Key and Hide other layers
  pdflayers "${2}___tmp.pdf" "${2}__tmp.pdf" --remove "Answer Key" 2>/dev/null || {
    # cli show the error
    echo "ERROR: We could not remove the Answer Key Layers from ${2}___tmp.pdf"
    # show the error
    zenity --error --width=700 --height=100 --text="ERROR: We could not remove the Answer Key Layers from ${2}___tmp.pdf" 2>/dev/null
    exit 1
  }
  # flatten the PDF so the layers do not return
  gs -q -dSAFER -dBATCH -dNOPAUSE -dNOCACHE \
    -sDEVICE=pdfwrite -dPreserveAnnots=false \
    -dAutoRotatePages=/None \
    -sOutputFile="${2}_tmp.pdf" "${2}__tmp.pdf"  || {
    # cli show the error
    echo "ERROR: We could not flatten ${2}__tmp.pdf"
    # show the error
    zenity --error --width=700 --height=100 --text="ERROR: We could not flatten ${2}__tmp.pdf" 2>/dev/null
    exit 1
  }
  # remove only after
  rm -f "${2}___tmp.pdf"
  rm -f "${2}__tmp.pdf"
}

So in my use-case it works, and I intend to keep using it this way, but if does not really help in the greater scheme of things... we can just drop the PR 👍

@SimonSegerblomRex
Copy link
Owner

Okay wow, not sure how that works on your end. No worries, it does work for me, and does in fact fix the issue of not finding the layers.

But if you think it a waist of time... all good, I work with thousands of PDF's everyday and hide and now remove there layers. yes I combined the scripts in the following way:

function getPDF() {
  # Get file
  cp "${1}" "${2}___tmp.pdf" || {
    # cli show the error
    echo "ERROR: We could not copy ${1} to ${2}___tmp.pdf"
    # show the error
    zenity --error --width=700 --height=100 --text="ERROR: We could not copy ${1} to ${2}___tmp.pdf" 2>/dev/null
    exit 1
  }
  # Remove Answer Key and Hide other layers
  pdflayers "${2}___tmp.pdf" "${2}__tmp.pdf" --remove "Answer Key" 2>/dev/null || {
    # cli show the error
    echo "ERROR: We could not remove the Answer Key Layers from ${2}___tmp.pdf"
    # show the error
    zenity --error --width=700 --height=100 --text="ERROR: We could not remove the Answer Key Layers from ${2}___tmp.pdf" 2>/dev/null
    exit 1
  }
  # flatten the PDF so the layers do not return
  gs -q -dSAFER -dBATCH -dNOPAUSE -dNOCACHE \
    -sDEVICE=pdfwrite -dPreserveAnnots=false \
    -dAutoRotatePages=/None \
    -sOutputFile="${2}_tmp.pdf" "${2}__tmp.pdf"  || {
    # cli show the error
    echo "ERROR: We could not flatten ${2}__tmp.pdf"
    # show the error
    zenity --error --width=700 --height=100 --text="ERROR: We could not flatten ${2}__tmp.pdf" 2>/dev/null
    exit 1
  }
  # remove only after
  rm -f "${2}___tmp.pdf"
  rm -f "${2}__tmp.pdf"
}

So in my use-case it works, and I intend to keep using it this way, but if does not really help in the greater scheme of things... we can just drop the PR 👍

Thanks for providing the additional information! It sounds like you solved the issue in the way I initially interpreted the title of this issue (which wasn't what @dufferzafar was looking for). I'll have another look at the pull request next week.

@mara004
Copy link
Contributor

mara004 commented Nov 27, 2021

I've gone through the code and the script hides the Optional Content Groups by setting their base state to '/OFF'.
What I want to achieve, is to totally remove that OCG. The resulting PDF should not have any OCGs at all. It should just
have the "normal" content.

I couldn't find any other linux tool that can remove OCGs either. Do let me know if you're aware of any. Thanks!

@dufferzafar As mentioned in the qpdf thread, there is PDFStitcher, which is capable of actually removing OCG content from the file. (Disclaimer: I have contributed to PDFStitcher a bit.)

I think the problem here is that some programs (such as Inkscape) don't take the on/off/hidden configuration into account and just display all available content. For these programs, it is necessary to remove the actual layer data, not just the handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants