Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] The invoice parser should support attachement that is not in the standard list #240

Open
2 tasks done
alimranahmed opened this issue Jan 24, 2025 · 7 comments
Open
2 tasks done
Assignees
Labels
enhancement New feature or request

Comments

@alimranahmed
Copy link

alimranahmed commented Jan 24, 2025

Before Submitting an Issue

Thank you for contributing to our project! 🎉 Before opening a new issue, please review our Contributing Guidelines.

To help us assist you efficiently, we require confirmation of the following:

  • I have read the documentation and could not find a solution to my issue.
  • I have searched open and closed issues to ensure my issue has not already been reported.

Issues that do not meet these requirements may be closed without further action. Thank you for your understanding and support! 🙏

Describe the feature

In ZugferdDocumentPdfReaderExt class, we have the latest list of allowed attached file defined in constant:

public const ATTACHMENT_FILENAMES = [
        'ZUGFeRD-invoice.xml'/*1.0*/,
        'zugferd-invoice.xml'/*2.0*/,
        'factur-x.xml'/*2.1*/,
        'xrechnung.xml'
    ];

But in some cases, a provider of invoice may not use any of this name. In reality, this is the case of many invoice provider. Some provider simply use the name like {invoice_number}.xml.

So, the library should provide the flexibility to bypass this list or provide their custom list. This is not possible at this moment.

Is it possible by extending the current implementation?
As this is a constant defined and not possible to by pass the list, user may want to extend this whole class ZugferdDocumentPdfReaderExt and change only the method collectAttachmentsFromPdfContent to bypass the alowed attachment list. But in this method, an array property called $attachmentContentList used to store the list of invoice. In the extended class we cannot access this $attachmentContentList as it's declared as private.

For example:

class ExtendedZugferdDocumentPdfReaderExt extends ZugferdDocumentPdfReaderExt
{
    protected function collectAttachmentsFromPdfContent(string $pdfContent): ZugferdDocumentPdfReaderExt
    {
        $this->attachmentContentList = []; // This is not possible because it's defined private in parent class.

        ....
        ....

        return $this;
    }

Or if we try to override resolveInvoiceDocumentContent(), this is also not possible because this method used attachmentContentList which again private property.

Expected behavior
There can be two solutions:

  1. A way for the library user to skip this allowed list and also provide their own list. This is the best solution.
  2. At lease making the ZugferdDocumentPdfReaderExt extendable so that using standard OOP someone can override the functionalities based on their need.

Screenshots
Not applicable

Sample Code
Already mentioned in previous section.

Additional context
In German legal firms, many don't use the standard attachement that is listed in the library.

@alimranahmed alimranahmed added the enhancement New feature or request label Jan 24, 2025
@horstoeko
Copy link
Owner

Hi @alimranahmed,

Thanks for your contribution. This is indeed an interesting feature request. In fact, I haven't come across anything comparable so far. I wonder if the invoice sender is aware that he is violating the specification :-)

Perhaps it would be possible to read the attachments using the following ZugferdDocumentPdfReaderExt methods and pass them manually to the ZugferdDocumentReader:

  • getAdditionalDocumentContentsFromFile
  • getAdditionalDocumentContentsFromContent

You should at least consider whether this would be an option.

@alimranahmed
Copy link
Author

alimranahmed commented Jan 24, 2025

Wow, thanks for your quick response. Highly appreciated.

Well yes it's no impossible, I have already overriden ZugferdDocumentPdfReader to use my extended ZugferdDocumentPdfReaderExt

class ExtendedZugferdPdfReader extends ZugferdDocumentPdfReader
{
    public static function getXmlFromContent(string $pdfContent): string
    {
        return ExtendedZugferdDocumentPdfReaderExt::getInvoiceDocumentContentFromContent($pdfContent);
    }
}

Now let's say instead of using my ExtendedZugferdDocumentPdfReaderExt, I can use original ZugferdDocumentPdfReaderExt ::getAdditionalDocumentContentsFromContent(), that gives me the additional documents. But there are still two problems:

  1. The return type getAdditionalDocumentContentsFromContent is array but the return type of getInvoiceDocumentContentFromContent() is string. So, it's not impossible but not convinient.
  2. Besides, now, if I simply replace the invoiceDocument with invoiceAdditionalDocument, that won't be able to handle if we have standard invoice attachements. Yes, I can try catch if if invoice document not found I can get additional documents.

By the way, I was fixing this issue the the code above mentioned and then ExtendedZugferdDocumentPdfReaderExt looks like:

class ExtendedZugferdDocumentPdfReaderExt extends ZugferdDocumentPdfReaderExt
{
    #[Override]
    public static function fromContent(string $pdfContent): ZugferdDocumentPdfReaderExt
    {
        return (new self)->collectAttachmentsFromPdfContent($pdfContent);
    }

    /**
     * We are using reflection to get the attachmentContentList because this is defined as private property
     * This is a limitation of the library as it isn't extendable enough
     */
    public function getAttachmentContentList(): array
    {
        $reflection = new ReflectionClass(ZugferdDocumentPdfReaderExt::class);
        $property = $reflection->getProperty('attachmentContentList');

        $property->setAccessible(true);

        return $property->getValue($this);
    }

    #[Override]
    public function resolveInvoiceDocumentContent(): string
    {
        $attachments = $this->getAttachmentContentList();

        return $attachments[0][ZugferdDocumentPdfReaderExt::ATTACHMENT_KEY_CONTENT];
    }
}

I am using reflection to access the private property, but as mentioned in my 2nd point, I can eleminate this extended class entirely.

Still I would vote for this feature.

@horstoeko
Copy link
Owner

Hi @alimranahmed,

OK. I still haven't quite understood the “problem”, but if you are happy to extend the list of file names for the permitted invoice attachments, I'll think of something. Please give me some time to design this.

Kind regards

@alimranahmed
Copy link
Author

alimranahmed commented Jan 24, 2025

@horstoeko,

Until we have this feature to have any name as invoice attachement, here is what I am doing now. It may help others if they face the same issue:

class ExtendedZugferdPdfReader extends ZugferdDocumentPdfReader
{
    public static function getXmlFromContent(string $pdfContent): string
    {
        try {
            $content = ZugferdDocumentPdfReaderExt::getInvoiceDocumentContentFromContent($pdfContent);
        } catch (ZugferdNoPdfAttachmentFoundException) {
            $additionalDocs = ZugferdDocumentPdfReaderExt::getAdditionalDocumentContentsFromContent($pdfContent);
            $content = Arr::get($additionalDocs, '0.content');
        }

        return $content;
    }
}

Then, I can call ExtendedZugferdPdfReader::getXmlFromContent($pdfContent) which will return additional document's content even though the attachements doesn't follow the stardard name.

@horstoeko
Copy link
Owner

Hi @alimranahmed

Which way of implementing you would prefer? Do you have some ideas?

Kind regards

@alimranahmed
Copy link
Author

alimranahmed commented Jan 24, 2025

One simple idea can be to pass a flag with the pdf content to allow non-standard attachments. But I normally don't prefer passing flags. In the library we don't use builder patter to build a object of the reader. Using builder patter would have been a cleaner way to do this.

ZugferdDocumentPdfReader::forContent($fileContent)
    ->allowCustomAttachementName(); // this can set a property to object.

But builder pattern is not what this library used, as result adapting this can be too much change. So, may be something like:

ZugferdDocumentPdfReader::readAndGuessFromContent(string $pdfContent, bool $allowCustomAttacnmentName);

Or may be instead of passing it here, pass this flag where else you feel more suitable.

@horstoeko
Copy link
Owner

Hi @alimranahmed,

But that doesn't sound good to me. It is also common practice to add supplementary attachments to the PDF, such as time sheets or copies of delivery notes. Then your logic wouldn't work - he doesn't know which attachment is the right one.

Kind regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants