Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: CDATA nesting into CDATA is avoided as it is prohibited #231

Conversation

myhailo-chernyshov-rg
Copy link
Contributor

Description

During some OLX nodes creation (HTML, discussions) the entire content is wrapped into CDATA tag. But such content can already contain CDATA tags inside it, which causes errors. To fix it, the existed CDATA tag occurences are deleted (but its content is preserved) before such OLX nodes creation.

Steps to reproduce

  1. Update your Common Cartridge course .imscc dump (or use the attached one cdata_bug_dump.imscc.zip but remove .zip extention leaving .imscc): add CDATA tag to any resource with webcontent type.

    For example, let's assume your dump's imsmanifest.xml looks like

    <manifest
        xmlns="http://www.imsglobal.org/xsd/imsccv1p3/imscp_v1p1"
        xmlns:lomr="http://ltsc.ieee.org/xsd/imsccv1p3/LOM/resource"
        xmlns:lomm="http://ltsc.ieee.org/xsd/imsccv1p3/LOM/manifest"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" identifier="iba028b49-207c-4a5c-a9be-269766ef59e4" xsi:schemaLocation="http://ltsc.ieee.org/xsd/imsccv1p3/LOM/resource http://www.imsglobal.org/profile/cc/ccv1p3/LOM/ccv1p3_lomresource_v1p0.xsd http://www.imsglobal.org/xsd/imsccv1p3/imscp_v1p1 http://www.imsglobal.org/profile/cc/ccv1p3/ccv1p3_imscp_v1p2_v1p0.xsd http://ltsc.ieee.org/xsd/imsccv1p3/LOM/manifest http://www.imsglobal.org/profile/cc/ccv1p3/LOM/ccv1p3_lommanifest_v1p0.xsd">
        <metadata>
            <schema>IMS Common Cartridge</schema>
            <schemaversion>1.3.0</schemaversion>
            <lomm:lom>
                <lomm:general>
                    <lomm:title>
                        <lomm:string language="en-US">Advanced Math course</lomm:string>
                    </lomm:title>
                </lomm:general>
            </lomm:lom>
        </metadata>
        <organizations>
            <organization identifier="i30ff4307-bf99-439b-b82b-4c9ff09f7298" structure="rooted-hierarchy">
                <item identifier="i323a3358-8628-4658-ba8b-880db1e0386a">
                    <item identifier="i3ae25411-acfb-46b1-a7ea-a4ec66b7eea0">
                        <title>Week 1 Module</title>
                        <item identifier="i99ed94d4-6f8e-4320-a7b6-ea67356cfedd">
                            <title>Class 1</title>
                            <item identifier="i39520b32-7c2b-44d1-a39f-2dc244c1ac2e" identifierref="i28631a72-ae33-412e-979e-175d55153529_R">
                                <title>Watch: Class 1 Introduction, Outline, Quiz, Bonus Question (1 minute)</title>
                            </item>
                        </item>
                    </item>
                </item>
                <metadata>
                    <lomm:lom/>
                </metadata>
            </organization>
        </organizations>
        <resources>
            <resource identifier="i28631a72-ae33-412e-979e-175d55153529_R" type="webcontent">
                <file href="сontent/i7847e58b-487f-4e39-85ca-23ecfaf4c067/Watch Class 1 Introduction.html"/>
            </resource>
        </resources>
    </manifest>

    It references to сontent/i7847e58b-487f-4e39-85ca-23ecfaf4c067/Watch Class 1 Introduction.html file inside the dump. Let's assume its content looks like

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
        <head>
            <title>CDATA containing HTML document</title>
        </head>
        <body>
            <script type="text/javascript">
                var htmlContent = "<div>Hello, world!</div>";
                alert(htmlContent);
            </script>
        </body>
    </html>

    For example, let's wrap

    var htmlContent = "<div>Hello, world!</div>";
    alert(htmlContent);

    rows into CDATA tag, it will start to look like

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
        <head>
            <title>CDATA containing HTML document</title>
        </head>
        <body>
            <script type="text/javascript">
                // <![CDATA[
                var htmlContent = "<div>Hello, world!</div>";
                alert(htmlContent);
                // ]]>
            </script>
        </body>
    </html>
  2. Run the cc2olx script:

    cc2olx -r zip -i path/to/imscc
  3. See a ValueError message:

    Traceback (most recent call last):
      File "/home/misha/work/cc2olx/src/cc2olx/main.py", line 67, in main
        convert_one_file(input_file, temp_workspace, link_file, passport_file)
      File "/home/misha/work/cc2olx/src/cc2olx/main.py", line 27, in convert_one_file
        olxfile.write(olx_export.xml())
      File "/home/misha/work/cc2olx/src/cc2olx/olx.py", line 64, in xml
        return self.doc.toprettyxml()
      File "/usr/lib/python3.10/xml/dom/minidom.py", line 60, in toprettyxml
        self.writexml(writer, "", indent, newl, encoding, standalone)
      File "/usr/lib/python3.10/xml/dom/minidom.py", line 1828, in writexml
        node.writexml(writer, indent, addindent, newl)
      File "/usr/lib/python3.10/xml/dom/minidom.py", line 897, in writexml
        node.writexml(writer, indent+addindent, addindent, newl)
      File "/usr/lib/python3.10/xml/dom/minidom.py", line 897, in writexml
        node.writexml(writer, indent+addindent, addindent, newl)
      File "/usr/lib/python3.10/xml/dom/minidom.py", line 897, in writexml
        node.writexml(writer, indent+addindent, addindent, newl)
      [Previous line repeated 1 more time]
      File "/usr/lib/python3.10/xml/dom/minidom.py", line 893, in writexml
        self.childNodes[0].writexml(writer, '', '', '')
      File "/usr/lib/python3.10/xml/dom/minidom.py", line 1223, in writexml
        raise ValueError("']]>' not allowed in a CDATA section")
    ValueError: ']]>' not allowed in a CDATA section
    {main.py:78} - Conversion completed
    

Deadline

"None"

@openedx-webhooks
Copy link

openedx-webhooks commented Nov 19, 2024

Thanks for the pull request, @myhailo-chernyshov-rg!

What's next?

Please work through the following steps to get your changes ready for engineering review:

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.

🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads

🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

🔘 Let us know that your PR is ready for review:

Who will review my changes?

This repository is currently unmaintained.

To get help with finding a technical reviewer, tag the community contributions project manager for this PR in a comment and let them know that your changes are ready for review:

  1. On the right-hand side of the PR, find the Contributions project, click the caret in the top right corner to expand it, and check the "Primary PM" field for the name of your PM.
  2. Find their GitHub handle here.

Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Nov 19, 2024
@mphilbrick211 mphilbrick211 added the needs test run Author's first PR to this repository, awaiting test authorization from Axim label Nov 26, 2024
@mphilbrick211
Copy link

Hi @myhailo-chernyshov-rg! Thanks for this contribution! It looks like you're contributing on behalf of Raccoon Gang - please have your manager reach out to [email protected] to have you added to Raccoon Gang's existing entity agreement with us. Thank you!

@mphilbrick211 mphilbrick211 removed the needs test run Author's first PR to this repository, awaiting test authorization from Axim label Dec 9, 2024
@mphilbrick211
Copy link

@ormsbee are you able to merge this?

@ormsbee ormsbee merged commit 8e239ad into openedx:master Dec 11, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants