Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow digits in the local part of prefixed names #470

Merged
merged 1 commit into from
Nov 8, 2024

Conversation

mightymax
Copy link
Contributor

@mightymax mightymax commented Nov 7, 2024

Fixes issue #469: digits are now correctly used as local names, e.g.:

@prefix sdo: <https://schema.org/> .

[] a sdo:3DModel .

Summary by CodeRabbit

  • New Features

    • Enhanced prefix matching in the N3Writer, allowing digits as valid starting characters for prefixed names.
    • Improved serialization functionality for various literal types in the Writer class.
  • Bug Fixes

    • Adjusted expected output in serialization tests to reflect more compact prefix notation.
  • Tests

    • Expanded test cases for serialization of literals, blank nodes, and lists to ensure robust handling of various scenarios.

Copy link
Contributor

coderabbitai bot commented Nov 7, 2024

Walkthrough

The changes in this pull request involve modifications to the N3Writer class in src/N3Writer.js, specifically updating the _prefixRegex regular expression to allow digits as valid starting characters for prefixed names. Additionally, the test suite for the N3Writer has been enhanced with new test cases for various literal types and expanded coverage for blank nodes and lists. The serialization tests have also been updated to reflect changes in expected output formats, ensuring comprehensive validation of the writer's functionality.

Changes

File Change Summary
src/N3Writer.js Modified _prefixRegex in addPrefixes method to allow digits as valid starting characters for prefixed names.
test/N3StreamWriter-test.js Updated expected output in serialization test case to use compact prefix notation instead of full URIs.
test/N3Writer-test.js Added new test cases for serialization of various literals (boolean, integer, decimal, double) and expanded tests for blank nodes and lists. Adjusted existing tests to validate prefix usage and serialization output formats.

Possibly related issues

🐇 In a world of prefixes, bright and new,
I hopped and changed the regex too!
Now digits can join the fun,
In Turtle files, they’ll surely run.
With tests that check each little case,
Our writer's ready, setting the pace! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@mightymax mightymax changed the title Allow digits in the local part of prefixed names fix: allow digits in the local part of prefixed names Nov 7, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
test/N3StreamWriter-test.js (1)

42-46: Consider expanding test cases for comprehensive digit support.

While the current test case covers basic digit support (3a, a3), consider adding test cases for:

  • Multiple digits (e.g., a:123)
  • Only digits (e.g., a:456)
  • Digits in different positions (e.g., a:1a2b3)
src/N3Writer.js (1)

300-300: Consider adding explicit test cases for edge cases.

While the change is correct, consider adding test cases for various edge cases:

  1. Local names starting with different digits (0-9)
  2. Mixed digit-letter combinations
  3. Multiple prefixed names with digits in a single document

Would you like me to help generate comprehensive test cases for these scenarios?

test/N3Writer-test.js (2)

274-274: Ensure consistency in prefix usage in test cases

In the test case starting at line 274, the prefixes used in the serialized output include a:3a, b:3a, and b:a3. While digits are now allowed in the local part of prefixed names, it's important to ensure that the test cases cover various scenarios, including prefixes that start with digits and those that contain digits in different positions.

Consider adding additional test cases where the local name starts with a digit, ends with a digit, and contains digits in between letters to fully validate the new functionality.


905-905: Validate serialization of prefixed names with digits in various positions

To thoroughly test the new regex pattern in _prefixRegex, it's important to include test cases that cover prefixed names with digits in different positions within the local part. This includes local names that:

  • Start with a digit
  • End with a digit
  • Contain digits in the middle

Adding these test cases will ensure that the writer correctly serializes all valid prefixed names according to the updated specification.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between da8b2d0 and accfc90.

📒 Files selected for processing (3)
  • src/N3Writer.js (1 hunks)
  • test/N3StreamWriter-test.js (1 hunks)
  • test/N3Writer-test.js (2 hunks)
🔇 Additional comments (2)
test/N3StreamWriter-test.js (1)

44-44: Test expectation correctly updated to validate digit support in prefixed names.

The change from '<http://a.org/3a> <http://a.org/b#3a> b:a3.\n' to 'a:3a b:3a b:a3.\n' properly validates that the StreamWriter now supports digits in prefixed names, which is the main objective of this PR.

Let's verify the test coverage for various digit positions in prefixed names:

✅ Verification successful

Let me analyze the test coverage for digit support in prefixed names based on the search results.

The search results show several test cases that cover digits in prefixed names:

  1. In N3StreamWriter-test.js and N3Writer-test.js:
['http://a.org/3a', 'http://a.org/b#3a', 'http://a.org/b#a3']
'a:3a b:3a b:a3.\n'
  1. In N3Lexer-test.js:
'leg:3032571 isbn13:9780136019701'

The test coverage appears comprehensive as it includes:

  • Digits at start of local name (3a)
  • Digits at end of local name (a3)
  • Multiple digits in local name (3032571, 9780136019701)
  • Different prefixes with digits (leg:, isbn13:)

Test coverage for digits in prefixed names is sufficient and validates the change.

The test cases properly verify that the N3 writer can handle digits in various positions within prefixed names, which aligns with the PR's objective of allowing digits in the local part of prefixed names.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if we have sufficient test coverage for different digit positions in prefixed names
# Expected: Tests should cover digits at start, middle, and end of local names

# Test: Search for test cases with digits in prefixed names
rg -A 5 "['\"].*:[0-9]|[0-9].*:.*[\n'\"]" test/

Length of output: 82391

src/N3Writer.js (1)

300-300: LGTM! Regex update allows digits in local names.

The modification to include digits (0-9) in the first character class of the local name pattern correctly implements support for prefixed names starting with digits (e.g., sdo:3DModel). This change aligns with the RDF 1.1 Turtle specification.

Let's verify the test coverage for this change:

✅ Verification successful

Regex change is well-tested and compliant

The codebase already has comprehensive test coverage for prefixed names with leading digits:

  • N3Lexer explicitly tests cases like leg:3032571 and isbn13:9780136019701
  • Both N3Writer and N3StreamWriter verify serialization of digit-prefixed names with a:3a, b:3a
  • Tests cover both the lexing and writing aspects of the functionality

The regex modification aligns with existing test cases, confirming it's a valid enhancement rather than a breaking change.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if there are test cases covering the new digit-prefixed functionality
# Expected: Find test cases that verify the serialization of prefixed names starting with digits

# Search for relevant test cases in test files
rg -l "test|describe|it.*['\"].*\d+.*['\"]" "test/"

# Search for actual usage of digit-prefixed names in tests
rg ":\d+" "test/"

Length of output: 682


Script:

#!/bin/bash
# Let's examine the specific test cases in N3Writer-test.js and N3StreamWriter-test.js
# that contain digit-prefixed names to ensure they adequately test the new functionality

# Check N3Writer-test.js content around the digit-prefixed test
rg -B 5 -A 5 "a:3a" test/N3Writer-test.js

# Check N3StreamWriter-test.js content around the digit-prefixed test
rg -B 5 -A 5 "a:3a" test/N3StreamWriter-test.js

# Also check N3Lexer-test.js for the ISBN test case context
rg -B 5 -A 5 "leg:3032571" test/N3Lexer-test.js

Length of output: 1607

test/N3Writer-test.js Show resolved Hide resolved
Copy link
Member

@RubenVerborgh RubenVerborgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW It's not a bug but a performance consideration. The N3Writer does not aim to exhaustively abbreviate; many other cases aren't implemented either. Happy to have this accepted, but only if performance is not impacted.

@mightymax
Copy link
Contributor Author

mightymax commented Nov 7, 2024

It is indeed not a bug (I never implied this or meant to imply it), I merely used fix in the title of this PR to mark it as a fix for the referred issue.

Performance impact should be minimal. I created a small test of 8M matches (Mac Silicon M3):

- Replacing prefixes with RegExp (v1): 261.057ms
* Number of runs: 1,000,000
* Terms matches: 8,000,000
* Memory usage: 48MB

- Replacing prefixes with RegExp (v2): 268.38ms
* Number of runs: 1,000,000
* Terms matches: 8,000,000
* Memory usage: 48MB

As you can see the impact is minimal: 7ms on 8M runs.

Test performed:

const assert = require('assert');
function escapeRegex(regex) {
  return regex.replace(/[\]\/\(\)\*\+\?\.\\\$]/g, '\\$&');
}

const prefixIRIs = {
  rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
  rdfs: 'http://www.w3.org/2000/01/rdf-schema#',
  xsd: 'http://www.w3.org/2001/XMLSchema#',
  owl: 'http://www.w3.org/2002/07/owl#',
  dc: 'http://purl.org/dc/elements/1.1/',
  dcTerms: 'http://purl.org/dc/terms/',
  foaf: 'http://xmlns.com/foaf/0.1/',
  schema: 'http://schema.org/',
}
let IRIlist = '', prefixList = '';
for (const prefixIRI in prefixIRIs) {
  IRIlist += IRIlist ? `|${prefixIRI}` : prefixIRI;
  prefixList += (prefixList ? '|' : '') + prefixIRIs[prefixIRI];
}

const terms = Object.keys(prefixIRIs).map(prefix => prefixIRIs[prefix] + '3DModel');
IRIlist = escapeRegex(IRIlist, /[\]\/\(\)\*\+\?\.\\\$]/g, '\\$&');
let prefixRegex = new RegExp(`^(?:${prefixList})[^\/]*$|` +
                                `^(${IRIlist})([_a-zA-Z][\\-_a-zA-Z0-9]*)$`);
runPerfTest(prefixRegex, 'v1');
prefixRegex = new RegExp(`^(?:${prefixList})[^\/]*$|` +
                                `^(${IRIlist})([_a-zA-Z0-9][\\-_a-zA-Z0-9]*)$`);
runPerfTest(prefixRegex, 'v2');

function runPerfTest(prefixRegex, version) {
  const TEST = `- Replacing prefixes with RegExp (${version})`;
  console.time(TEST);
  let count = 0;
  const maxRuns = 1000000;
  const expectMatch = terms.length * maxRuns
  let actualMatch = 0;
  while (count < maxRuns) {
    // Replace prefixes with RegExp
    for (const term of terms) {
      actualMatch += prefixRegex.exec(term) ? 1 : 0;
    }
    count++
  }
  assert.equal(actualMatch, expectMatch);
  console.timeEnd(TEST);
  console.log(`* Number of runs: ${count.toLocaleString()}
* Terms matches: ${actualMatch.toLocaleString()}
* Memory usage: ${Math.round(process.memoryUsage().rss / 1024 / 1024)}MB
`);
}

@jeswr jeswr changed the title fix: allow digits in the local part of prefixed names feat: allow digits in the local part of prefixed names Nov 8, 2024
@jeswr jeswr merged commit 3b6ff6b into rdfjs:main Nov 8, 2024
2 checks passed
@jeswr
Copy link
Collaborator

jeswr commented Nov 8, 2024

Thanks @mightymax

Copy link

github-actions bot commented Nov 8, 2024

🎉 This PR is included in version 1.23.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@jeswr
Copy link
Collaborator

jeswr commented Nov 9, 2024

It is indeed not a bug (I never implied this or meant to imply it), I merely used fix in the title of this PR to mark it as a fix for the referred issue.

FYI @mightymax - repos like this use semantic release so it helps us version if you use semantic commit conventions in commits and PR titles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants