Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV With Byte Order Mark: Accessing First Column's Name #171

Closed
tobiasschweizer opened this issue Jun 17, 2022 · 2 comments
Closed

CSV With Byte Order Mark: Accessing First Column's Name #171

tobiasschweizer opened this issue Jun 17, 2022 · 2 comments
Assignees
Labels
bug Something isn't working next release The bug is fixed and the fix will be available in the next release.

Comments

@tobiasschweizer
Copy link

Hi there,

I work with a CSV and encountered a problem when trying to access the first column's name.

CSV: https://data.snf.ch/Exportcsv/GrantWithAbstracts.csv

mapping.ttl:

@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix schema: <http://schema.org/>.
@prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#lat>.
@prefix gn: <http://www.geonames.org/ontology#>.
@prefix carml: <http://carml.taxonic.com/carml/> .
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#> .
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .
@prefix fno: <https://w3id.org/function/ontology#> .
@base <http://example.com/ns#>.

<#LogicalSourceGrant> a rml:BaseSource ;
  rml:source <#CSVW_sourceGrant> ;
  rml:referenceFormulation ql:CSV .

<#CSVW_sourceGrant> a csvw:Table;
   csvw:url "GrantWithAbstracts.csv" ;
   csvw:dialect [ a csvw:Dialect;
       csvw:delimiter ";"
   ] .

<#ProjectMapping> a rr:TriplesMap;
  rml:logicalSource <#LogicalSourceGrant> ;

  rr:subjectMap [
    rr:template "http://snf.ch/project/{GrantNumber}";
    rr:class schema:ResearchProject
  ] ;

  rr:predicateObjectMap [
    rr:predicate schema:description ;
    rr:objectMap [
      rml:reference "Abstract" # first column's name
    ]
  ] .

java -jar rmlmapper-5.0.0-r362-all.jar -m mapping.ttl -s jsonld returns:

09:34:31.754 [main] ERROR be.ugent.rml.cli.Main .main(393) - Mapping for Abstract not found, expected one of [LaySummaryLead_En, MainDiscipline_Level1, GrantNumberString, Keywords, ResponsibleApplicantName, GrantNumber, CallFullTitle, LaySummaryLead_It, MainDiscipline_Level2, AmountGrantedAllSets, Institute, CallDecisionYear, LaySummary_Fr, FundingInstrumentReporting, MainDiscipline, LaySummaryLead_Fr, AllDisciplines, LaySummaryLead_De, Title, FundingInstrumentLevel1, TitleEnglish, LaySummary_De, Abstract, ResearchInstitution, EffectiveGrantEndDate, FundingInstrumentPublished, MainDisciplineNumber, InstituteCountry, LaySummary_En, State, LaySummary_It, CallEndDate, EffectiveGrantStartDate]

Encoding: file -I GrantWithAbstracts.csv

GrantWithAbstracts.csv: text/plain; charset=utf-8

I looked at the file using hexdump -c GrantWithAbstracts.csv | less:

0000000 <EF> <BB> <BF> A b s t r a c t ; A l l D

I figured that this is a byte order mark (BOM).
I understand that the presence of a BOM is not required:

Byte order has no meaning in UTF-8,[5] so its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8, or that it was converted to UTF-8 from a stream that contained an optional BOM.

(https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8)

Is it possible that the BOM is (mistakenly) somehow part of the first column's name (and thus "Abstract" is not found)?

Thanks a lot for any hint.

@DylanVanAssche DylanVanAssche added the bug Something isn't working label Jul 1, 2022
@DylanVanAssche
Copy link
Contributor

Found a fix, will be included in next release, thanks for reporting!

@DylanVanAssche DylanVanAssche added the next release The bug is fixed and the fix will be available in the next release. label Jul 1, 2022
@DylanVanAssche DylanVanAssche self-assigned this Jul 1, 2022
@tobiasschweizer
Copy link
Author

Found a fix, will be included in next release, thanks for reporting!

That's great. Thank you very much for looking into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working next release The bug is fixed and the fix will be available in the next release.
Projects
None yet
Development

No branches or pull requests

2 participants