Optimize the JSON parsing in NpmPackageIndexBuilder.seeFile #1898

qligier · 2025-02-05T19:14:27Z

While working on some performance issues in Matchbox, I noticed that NpmPackageIndexBuilder.seeFile was using a lot of resources (time and memory allocation). The reason is that it fully parses many JSON files, allocating memory for objects representing the whole file, just to read a few properties and discard the object immediately.

This is a use case where streaming parsers shine: they only parse and allocate memory for the parts we're interested in.

This PR uses the streaming parser provided in Jackson.

In Matchbox startup, we measured a reduction of ~6 seconds (12%) and ~7 Go of allocation (23%), when loading hl7.fhir.r4.core, hl7.terminology.r4 and hl7.fhir.uv.extensions.r4.

grahamegrieve · 2025-02-06T04:48:37Z

The R4B test failures look related to this

codecov · 2025-02-06T08:20:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 12.91%. Comparing base (627d282) to head (053d201).
Report is 6 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##             master    #1898   +/-   ##
=========================================
  Coverage     12.91%   12.91%           
- Complexity    34012    34021    +9     
=========================================
  Files          2252     2252           
  Lines        686541   686547    +6     
  Branches     202569   202576    +7     
=========================================
+ Hits          88643    88650    +7     
+ Misses       566487   566483    -4     
- Partials      31411    31414    +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

qligier · 2025-02-06T13:34:50Z

Yep, I fixed it.

@deprecated

## Announcement This release marks the beginning of a process of refactoring the HAPI core code. We are trimming and refactoring the core model code to reduce dependencies. As part of this, we will be culling all the old unmaintained code in the older versions. In addition, we will be moving all the terminology, rendering, view definition, and validation related code to a new partner package for R4, R4B, and R5. This change is planned for July 2025. This release starts the process of marking the code with annotations to indicate it's proposed fate: - @deprecated classes will be deleted in July 2025 unless users raise issues with that - @MarkedToMoveToAdjunctPackage is code that will move the other package - Code with no annotations will not move or be deleted ## Validator Changes * Add HL7 CodeSystem display and definition checks * Add Matchetype validator * Add "http://hl7.org/fhir/tools/StructureDefinition/snapshot-base-version" to snapshot generation * Optimize the JSON parsing in NpmPackageIndexBuilder.seeFile (#1898) (faster loading) * Fix stack crash when structure definitions are circular * Fix error reporting duplicate contained IDs when contained resources are sliced by a profile * Allow cardinality changes in obligation profiles (but not recommended) * Fix bug with wrongly processing -ips#(v) parameter * Add underscore to regex to be able to use underscore in Bundle URLs ## Other code changes * Refactor FileUtilities and other Utilities classes * fix element order in Element.forceElement() * fix NPE in patient renderer * Resource Factory updates for loading generated resources in IG publisher * Fix intermittent thread issue in Date rendering ***NO_CI***

qligier force-pushed the ql_optimize_npib branch from 544c208 to c63277c Compare February 5, 2025 20:51

qligier force-pushed the ql_optimize_npib branch from c63277c to ec8d461 Compare February 6, 2025 07:11

Optimize the JSON parsing in NpmPackageIndexBuilder.seeFile

053d201

qligier force-pushed the ql_optimize_npib branch from c3603d3 to 053d201 Compare February 6, 2025 07:30

dotasek self-requested a review February 6, 2025 16:26

dotasek approved these changes Feb 6, 2025

View reviewed changes

dotasek merged commit 57e9513 into hapifhir:master Feb 6, 2025
33 checks passed

qligier deleted the ql_optimize_npib branch February 7, 2025 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the JSON parsing in NpmPackageIndexBuilder.seeFile #1898

Optimize the JSON parsing in NpmPackageIndexBuilder.seeFile #1898

qligier commented Feb 5, 2025

grahamegrieve commented Feb 6, 2025

codecov bot commented Feb 6, 2025 •

edited

Loading

qligier commented Feb 6, 2025

Optimize the JSON parsing in NpmPackageIndexBuilder.seeFile #1898

Optimize the JSON parsing in NpmPackageIndexBuilder.seeFile #1898

Conversation

qligier commented Feb 5, 2025

grahamegrieve commented Feb 6, 2025

codecov bot commented Feb 6, 2025 • edited Loading

Codecov Report

qligier commented Feb 6, 2025

codecov bot commented Feb 6, 2025 •

edited

Loading