-
Notifications
You must be signed in to change notification settings - Fork 10
Resources data cleaning
Data is easier to process and search if it's tidy.
Manual QA:
- Sort A-Z and Z-A by each of the columns, see if anything is missing or inconsistent.
- Are there any "null"s that shouldn't be nulls?
- Run it through a broken link checker - are there any broken links?
- Are there any HTTP links? All links should be HTTPS.
- Check for links to non-.gov websites. Is it a legitimate government website or publication? For example, is it an official CMS site run by contractors, such as PASRR Technical Assistance Center or ResDAC?
Remove:
- Duplicate items (multiple items with the same URL)
- Hidden Unicode control characters
- Newlines
- Double spaces
Consider whether to systematically re-process:
- Curly quotes
- Curly apostrophes
’
- Em dashes and en dashes
—
(may also need to make sure they have spaces around them, to ensure searchability) - Copyright and registered trademark symbols
®
- Section symbols
Please note that all pages on this GitHub wiki are draft working documents, not complete or polished.
Our software team puts non-sensitive technical documentation on this wiki to help us maintain a shared understanding of our work, including what we've done and why. As an open source project, this documentation is public in case anything in here is helpful to other teams, including anyone who may be interested in reusing our code for other projects.
For context, see the HHS Open Source Software plan (2016) and CMS Technical Reference Architecture section about Open Source Software, including Business Rule BR-OSS-13: "CMS-Released OSS Code Must Include Documentation Accessible to the Open Source Community".
For CMS staff and contractors: internal documentation on Enterprise Confluence (requires login).
- Federal policy structured data options
- Regulations
- Resources
- Statute
- Citation formats
- Export data
- Site homepage
- Content authoring
- Search
- Timeline
- Not built
- 2021
- Reg content sources
- Default content view
- System last updated behavior
- Paragraph indenting
- Content authoring workflow
- Browser support
- Focus in left nav submenu
- Multiple content views
- Content review workflow
- Wayfinding while reading content
- Display of rules and NPRMs in sidebar
- Empty states for supplemental content
- 2022
- 2023
- 2024
- Medicaid and CHIP regulations user experience
- Initial pilot research outline
- Comparative analysis
- Statute research
- Usability study SOP
- 2021
- 2022
- 2023-2024: 🔒 Dovetail (requires login)
- 🔒 Overview (requires login)
- Authentication and authorization
- Frontend caching
- Validation checklist
- Search
- Security tools
- Tests and linting
- Archive