Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused bits of libunicode #13224

Closed
wants to merge 8 commits into from

Conversation

nabijaczleweli
Copy link
Contributor

@nabijaczleweli nabijaczleweli commented Mar 16, 2022

Motivation and Context

774 .rodata bytes from dropping uconv.c (but dunno if that actually gets pulled in since it's unused entirely), and:

  35376   -> 15885  lines         =  -19491 (55.09% down)
  1477745 -> 663942 bytes         = -813803 (55.07% down)
  301956  -> 134307 .rodata bytes = -167649 (55.52% down)

in u8_textprop_data.h (assuming no alignment, which is mostly fair I think).

The default git differ absolutely chokes on this (it gave me +17k -24k for the first big commit). I recommend explicitly picking patience. And GitHub seems to use the same one. The true diffstat for this is:

nabijaczleweli@tarta:~/store/code/zfs$ git diff --stat=80 --patience origin/master
 include/sys/u8_textprep.h            |    39 +-
 include/sys/u8_textprep_data.h       | 19587 +--------------------------------
 lib/libunicode/Makefile.am           |     3 +-
 module/Makefile.bsd                  |     3 +-
 module/os/freebsd/zfs/zfs_vnops_os.c |    12 +-
 module/os/linux/zfs/zfs_dir.c        |     2 +-
 module/os/linux/zfs/zfs_vnops_os.c   |    17 +-
 module/unicode/Makefile.in           |     1 -
 module/unicode/u8_textprep.c         |   274 +-
 module/unicode/uconv.c               |   859 --
 module/zfs/zap_micro.c               |     2 +-
 11 files changed, 153 insertions(+), 20646 deletions(-)

How Has This Been Tested?

Builds.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the OpenZFS code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes. – none apply
  • I have run the ZFS Test Suite with this change applied. – CI take my hand
  • All commit messages are properly formatted and contain Signed-off-by.

@nabijaczleweli
Copy link
Contributor Author

Real .rodata gain seems to be uh

nabijaczleweli@tarta:~/store/code/zfs$ size -A module/unicode/zunicode.ko /lib/modules/5.10.0-11-amd64/updates/dkms/zunicode.ko  | grep -e rodata -e :
module/unicode/zunicode.ko  :
.rodata                     135200      0
/lib/modules/5.10.0-11-amd64/updates/dkms/zunicode.ko  :
.rodata                     303648      0

so -168448 (25 bytes more than predicted).

@nabijaczleweli
Copy link
Contributor Author

2.5 week bump; all-minus, clean CI, NFC

@behlendorf
Copy link
Contributor

The change itself looks correct. My only real concern is that it modifies stable committed interfaces which we inherited from Solaris. That interface is clearly documented here.

https://docs.oracle.com/cd/E88353_01/html/E37855/u8-strcmp-9f.html
https://docs.oracle.com/cd/E88353_01/html/E37855/u8-textprep-str-9f.html
https://docs.oracle.com/cd/E88353_01/html/E37855/u8-validate-9f.html

Now practically speaking I doubt it really matters that much for a few reasons. 1) there are no external consumers of this interface on Linux or FreeBSD. 2) we've never done a major update to the unicode library/kmod and it seems unlikely it will ever be needed. And 3) the only case where this would be an issue is if illumos was using the OpenZFS sources along with their native kernel provided unicode interfaces. However, even in this case all but one caller are in the platform specific code so it would be at worst a mild inconvenience.

Still I'm reluctant to break this committed interface just to shed some unused code.

@nabijaczleweli
Copy link
Contributor Author

I mean, nothing's ever used it here, we're dropping 168k of .rodata. And it's a committed interface for... Solaris. Like most of the stuff I ripped out of ICP (there was a module system, hardware providers, &c.), which was an interface other modules probably (definitely?) hooked into. Like a lot of that code, it seems to've been grandfathered in because it's in /u/s/common/unicode.

AFAIK we have one real downstream, Lustre, and it uses none of these. The illumos gate doesn't use most of them. There's one use of U8_UNICODE_320, which is usr/src/uts/common/io/comstar/port/iscsit/iscsit_login.c. 2/3rds of the uconv functions are mentioned outside the header at all, most of which in userspace.

I don't think this is a realistic concern for us. Because we're not Solaris. Hell, we don't expose this to userspace, so kSolaris. And we get a considerable amount of dead kernel address-space back. And by detaching ourselves from the interface we open ourselves to importing Unicode versions from this side of the century. I mean, if you're really concerned about the API we could, theoretically, take unicode_versions and EOPNOTSUPP anything that's not U8_UNICODE_LATEST. But I just don't see the point in pretending. I'd go so far as to drop the EXPORT_SYMBOLS()es, if they weren't required, for now, with a non-unified zfs.ko image.

@behlendorf
Copy link
Contributor

if you're really concerned about the API we could, theoretically, take unicode_versions and EOPNOTSUPP anything that's not U8_UNICODE_LATEST since

I considered this, but came to the same conclusion. It's just not worth it. There's just no compelling reason to jump through hoops to try and preserve this API. And if we're never going to use this dead functionality, then it makes sense to remove it. Alright, I'm convinced let's go ahead and drop it.

As for dropping the EXPORT_SYMBOL()es that's another thing I've wanted to revisit. The reality is Lustre does not require many of the symbols, perhaps a few dozen, and with a unified zfs.ko we could whittle down the exports to roughly what's actually used. That would give us a much better idea of what symbols we need to be careful not to gratuitously modify.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 5, 2022
@behlendorf behlendorf requested a review from mmaybee April 5, 2022 21:01
@behlendorf behlendorf mentioned this pull request Apr 15, 2022
13 tasks
@nabijaczleweli
Copy link
Contributor Author

Rebased

@nabijaczleweli
Copy link
Contributor Author

8-day bump :0

@nabijaczleweli
Copy link
Contributor Author

Rebased

@nabijaczleweli
Copy link
Contributor Author

1-month-2-week bump

@mcmilk
Copy link
Contributor

mcmilk commented Oct 19, 2022

It would be nice if this can be rebased and then merged before it's then outdated again :-)

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
And slide U8_UNICODE_500 forward.

This halves u8_textprep_data.h:
  35376   lines -> 17802
  1477745 bytes -> 743357

With a theoretical (no-alignment) .rodata reduction of 150978 bytes

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
This yields another 10.6% u8_t_data.h reduction:
  17775  lines -> 15885
  743125 bytes -> 663942

And (assuming no alignment) frees another 16671 .rodata bytes

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
@nabijaczleweli
Copy link
Contributor Author

Rebased

@mcmilk
Copy link
Contributor

mcmilk commented Mar 20, 2023

I rebased this pull request on my repo. It's only one small conflict.
@behlendorf - have you done some decision for it?

The interface of these:
https://docs.oracle.com/cd/E88353_01/html/E37855/u8-strcmp-9f.html
https://docs.oracle.com/cd/E88353_01/html/E37855/u8-textprep-str-9f.html
https://docs.oracle.com/cd/E88353_01/html/E37855/u8-validate-9f.html
... is documented, but I find also, when there is really no use for it - we should strip it away.

And I think that illumos-gate also doesn't depend on this code here... within OpenZFS...

@amotin amotin removed the Status: Accepted Ready to integrate (reviewed, tested) label Oct 29, 2024
@amotin
Copy link
Member

amotin commented Oct 29, 2024

Talking with Brian we are not sure we'd like to diverge from original code. We don't know if that may ever be updated, we might check if there were any updates already, but if there were, then any local changes would make any future merges problematic.

PS: It seems this code is mostly in sync with illumos, except few minor local changes.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate this kind of reduction would it just be done with a limited number of ifdefs. If somebody wants to try it -- please. But I don't think divergence from Solaris/Illumos in both code and especially API worth saving 160KB these days.

@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Oct 30, 2024

Broke up into uconv removal (shouldn't be controversial) in #16702 and libunicode #ifs in #16704.

@amotin
Copy link
Member

amotin commented Oct 30, 2024

Closing as superseded.

@amotin amotin closed this Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Revision Needed Changes are required for the PR to be accepted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants