Remove unused bits of libunicode #13224

nabijaczleweli · 2022-03-16T03:13:44Z

Motivation and Context

774 .rodata bytes from dropping uconv.c (but dunno if that actually gets pulled in since it's unused entirely), and:

  35376   -> 15885  lines         =  -19491 (55.09% down)
  1477745 -> 663942 bytes         = -813803 (55.07% down)
  301956  -> 134307 .rodata bytes = -167649 (55.52% down)

in u8_textprop_data.h (assuming no alignment, which is mostly fair I think).

The default git differ absolutely chokes on this (it gave me +17k -24k for the first big commit). I recommend explicitly picking patience. And GitHub seems to use the same one. The true diffstat for this is:

nabijaczleweli@tarta:~/store/code/zfs$ git diff --stat=80 --patience origin/master
 include/sys/u8_textprep.h            |    39 +-
 include/sys/u8_textprep_data.h       | 19587 +--------------------------------
 lib/libunicode/Makefile.am           |     3 +-
 module/Makefile.bsd                  |     3 +-
 module/os/freebsd/zfs/zfs_vnops_os.c |    12 +-
 module/os/linux/zfs/zfs_dir.c        |     2 +-
 module/os/linux/zfs/zfs_vnops_os.c   |    17 +-
 module/unicode/Makefile.in           |     1 -
 module/unicode/u8_textprep.c         |   274 +-
 module/unicode/uconv.c               |   859 --
 module/zfs/zap_micro.c               |     2 +-
 11 files changed, 153 insertions(+), 20646 deletions(-)

How Has This Been Tested?

Builds.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes. – none apply
I have run the ZFS Test Suite with this change applied. – CI take my hand
All commit messages are properly formatted and contain Signed-off-by.

nabijaczleweli · 2022-03-16T03:20:31Z

Real .rodata gain seems to be uh

nabijaczleweli@tarta:~/store/code/zfs$ size -A module/unicode/zunicode.ko /lib/modules/5.10.0-11-amd64/updates/dkms/zunicode.ko  | grep -e rodata -e :
module/unicode/zunicode.ko  :
.rodata                     135200      0
/lib/modules/5.10.0-11-amd64/updates/dkms/zunicode.ko  :
.rodata                     303648      0

so -168448 (25 bytes more than predicted).

nabijaczleweli · 2022-04-02T13:15:08Z

2.5 week bump; all-minus, clean CI, NFC

behlendorf · 2022-04-05T18:09:25Z

The change itself looks correct. My only real concern is that it modifies stable committed interfaces which we inherited from Solaris. That interface is clearly documented here.

https://docs.oracle.com/cd/E88353_01/html/E37855/u8-strcmp-9f.html
https://docs.oracle.com/cd/E88353_01/html/E37855/u8-textprep-str-9f.html
https://docs.oracle.com/cd/E88353_01/html/E37855/u8-validate-9f.html

Now practically speaking I doubt it really matters that much for a few reasons. 1) there are no external consumers of this interface on Linux or FreeBSD. 2) we've never done a major update to the unicode library/kmod and it seems unlikely it will ever be needed. And 3) the only case where this would be an issue is if illumos was using the OpenZFS sources along with their native kernel provided unicode interfaces. However, even in this case all but one caller are in the platform specific code so it would be at worst a mild inconvenience.

Still I'm reluctant to break this committed interface just to shed some unused code.

nabijaczleweli · 2022-04-05T19:04:21Z

I mean, nothing's ever used it here, we're dropping 168k of .rodata. And it's a committed interface for... Solaris. Like most of the stuff I ripped out of ICP (there was a module system, hardware providers, &c.), which was an interface other modules probably (definitely?) hooked into. Like a lot of that code, it seems to've been grandfathered in because it's in /u/s/common/unicode.

AFAIK we have one real downstream, Lustre, and it uses none of these. The illumos gate doesn't use most of them. There's one use of U8_UNICODE_320, which is usr/src/uts/common/io/comstar/port/iscsit/iscsit_login.c. 2/3rds of the uconv functions are mentioned outside the header at all, most of which in userspace.

I don't think this is a realistic concern for us. Because we're not Solaris. Hell, we don't expose this to userspace, so kSolaris. And we get a considerable amount of dead kernel address-space back. And by detaching ourselves from the interface we open ourselves to importing Unicode versions from this side of the century. I mean, if you're really concerned about the API we could, theoretically, take unicode_versions and EOPNOTSUPP anything that's not U8_UNICODE_LATEST. But I just don't see the point in pretending. I'd go so far as to drop the EXPORT_SYMBOLS()es, if they weren't required, for now, with a non-unified zfs.ko image.

behlendorf · 2022-04-05T19:43:17Z

if you're really concerned about the API we could, theoretically, take unicode_versions and EOPNOTSUPP anything that's not U8_UNICODE_LATEST since

I considered this, but came to the same conclusion. It's just not worth it. There's just no compelling reason to jump through hoops to try and preserve this API. And if we're never going to use this dead functionality, then it makes sense to remove it. Alright, I'm convinced let's go ahead and drop it.

As for dropping the EXPORT_SYMBOL()es that's another thing I've wanted to revisit. The reality is Lustre does not require many of the symbols, perhaps a few dozen, and with a unified zfs.ko we could whittle down the exports to roughly what's actually used. That would give us a much better idea of what symbols we need to be careful not to gratuitously modify.

nabijaczleweli · 2022-04-21T13:12:36Z

Rebased

nabijaczleweli · 2022-04-29T23:01:01Z

8-day bump :0

nabijaczleweli · 2022-05-10T19:37:18Z

Rebased

nabijaczleweli · 2022-06-28T21:34:40Z

1-month-2-week bump

mcmilk · 2022-10-19T19:42:04Z

It would be nice if this can be rebased and then merged before it's then outdated again :-)

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

And slide U8_UNICODE_500 forward. This halves u8_textprep_data.h: 35376 lines -> 17802 1477745 bytes -> 743357 With a theoretical (no-alignment) .rodata reduction of 150978 bytes Signed-off-by: Ahelenia Ziemiańska <[email protected]>

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

This yields another 10.6% u8_t_data.h reduction: 17775 lines -> 15885 743125 bytes -> 663942 And (assuming no alignment) frees another 16671 .rodata bytes Signed-off-by: Ahelenia Ziemiańska <[email protected]>

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

nabijaczleweli · 2022-11-29T16:58:27Z

Rebased

mcmilk · 2023-03-20T14:27:16Z

I rebased this pull request on my repo. It's only one small conflict.
@behlendorf - have you done some decision for it?

The interface of these:
https://docs.oracle.com/cd/E88353_01/html/E37855/u8-strcmp-9f.html
https://docs.oracle.com/cd/E88353_01/html/E37855/u8-textprep-str-9f.html
https://docs.oracle.com/cd/E88353_01/html/E37855/u8-validate-9f.html
... is documented, but I find also, when there is really no use for it - we should strip it away.

And I think that illumos-gate also doesn't depend on this code here... within OpenZFS...

amotin · 2024-10-29T20:32:35Z

Talking with Brian we are not sure we'd like to diverge from original code. We don't know if that may ever be updated, we might check if there were any updates already, but if there were, then any local changes would make any future merges problematic.

PS: It seems this code is mostly in sync with illumos, except few minor local changes.

amotin

I would appreciate this kind of reduction would it just be done with a limited number of ifdefs. If somebody wants to try it -- please. But I don't think divergence from Solaris/Illumos in both code and especially API worth saving 160KB these days.

nabijaczleweli · 2024-10-30T12:07:29Z

Broke up into uconv removal (shouldn't be controversial) in #16702 and libunicode #ifs in #16704.

amotin · 2024-10-30T16:17:27Z

Closing as superseded.

nabijaczleweli force-pushed the utf8-hehe branch from 57fefd8 to c06379a Compare March 16, 2022 03:27

behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 16, 2022

ahrens assigned jwk404 Mar 17, 2022

nabijaczleweli mentioned this pull request Mar 25, 2022

libshare cleanup, NFS path escaping #13165

Closed

13 tasks

behlendorf approved these changes Apr 5, 2022

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 5, 2022

behlendorf requested a review from mmaybee April 5, 2022 21:01

behlendorf mentioned this pull request Apr 15, 2022

WIP: Punt zfs_dbgmsg into zcommon #13206

Closed

13 tasks

nabijaczleweli force-pushed the utf8-hehe branch from c06379a to b49a17e Compare April 21, 2022 13:12

nabijaczleweli force-pushed the utf8-hehe branch from b49a17e to 949f0fc Compare May 10, 2022 19:37

nabijaczleweli added 8 commits November 29, 2022 17:48

module: unicode: remove unused uconv.c

b8a4782

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

module: unicode: remove unused U8_VALIDATE_UCS2_RANGE

fdb7765

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

module: unicode: remove unused U8_VALIDATE_CHECK_ADDITIONAL

fe1e0cd

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

module: unicode: remove unused U8_UNICODE_320

0e05a43

And slide U8_UNICODE_500 forward. This halves u8_textprep_data.h: 35376 lines -> 17802 1477745 bytes -> 743357 With a theoretical (no-alignment) .rodata reduction of 150978 bytes Signed-off-by: Ahelenia Ziemiańska <[email protected]>

module: unicode: fold away U8_UNICODE_{500,LATEST}

f642923

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

module: unicode: remove unused U8_TEXTPREP_NOWAIT

e384436

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

module: unicode: remove unused tolower transformations

f3518c3

This yields another 10.6% u8_t_data.h reduction: 17775 lines -> 15885 743125 bytes -> 663942 And (assuming no alignment) frees another 16671 .rodata bytes Signed-off-by: Ahelenia Ziemiańska <[email protected]>

module: unicode: remove unused U8_HANGUL_JAMO()

5f9b785

Signed-off-by: Ahelenia Ziemiańska <[email protected]>

nabijaczleweli force-pushed the utf8-hehe branch from 949f0fc to 5f9b785 Compare November 29, 2022 16:58

mcmilk approved these changes Dec 18, 2022

View reviewed changes

amotin removed the Status: Accepted Ready to integrate (reviewed, tested) label Oct 29, 2024

amotin requested changes Oct 30, 2024

View reviewed changes

amotin added the Status: Revision Needed Changes are required for the PR to be accepted label Oct 30, 2024

This was referenced Oct 30, 2024

module: unicode: remove unused uconv.c (#13224) #16702

Merged

module: unicode: #if out unused bits of libunicode (#13224) #16704

Closed

amotin closed this Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unused bits of libunicode #13224

Remove unused bits of libunicode #13224

nabijaczleweli commented Mar 16, 2022 •

edited

Loading

nabijaczleweli commented Mar 16, 2022

nabijaczleweli commented Apr 2, 2022

behlendorf commented Apr 5, 2022

nabijaczleweli commented Apr 5, 2022

behlendorf commented Apr 5, 2022

nabijaczleweli commented Apr 21, 2022

nabijaczleweli commented Apr 29, 2022

nabijaczleweli commented May 10, 2022

nabijaczleweli commented Jun 28, 2022

mcmilk commented Oct 19, 2022

nabijaczleweli commented Nov 29, 2022

mcmilk commented Mar 20, 2023

amotin commented Oct 29, 2024 •

edited

Loading

amotin left a comment

nabijaczleweli commented Oct 30, 2024 •

edited

Loading

amotin commented Oct 30, 2024

Remove unused bits of libunicode #13224

Remove unused bits of libunicode #13224

Conversation

nabijaczleweli commented Mar 16, 2022 • edited Loading

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

nabijaczleweli commented Mar 16, 2022

nabijaczleweli commented Apr 2, 2022

behlendorf commented Apr 5, 2022

nabijaczleweli commented Apr 5, 2022

behlendorf commented Apr 5, 2022

nabijaczleweli commented Apr 21, 2022

nabijaczleweli commented Apr 29, 2022

nabijaczleweli commented May 10, 2022

nabijaczleweli commented Jun 28, 2022

mcmilk commented Oct 19, 2022

nabijaczleweli commented Nov 29, 2022

mcmilk commented Mar 20, 2023

amotin commented Oct 29, 2024 • edited Loading

amotin left a comment

Choose a reason for hiding this comment

nabijaczleweli commented Oct 30, 2024 • edited Loading

amotin commented Oct 30, 2024

nabijaczleweli commented Mar 16, 2022 •

edited

Loading

amotin commented Oct 29, 2024 •

edited

Loading

nabijaczleweli commented Oct 30, 2024 •

edited

Loading