# NAME Sys::Binmode - Fix Perl���s system call character encoding. <div> <a href='https://coveralls.io/github/FGasper/p5-Sys-Binmode?branch=master'><img src='https://coveralls.io/repos/github/FGasper/p5-Sys-Binmode/badge.svg?branch=master' alt='Coverage Status' /></a> </div> # SYNOPSIS use Sys::Binmode; my $foo = "��"; $foo .= "\x{100}"; chop $foo; # Prints ��������: print $foo, $/; # In Perl 5.32 this may print mojibake, # but with Sys::Binmode it always prints ��������: exec 'echo', $foo; # DESCRIPTION tl;dr: Use this module in **all** new code. # BACKGROUND Ideally, a Perl application doesn���t need to know how the interpreter stores a given string internally. Perl can thus store any Unicode code point while still optimizing for size and speed when storing ���bytes-compatible��� strings���i.e., strings whose code points all lie below 256. Perl���s ���optimized��� string storage format is faster and less memory-hungry, but it can only store code points 0-255. The ���unoptimized��� format, on the other hand, can store any Unicode code point. Of course, Perl doesn���t _always_ optimize ���bytes-compatible��� strings; Perl can also, if it wants, store such strings ���unoptimized��� (i.e., in Perl���s internal ���loose UTF-8��� format), too. For code points 0-127 there���s actually no difference between the two forms, but for 128-255 the formats differ. (cf. ["The "Unicode Bug"" in perlunicode](https://metacpan.org/pod/perlunicode#The-Unicode-Bug)) This means that anything that reads Perl���s internals **MUST** differentiate between the two forms in order to use the string correctly. Alas, that differentiation doesn���t always happen. Thus, Perl can output a string that stores one or more 128-255 code points differently depending on whether Perl has ���optimized��� that string or not. Remember, though: Perl applications _should_ _not_ _care_ about Perl���s string storage internals. (This is why, for example, the [bytes](https://metacpan.org/pod/bytes) pragma is discouraged.) The catch, though, is that without that knowledge, **the** **application** **can���t** **know** **what** **it** **actually** **says** **to** **the** **outside** **world!** Thus, applications must either monitor Perl���s string-storage internals or accept unpredictable behaviour, both of which are categorically bad. # HOW THIS MODULE (PARTLY) FIXES THE PROBLEM This module provides predictable behaviour for Perl���s built-in functions by downgrading all strings before giving them to the operating system. It���s equivalent to���but faster than!���prefixing your system calls with `utf8::downgrade()` (cf. [utf8](https://metacpan.org/pod/utf8)) on all arguments. Predictable behaviour is **always** a good thing; ergo, you should use this module in **all** new code. # CAVEAT: CHARACTER ENCODING If you apply this module injudiciously to existing code you may see exceptions thrown where previously things worked just fine. This can happen if you���ve neglected to encode one or more strings before sending them to the OS; if Perl has such a string stored upgraded then Perl will, under default behaviour, send a UTF-8-encoded version of that string to the OS. In essence, it���s an implicit UTF-8 auto-encode. The fix is to apply an explicit UTF-8 encode prior to the system call that throws the error. This is what we should do _anyway_; Sys::Binmode just enforces that better. ## Windows (et alia) NTFS, Windows���s primary filesystem, expects filenames to be encoded in little-endian UTF-16. To create a file named `��p��e`, then, on NTFS you have to do something like: my $windows_filename = Encode::Simple::encode( 'UTF-16LE', $filename ); ��� where `$filename` is a character (i.e., decoded) string. Other OSes and filesystems may have their own quirks; regardless, this module gives you a saner point of departure to address those than Perl���s default behaviour provides. # WHERE ELSE THIS PROBLEM CAN APPEAR The unpredictable-behaviour problem that this module fixes in core Perl is also common in XS modules due to rampant use of [the SvPV macro](https://perldoc.perl.org/perlapi#SvPV) and variants. SvPV is like the [bytes](https://metacpan.org/pod/bytes) pragma in C: it gives you the string���s internal bytes with no regard for what those bytes represent. XS authors _generally_ should prefer [SvPVbyte](https://perldoc.perl.org/perlapi#SvPVbyte) or [SvPVutf8](https://perldoc.perl.org/perlapi#SvPVutf8) in lieu of SvPV unless the C code in question deals with Perl���s encoding abstraction. Note in particular that, as of Perl 5.32, the default XS typemap converts scalars to C `char *` and `const char *` via an SvPV variant. This means that any module that uses that conversion logic also has this problem. So XS authors should also avoid the default typemap for such conversions. # LEXICAL SCOPING If, for some reason, you _want_ Perl���s unpredictable default behaviour, you can disable this module for a given block via `no Sys::Binmode`, thus: use Sys::Binmode; system 'echo', $foo; # predictable/sane/happy { # You should probably explain here why you���re doing this. no Sys::Binmode; system 'echo', $foo; # nasal demons } # AFFECTED BUILT-INS - `exec` and `system` - `do` and `require` - File tests (e.g., `-e`) and the following: `chdir`, `chmod`, `chown`, `chroot`, `link`, `lstat`, `mkdir`, `open`, `opendir`, `readlink`, `rename`, `rmdir`, `stat`, `symlink`, `sysopen`, `truncate`, `unlink`, `utime` - `bind`, `connect`, and `setsockopt` - `syscall` # TODO - `dbmopen` and the System V IPC functions aren���t covered here. If you���d like them, ask. - There���s room for optimization, if that���s gainful. - Ideally this behaviour should be in Perl���s core distribution. - Even more ideally, Perl should adopt this behaviour as _default_. Maybe someday! # ACKNOWLEDGEMENTS Thanks to Leon Timmermans (LEONT) and Paul Evans (PEVANS) for some debugging and design help. # LICENSE & COPYRIGHT Copyright 2021 Gasper Software Consulting. All rights reserved. This library is licensed under the same license as Perl.