Skip to content

Latest commit

 

History

History
507 lines (399 loc) · 13.1 KB

perl.md

File metadata and controls

507 lines (399 loc) · 13.1 KB

use

use v5.10 # say, state, //, -r -w (-r $file && -w _)
use v5.12 # use strict, ...
use v5.14 # s///r
use v5.16 # fc
use v5.20 # postderef
use v5.22 # regex modifier /n
use v5.26 # <<~
use v5.32 # chained comparisons

Statement, expression

 statement -> code
expression -> code that returns a value

Arrays and Lists

, creates lists. () is only necessary if precedence is ambiguous.

=> (fat coma) is the same as ,
      foreach is the same as for
@items = qw/one two three four five/ # quote words into a list ('', '', ...)

if (@items) # number of elements

@items[1..$#items] # slice: all bar 1st

          my $a = @array # last to $a in scalar context, see coma operator
        my ($a) = @array #  1st to $a in list   context
my ($a, $b, $c) = @array # multiple assignments

splice @items,  0,  2            # remove beginning  :         three four five | ~shift
splice @items,  1, -1            # remove   middle   : one                five |
splice @items, -2                # remove       end  : one two three           | ~pop
splice @items,  1,  3, qw/2 3 4/ # remove & REPLACE  : one 2   3     4    five |
splice @items,  2,  0, qw/2 3/   # remove 0 (INSERT) : one two three four five |
#                                                             '- 2 3

shift, unshift, pop and push # special cases of splice

 map expr, @items # modify - comprehension
grep expr, @items # filter - like @list =~ /match/ but check ~~ for this

print @items   # $, - print's field separator; $\ is the record separator only printed after print's last argument
print "@items" # $" - list separator for interpolation

merge two arrays and keep elements unique

my @unique = uniq(@array1, @array2);        # 1. use List::Util 'uniq';
my @merged{@array1, @array2} = ();          # 2.
my %merged = map { $_, 1 } @array1, @rray2; # 3. create (key:$_, val:1) list for each item

methods 2. and 3. need this:

my @unique = keys %merged;

Hashes

%items = @pairs;

%items = (
   key1 => 'val1', # keys auto quoted (same for $items{key1})
   key2 => 'val2'
);

while (my ($key, $val) = each %items)

References

$ref = \$named_variable;      # \@, \%
$ref = [qw/anonymous array/]; # mnemo: []s access array elements
$ref = {anonymous => 'hash'}; #        {}s access hash elements
                use {$ref} anywhere an array/hash would be used ({}s are optional)
               /
     %hash | %$ref      | $ref->%*
    @array | @$ref      | $ref->@*
$hash{key} | $$ref{key} | $ref->{key}
 $array[3] | $$ref[3]   | $ref->[3]
                              /
                 optional between 2 subscripts:

([...], [..x], [...]) - $$array[1][2]   <=>
                         $array[1]->[2] <=>
                         $array[1][2]       {}{} for hashes

Scope

   my - lexical scope
  our - same but alias for package var so can be accessed from outside
local - local copy of a global variable

example: input record separator aka IFS

local $/ = "\0"; # read null separated records
local $/;        # slurp file mode
local $/ = '';   # paragraph mode

environment variables

         private | public (inherited by children)
      -----------+--------------
Perl: my $EDITOR | $ENV{EDITOR}
Bash:     EDITOR | export EDITOR -> $ env # see public
                                    $ set # see all

Regex

Perl regex REPL

zero-width assertions don't consume chars => they are ANDed

hello(?=\d)(?!123) # followed by a number AND not followed by 123

/$var/o - check $var only once since we know it's not going to change

  pre - $`
match - $&
 post - $'

backreferences

s/(\d+).\1/...$1/; # \1 and $1 represent the actual match, not \d+

captures in list context

my ($ext) = $file =~ /\.(\w{3})/;
my @numbers = $version =~ /\d+/g; # progressive matching

lookaround

        \K
(?<= ... ) --- (?= ... )
(?<! ... ) --- (?! ... )

⚠️ .+(?=bla) (variable length pattern) is a bad idea as lookahead is of zero-width so .+ consumes everything!

multilines and newlines in s///ms

# ^ and $ positions: hello$\n^alien$\n^world, beware of $\ or $. which are special variables!
$_ = qq/hello\nalien\nworld\n/;
s/^.+$/---/m;  # multilines: match ^ and $ many times
s/lo.+wo/@@/s; # pretend $msg is a single line => . matches anything, including \ns

possessive quantifiers

no backtracking <=> don't give up characters

A++ is syntactic sugar for atomic group notation: (?>A+)

example:
"abcd =~ "[^"]+"
after matching "abcd, it's clear that no backtracking will change the fact that
a final " cannot be matched. Thus, in order to speedup failure, the pattern is
better rewritten as "[^"]++"

notes:

  • "abcd" =~ "[^"]++" still matches.
  • the optimizer would've automatically turned the regex possessive in this simple case.

Subroutines

FMTEYEWTK on Prototypes in Perl

# takes a scalar, and a 2nd optional one
sub get ($;$) {
   my $var = shift; # or my ($var1, $var2) = @_;
   wantarray ? @res : $res;
}

ternary operator - cond ? true : false

printf "I've got %d camel%s", $ARGV[0], $ARGV[0] == 1 ? '' : 's';

printf is sometimes more readable than print

print 'Found a ', pos($i), "at\n";
printf "Found a %d at\n", pos($i);

sprintf is like printf but a string is returned instead of printed

it can then be passed to functions such as say which lack formatting capabilities.

date with format

strftime '%d-%b-%Y_%Hh%M:%S', localtime; # POSIX module
$now->strftime($format);                 # Time::Piece->new

pack

pack

evaluation in s//$1/

$add = 4 + 3;
$_ = 'Sum: $add';
s/(\$\w+)/$1/ee;
without /e -> "" interpolation
   with /e -> normal code:
              $1 gets 'interpolated' by the first /e,
              it's value (4 + 3) gets evaluated by the second /e!

return values

              s/// - number of substitutions
             chomp - number of chars
              grep - list
               map - list
if (my $var = ...) - lvalue, not boolean
   each %hash, //g - boolean
        shift, pop - element
     unshift, push - number of elements

Command line

perl -n # while (<>) { ...        } read           lines from files, add -l for chomp
perl -p # while (<>) { ...; print } read and print lines from files

-a implies -n
-F implies -an

perl -00   # paragraph  mode
perl -0777 # file slurp mode

# to match newlines we need 'slurp' and //s
# use print $1 because $& would be the whole file
# 'while //sg' could be replaced with 'if //ms' when matching with ^ and/or $
perl -0777 -lne 'print $1 while /(---)/sg' file

sed

perl -lpe 's///' file

perl [-i(suffix)] -lp[0]e 's///' file
        \              \
         \              read null separated data
          edit in place

use a module

perl -M'Term::ANSIColor ":constants"' -E 'say YELLOW.Hello'
perl -mTerm::ANSIColor=:constants -E 'say YELLOW.Hello'

one liners

delete lines

perl -i -lnE 'say unless /.../ or /.../' file

search and replace in multiple files in parallel

rg -il mem | parallel -q perl -i -lpe 's/mem/Memory/ig'

print from field $3 to last

perl -lane 'print "@F[2..$#F]"' /my/file

regex based sort

perl -e 'print for sort {@m=map/(\d\.\d)/,$b,$a; pop@m<=>pop@m} <>' /my/file
perl -e 'print for sort {(split" ",$a)[1]<=>(split" ",$b)[1]} <>' /my/file # sort on 2nd field

replace line(s) with contents of file

perl -i -lnE '$name=...; $_=`cat ~/keys/$name` if /$name/; chomp; say' authorized_keys

namei -l

perl -e '$_=shift; push @paths, $`.$& while m{.*?/(?!$)}g; exec qw/ls -lhd/, @paths, $_' /my/file

disk usage pretty

du -ah0 -t100m -d1 | sort -hrz | perl -0lane 's:^\./:: for @F; print shift @F, " ", `ls -d --color "@F"`'

find files older than a day

perl -E 'for(<*>){say if-M>1}'

PerlIO: convert from dos/cyrillic to unix/utf8

perl -Mopen=':std,IN,crlf:encoding(cp1251),OUT,unix:encoding(utf8)' -lpe '' star_wars.sub > star_wars.srt

Precedence

or, and are the same as
||, && but with lower precedence

Unicode

use utf8;                           # write Unicode characters in your source code
use Encode 'decode';                # process @ARGV in utf8 (perl -CA can also be used)
use open qw/:std :encoding(UTF-8)/; # Unicode with IO filehandles, e.g `say '零'`

Exceptions

try, catch is:

   eval BLOCK;
if ($@) BLOCK

Traps

always chomp after: system, backticks, open, <STDIN>


glob, <*> is safe for word splitting,
it's arguments only split on whitespace, not the returned files!
solutions: <"">, glob '""', or best to completely avoid the shell:

opendir my $DIR, '.' or die "$!\n";
my @dotfiles = grep { -f and /^\./ } readdir $DIR;

use open, system, ... with 3 args '-|', ... to:

  • be protected against clobbering, code exe, ... (>, |, ... in $filename)
  • avoid spawning a shell

die "exception"; # without a newline, the script line number is appended

if (`lsof ...`)
vs
if (system('lsof', ...) == 0)

because
% lsof +D folder 'always' sets $? to 1, man lsof (DIAGNOSTICS)


do not use -X file tests because of race conditions

example: just use

`cat $file`
vs
`cat $file` if -f $file;

each and //g return boolean so use while instead of for:

while (each %hash)
while (//g)

use

@backups[0 .. $#backups - 3]
vs
@backups[0 .. -3]

because .. counts up only


100 %  3 =  1 (100 - 3 * 33 = 99)
100 % -3 = -2 (100 - 3 * 34 = 102 <=> 100 % 3 - 3) # => it's either 0 or negative

Documentation

Easier access to Perl help topics

perldoc perl
perldoc perldoc
perldoc perlop
perldoc perlrun # command line options
perldoc File::Basename
perldoc -f split
perldoc -f -x # file test operators

Modules

use warnings;
use diagnostics;
use Getopt::Long 'GetOptions';
use File::Basename 'basename';
use File::Path 'make_path';
use Term::ANSIColor ':constants';
use List::Util 'any';

End of the program

__END__ or __DATA__

  • POD
  • comments
  • data that we want to process with while (<DATA>)