IO::FILE=IO(0X95C28C4)

Section: User Contributed Perl Documentation (1)
Updated: 2022-04-13
Index Return to Main Contents
 

NAME

Perl::Tokenizer - A tiny Perl code tokenizer.  

VERSION

Version 0.10  

SYNOPSIS

    use Perl::Tokenizer;
    my $code = 'my $num = 42;';
    perl_tokens { print "@_\n" } $code;

 

DESCRIPTION

Perl::Tokenizer is a tiny tokenizer which splits a given Perl code into a list of tokens, using the power of regular expressions.  

SUBROUTINES

perl_tokens(&$)
This function takes a callback subroutine and a string. The subroutine is called for each token in real-time.

    perl_tokens {
        my ($token, $pos_beg, $pos_end) = @_;
        ...
    } $code;

The positions are absolute to the string.

 

EXPORT

The function perl_tokens is exported by default. This is the only function provided by this module.  

TOKENS

The standard token names that are available are:

       format .................. Format text
       heredoc_beg ............. The beginning of a here-document ('<<"EOT"')
       heredoc ................. The content of a here-document
       pod ..................... An inline POD document, until '=cut' or end of the file
       horizontal_space ........ Horizontal whitespace (matched by /\h/)
       vertical_space .......... Vertical whitespace (matched by /\v/)
       other_space ............. Whitespace that is neither vertical nor horizontal (matched by /\s/)
       var_name ................ Alphanumeric name of a variable (excluding the sigil)
       special_var_name ........ Non-alphanumeric name of a variable, such as $/ or $^H (excluding the sigil)
       sub_name ................ Subroutine name
       sub_proto ............... Subroutine prototype
       comment ................. A #-to-newline comment (excluding the newline)
       scalar_sigil ............ The sigil of a scalar variable: '$'
       array_sigil ............. The sigil of an array variable: '@'
       hash_sigil .............. The sigil of a hash variable: '%'
       glob_sigil .............. The sigil of a glob symbol: '*'
       ampersand_sigil ......... The sigil of a subroutine call: '&'
       parenthesis_open ........ Open parenthesis: '('
       parenthesis_close ....... Closed parenthesis: ')'
       right_bracket_open ...... Open right bracket: '['
       right_bracket_close ..... Closed right bracket: ']'
       curly_bracket_open ...... Open curly bracket: '{'
       curly_bracket_close ..... Closed curly bracket: '}'
       substitution ............ Regex substitution: s/.../.../
       transliteration.......... Transliteration: tr/.../.../ or y/.../.../
       match_regex ............. Regex in matching context: m/.../
       compiled_regex .......... Quoted compiled regex: qr/.../
       q_string ................ Single quoted string: q/.../
       qq_string ............... Double quoted string: qq/.../
       qw_string ............... List of quoted words: qw/.../
       qx_string ............... System command quoted string: qx/.../
       backtick ................ Backtick system command quoted string: `...`
       single_quoted_string .... Single quoted string, as: '...'
       double_quoted_string .... Double quoted string, as: "..."
       bare_word ............... Unquoted string
       glob_readline ........... <readline> or <shell glob>
       v_string ................ Version string: "vX" or "X.X.X"
       file_test ............... File test operator (-X), such as: "-d", "-e", etc...
       data .................... The content of `__DATA__` or `__END__` sections
       keyword ................. Regular Perl keyword, such as: `if`, `else`, etc...
       special_keyword ......... Special Perl keyword, such as: `__PACKAGE__`, `__FILE__`, etc...
       comma ................... Comma: ','
       fat_comma ............... Fat comma: '=>'
       operator ................ Primitive operator, such as: '+', '||', etc...
       assignment_operator ..... '=' or any assignment operator: '+=', '||=', etc...
       dereference_operator .... Arrow dereference operator: '->'
       hex_number .............. Hexadecimal literal number: 0x...
       binary_number ........... Binary literal number: 0b...
       number .................. Decimal literal number, such as 42, 3.1e4, etc...
       special_fh .............. Special file-handle name, such as 'STDIN', 'STDOUT', etc...
       unknown_char ............ Unknown or unexpected character

 

EXAMPLE

For this code:

    my $num = 42;

it generates the following tokens:

      #  TOKEN                     POS
      ( keyword              => ( 0,  2) )
      ( horizontal_space     => ( 2,  3) )
      ( scalar_sigil         => ( 3,  4) )
      ( var_name             => ( 4,  7) )
      ( horizontal_space     => ( 7,  8) )
      ( assignment_operator  => ( 8,  9) )
      ( horizontal_space     => ( 9, 10) )
      ( number               => (10, 12) )
      ( semicolon            => (12, 13) )

 

REPOSITORY

https://github.com/trizen/Perl-Tokenizer <https://github.com/trizen/Perl-Tokenizer>  

AUTHOR

Daniel ``Trizen'' Xuteu, <trizen@protonmail.com>  

COPYRIGHT AND LICENSE

Copyright (C) 2013-2017

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.22.0 or, at your option, any later version of Perl 5 you may have available.


 

Index

NAME
VERSION
SYNOPSIS
DESCRIPTION
SUBROUTINES
EXPORT
TOKENS
EXAMPLE
REPOSITORY
AUTHOR
COPYRIGHT AND LICENSE

This document was created by