This program puts HTML links around URLs in files. It doesn't work on all possible URLs, but does hit the most common ones. It tries hard to avoid including end-of-sentence punctuation in the marked-up URL.
It is a typical Perl filter, so it can be used by feeding it input:
% gunzip -c ~/mail/archive.gz | urlify > archive.urlified
or by supplying files on the command line:
% urlify ~/mail/*.inbox > ~/allmail.urlified
The program is shown in Example 6.13.
#!/usr/bin/perl # urlify - wrap HTML links around URL-like constructs $urls = '(http|telnet|gopher|file|wais|ftp)'; $ltrs = '\w'; $gunk = '/#~:.?+=&%@!\-'; $punc = '.:?\-'; $any = "${ltrs}${gunk}${punc}"; while (<>) { s{ \b # start at word boundary ( # begin $1 { $urls : # need resource and a colon [$any] +? # followed by on or more # of any valid character, but # be conservative and take only # what you need to.... ) # end $1 } (?= # look-ahead non-consumptive assertion [$punc]* # either 0 or more punctuation [^$any] # followed by a non-url char | # or else $ # then end of the string ) }{<A HREF="$1">$1</A>}igox; print; }