Perl web programming: Extracting form data by parsing the HTTP request message.
Most Perl web scripts, that you find on some Internet site or in some web-development related book, use the module CGI.pm. CGI.pm is considered being obsolete by lots of people, and has been removed from Perl's core years ago. On the other hand, the most recent release of Strawberry Perl for Windows still includes the module, and I suppose that lots of people, who write Perl scripts for the web, still use it. Personally, I think that using CGI.pm to extract data from a web form (this is what on some websites is called "the good part of CGI.pm") is a simple and acceptable solution. Using the methods of this module to generate a dynamic web page, (it's in particular this part of CGI.pm that is criticized) is something that I never used. Anyway, this tutorial shows how to extract form data with Perl, without using CGI.pm. The reader is supposed to have a basic knowledge of HTML and Perl CGI programming...
Let's start by creating a simple web page (filename "hello1.html)", where the user can enter their first name and last name, and pushing a button, can send this data to the web server, where it should be handled by a Perl script called "hello_user.pl". Here is the HTML of my page:
<html>
<head>
<title>Perl web programming (HTML input form)</title>
</head>
<body>
<form method="POST" action="/cgi-bin/hello_user.pl">
<br/>Please, enter your name:<br/><br/>
First Name <input type="text" name="firstname" value=""/><br/><br/>
Last Name <input type="text" name="lastname" value=""/><br/><br/>
<input type="submit" value="Submit"/>
</form>
</body>
</html>
The two input fields for the first name and last name are "packed" into a form, defined as a <form> element. One of this element's attributes is action, that is set to the URL of the script to be executed when the submit button is pushed (in our case: the script "hello_user.pl" located, as it should be, in the cgi-bin directory).
The two text fields for the first name and last name, entered by the user, are identified by a name using the name attribute. It's these names that will be used in the script to extract the content (values) of the text fields.
If we place the file hello1.html in the document root of the web server, we can access the page entering localhost/hello1.html in the address bar of the web browser.
![]() |
The script hello_user.pl, executed when the submit button is pushed, should simply return a web page with a greeting message, a personal greeting if both a first name and a last name have been entered, otherwise the message "Hello, World!". Here is the code, using the classic approach with CGI.pm:
#!C:/Programs/Strawberry/win64/perl/bin/perl.exe
use strict; use warnings;
use CGI;
use CGI::Carp "fatalsToBrowser";
my $cgi = new CGI;
print $cgi->header();
my %params = $cgi->Vars;
my $firstname = $params{'firstname'};
my $lastname = $params{'lastname'};
$firstname =~ s/[^a-zA-Z\'\-]*//g;
$lastname =~ s/[^a-zA-Z\'\-]*//g;
my $greeting;
if ($firstname and $lastname) {
$greeting = "Hello, $firstname $lastname!";
}
else {
$greeting = "Hello, World!";
}
print "<html>";
print "<head><title>Perl web programming (Perl script output)</title></head>";
print "<body><h1>$greeting</h1></body>";
print "</html>";
The form input data is extracted using the method $cgi->Vars. Assigning the return of this method to the hash %params (you may use another name, of course), we can access the values of the different form fields by accessing the different elements of the hash, the keys to use being the name that we have given to the field in the HTML code of the page with the form. So, to extract the first name and last name entered by the user, we use $params{'firstname'} and $params{'lastname'}.
The code of the script should be simple enough to understand. Just one remark: Note the two lines that transform the values returned as first name and last name by removing everything that's not a letter, an apostrophe, or a hyphen. This is a security measure to avoid cross-site scripting, and this should be part of every script that extracts data entered by the user! It's obvious that in a "real world" example, we would have to do differently, because the way it's done here allows only ASCII characters (no accents, umlauts, no non-Latin characters).
The screenshots below show the web page generated by the Perl script. On the left, both the first name and the last name have been entered, on the right, one, or the two text fields have been let empty.
![]() |
![]() |
The usage of the method $cgi->Vars makes extracting the form data really easy. But, if we don't want to use CGI.pm, how can we proceed?
When we push a submit button on a web form, the browser sends a request message to the server. As described in my tutorial Perl web programming: HTTP headers and CGI environment variables, information concerning this message is available as CGI environment variables, that we can read as such from a Perl script. However, things are a little bit more complicated. In fact, the request message can be of two very different kinds: either there is a POST request, or there is a Get request. Which one of the two is used, depends on the method attribute of the <form> element on the "calling" web page.
In our file hello1.html, we have used the POST method. Lets try out what happens, if we use GET instead. Create the file hello2.html, identical to hello1.html,
except for the form definition line:
<form method="GET" action="/cgi-bin/hello_user.pl">
Place the file hello2.html in the document root of your web server and access the page by entering localhost/hello2.html in the browser address field. The page displayed looks exactly the same as with hello1.html, If we push the submit button, the same greeting page as before is displayed. But, look at the content of the browser address bar (the screenshot on the left corresponds to the page displayed when the name fields were filled in, the screenshot on the right to the page when these fields were left empty): the form data has been appended to the URL!
![]() |
![]() |
Note: The screenshot on the left also shows how the symbols "<" and ">", entered as part of the last name, have been filtered out by our XSS related Perl code.
Our examples show the major difference between GET and POST: in a GET request the form parameters are part of the URL, whereas in a POST request the form parameters are stored within the request body. There are several other important differences between the two request methods. For details, please, have a look at the article Difference between HTTP GET and POST Methods at the GeeksforGeeks website.
From what has been said, it's obvious that extracting the form data from the request message has to be handled differently for GET and for POST. Thus, the first thing our script has to do is to determine the request method. This is easy thanks to the CGI environment variable REQUEST_METHOD.
if ($ENV{REQUEST_METHOD} eq "GET"){
...
}
elsif ($ENV{REQUEST_METHOD} eq 'POST') {
...
}
Extracting the form data in the case of a GET request is simple. In this case, the data is appended to the URL, using a string of the form
?<param1>=<value1>&<param2>=<value2>...
and this string is available as CGI environment variable QUERY_STRING.
Thus, all we have to do to get the different parameter-value-pairs is to split the content of this environment variable into an array.
my @param_value_pairs;
if ($ENV{REQUEST_METHOD} eq "GET"){
my $query = $ENV{QUERY_STRING};
if (defined $query) {
@param_value_pairs = split /&/, $query;
}
}
In the case of a POST request the form data is included in the message body, and it's not possible to directly extract it. However, what we can do is determine the length of the request body using the CGI environment variable CONTENT_LENGTH, and then read that amount of data from the standard input. And again, all we have to do to get the different parameter-value-pairs is to split the data read this way into an array.
my @param_value_pairs;
if ($ENV{REQUEST_METHOD} eq 'POST') {
my $query = "";
read(STDIN, $query, $ENV{CONTENT_LENGTH}) == $ENV{CONTENT_LENGTH}
or return undef;
push @name_value_pairs, split /&/, $query;
}
Having an array with the form's parameter-value-pairs, we can create a hash, with elements having the parameter name as key and the parameter value as value. Here is the code, as suggested in the article Forms and CGI, actually a preview of the O'Reilly book CGI Programming with Perl, 2nd Edition.
my %form_data;
foreach my $param_value (@param_value_pairs) {
my($param, $value) = split /=/, $param_value;
$param =~ tr/+/ /;
$param =~ s/%([\da-f][\da-f])/chr( hex($1) )/egi;
$value = "" unless defined $value;
$value =~ tr/+/ /;
$value =~ s/%([\da-f][\da-f])/chr( hex($1) )/egi;
$form_data{$param} = $value;
}
Note the transformation of parameter name and value, decoding the URL-encoded characters...
Here is the complete code of my hello_user2.pl:
#!C:/Programs/Strawberry/win64/perl/bin/perl.exe -I"."
use strict; use warnings;
print "Content-type: text/html\n\n";
my $ref_query = parse_form_data()
or error("Script error: Invalid HTTP request!");
my %query = %$ref_query;
my $firstname = $query{firstname};
my $lastname = $query{lastname};
$firstname =~ s/[^a-zA-Z\'\-]*//g;
$lastname =~ s/[^a-zA-Z\'\-]*//g;
my $greeting;
if ($firstname and $lastname) {
$greeting = "Hello, $firstname $lastname!";
}
else {
$greeting = "Hello, World!";
}
print "<html>";
print "<head><title>Perl test (Perl script output)</title></head>";
print "<body><h1>$greeting</h1></body>";
print "</html>";
#
# Reading form data without CGI.PM
#
sub parse_form_data {
# Modified from CGI Programming with Perl, 2nd Edition by Scott Guelich, Shishir Gundavaram, Gunther Birznieks (O’Reilly)
# Webpage: https://www.oreilly.com/library/view/cgi-programming-with/1565924193/ch04.htm
my %form_data;
my @name_value_pairs;
if ($ENV{'REQUEST_METHOD'} eq "GET"){
my $query = $ENV{'QUERY_STRING'};
if (defined $query) {
@name_value_pairs = split /&/, $ENV{QUERY_STRING};
}
else {
return undef;
}
}
elsif ($ENV{REQUEST_METHOD} eq 'POST') {
my $query = "";
read(STDIN, $query, $ENV{CONTENT_LENGTH}) == $ENV{CONTENT_LENGTH}
or return undef;
@name_value_pairs = split /&/, $query;
}
foreach my $name_value (@name_value_pairs) {
my($name, $value) = split /=/, $name_value;
$name =~ tr/+/ /; $name =~ s/%([\da-f][\da-f])/chr( hex($1) )/egi;
$value = "" unless defined $value;
$value =~ tr/+/ /; $value =~ s/%([\da-f][\da-f])/chr( hex($1) )/egi;
$form_data{$name} = $value;
}
return \%form_data;
}
Create hello3.html from hello1.html and hello4.html from hello2.html, replacing action="/cgi-bin/hello_user.pl" by action="/cgi-bin/hello_user2.pl" and place the two files in your web server documents root. Place hello_user2.pl in the cgi-bin directory. Then try the new script with POST resp. GET by submitting the form at localhost/hello3.html resp. localhost/hello4.html. The output should be the same as before when using the script based on CGI.pm.
Note that the parsing routine, as described above, does not work correctly in the case where the form contains elements that share the same name, or if there are elements that support multiple values. In fact, in this case we'll receive multiple values for the same name. All name-value-pairs will be stored correctly in the array, but when creating the hash, we would overwrite all existing hash entries, and for a given parameter, only the last value would be saved.
There are ways to solve this possible issue; the one suggested in the O'Reilly book is to replace $form_data{$name} = $value by
if ( exists $form_data{$name} ) {
$form_data{$name} .= "\t$value";
else {
$form_data{$name} = $value;
}
If there is already a hash element with that key, then we append the new value to the existing one, separating the values by some character, as for example a tab. This makes, of course, only sense if the "calling" script is aware of the parameters with multiple values, and parses the value string of these parameters.
Another possibility would be to use a hash of arrays instead of a hash of strings. This way, we could store any number of values for a given parameter, including a single one.
If you find this text helpful, please, support me and this website by signing my guestbook.