|
BUILDING PAGESIf you run a small site with not too much traffic, having pages on your site generated on the fly by CGI scripts presents no real problem to your web server. However, as site traffic increases, this situation changes. It becomes ever harder for the web server to keep up with requests, as each time a page needs to be generated the perl interpreter needs to be evoked, the script code compiled and executed. One way to lighten the server load is to turn CGI pages whose output changes relatively infrequently into an equivalent set of static HTML pages. The browsable Category, Team, and Yellow Pages produced by Review Foundry are obvious candidates for this process, and each can be turned into a static directory that you can build on, say, a weekly basis. Review Foundry can also convert Member and Supplier Profile pages to HTML, as well as Top Reviewer pages. Via The BrowserThere are better ways to build pages than the method discussed in this section, namely via your browser. However, for reasons discussed in the next sections, these other methods for building may not be viable, so the browser method may be your only option. To build pages via your browser, go to the Build control panel. There you'll see options to separately build each of the Category, Team, and Yellow Page branches. There are also options to build pages for the Member and Supplier Profiles, and the Top Reviewer pages. You will also see an option to build all of these branches, one after the other. The process is performed in steps (i.e. is staggered). Up to a couple of hundred pages can generally be created before your web server reaches its timeout limit and the process is unexpectedly aborted. To keep this from happening, you should ensure that not too many pages are attempted at any one time. Built pages are presently batched by Container for the Category, Team, and Yellowpage branches, and by blocks of 100 for built profiles and reviewer pages. You can specify (as a configuration option) the number of containers to be processed together before the browser page is refreshed and the next batch is begun. If your Containers are relatively light, and each has no more than a few dozen Things within it, you might process 5 or 10 Containers per staggered page. If you are building the various possible Thing and Review orderings, the number of Containers that can safely be processed will decrease correspondingly. You may be required to process as little as a single Container per browser page. The number you decide upon is one of the configuration variables that can be set from the Build / Browse frame of the Configure control panel. It is recommended that you keep Containers on small side, if possible. If you cannot keep your Containers on the small side and run into timeout problems, then consider building pages from the command line (see next method). Note: The implementation of page building via the browser is handled by something called an NPH process (for non-parsed headers). For NPH processes the page headers are NOT handled by the server, and instead the script produces all of its own header info. However, on some servers this is forbidden and you may see a 500-type error message produced when you try to build pages via the browser. If that is the case, you can try toggling the nph_headers configuration variable found on the Configure > Build / Browse page. This variable allows you to switch off the NPH header management, and hand the headers generation process back over to the web server. So try this if you see an error of the form "Error 500 - Internal server error". Via The Command LineIf you have telnet (or SSH) access to your site, you can log in and run the build script via the command line. This method has the advantage that it is (somewhat) faster than the equivalent process carried out from the browser, because no CGI processing is involved. Also, building can take place in one (generally) long uninterrupted job--unlike building from the browser, where the process is split into many smaller jobs to reduce memory consumption and avoid timeout limits. But it still suffers from one drawback shared by the browser method of building--the process needs to be carried out manually. In the next section a possible solution to that problem is discussed. The command line invocations for a telnet-initiated build can be one of the following (this assumes you are issuing the command from the Review Foundry /do/admin directory which should be directory protected): perl ./nph-admin.cgi --do=BuildAll perl ./nph-admin.cgi --do=BuildItem perl ./nph-admin.cgi --do=BuildMember perl ./nph-admin.cgi --do=BuildSupplier perl ./nph-admin.cgi --do=BuildMemberProfile perl ./nph-admin.cgi --do=BuildSupplierProfile perl ./nph-admin.cgi --do=BuildReviewer If you wish to build all the static directories listed in your build plan (see the corresponding configuration variables for building), use the first command with the 'BuildAll' argument. In this case, if Member and Supplier Profiles are in the build plan, they will be built first, then Top Reviewers. If your Yellow Page directory is in the build plan the Supplier pages will be built next. If your Team directory is in the build plan the Member pages will follow. Finally, if your Category pages are in the build plan the Items pages will be built. If you need to only build one of the directories in your build plan, use one of the other commands shown above. Note: If you are building all possible Thing and Review orderings, the build process is going to take a long time, particularly if you have a lot of Things and Reviews in your database. Conflicting User IdentitiesWARNING: If you decide to switch between browser-based building and command-line building don't expect to get anywhere unless the process executes as the SAME user in both cases. Why? Because if one process builds static files which are then owned by user A, and then the other process attempts to overwrite those same files as user B, permissions on the files will likely prohibit any overwriting from taking place and the process may seem to die mysteriously (unless you have 777 permissions on everything). Generally both processes on a server will execute as the same user, but I have spent hours looking for problems in code when it has turned out that conflicting user identities are the cause of the problem. Dealing With TimeoutsThere is also a configuration variable that can be used to help you if your server timeouts for any reason while build from the command line. If your maximum build time for any process is, say, 60 minutes, and you set the build_expiration_in_minutes to 120 minutes, then you can get the build process to pick up where it left off if the process dies unexpectedly due to a server problem. Provided the process is restarted within the period before the expiration, it will skip rebuilding the pages already build. After the period has expired (like, the next time you intend to rebuild pages) the build will start again from scratch. This feature is useful, for instance, if your Perl process runs out of memory during a long command-line build (which happens to be the actual motivation behind the addition of this feature). If you are in the habit of running a cron job (see below) to perform the build, say daily, and timeouts are an issue, then there is a way to handle this. You can set up a monitoring script for the cron job. This script will check whether or not the build script is running, and if it is found to be not running it will relaunch the build. Basically the monitoring script runs a loop with a set number of iterations, and sleeps for a certain duration at the bottom of each iteration. The total monitoring time should be long enough to cover the execution of the build even if it is restarted a few times due to interruptions. Here's a example of a monitoring script. You would cron this rather that the build script directly, which is referred to in the monitoring script:
#!/usr/bin/perl -w--
#
# monitor.pl
use strict;
my $foundry_path = '/path/to/Foundry/do';
my $build_script = $foundry_path.'/admin/nph-admin.cgi';
my $build_args = '--do=BuildAll';
my $monitor_cmd = "ps a | grep $build_script";
my $launch_cmd = "/usr/bin/perl $build_script $build_args";
## edit the number of $interval_durations so
## as to cover the expected total build duration
## ---------------------------------------------
my $interval_durations = 9; ## 3hrs monitoring
my $interval_duration_minutes = 20;
my $interval_duration_seconds = 60 * $interval_duration_minutes;
## start the monitoring loop
## -------------------------
foreach ( my $interval = 0; $interval < $interval_durations; ++$interval )
{
my $monitor_result = `$monitor_cmd`;
my @lines = split( "\n", $monitor_result );
## is the build running?
## ---------------------
my $is_running = 0;
my $index = 1;
foreach my $line (@lines)
{
++$is_running if ( $line !~ /grep\s+$build_script/ );
print "line $index: $line\n";
++$index;
}
print "\nbuild is_running = $is_running so ";
if ( $is_running )
{
print "do nothing\n\n";
}
else {
my @localtime = localtime();
my $year = $localtime[5] + 1900;
my $month = ('Jan','Feb','Mar','Apr','May','Jun',
'Jul','Aug','Sep','Oct','Nov','Dec')[$localtime[4]];
my $day = $localtime[3];
my $wday = ('Sun','Mon','Tue','Wed','Thu','Fri','Sat')[$localtime[6]];
my $hour = sprintf("%02u",$localtime[2]);
my $min = sprintf("%02u",$localtime[1]);
my $sec = sprintf("%02u",$localtime[0]);
my $formatted_time = qq|$hour:$min:$sec $wday $day $month, $year|;
print "launch $build_script at $formatted_time\n\n";
system($launch_cmd);
}
sleep($interval_duration_seconds);
}
my @localtime = localtime();
my $year = $localtime[5] + 1900;
my $month = ('Jan','Feb','Mar','Apr','May','Jun',
'Jul','Aug','Sep','Oct','Nov','Dec')[$localtime[4]];
my $day = $localtime[3];
my $wday = ('Sun','Mon','Tue','Wed','Thu','Fri','Sat')[$localtime[6]];
my $hour = sprintf("%02u",$localtime[2]);
my $min = sprintf("%02u",$localtime[1]);
my $sec = sprintf("%02u",$localtime[0]);
my $formatted_time = qq|$hour:$min:$sec $wday $day $month, $year|;
print "\nmonitor exiting at time $formatted_time\n\n";
Place this script into your /do/admin directory so that it can only be run by you. It should be apparent by examining the code for this script that when the script first launches, it checks whether the build script is running by examining the process list. Then, not finding it, it launches the build and rechecks its process status every 20 minutes. If it finds the build has stopped, it relaunches it. Otherwise it sleeps 20 minutes before checking again. The length of time you run the monitor is determined by the product of $interval_duration_minutes and $interval_duration_minutes, which you can adjust as needed. If the build never gets interrupted then the total monitoring run time is irrelevant. Depending on your operating system, you may be required alter the exact form of the $monitor_cmd variable. The one shown above should work for a Linux setup. the Via Cron JobIf you know how to run scheduled cron jobs--automated execution of programs--you may be able to set things up so that the build process takes place according to a preset schedule that requires no human intervention. However, many web hosts RESTRICT the amount of CPU time that can be allocated to a single cron job. If this is the case for you, very likely you will find yourself running into timeout problems yet again. Possibly, the cron job may only be of use to you if you are running your own dedicated web server and you can remove the time limit for cron execution. Check with your web host first about timeout limits for cron jobs before you invest time trying to get the build process automated. Otherwise, if you believe that setting up a cron job should be feasible, edit your crontab file and add something like the following lines: 38 1 * * 1 perl /path/to/nph-admin.cgi --do=BuildSupplierProfile --cron=1 38 2 * * 1 perl /path/to/nph-admin.cgi --do=BuildMemberProfile --cron=1 38 3 * * 1 perl /path/to/nph-admin.cgi --do=BuildReviewer --cron=1 38 4 * * 1 perl /path/to/nph-admin.cgi --do=BuildItem --cron=1 38 5 * * 1 perl /path/to/nph-admin.cgi --do=BuildMember --cron=1 38 6 * * 1 perl /path/to/nph-admin.cgi --do=BuildSupplier --cron=1 This example, which rebuilds every Monday at 1:38, 2:38, 3:38, 4:38, 5:38, and 6:38 A.M., respectively, the Supplier Profile, Member Profile, Top Reviewer, Category, Team, and Yellow Page branches, assumes that the individual builds each take less than an hour to complete. Alternatively, if you cannot be sure of the time required to compete one of the build arms, you can elect to build the lot, one after the other, like this: 38 2 * * 1 perl /path/to/nph-admin.cgi --do=BuildAll --cron=1 The extra --cron=1 argument ensures that logging to the screen is switched off unless an error message needs to be output. This ensures that any email message sent to you after your cron jobs are completed remains of manageable size. If you cannot run cron jobs, try to use the telnet method instead. If that isn't possible, try the browser method. Note: If you discover that your builds are terminated due to timeouts, try using the monitoring script method of the previous section. Cron the monitor, rather than the build script directly. E.g. 38 1 * * 1 perl /path/to/monitor.pl You would, of course, edit the content of monitor.pl so that it launches the relevant build command. Meaning Of The Build PlanWhen static pages are built, Review Foundry has to have some idea about what static pages will ultimately be created in the build so that it can put in links to these pages before they are actually built (since not everything can be built all at once). This is handled by specifying a bunch of "build plan" variables, which can be located on the Configure > Build / Browse control panels. In fact, these are the build plan variables you will find there: These 3 build plan options specify whether or not you intend to have static review pages generated for the Category, Team, and Yellowpage branches respectively. Generally if you are building for any one branch, and have not disabled the other branches, you'll build pages for those too (though it's not a requirement). Both Members and Suppliers can have profile pages associated with them. If you want to build the profile pages for either of these things, set the relevant build plan variable shown above. When you offer pages of Top Reviewers you can opt to build static pages for these too. If you would like to offer an RSS feed to every member, which contains a summary of the most recent reviews they have written, set the build plan option shown above (and of course, be sure to build the pages regularly too). When the build plan option shown above is enabled, all your custom RSS feeds are rebuilt each time you build static pages for Top Reviewers. Otherwise you can build your custom RSS feeds from the Miscellaneous control panel. These 3 build plan options allow you to build RSS feeds of latest reviews for all Items, Members, or Suppliers respectively. Note that these are reviews written about these things because they are attached to a Category, Team, or Yellowpage. Finally, articles can also be build as static pages, and this is the build plan variable that needs to be enabled so that you can do that. NOTES When you "Build All" the program builds everything according to the plan, and puts in all the static links assuming that the entire build will proceed to completion. In fact, you should probably ALWAYS elect to "Build All", because if you only build (for example) the Supplier pages, but the Member Profiles are mentioned in the plan, the links to static Member Profile pages will be inserted into static pages (and dynamic pages too), but they will lead nowhere because the Member Profile pages were not actually built. If you hit "Build All" the member profile pages will actually be built before the Supplier pages are. So "Build All" is generally the best way to go. You won't have to remember what to build and what need not be built. Once you have done a build, you can point your browser to the build root page (the URL of which is used to create the link labelled Static in your admin navigation bar on the far right) to see the results. Static links will also appear in most (but not all) places on the dynamic pages, so anyone who starts on the dynamic pages will end up on static ones fairly quickly. Building Compressed PagesIf you build static pages you will see that a fair amount of disk space is devoted to the result. In particular, if you have defined a number of rating attributes, and have allowed pages to be build with sortings based on the average value of those rating attributes, a LOT of disk space is chewed up. You do have the option of NOT offering those review sortings to visitors. See the Configure > Build / Browse control panel if you wish to deactivate those review sorting options in static pages. Note you will need to remove your static pages and rebuild them if you do this. Review Foundry does not delete old versions of built pages at present (rather it simply overwrites existing pages). If you have a large database of review items and want to keep all those review sorting options when building pages, and happen to be hosted on an Apache server that has the mod_gunzip module installed, then there is another option: build the static pages in compressed gzip format. This option is offered from the Configure > HTML Compression control panel. Instead of building pages like harley_motor_cycle.html, Review Foundry will allow you to instead write pages as harley_motor_cycle.html.gz where the content is gzipped and occupies around 25 percent the amount of disk space as the equivalent uncompressed file. The mod_gunzip module will negotiate with browsers, sending them the compressed pages when they request it, saving significantly on bandwidth as well as disk space. For browsers that cannot handle the gzip format, mod_gunzip will inflate the file before sending it. The lesson to take away here? If you can find a decent web hosting company that offers Apache solutions with mod_gunzip as an option, strongly consider using them as your hosting platform--particularly if you intend to create a fairly large Review Foundry database (with several thousand or more reviewable items). Reducing Total Page CountWhen you build pages there are some important considerations to keep in mind. You don't want to run out of disk space, or exceed the total number of files you are permitted to create on your server. Let me show you how this happens... One of my customers came to me recently. His build process wasn't working. It would not even start. After checking his site I found that during the initialization process, where the build log file was created, the server was denying the creation of the file. The error message: "disk quota exceeded". I thought he had run out of disk space. But he came back later and told me his web hosting company had informed him that he had exceeded his allotment of 800,000 files. He thought this must be a mistake. How could Review Foundry generate 800,000 files? Answer: Easy. Just don't take into consideration how many pages need to be built to accomodate all category sortings and review sortings. My customer had imported 20,000 suppliers into his system. Now that's not a horrendous number, so I did a rough calculation for him to see whether it might be possible to create 800,000 files before collecting even a handful of reviews. The content of that rough calculation is reproduced below. Think about the important numbers that come up when you are planning your own system and you intend to build pages. "Gary," I told him after my back of the envelope calculation of the relevant numbers, "It's certainly possible to reach 800,000 files. Let me show you why." This is what I told him:
Let's do a quick rough calculation. Let's say you have M = 20,000 suppliers and N = 12 rating attributes (actually 6, but you have keep both ascending and descending sortings for each). Let's also assume each supplier resides in just one yellowpage and that P suppliers are listed per yellowpage, and also that you have Q reviews listed per page, and R reviews in total. There will be about (M/P)*N files for yellowpage listings, or ( 20,000 / 10 ) * (12) = 24,000 files. There will be about M*N*( the average number of review pages per supplier ) files for review listings. You have barely any reviews so let's say this is the minimum number M*N*1 = 20,000 * 12 * 1 = 240,000. If you place each supplier in multiple yellowpages you would have to multiply by that number, and there is some indication you did that. If you placed each supplier in 2 yellowpages on average then you would come up with 480,000 files. Then there are supplier profiles. Another 20,000. So very roughly, I count 284,000 files if you did not add suppliers to multiple yellowpages. That's a ballpark figure. You could easily reach 800,000 files if you added each supplier to--on average--3 or more yellowpages. --Stephen So, the important factors here are (1) the number of review sortings due to rating attributes. If you retain only the scensing or the descending sorting for each rating attribute (i.e. keep "sort by best service" and toss "sort by worst service") you'll cut page number by roughly a factor of 2. Furthermore, cut the number of rating attributes by 2 again (do you really need them all?) and you cut page total in half again. If you are placing each reviewed thing in multiple contains, then this increased the total number of page by roughly the same factor. Place each item in 3 categories, and you triple your built page count. Those are important considerations. Think about them before you go wild and assume you have endless disk space and file allocations to play with. You never do. Things That Can Go Wrong When BuildingIn this section I am just going to list some problems I have come across with customers who have had trouble building pages for one reason or another. They are presented in no particular order, and some are copied from the TROUBLESHOOTING section of the manual.
« Table of Contents | Obtain Review Foundry » Copyright © 2004 Random Mouse Software. All Rights Reserved. |