During the examination, it became apparent that many of the accesses which were being banned, primarily in the "Sex" category, were banner ads. Some websites include banner ads which are sent from other sites, internet advertising companies which send out hundreds of thousands of banner ads daily. Smartfilter was banning many of these banner ad sites under its "Sex" category, possibly because some of them displayed banner ads on sexually-explicit websites or displayed sexually-explicit banner ads. This seemed to considerably inflate the number of banned accesses under the "Sex" category, because someone could be visiting, say, Yahoo.com, and if Yahoo was using one of the banned banner ad sites, each page visited on Yahoo would lead to another entry in the logfiles as the banner ad was rejected. Thus it is entirely possible to generate banned accesses in these log files when the user never intended to visit any sort of sexually-explicit site (see also the discussion of wrongly banned sites).
In order to examine this situation, we resolved to examine the log files without counting images. It is important to understand the procedure that occurs when one accesses a web page:
Situation A) Load a page. The HTML file (the text on the page) is loaded first. It may reference anywhere from zero to hundreds of images in its code. When the HTML page is loaded, the individual images are then loaded. The total number of accesses is equal to one for the HTML page plus one for each image loaded.
Situation B) Load a page. The page is banned by Smartfilter and rejected. No images are loaded. The total number of accesses is one.
Thus, eliminating images from consideration effectively normalizes the distribution: accessing one page, whether it is banned or not, counts as one access. We felt this might be a more useful tally (although of course the straight access count is valid as well).
We undertook to exclude all images from the logfiles and recompile the statistics on a no-images basis. (See the Methodology for more information.) The results were roughly as expected. Eliminating the images eliminated very many of the overall hits, more than two-thirds. The percentage of bans was therefore increased across the board, although the number of banned hits dropped as well with the elimination of the images. The "Sex" category showed the greatest drop: almost half of the "Sex" accesses were eliminated, showing that very many of the original "Sex" bans were not caused by people looking for pornography but were banner ads, probably completely innocent, that were served from banned domains.
Tables 3 and 4. Statistics Excluding Images and Known Banner-Ad Sites
Examination of the log files also showed that late-night (i.e.,
adult) accesses were much more likely to be banned. The following
graphs relate the time of day and whether it was a school or
non-school day to the number of accesses and number of bans recorded
at that time. (Note that the bans are graphed separately from the
number of accesses because they are barely visible on the primary
charts. The charts also include a category called
"universal", which is shorthand for sites which are banned
under all of Smartfilter's categories. These universal bans
are on sites which might allow someone to get around the banning, such
as www.anonymizer.com.)
Figures 1-4 show total statistics. Both
charts showing total accesses (Fig. 1,3) show a very pronounced
"bell" shape, having many more accesses occurring during the
middle of the day. The curves for banned accesses (Fig. 2,4) show
somewhat flatter shapes, as more of the bans occur in the evenings and
late nights. Figures 5-8 show the same statistics with images
removed. These graphs are much flatter, especially the graphs of
banned sites. The fact that removing banned images flattens the
graphs shows that more of the "unintentional", banner-ad
bans occurred during the day while accesses at night were less likely
to be of this variety. Generally, school accesses which generated
"Sex" bans were more likely to be "innocent" than
accesses attributable to library or dial-up usage.
Wrongful Bans
Wrongful bans are another important part of the equation. The
Appendix to this paper includes a listing of sites which some Utah
resident attempted to access and which were censored, during September
10-October 10. It is important to note that these represent real
people being banned from real sites - the document at the banned URL
is described underneath the URL. Each URL listed was banned at least
once during the sample period. Many URLs were attempted to be
accessed many times, but are listed only once in the listing. In many
cases, to avoid boring the reader, we have listed only one URL from a
site even though many URLs from that site were attempted to be
accessed and banned.
Wrongful bans are listed in the Appendix. Some interesting
evidence turns up upon scrutiny of them. Secure Computing states:
"As a rule, sites are not added to the Control List without first being viewed and approved by our staff."
There is a great deal of evidence that this is untrue. Offspring.com is banned under the Criminal Skills category for lyrics which use phrases like "crack the codes", "tap", and "surveillance" - but it's a rock group, not a site discussing "Criminal Skills". A website about a computer game named "Grand Theft Auto" is banned for its "Criminal Skills". A scholarly paper about Nazi Germany is banned, as well as sites which oppose hate speech and racism. National Families in Action and the Life Education Network, two groups which oppose drug abuse, are banned. A music group called "Bud Good and the Goodbuds" is banned under Drugs, for obvious reasons. An appeals court decision in a drug case is banned under Drugs. The Iowa State Division of Narcotics Enforcement is banned. A government brochure put out by the National Institute of Health is banned under Drugs. It is entitled "Marijuana: Facts for Teens". A page at Florida State University is banned under Gambling:
http://mailer.fsu.edu/~wwager/index_public.html
Look carefully at that URL. Do you see the phrase "wwager" in it? The author of the page is named Walter Wager. That is why this page was banned under Gambling - because a computer, not a human, read that page and decided that since it involved "wager"ing, it should be categorized under Gambling. Similarly, a computer can read a page which uses the word "Narcotics", and, not realizing that it's the Iowa State Police, adds it to the list.
These are the sorts of mistakes computers make. Companies which make censoring software employ computers to search through the world wide web looking for materials which meet their criteria. A computer sees a page which uses the phrase "grand theft auto", and decides immediately that this must involve Criminal Skills, and adds the page immediately to the list of banned sites. No human would decide that the page for a computer game involved Criminal Skills, but a computer easily could. Nor would a page written by Walter Wager be classified under Gambling by a human - but a computer might.
Secure Computing states that every site on their list was examined by a human before being added to the blacklist. (UEN, in a report to their superiors during the implementation of the Smartfilter system, stated that "[Smartfilter] uses educators to evaluate if the site is appropriate or not", emphasis added, although this is not and has never been true.) In fact, the most likely case is that humans may be employed to supervise and monitor the computers, and make some decisions about banning some sites, but that the computer program itself also adds sites to the blacklist on its own initiative. This is a cost-effective way to deal with the 500,000,000 or so web pages available over the internet, and since few customers will discover the errors that it creates, companies which make censoring software will be tempted to delegate more and more responsibility to computer programs. The Censorware Project has examined many different censorware products and to date, all of them have exhibited characteristics which indicated that the companies involved were lying about employing humans and humans alone to add sites to the list. Companies are unwilling to take the public relations hit that would come from admitting that computers perform the selection of "bad" sites, but they are also unwilling to take the financial hit from hiring the hundreds or thousands of humans it would take to have a chance at keeping up with the internet's rate of change and growth.
The other interesting factor disclosed by the wrongful bans is the ban on candyland.com. This site resolves to the corporate homepage of Hasbro, the toymaker (who owns the rights to the board game Candy Land). Originally owned by a porn site, this domain was sued by Hasbro in 1996 and forced to stop using it in, if our information is correct, February 1996. They took all content off the site and for a while it was simply empty. It formally turned over to Hasbro in March 1997. Therefore, Smartfilter has not reviewed this site since at least March 1997, and more probably since 1996, because any recent review would have found no content or Hasbro instead of porn. This provides a good indication as to how frequently sites are re-reviewed for changed content - most likely, they are never re-reviewed. Candyland.com will stay banned as a porn site indefinitely (or until Secure Computing reads this report), although it has had no pornography for more than two years, because the constant growth of the internet requires censorware companies to spend their time searching for new sites, not reviewing old sites already on their blacklist. Over time, this will also lead to substantial errors as domains and individual users turn over, change content, etc.
Overridden Bans
UEN has the capability to override bans of certain sites. If a site comes to UEN's attention which should not be banned, yet is, UEN can enter this site into the software and allow it to be accessed. UEN puts several barriers in the way of actually implementing this, though; appeals regarding banned sites must run through the scholastic chain of command, through the school principal and district supervisor. Obviously few teachers or other persons encountering a wrongly banned site would pursue this, and this is shown by the extremely few number of sites which UEN has bothered to override from the default blacklist presented to them by Secure Computing. The following sites showed up as overridden in the log files:
http://209.75.21.6/
This is a company which serves banner ads.
http://www.mormon.com/
All things Mormon. Currently banned under Sex, but overridden. [Not entirely, see further discussion.]
http://fafsaws1.fafsa.ed.gov/
The Free Application for Federal Student Aid, a form which is required to be filed for all applicants for college financial aid.
http://netaddress.usa.net/
Free web-based email service. Currently banned under Chat (thus, not banned under UEN's settings, but obviously was at some time.)
http://www.cyberteens.com/
Stories and whatnot, by and for teenagers. Like mormon.com, this site has both banned and unbanned accesses.
http://www.infoseek.com/
All ads at Infoseek (http://www.infoseek.com/ads/) are banned under the Sex category. This was apparently causing some problems, or perhaps all of Infoseek was banned, so this override was placed.
Mormon.com and cyberteens.com are the most interesting. Both of these sites have instances in the log files where they are banned, and instances where they are permitted to be accessed due to an override placed by UEN. It is our suspicion that UEN attempted to override the bans on these sites, but did not do so for all of the eleven proxy servers - thus, in some areas of Utah, students attempting to access mormon.com will be banned from doing so and in some areas they will be allowed. Neither mormon.com nor cyberteens.com has any material inappropriate for teenagers.
Previous Evaluations
It is difficult to compare these statistics to past performance, in Utah or elsewhere, as this is the first comprehensive evaluation of censoring software performance in real-life conditions. (Documentation relating to the lawsuit filed in Loudoun County, Virginia, provides the second-best source of such information; available online at http://censorware.net/legal/loudoun/ ). A document written in November 1996, just after UEN began using Smartfilter, indicates that "less than 0.7%" of all accesses were banned at that time. During the intervening two years to September 1998, gross accesses have increased by 1300% and the percentage of accesses banned has decreased somewhat to approximately 0.4% (see previous tables). Another document from March 1998 indicates that at that time, 0.60% of all accesses were banned. Although UEN states that they undertake to evaluate the effectiveness and performance of the censoring software they use, they were unable to provide any documentation of ever having done so beyond compiling gross statistics on the number and percentage of accesses banned, which their software does automatically.
It is perhaps worth noting that when UEN began providing internet access to Utah schools, the original plan was to allow each school district to create their own list of sites which they did not wish to be accessible over the internet, and for UEN to enforce these lists for each district. If they had implemented that plan, it is likely that documents such as the U.S. Constitution and Declaration of Independence would not today be banned in Utah.
continue: Conclusion
back to Table of Contents
This document last updated on Thursday September 07 2000.
Copyright © 1999 by the Censorware Project.
Redistribute freely in appropriate forums for non-profit uses only.
Contact information.
Censorware.Ørg.