Introduction
Reports
Utah - Smartfilter
X-Stop
Websense
Cyberpatrol
Press
FAQ
Search
Tips
Diagnose
Cases
Loudoun, VA
Livermore, CA
PICS
Essays
Shame
Links
Administrivia

Censored Internet Access
in Utah Public Schools and Libraries

Methodology, Background and Sample Population

 

Log files created by UEN's eleven proxy servers running the Smartfilter software were collected and analyzed. A "proxy server" is a computer through which requests for documents on the internet pass. The software running on the proxy server examines each request as it is made and decides whether to accept it (and fetch the requested document) or reject it. The decision is made by consulting an encrypted list of internet addresses which were determined, by Secure Computing Corporation, to fall into one of 27 categories. When the access is accepted or rejected, the proxy servers note the result in an electronic log file.

The log files covered the period 10 September through 10 October 1998, inclusive, which represented 20 days when school was in session and 11 non-school days. The files contained the following information:

  • IP address of accessing computer [REDACTED by UEN]
  • Date/time of access
  • Method of access
  • URL accessed
  • HTTP specification
  • HTTP status code
  • Number of bytes returned
  • Smartfilter category classification

The log files ended up being 838 Mb compressed; approximately 6.5 Gigabytes uncompressed, and included about 53 million lines of data, with each line representing one access or attempted access of a resource on the World Wide Web. Email, chat (except for web-based chat), and other methods of using the internet are not covered by these log files and are not known to be intercepted by Smartfilter.

The log files were not pristine. Files from the various proxy servers were combined into a single file for transmission to us. When received, they contained inconsistencies, especially in the places where one file ended and the next began. A very small number of URLs were discarded due to these errors in the files. This is not expected to bias the results in any measurable fashion.

The files were scanned by a custom-written computer program (perl), which collected statistics about the files and separated the URLs by which category(ies) they were banned under. Smartfilter classifies documents on the web into 27 categories, any or all of which can be activated by the entity controlling the software (UEN). UEN has five of the 27 categories activated: "Sex", "Gambling", "Criminal Skills", "Hate Speech", and "Drugs". A URL can be classified in multiple categories. If a user tries to access a URL which is classified in any of the five categories which UEN has activated, they will generally receive a message saying that access has been denied.

Certain statistics regarding the files were also collected with image files excluded, to the extent possible. For these statistics, URLs ending in ".gif" or ".jpg" or ".jpeg" were excluded, as well as all URLs from domains known to be dedicated to serving banner ads. This process is slightly overinclusive - if a user visited the banner ad domain directly, to look up advertising rates for example, these accesses would be discarded. It is also moderately underinclusive - there are sites which serve image files in various non-standard methods,which would not be caught nor discarded by this procedure.

UEN has no access to Smartfilter's list of banned sites. They will know a given site is banned only by attempting to access it and being blocked from doing so. UEN does not make additions to the list of banned sites and makes very few removals (see discussion). For all practical purposes, the makers of Smartfilter (Secure Computing Corporation of San Jose, California) make the final decisions as to what Utah students, adults and library patrons can view over the internet.

Costs to UEN for maintaining the censoring proxy servers are significant. A newspaper article indicated the cost for the software alone is approximately $20,000 per year. UEN's budget for fiscal year 1999 indicates they were allocated $12,000 for proxy software and $124,048 for proxy hardware. There are also substantial costs associated with the personnel to maintain and administer these servers.

After the files were scanned and separated into categories, the URLs which had been banned in each category were reviewed. For the four smaller categories, this review was simply scanning over the list looking for URLs which seemed "out of place". For the "Sex" category, another script (computer program) was employed to aid in the review. URLs which appeared to represent valuable resources wrongly or irrationally banned were called up (see discussion). The review process likely missed many such URLs. The review process should have been efficient enough to discover the majority (>50%; perhaps >75%) of the wrongful bans present in the log files. Note that this applies only to the list of wrongly or irrationally banned sites at the end of the report; the statistics and graphs presented were compiled by a simple computer program and are presumed to be 100% accurate. Achieving 100% accuracy in reviewing the banned sites would require humans to examine the document at every URL banned, which is prohibitively difficult.

The user population which created these log files is diverse. During school days, the users are predominantly public school students. The vast majority of Utah public schools are wired for internet access (although sometimes this means having one computer with a modem); a majority of the wired schools have computing facilities available to students. In general, it appears that many Utah high schools are well-connected and have multiple computers available to students, while many elementary schools may have only a single computer with a dial-up connection to the internet, which may be available only for class demonstrations and the like and not for student use. It is assumed that the bulk of student internet accesses are from students aged 13-18. From information provided by UEN and our own research, it is believed that all public elementary and secondary schools in Utah have all of their internet access provided through UEN's Smartfilter-equipped proxy servers.

According to the Census Bureau, Utah has some 500,000 residents between the ages of 5 and 17. Not all of them attend public schools; there are approximately 30 private schools plus the possibility of home schooling. Considering this factor and the factors listed previously, we believe a reasonable estimate for the number of public school students who had the opportunity to use the internet during the sampled time period to be in the neighborhood of 100,000-150,000. We have no estimate for the number of library patrons and dial-up educators whose accesses contribute to the non-school day logs.

During non-school days, the users are predominantly library patrons, with a smattering of dial-up users (these people would also be present during the school days, of course, and make up a sort of "background" presence to the scholastic users). UEN provides dial-up access at all hours of the day and night to an unknown number of teachers statewide. Thus, accesses which occur at times when neither libraries or schools are open are from these dial-up services. Of the 70 public library systems in Utah, at least eight use UEN's proxy servers, and probably several more. A few public libraries in Utah use other censoring software products. Many public libraries do not censor internet access. Urban areas may be less likely to have censored internet access than rural areas. Nationally, the average age of internet users is 30-35, so this population can probably be assumed to be significantly older than the student population.

Internet usage during school days is approximately 20 times that of non-school days.

continue: Discussion

back to Table of Contents

This document last updated on Thursday September 07 2000.


Copyright © 1999 by the Censorware Project.
Redistribute freely in appropriate forums for non-profit uses only.
Contact information.
Censorware.Ørg.