Introduction
Reports
Utah - Smartfilter
X-Stop
Websense
Cyberpatrol
Press
FAQ
Search
Tips
Diagnose
Cases
Loudoun, VA
Livermore, CA
PICS
Essays
Shame
Links
Administrivia

Lies, Damn Lies, and Statistics

A followup to our March 1999 Utah SmartFilter report

by Jamie McCarthy
for the Censorware Project

 

Abstract


The makers of SmartFilter issued a press release falsely claiming we proved its accuracy. This fabrication was used to help solidify the views of Sen. McCain, whose mandatory-censorware bill (S.97) has just been approved by committee; when it passes it will ensure your taxes pay for this software in schools and libraries. New analysis shows that when SmartFilter blocks a webpage, it is a wrong block one time in twenty (4.56%-5.24%). SmartFilter still blocks many sites we identified, including the SAFE website. Pornography, pro-drug information, and bomb-making instructions are accessible whether SmartFilter is used or not.


Introduction

In March 1999, we issued a report analyzing censorware performance in 54 million web accesses in the state of Utah. Our report shows unequivocally that the software in question, SmartFilter, sold by Secure Computing Corporation, banned many webpages which should never have been banned.

Moreover, the types of errors make it obvious that most of their blacklist has never seen effective human review. When the homepage of one Mr. Walter Wager is blocked as "gambling," that means a computer is making the decisions about what to block, not a person. We found it insulting that the Utah Education Network thought so little of their students and library patrons that they would use robots to decide what was OK to read.

We published our report and included a list of the sites which were banned. We also put forth an offer to send, on CD-ROM, for only the cost of duplication, a complete copy of our original data set - this was the full disclosure which ensured that our facts could be checked by anyone willing to sort through, as we did, many gigabytes of raw data.

On Friday, June 18, 1999 - three months later - Secure Computing Corporation met with Sen. John McCain (R-Ariz.). In conjunction with Sen. McCain's visit, the company put forth a grossly misleading press release which claimed that our report had proven their software 99.9994% accurate.

"The comprehensiveness of the research of The CensorWare Project and its subsequent findings validate just how valuable a tool SmartFilter can be for schools, businesses and any organization that is looking to implement an Internet Use Management Policy," said Gus Maldonado, product marketing manager at Secure Computing.

Today, the Senate Commerce Committee approved Sen. McCain's filtering bill (S.97), known as the "Children's Internet Protection Act," which mandates the purchase of censorware with your tax money. Censorware will be required in every school and library which receives E-Rate funds. To give an idea of what these costs will be, the Utah Education Network spends $20,000 per year on software alone. Their 1999 budget includes $124,000 for hardware, and these figures do not include staff salaries.

Secure Computing must be eagerly anticipating the explosive growth in their market.

Here is what Secure Computing did not do:


  • They did not contest any of the facts in our report.
  • They did not request the original data on CD-ROM. (In fact, to this day nobody outside our project has requested it, so we can safely say that Secure Computing has never seen the data.)
  • And they did not consult us before or after misappropriating our group's research for their own purposes.

The figure "99.9994%" was arrived at through a ridiculous procedure which consisted essentially of finding the smallest number mentioned in our report and dividing it by the largest. It bears no relationship to reality.

If this number "99.9994%" were the work of Secure Computing's computer programmers, we might consider it a miracle that their software works at all. However, this number - an astonishing claimed error rate of 0.0006% - was surely created by their marketing department for one purpose only: to solidify Sen. McCain's views that censorware should be forced, by law, into schools and libraries across the country.

The Censorware Project strongly protests this cynical and fraudulent distortion of our work.

One in Twenty

After Secure Computing's press release, we went back to our original data. Our original report included over a dozen charts and graphs showing, for example, when the busiest times of day were for the internet in Utah, and whether students or library patrons were more likely to try to access banned sites. (Student usage was approximately twenty times heavier than library usage, but library patrons were twice as likely to hit a banned site. Which is not necessarily, as we shall see, a properly-banned site.)

One figure which we did not bother to calculate or publish was the ratio of "correctly" banned accesses to "wrongly" banned accesses. That's what this followup is for.

The terms are a little fuzzy, of course, and no two people will draw the dividing line between "correct" and "wrong" in exactly the same place. For example, many public libraries include Playboy as a magazine that can be read in the reference section. The website playboy.com is less racy than the print magazine; does that mean it should remain unbanned? One could make a case for this - but one could also point out that the list of banned accesses also includes schools, and while a student doing a report on Coppola movies might like to read an interview with James Caan, most people would not have a problem with this material being censored.

Perhaps this was the most important figure we should have published, because it gives some idea of how effective, in operation, the software is. It's important to remember that any censorship in the public arena is too much; but it may be more persuasive to put numbers on the harm done.

We have used our best judgement in estimating wrongly banned accesses - and then we stepped back to consider only those bans which would be inappropriate even in a school setting. For example, it is a violation of the First and Fourteenth Amendments to restrict adults from reading playboy.com or a lingerie website in a public library (according to Judge Brinkema), but we have not counted such blocks in the statistics that follow.

In Utah, for every 22 times SmartFilter "correctly" blocked someone from accessing a webpage, there was one "wrongly" blocked access. In other words, the overblocking error rate was about 5%.

There are several caveats that apply.

First, this includes only overblocking errors, not underblocking. We have only examined the 174,000 accesses which were blocked; nobody has time to examine the 54,000,000 accesses which were let through. One must presume that there is a certain amount of pornography which was viewed despite SmartFilter's best efforts, but we cannot know how much. Our report can therefore only identify a minimum amount of error in the product.

Second, this also does not include all overblocking errors. As the Methodology page made clear, it would have been prohibitively difficult to examine all 174,000 blocked accesses. We surely missed many. Again, the figures we cite refer to the minimum, known error.

Third, this does not include the sites which were overridden by the Utah Education Network. One of the sites wrongly blocked by SmartFilter was mormon.com, obviously a popular site in Utah. Why should Secure Computing should be off the hook for this improper block? It was in their blacklist and remains in their blacklist to this day. Counting these accesses would raise the error rate from 1 in 22 to 1 in 19. Either way, it's close enough: we'll call it five percent.

Statistics

Here are the details of how the 5% figure was arrived at - and how Secure Computing's 0.0006% figure was concocted.

The 31 days of logs which we obtained listed approximately 54,000,000 web accesses and attempted accesses. These are raw hits, where not only each page counts as an access but each GIF and JPEG on each page, as well as other supplementary types of data. Of these, the vast majority, 99.55%, were not affected by SmartFilter. The remaining 237,000 web accesses were blocked (including overridden blocks).

But raw hits are not an informative metric. We're not interested in how many computer accesses were blocked, but rather the number of times a human was prevented from seeing a webpage. We have removed supplementary data by simply discarding all URLs that end in ".gif", ".jpg", or ".jpeg". This is not a foolproof method, but is a reasonably accurate way to remove this data.

As our original report points out, many of these blocked images were banner ads which the software assumed to be sexually explicit. The assumption was wrong as often as not. If we had wanted to inflate our number of "wrong blocks," we would have included these blocked images, because four of the five top blocked-image domains are nonpornographic (chathouse.com, geocities.com, infoseek.com, and mormon.com). Those four alone account for 40% of blocked images, already eight times the rate of wrongly-blocked pages! Removing these from consideration ensures that each act of censorship which we refer to is an actual page blocked, not a failure to load an advertisement or other image.

After images are discarded, 122,700 webpages remain which were blocked. (Of those, overridden blocks numbered under 1000, almost all at mormon.com.)

Checking this list of 122,700 webpages blocked against our list of wrongly blocked directories reveals that, in 5,601 cases, the block was applied wrongly. Thus, one in 21.9 blocks was performed wrongly.

Including the capricious block on mormon.com, which was overridden no thanks to Secure Computing, would raise the total to 6,434, or one in 19.1.

Depending on whether you give Secure Computing the benefit of the doubt, this is an error rate of either 5.24% or 4.56%.

To put it another way: while school is in session, a child somewhere in Utah is banned by our government from accessing a valuable website every 99 seconds.

Damn Lies

Secure Computing's press release math works differently. They did not begin with the figure of 5,601 wrong blocks (which was not published in our first report but was obtainable from the raw data we offered). Instead they took the list of wrongly blocked sites which we published. This was a little over 300 unique sites, essentially the same as the 5,601 figure with all duplicates removed.

Then the number 300 (no duplicates, wrong blocks, no GIFs) was divided into the original number of 54,000,000 (many duplicates, 99% no blocks at all, GIFs). The resulting figure, which they published, was meaningless.

To illustrate this, consider what the Secure Computing "error rate" would be for a censorware product that blocked no URLs. Because only overblocking is counted, the fact that the software lets through every bit of unwanted material is irrelevant. The error rating their marketing department would assign: 0%.

Or, consider if they released a product that blocked the entire web. This would require only one entry in the blacklist: ban everything that begins with "http"! Secure Computing's marketing department would call this an error rate of 0.000002%.

Overblocking

Astonishingly, Secure Computing has not carefully reexamined its blacklist in the light of our March report. The following websites which we reported blocked in September/October 1998 are still blocked to this day:

After three months, Secure Computing cannot clean up 300 bad blocks which have been handed to them on a silver platter. So how can they be expected to maintain their blacklist on a day-to-day basis? One million new URLs are added to the world wide web every day.

Underblocking

We mentioned that the error rate of 5% does not include underblocking. Judging by how easy it was to access pornography with the trial version of the software installed - it uses the same blacklist as the real version and had been set to the same parameters as the Utah Education Network - we would guess that Utah students and library patrons were, and are, easily able to get around the blocking.

We searched on "sex" on a major search engine and started clicking down the list of hits. By hit #9, we were looking at "Amanda's Gallery," a page of explicit photos from "Amanda's Senior Year in High School" where she apparently spent a great deal of time with her clothes off. "Upskirts, freecam, sex diary" - all were allowed by SmartFilter.

To illustrate the problems that censorware manufacturers have, we went back to the Wiretap archive. This archive contains public domain text like George Washington's Farewell Address, Mark Twain's Tom Sawyer, and most of Shakespeare's tragedies - all of which were blocked in September. Now that it is unblocked, SmartFilter is happy to show us "how-to" instructions on having sex with a horse, making drugs, and even building an atomic bomb.

The pro-censorware group "Filtering Facts," which apparently consists of one David Burt, had identified a number of blocks in our March report which we had listed as wrong and he thought were correct. (Just to be on the safe side, in this followup report we have excluded all the blocks which he identified, and a few more besides. This conservatism on our part affected the numbers almost not at all: roughly one-tenth of a percentage point.)

Curiously, SmartFilter unblocked several of the sites which Filtering Facts had identified.

Filtering Facts calls Chatropolis as "sexually graphic adult chat rooms [containing] such areas as 'Analopolis- Anal Sex Chat', [and] 'Men in Lingerie-Cross Dressing Chat'." Using SmartFilter, we successfully accessed the Anal Sex Chat at http://www.chatropolis.com/rooms/analopolis.html. This included an explicit photo in an advertisement reading "Shower me with your cum, Live Girls 24/7."

Filtering Facts points out that http://www.bme.freeq.com/ "contains a whole gallery of graphic, close-up color photos of mutilated genitalia." This is true and we thank Mr. Burt for pointing it out: so why does SmartFilter now allow us to view full-color closeups of shaved genitals at http://bme.freeq.com/pierce/10-female/clit/?

Filtering Facts says that http://www.queernet.org/ should be blocked because it offers "'the doghouse - Dogslaves and their Owner/Trainers', and 'leatherpage - Discussion of same-sex bondage and S&M interactions.'" Actually, these are links to mailing lists on other sites; QueerNet itself hosts material more like "rppa - Rainbow Pride Performing Arts of San Jose" and "orthodykes - A list for Orthodox Jewish lesbians." So in this case, SmartFilter is correct not to block the site. Anymore.

Filtering Facts is horrified by eidos.org, which "offers such offerings [sic] as 'Memorable Experiences with Sexy Ladies of Porn, Part II', 'Barely Legal Nymphos'." We should point out that the magazine Eidos seems to be text-only (except for nonexplicit cover photos) and text of the kind that should not be censored. But we didn't look very closely at the site, because it's no longer blocked, so whatever the "Barely Legal Nymphos" are, they're accessible by Utah schools.

Despite the inaccuracies in Filtering Facts' analysis, we would like to thank Mr. Burt for calling these sites to our attention.

Conclusion

We should emphasize that we have no reason to believe SmartFilter is any better, or any worse, than other censorware products. All censorware is subject to the same laws of nature that forbid them from having effective oversight of their own blacklists. There is just too much data to process.

This is the inevitable result of the exponential growth of the web, and in report after report, we have confirmed it to be true in practice: censorware is censorship by robot. It simply cannot do what it promises. This is why the Bible gets banned right alongside birthcontrol.com.

Where Secure Computing has distinguished itself is in the cynical and manipulative use of our criticism. We were grimly unsurprised when their first response was to ban our website in all 27 categories. Censorship is their stock in trade, so it would logically be their first response.

But, three months later, to concoct phony numbers based on our work, without contacting us, for the purpose of impressing Sen. McCain and thus encouraging legislation to increase their market, is disgraceful and unfair. We are volunteers who work out of pocket and have received not a dime in compensation. We try to spread our message because we believe in free speech and want to live in a country where decent (self-)education is still possible for all citizens, not just those who can afford computers.

We strongly protest the misuse of our good name, and call upon Secure Computing to retract its misleading press release.

our press release for this followup report

the original Utah report

On June 21, 1999, James S. Tyre faxed to Senator McCain a letter outlining much of the information in the report.

This document last updated on Thursday September 07 2000.


Copyright © 1999 by the Censorware Project.
Redistribute freely in appropriate forums for non-profit uses only.
Contact information.
Censorware.Ørg.