Lies, Damn Lies, and Statistics
A followup to our March 1999 Utah SmartFilter report
by Jamie McCarthy
for
the Censorware Project
Abstract
The makers of SmartFilter issued a press release falsely claiming
we proved its accuracy. This fabrication was used to help solidify the
views of Sen. McCain, whose mandatory-censorware bill
(S.97)
has just been approved by committee; when it passes it will ensure
your taxes pay for this software in schools and libraries. New
analysis shows that when SmartFilter blocks a webpage, it is a wrong
block one time in twenty
(4.56%-5.24%). SmartFilter still blocks many sites we identified,
including the
SAFE website.
Pornography, pro-drug information, and bomb-making instructions are
accessible whether SmartFilter is used or not.
Introduction
In March 1999, we issued a
report
analyzing censorware performance in 54 million web accesses in the
state of Utah. Our report shows unequivocally that the software in
question, SmartFilter, sold by Secure Computing Corporation, banned
many webpages which should never have been banned.
Moreover, the types of errors make it obvious that most of their
blacklist has never seen effective human review. When the homepage of
one
Mr. Walter Wager
is blocked as "gambling," that means a computer is making the
decisions about what to block, not a person. We found it insulting
that the Utah Education Network thought so little of their students
and library patrons that they would use robots to decide what was OK
to read.
We published our report and included a list of the sites which were
banned. We also put forth an offer to send, on CD-ROM, for only the
cost of duplication, a complete copy of our original data set - this
was the full disclosure which ensured that our facts could be checked
by anyone willing to sort through, as we did, many gigabytes of raw
data.
On Friday, June 18, 1999 - three months later - Secure Computing
Corporation
met with
Sen. John McCain (R-Ariz.).
In conjunction with Sen. McCain's visit, the company put forth a
grossly
misleading press release
which claimed that our report had proven their software 99.9994%
accurate.
"The comprehensiveness of the research of The CensorWare Project
and its subsequent findings validate just how valuable a tool
SmartFilter can be for schools, businesses and any organization that is
looking to implement an Internet Use Management Policy," said Gus
Maldonado, product marketing manager at Secure Computing.
Today, the Senate Commerce Committee approved Sen. McCain's
filtering bill
(S.97),
known as the "Children's Internet Protection Act," which mandates the
purchase of censorware with your tax money. Censorware will be
required in every school and library which receives E-Rate funds. To
give an idea of what these costs will be, the Utah Education Network
spends $20,000 per year on software alone. Their 1999 budget includes
$124,000 for hardware, and these figures do not include staff
salaries.
Secure Computing must be eagerly anticipating the explosive growth
in their market.
Here is what Secure Computing did not do:
- They did not contest any of the facts in our report.
- They did not request the original data on CD-ROM. (In fact, to
this day nobody outside our project has requested it, so we can safely
say that Secure Computing has never seen the data.)
- And they did not consult us before or after misappropriating our
group's research for their own purposes.
The figure "99.9994%" was arrived at through a ridiculous procedure
which consisted essentially of finding the smallest number mentioned in
our report and dividing it by the largest. It bears no relationship to
reality.
If this number "99.9994%" were the work of Secure Computing's
computer programmers, we might consider it a miracle that their
software works at all. However, this number - an astonishing claimed
error rate of 0.0006% - was surely created by their marketing
department for one purpose only: to solidify Sen. McCain's views
that censorware should be forced, by law, into schools and libraries
across the country.
The Censorware Project strongly protests this cynical and
fraudulent distortion of our work.
One in Twenty
After Secure Computing's press release, we went back to our
original data. Our original report included over a dozen charts and
graphs showing, for example, when the busiest times of day were for
the internet in Utah, and whether students or library patrons were
more likely to try to access banned sites. (Student usage was
approximately twenty times heavier than library usage, but library
patrons were twice as likely to hit a banned site. Which is not
necessarily, as we shall see, a properly-banned site.)
One figure which we did not bother to calculate or publish was the
ratio of "correctly" banned accesses to "wrongly" banned accesses.
That's what this followup is for.
The terms are a little fuzzy, of course, and no two people will draw
the dividing line between "correct" and "wrong" in exactly the same
place. For example, many public libraries include Playboy as a magazine
that can be read in the reference section. The website playboy.com is
less racy than the print magazine; does that mean it should remain
unbanned? One could make a case for this - but one could also point
out that the list of banned accesses also includes schools, and while a
student doing a report on Coppola movies might like to read
an
interview with James Caan,
most people would not have a problem with this material being censored.
Perhaps this was the most important figure we should have published,
because it gives some idea of how effective, in operation, the
software is. It's important to remember that any censorship in the
public arena is too much; but it may be more persuasive to put numbers
on the harm done.
We have used our best judgement in estimating wrongly banned
accesses - and then we stepped back to consider only those bans which
would be inappropriate even in a school setting. For example, it is a
violation of the First and Fourteenth Amendments to restrict adults
from reading
playboy.com
or a
lingerie website
in a public library (according to
Judge Brinkema),
but we have not counted such blocks in the statistics that follow.
In Utah, for every 22 times SmartFilter "correctly" blocked someone
from accessing a webpage, there was one "wrongly" blocked access. In
other words, the overblocking error rate was about 5%.
There are several caveats that apply.
First, this includes only overblocking errors, not
underblocking. We have only examined the 174,000 accesses which were
blocked; nobody has time to examine the 54,000,000 accesses which were
let through. One must presume that there is a certain amount of
pornography which was viewed despite SmartFilter's best efforts, but we
cannot know how much. Our report can therefore only identify a
minimum amount of error in the product.
Second, this also does not include all overblocking errors. As the
Methodology
page made clear, it would have been prohibitively difficult to examine
all 174,000 blocked accesses. We surely missed many. Again, the figures
we cite refer to the minimum, known error.
Third, this does not include the sites which were overridden by
the Utah Education Network. One of the sites wrongly blocked by
SmartFilter was
mormon.com,
obviously a popular site in Utah. Why should Secure Computing should
be off the hook for this improper block? It was in their blacklist and
remains in their blacklist to this day. Counting these accesses would
raise the error rate from 1 in 22 to 1 in 19.
Either way, it's close enough: we'll call it five percent.
Statistics
Here are the details of how the 5% figure was arrived at - and
how Secure Computing's 0.0006% figure was concocted.
The 31 days of logs which we obtained listed approximately
54,000,000 web accesses and attempted accesses. These are raw hits,
where not only each page counts as an access but each GIF and JPEG on
each page, as well as other supplementary types of data. Of these, the
vast majority, 99.55%, were not affected by SmartFilter. The remaining
237,000 web accesses were blocked (including overridden blocks).
But raw hits are not an informative metric. We're not interested in
how many computer accesses were blocked, but rather the number of
times a human was prevented from seeing a webpage. We have removed
supplementary data by simply discarding all URLs that end in ".gif",
".jpg", or ".jpeg". This is not a foolproof method, but is a
reasonably accurate way to remove this data.
As our original report points out, many of these blocked images
were banner ads which the software assumed to be sexually explicit.
The assumption was wrong as often as not. If we had wanted to inflate
our number of "wrong blocks," we would have included these blocked
images, because four of the five top blocked-image domains are
nonpornographic (chathouse.com, geocities.com, infoseek.com, and
mormon.com). Those four alone account for 40% of blocked images,
already eight times the rate of wrongly-blocked pages! Removing these
from consideration ensures that each act of censorship which we refer
to is an actual page blocked, not a failure to load an advertisement
or other image.
After images are discarded, 122,700 webpages remain which were
blocked. (Of those, overridden blocks numbered under 1000, almost all
at mormon.com.)
Checking this list of 122,700 webpages blocked against our list of
wrongly blocked directories reveals that, in 5,601 cases, the block was
applied wrongly. Thus, one in 21.9 blocks was performed wrongly.
Including the capricious block on mormon.com, which was overridden
no thanks to Secure Computing, would raise the total to 6,434, or one
in 19.1.
Depending on whether you give Secure Computing the benefit of the
doubt, this is an error rate of either 5.24% or 4.56%.
To put it another way: while school is in session, a child
somewhere in Utah is banned by our government from accessing a
valuable website every 99 seconds.
Damn Lies
Secure Computing's press release math works differently. They did
not begin with the figure of 5,601 wrong blocks (which was not
published in our first report but was obtainable from the raw data we
offered). Instead they took the list of wrongly blocked sites
which we published. This was a little over 300 unique sites, essentially
the same as the 5,601 figure with all duplicates removed.
Then the number 300 (no duplicates, wrong blocks, no GIFs) was
divided into the original number of 54,000,000 (many duplicates, 99% no
blocks at all, GIFs). The resulting figure, which they published, was
meaningless.
To illustrate this, consider what the Secure Computing "error rate"
would be for a censorware product that blocked no URLs.
Because only overblocking is counted, the fact that the software
lets through every bit of unwanted material is irrelevant. The error
rating their marketing department would assign: 0%.
Or, consider if they released a product that blocked the entire web.
This would require only one entry in the blacklist: ban everything that
begins with "http"! Secure Computing's marketing department would call
this an error rate of 0.000002%.
Overblocking
Astonishingly, Secure Computing has not carefully reexamined its
blacklist in the light of our March report. The following websites
which we reported blocked in September/October 1998 are still blocked
to this day:
After three months, Secure Computing cannot clean up 300 bad blocks
which have been handed to them on a silver platter. So how can they be
expected to maintain their blacklist on a day-to-day basis? One
million new URLs are added to the world wide web every day.
Underblocking
We mentioned that the error rate of 5% does not include
underblocking. Judging by how easy it was to access pornography with
the trial version of the software installed - it uses the same
blacklist as the real version and had been set to the same parameters
as the Utah Education Network - we would guess that Utah students and
library patrons were, and are, easily able to get around the blocking.
We searched on "sex" on a major search engine and started clicking
down the list of hits. By hit #9, we were looking at
"Amanda's Gallery,"
a page of explicit photos from "Amanda's Senior Year in High School"
where she apparently spent a great deal of time with her clothes off.
"Upskirts, freecam, sex diary" - all were allowed by SmartFilter.
To illustrate the problems that censorware manufacturers have, we
went back to the
Wiretap archive.
This archive contains public domain text like George Washington's
Farewell Address, Mark Twain's Tom Sawyer,
and most of Shakespeare's tragedies - all of which were blocked in
September. Now that it is unblocked, SmartFilter is happy to show us
"how-to" instructions on
having
sex with a horse,
making
drugs,
and even
building
an atomic bomb.
The pro-censorware group
"Filtering Facts,"
which apparently consists of one David Burt, had
identified
a number of blocks in our March report which we had listed as wrong
and he thought were correct. (Just to be on the safe side, in this
followup report we have excluded all the blocks which he identified,
and a few more besides. This conservatism on our part affected the
numbers almost not at all: roughly one-tenth of a percentage point.)
Curiously, SmartFilter unblocked several of the sites which
Filtering Facts had identified.
Filtering Facts calls
Chatropolis
as "sexually graphic adult chat rooms [containing] such areas
as 'Analopolis- Anal Sex Chat', [and] 'Men in Lingerie-Cross
Dressing Chat'." Using SmartFilter, we successfully accessed the Anal
Sex Chat at
http://www.chatropolis.com/rooms/analopolis.html.
This included an explicit photo in an advertisement reading "Shower me
with your cum, Live Girls 24/7."
Filtering Facts points out that
http://www.bme.freeq.com/
"contains a whole gallery of graphic, close-up color photos of
mutilated genitalia." This is true and we thank Mr. Burt for pointing
it out: so why does SmartFilter now allow us to view full-color
closeups of shaved genitals at
http://bme.freeq.com/pierce/10-female/clit/?
Filtering Facts says that
http://www.queernet.org/
should be blocked because it offers "'the doghouse - Dogslaves and
their Owner/Trainers', and 'leatherpage - Discussion of same-sex
bondage and S&M interactions.'" Actually, these are links to mailing
lists on other sites; QueerNet itself hosts material more like "rppa -
Rainbow Pride Performing Arts of San Jose" and "orthodykes - A list for
Orthodox Jewish lesbians." So in this case, SmartFilter is correct not
to block the site. Anymore.
Filtering Facts is horrified by
eidos.org,
which "offers such offerings [sic] as 'Memorable Experiences
with Sexy Ladies of Porn, Part II', 'Barely Legal Nymphos'." We should
point out that the magazine Eidos seems to be text-only (except for
nonexplicit cover photos) and text of the kind that should not be
censored. But we didn't look very closely at the site, because it's no
longer blocked, so whatever the "Barely Legal Nymphos" are, they're
accessible by Utah schools.
Despite the inaccuracies in Filtering Facts' analysis, we would like
to thank Mr. Burt for calling these sites to our attention.
Conclusion
We should emphasize that we have no reason to believe SmartFilter is
any better, or any worse, than other censorware products. All
censorware is subject to the same laws of nature that forbid them from
having effective oversight of their own blacklists. There is just too
much data to process.
This is the inevitable result of the exponential growth of the web,
and in report after report, we have confirmed it to be true in
practice: censorware is censorship by robot. It simply cannot do what
it promises. This is why the Bible gets banned right alongside
birthcontrol.com.
Where Secure Computing has distinguished itself is in the cynical
and manipulative use of our criticism. We were grimly unsurprised when
their first response was to ban our website in all 27 categories.
Censorship is their stock in trade, so it would logically be their
first response.
But, three months later, to concoct phony numbers based on our work,
without contacting us, for the purpose of impressing Sen. McCain
and thus encouraging legislation to increase their market, is
disgraceful and unfair. We are volunteers who work out of pocket and
have received not a dime in compensation. We try to spread our message
because we believe in free speech and want to live in a country where
decent (self-)education is still possible for all citizens, not just
those who can afford computers.
We strongly protest the misuse of our good name, and call upon
Secure Computing to retract its misleading press release.
our press release for this followup report
the original Utah report
On June 21, 1999, James S. Tyre
faxed
to Senator McCain a letter outlining much of the information in the report.
This document last updated on Thursday September 07 2000.
Copyright © 1999 by the Censorware Project.
Redistribute freely in appropriate forums for non-profit uses only.
Contact information.
Censorware.Ørg.