Neuroshell Easy: A Review
Dateline: March 22, 1998
YOU keep reading in my articles about neural nets, and how powerful they are, and about specific applications that use neural net techniques (for detecting credit card fraud, for example). When I discovered a general-purpose neural net program, Neuroshell, that could be applied to almost any situation where one is looking to make predictions, I figured this was worth some extra attention, so I asked the company producing the program for a review copy. They graciously complied.
In fact, they sent me not one but two of their programs: Neuroshell Easy Predictor, and Neuroshell Easy Classifier.
A Dataset, A Dataset; My Kingdom for a Dataset
The review has been in the works for a month now. I tried hard to get hold of some datasets so I could put Neuroshell through its paces. I had initially encouraging indications of readiness to supply datasets, from several sources including a state Medicaid director and an economist friend who has maintained detailed basketball stats for years. Alas, they let me down.
I next did a Yahoo search on "dataset" but after a couple of hours surfing did not find anything that would fit the bill. A UK site had what would have been a great dataset, but it was available only in SPSS (a heavy-duty statistics program) system file format, and I lost access to the SPSS program when I finished my Master's degree. I'm sure there probably are some suitable datasets out there in text format, but when you've already spent two hours looking, you start to wonder if it will take another two.
I toyed briefly with setting up a dataset from scratch, copying data from the UK soccer league tables into Excel so I could run it through Predictor to predict winning/losing teams, but (as my senses told me going in) choosing the right input variables, finding data for those variables for each team, etc., was a major task and well beyond my resources of time. After an hour trying to copy and paste bits of data from a soccer league Web site to Excel, I decided this would take too long.
I finally had to decide whether to put you all off for yet another week, possibly even longer, in hopes my sources would pull through; or review Neuroshell on the basis of the example datasets it comes bundled with. The latter are very small and limited, but I decided what the heck, they are enough to demonstrate that the software does what it sets out to do, and that answers one of the key questions any software review should answer.
Other key questions include how easy is it to set up and how easy is it to use? We'll deal with these issues first, then take a look at what Neuroshell sets out to do and how well it does it.
Setup
The programs installed easily and painlessly, as one ought to expect these days (but is sometimes disappointed). Because they are 32-bit programs, they will not run on Windows 3.x, only on Windows95 and NT (3.51 and 4.0). I run NT 4.0 Workstation on an IBM PC 300 XL with 64 megabytes of RAM, a Pentium II/266 MHz, and oodles of disk space (except that IBM, in its wisdom, shipped the machine with the drive partitioned into three: C, D, and E drives. They allotted only half a gigabyte to C, and despite my efforts to put all new programs and data files into the D drive, C has filled up with stuff I can't see an easy way to get rid of without causing all sorts of problems.)
But that's my problem, and it didn't interfere with installing and using Neuroshell.
Use
Both Predictor and Classifier come with the same neural network tutorial, and it is excellent, if you want to know how the software's underlying engine works. It uses animated graphics and short, concise textual explanations to describe and explain the process of how a neural net learns. (This can be downloaded from Neuroshell's Web site at no charge.)
The tutorial is not intended to teach you how to use the programs (Predictor and Classifier) themselves. That tutorial is cleverly built into the programs. The top portion of the screen contains guidance from an "instructor" on what to do next at every step of the way. Once you grow familiar with the programs, you can turn the instructor off, but it is always ready to be called up again with a mouse click if you need it. The Help system, which is integral to the Instructor but which can also be invoked for any topic, is also excellent.
The first step is to load a dataset. Datasets must be in comma, space, or tab-delimited text format. Most statistics, database, and spreadsheet programs can save data in those formats, so Neuroshell avoids the hassle of working to proprietary file types. (Of course, if you have an SPSS system file but not the SPSS program that would convert it to a text file, you're out of luck, but I'm probably unique in being in that situation.) I, of course, elected to load one of the example files provided with the program. It contains data from a hamburger joint, whose owner wants to predict monthly sales based on the season, number of special events scheduled in the locality, the number of ads she has placed, and the cost of those ads. Among other things, she can use the prediction to make adjustments in her advertising budget.
You then have to tell the program whether you want to train a new neural net using this data, or apply an already trained neural net to make predictions from the data. If your dataset is large enough (enough cases), you can do both by selected just a subset of cases for learning and applying the result to the remaining cases.
How large is "large enough?" There
may be a formula for it, as for example with survey sample size, but I
don't know. Common sense would say that the more cases used in learning,
the more learned the neural net will be! The burger file has only 24 cases,
so I decided to establish three neural nets using, respectively, 6, 12,
and 18 cases ("rows" in the spreadsheet), and seeing how well the program
predicted the remaining cases.
|
|
|
|
||
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I think what this is really showing us is that (1) the net does get better with more cases but (2) the number of cases needs to be many more than 12 or 18. (Neuroshell produces tables like this, and also a graph, which can be easily printed or copied into another program, as I did here. I joined three tables together and changed the column headings slightly.)
Having got this far and "trained the net," one can save the trained net for future use in applying it to new data. It's that simple.
Compared to analyzing data using standard statistical methods, Neuroshell is a breeze. Programs such as SPSS are dauntingly difficult even if you have had training in statistics—and in fact you really cannot use SPSS without first understanding statistics. Not so with Neuroshell. If you think you know what factors help determine an outcome, just capture those factors in a spreadsheet along with some actual outcomes and let Neuroshell do the rest.
Classifier
The difference between Predictor and Classifier seems somewhat fine and arbitrary. Predictor will give you a unique result for every case. For example: if it's a Summer Sunday, there are two local events that day, and you've spent $100 on an ad in the Sunday newspaper, you can expect to sell (say) 357 burgers. Classifier works a bit like factor and discriminant analysis, in which results are clustered into fewer outcomes or "classes" or "categories."
For example, you might want to classify people who have undergone screening for skin cancer into those that have benign and those that have malignant cancer. Trained on input ("independent," in standard research methodology) variables such as skin type, results of blood test, etc., a Classifier neural net can determine whether a given individual's skin cancer is likely to be benign or malignant.
Using Classifier is practically identical to using Predictor. The interface is the same, and the actions you take—specifying the data file, the input (independent) variables, and the output (dependent) variable—are the same.
After training, a sample dataset
of 50 cases supplied with Classifier correctly classified all 50
cases, as the following table shows:
|
|
|
|
|
| Classified benign |
|
|
|
| Classified malignant |
|
|
|
| Total |
|
|
|
The Bottom Line
Neuroshell is easy, useful, and powerful. It may not always replace standard statistical approaches, but in most cases it could at least be used as an adjunct analysis. If I were doing regressions, factor, or discriminant analyses I would definitely run the data also through Neuroshell to compare results. Doing so is quick and painless, and the software does not cost much ($395), relative to the cost of stat packages like SPSS and SAS.
Easy Predictor and Easy Classifier are not the only products from Ward Systems Group. They sell heavier-duty versions of these programs plus Genehunter, a genetic algorithm program that does the same sort of thing only using a different method. In fact, Predictor and Classifier also come with a genetic algorithm that can be used instead of or in addition to the neural net. Ward also sells versions of its software pre-optimized for a given function. For example, Neuroshell Trader is optimized for predicting the stock market.
One regular visitor to this site tells me he has used Neuroshell successfully to predict the stock market, lotteries, and horse racing, and says he has become independently wealthy as a result. He anticipated my scepticism with respect to predicting lotteries and says there is less randomness than is generally supposed. I remain sceptical, but I guess the only way to find out would be to try it!
Until Neuroshell, I had not seen a general purpose neural net program usable by just about anybody. Others I have seen required a deep understanding of neural net technology; Neuroshell does not. I commend it to your attention.
For more information, visit Ward's
Website.
Until
next week,
NEXT WEEK: More IMP and EEEK.
Help Wanted: Got questions or comments on this article or on any other AI-related subject under the sun? Post it in the AIBB!