Top
Past Meeting Archive Los Angeles ACM home page National ACM home page Click here for More Activities this month
Check out the Southern California Tech Calendar

Meeting of the
Los Angeles Chapter of ACM

Wednesday, December 4, 2002

CHALLENGES FOR ALL OF US

Dr. Arnold Goodman
Associate Director, UCI Center for Statistical Consulting

Our world is increasingly overwhelmed by huge amounts of complex data, waiting for productive methods to convert them into useful information and then meaningful knowledge. Discovering such knowledge from data requires that patterns and findings mined from data be developed into predictive conclusions: to serve most practical purposes, suggest the knowledge demanded by more important or scientific purposes, and then facilitate evaluation in the world beyond data.

Teams face many chalenges. The FUNDAMENTAL CHALLENGE is for results to work almost all (not only some) of the time, account for any uncertainties outside (as well as inside) the data, and add new and valuable knowledge to the client's area. The COLLABORATION CHALLENGE is for data miners, statisticians and clients to all recognize a joint dependence, and for each to widen his focus until harmonious collaboration becomes possible. The ALLOCATION CHALLENGE is balancing the effort devoted to analysis inside database with effort devoted to evaluation outside database in the client's environment, difficult though it might be to actually accomplish. The LEADERSHIP CHALLENGE is creating an atmosphere that welcomes this critical external knowledge: in order to prevent the reluctance of data miners, statisticians and clients to leave the safety of their own expertise, from causing them to miss both the opportunity of learning from each other and opportunity of achieving a greater success through serious collaboration.

Uncertainties inherent in both the collection and processing of data need to be accounted for by prediction confidence intervals, hypothesis test levels and serious knowledge evaluations. Data mining and statistics will probably grow toward each other in the next 10 years since data mining cannot succeed at knowledge discovery without statistical thinking, statistics on new massive and complex datasets cannot succeed fully without data mining approaches, and both will probably be driven by clients to work closely together to justify the increasing costs of data collection, management, analysis and realistic evaluation.

After leaving Stanford with a PhD in Mathematical Statistics, Dr. Goodman spent 35 years working with information technology (aka data processing) in aerospace, petroleum and then county government. He has been responsible for problem solving, planning, performance evaluation, management science, computer chargeback and computer capacity planning. He, along with Sal Polick, were responsable for starting Interface, a confrence for the interface of Statistics and Computer Science. In addition, he has been involved in the Quality Movement since attending Ed Deming's very first 4-day seminar.

~Summary~

LA ACM Chapter December Meeting.
Held Wednesday, December 4, 2002.

The presentation was "Data Mining Needs Statistics to Discover Knowledge" a presentation by Dr. Arnold Goodman, Associate Director, UCI Center for Statistical Consulting. This was a regular meeting of the Los Angeles Chapter of ACM and was also a joint meeting with the Los Angeles Chapters of SIGSOFT and SIGCHI.

The basic challenge presented by Dr. Goodman was that of applying statistics to data mining to get useful information and meaningful knowledge. A problem is that Data Miners and Statisticians have different cultures and sometimes don't show too much respect for each other. Interactions have increased and at the meeting Interface 2001, which was a meeting showing how statistics interfaces with the world, there were 12 Data Mining sessions and a full day of bioinformatics. However Data Miners are still giving little respect to statisticians and are still telling the same jokes about them at recent meetings. Dr. Goodman noted that after many years people are still making the same mistakes, so he started thinking about how to meet this kind of challenge. Discovering knowledge rests on three balanced legs of computer science, statistics and the client area and won't stand on any one of them or even three unbalanced legs. A serious commitment to real collaboration from all three is required.

One method of doing this is to develop a checklist to identify things that lead to success, organize them into groups and establish priorities. Dr. Goodman provided handouts with a table presenting a checklist and scorecard for finding value hidden in databases. The maturity of this system ranged from infancy where most emphasis was on operating on very large and complex databases, through childhood with data mining procedures and statistics that aided making decisions on starting and stopping processing, into youth where patterns are exploited by prediction software and statistics are used to provide good models. It is hoped that the field can advance to maturity where there are very good models that have been proven to be valid, reliable and true almost all of the time, but we aren't there yet. Dr. Goodman noted that it is better to have a rather crude model that is vaguely correct than a more elaborate model that is precisely wrong. He presented another table, a checklist and scorecard for evaluating an analysis. He said that in earlier days of computing most practitioners were very conscious of the need to validate things, but that doesn't seem to be as true today. He pointed out some key statements at the bottom of the checklist and scorecard table:

Even huge and complex databases are actually no more than samples from even more huge, complex populations.

Questioning the answers is as important as answering the questions when seeking findings, conclusions and meaningful knowledge.

The more a solution is separated from the problem and data, the more it needs evaluation outside the problem, data and analysis.

Those intelligent enough to develop and implement complex analyses should be smart enough to evaluate them appropriately.

Evaluation is as needed in using solutions beyond their problems and data as confidence intervals and hypothesis testing.

No amount of mathematics, no amount of methodology, no computer algorithm relieves a professional of the obligation to think critically about what he is doing. No professional should fall in love with his model.

It is important to use information in actual areas to exploit actual properties. The proof of the pudding is in the eating, not in the cooking process. It is most important to define the client's problem and solve his problem, not just an earlier problem for which you have a solution. Solve the client's problem even if you have to change your methodology, don't just tell him what you cannot do. Do keep the client informed of the risk involved in the approach that is taken. Learn enough about the client's operation to capture the essence of his problem, but don't overly complicate it. You have to know when to stop.

When you begin, evaluate what it will take to solve the problem and investigate the resources required and whether or not the costs will fall within the available budget. During operations compare actual behavior with what was expected to happen. Always communicate with everybody and talk in their language, don't make them use your terminology. Try to add value to the client's world, talk like a friend, and think as if you were the client. Dr. Goodman presented a table with scorecards for four case studies of collaborative projects with consulting scores of .55 to .70 for the projects where a score of 1.0 would be the best possible score. This indicates that while things were improving, they are still some distance from perfection.

You can reach Dr. Goodman at: agoodman@uci.edu

I found quite a few references to Dr. Goodman by doing a Google search using: Arnold+Goodman+Statistics. As usual, you pick up inapplicable references, but you can find quite a few appropriate ones that highlight Dr. Goodman's very impressive background.

Dr. Goodman gave an excellent presentation that was very well received by those in attendance at the meeting. Much of the material was in the printed handouts provided at the meeting. Once again, to get real advantage of a Los Angeles Chapter of ACM meeting you had to be there as what was presented cannot be covered in a relatively short article

This was the fourth meeting of the LA Chapter year and was attended by 18 persons.
Mike Walsh, LA ACM Secretary

 

The first meeting of 2003 will be on Wednesday, January 8th. The speaker is still being arranged; check back later for information about the program. Come celebrate the new year with us.
Join us


The Los Angeles Chapter normally meets the first Wednesday of each month at the Ramada Hotel, 6333 Bristol Parkway, Culver City. The program begins at 8 PM.   From the San Diego Freeway (405) take the Sepulveda/Centinela exit southbound or the Slauson/Sepulveda exit northbound.

5:15 p.m.  Business Meeting

6:30 p.m. Cocktails/Social

7:00 p.m. Dinner

8:00 p.m.  Presentation

 

Reservations

To make a reservation, call or e-mail John Halbur, (310) 375-7037, and indicate your choice of entree, by Sunday before the dinner meeting.

There is no charge or reservation required to attend the presentation at 8:00 p.m.. Parking is FREE!

For membership information, contact Mike Walsh, (818)785-5056 or follow this link.


Other Affiliated groups

SIGAda   SIGCHI SIGGRAPH  SIGPLAN

****************
LA SIGAda

Return to "More"

****************

LA  SIGGRAPH

Please visit our website for meeting dates, and news of upcoming events.

For further details contact the SIGPHONE at (310) 288-1148 or at Los_Angeles_Chapter@siggraph.org, or www.siggraph.org/chapters/los_angeles

Return to "More"

****************

Past Meeting Archive Los Angeles ACM home page National ACM home page Top

 Last revision: 2002 1221 - [ Webmaster ]