Working Of Sas And Spss Modelers Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

SAS was conceive by Anthony J. Barr in 1966. As a North Carolina State University graduate student from 1962 to 1964, Barr had created an analysis of variance modelling language inspired by the notation of statisticinolan Maurice Kendall, followed by a multiple regression program that generated machine code for performing algebraic transformation of the raw data. Illustrating on those programs and his experience with structured data files, he created SAS, placing statistical procedures into a formatted file framework. From 1966 to 1968, Barr developed the fundamental structure and language of SAS.

In January 1968, Barr and Jim Goodnight collaborated, integrating new multiple regression and analysis of variance routines developed by Goodnight into Barr's framework. Goodnight's routines made the handling of basic statistical analysis more robust, and his later implementation (in SAS 76) of the general linear model increased the analytical power of the system. By 1971, SAS was gaining popularity within the academic community. One strength of the system was analyzing experiments with missing data, which was useful to the pharmaceutical and agricultural industries, among others.

In 1973, John Sall joined the project, making extensive programming contributions in econometrics, time series, and matrix algebra. Other participants in the early years included Caroll G. Perkins, Jolayne W. Service, and Jane T. Helwig. Perkins made programming contributions. Service and Helwig created the early documentation.

In 1976, SAS Institute, Inc. was incorporated by Barr, Goodnight, Sall, and Helwig.

SAS sued World Programming, the developers of a competing implementation, World Programming System, alleging that they had infringed SAS's copyright in part by implementing the same functionality. This case was referred from the United Kingdom's High Court of Justice to the European Court of Justice on 11 August, 2010. In May 2012, the European Court of Justice ruled in favor of World Programming, finding that "the functionality of a computer program and the programming language cannot be protected by copyright."


SAS 71

SAS 71 represents the first limited release of the system. The first manual for SAS was printed at this time, approximately 60 pages long. The DATA step was implemented. Regression and analysis of variance were the main uses of the program.

SAS 72

This more robust release was the first to achieve wide distribution. It included a substantial user's guide, 260 pages in length. The MERGE statement was introduced in this release, adding the ability to perform a database JOIN on two data sets. This release also introduced the comprehensive handling of missing data.

SAS 76

SAS 76 was a complete system level rewrite, featuring an open architecture for adding and extending procedures, and for extending the compiler. The INPUT and INFILE statements were significantly enhanced to read virtually all data formats in use on the IBM mainframe.[13] Report generation was added through the PUT and FILE statements. The capacity to analyze general linear models was added.


1980 saw the addition of SAS/GRAPH, a graphing component; and SAS/ETS for econometric and time-series analysis. In 1981 SAS/FSP followed, providing full-screen interactive data entry, editing, browsing, retrieval, and letter writing. In 1983 full-screen spreadsheet capabilities were introduced (PROC FSCALC). For IBM mainframes, SAS 82 no longer required SAS databases to have direct access organization ( (DSORG=DAU), because SAS 82 removed location-dependent information from databases. This permitted SAS to work with datasets on tape and other media besides disk.

Version 4 series

In the early 1980s, SAS Institute released Version 4, the first version for non-IBM computers. It was written mostly in a subset of the PL/I language, to run on several minicomputer manufacturers' operating systems and hardware: Data General's AOS/VS, Digital Equipment's VAX/VMS, and Prime Computer's PRIMOS. The version was colloquially called "Portable SAS" because most of the code was portable, i.e., the same code would run under different operating systems.

Version 6 series

Version 6 represented a major milestone for SAS. While it appeared superficially similar to the user, major changes occurred "under the hood": the software was rewritten. From its FORTRAN origins, followed by PL/I and mainframe assembly language; in version 6 SAS was rewritten in C, to provide enhanced portability between operating systems, as well as access to an increasing pool of C programmers compared to the shrinking pool of PL/I programmers. This was the first version to run on UNIX, MS-DOS and Windows platforms. The DOS versions were incomplete implementations of the Version 6 spec: some functions and formats were unavailable, as were SQL and related items such as indexing and WHERE subsetting. DOS memory limitations restricted the size of some user-defined items. The mainframe version of SAS 6 changed the physical format of SAS databases from "direct files" (DSORG=DA) to standard blocked physical sequential files (DSORG=PS,RECFM=FS) with a customized EXCP macro instead of BSAM, QSAM or previously BDAM which was used through version 5 until the complete rewrite of version 6. The practical benefit of this change is that a SAS 6 database can be copied from any media with any copying tool including IEBGENER - which uses BSAM. In 1984 a project management component was added (SAS/PROJECT). In 1985 SAS/AF software, econometrics and time series analysis (SAS/ETS) component, and interactive matrix programming (SAS/IML) software was introduced. MS-DOS SAS (version 6.02) was introduced, along with a link to mainframe SAS. In 1986 Statistical quality improvement component is added (SAS/QC software); SAS/IML and SAS/STAT software is released for personal computers. 1987 saw concurrent update access provided for SAS data sets with SAS/SHARE software. Database interfaces are introduced for DB2 and SQL-DS. In 1988 SAS introduced the concept of MultiVendor Architecture (MVA); SAS/ACCESS software is released. Support for UNIX-based hardware announced. SAS/ASSIST software for building user-friendly front-end menus is introduced. New SAS/CPE software establishes SAS as innovator in computer performance evaluation. Version 6.03 for MS-DOS is released. 6.06 for MVS, CMS, and OpenVMS is announced in 1990. The same year, the last MS-DOS version (6.04) is released. Data visualization capabilities added in 1991 with SAS/INSIGHT software. In 1992 SAS/CALC, SAS/TOOLKIT, SAS/PH-Clinical, and SAS/LAB software is released. In 1993 software for building customized executive information systems (EIS) is introduced. Release 6.08 for MVS, CMS, VMS, VSE, OS/2, and Windows is announced. 1994 saw the addition of ODBC support, plus SAS/SPECTRAVIEW and SAS/SHARE*NET components. 6.09 saw the addition of a data step debugger. 6.09E for MVS. 6.10 in 1995 was a Microsoft Windows release and the first release for the Apple Macintosh. Version 6 was the first, and last series to run on the Macintosh. JMP, also produced by the SAS Institute, is the software package the company produces for the Macintosh. Also in 1995, 6.11 (codenamed Orlando) was released for Windows 95, Windows NT, and UNIX. In 1996 SAS announces Web enablement of SAS software and introduced the scalable performance data server. In 1997 SAS/Warehouse Administrator and SAS/IntrNet software goes into production. 1998 sees SAS introduce a customer relationship management (CRM) solution, and an ERP access interface - SAS/ACCESS interface for SAP R/3. SAS is also the first to release OLE-DB for OLAP and releases HOLAP solution. Balanced scorecard, SAS/Enterprise Reporter, and HR Vision are released. First release of SAS Enterprise Miner. 1999 sees the releases of HR Vision software, the first end-to-end decision-support system for human resources reporting and analysis; and Risk Dimensions software, an end-to-end risk-management solution. MS-DOS versions are abandoned because of Y2K issues and lack of continued demand. In 2000 SAS shipped Enterprise Guide and ported its software to Linux.

Version 7 series

The Output Delivery System debuted in version 7; as did long variable names (from 8 to 32 characters); storage of long character strings in variables (from 200 to 32,767); and a much improved built-in text editor, the Enhanced Editor. Version 7 saw the synchronisation of features between the various platforms for a particular version number (which previously hadn't been the case). Version 7 foreshadowed version 8. It was believed in the SAS users community, although never officially confirmed, that in releasing version 7 SAS Institute released a snapshot from their development on version 8 to meet a deadline promise. To some, SAS Institute recommending that sites wait until version 8 before deploying the new software was a confirmation of this.

Version 8 series

Released about 1999; 8.0, 8.1, 8.2 were Unix, Linux, Microsoft Windows, CMS (z/VM) and z/OS releases. Key features: long variable names, Output Delivery System (ODS). SAS 8.1 was released in 2000. SAS 8.2 was released in 2001.

Version 9 series

Version 9 makes additions to base SAS. The new hash object now allows functionality similar to the MERGE statement without sorting data or building formats. The function library was enlarged, and many functions have new parameters. Perl Regular Expressions are now supported, as opposed to the old "Regular Expression" facility, which was incompatible with most other implementations of Regular Expressions. Long format names are now supported. SAS 9.2 released in March 2008 and was demonstrated at SAS Global Forum (previously called SUGI) 2008.


Read and write different file formats.

Process data in different formats.

SAS programming language, a 4th generation programming language. SAS DATA steps are written in a 3rd-generation procedural language very similar to PL/I; SAS PROCS, especially PROC SQL, are non-procedural and therefore better fit the definition of a 4GL.

WHERE filtering available in DATA steps and PROCs; based on SQL WHERE clauses, incl. operators like LIKE and BETWEEN/AND.

Built-in statistical and random number functions.

Functions for manipulating character and numeric variables. Version 9 includes Perl Regular Expression processing.

System of formats and informats. These control representation and categorization of data and may be used within DATA step programs in a wide variety of ways. Users can create custom formats, either by direct specification or via an input dataset.

Comprehensive date- and time-handling functions; a variety of formats to represent date and time information without transformation of underlying values.

Interaction with database products through a subset of SQL (and ability to use SQL internally to manipulate SAS data sets). Almost all SAS functions and operators available in PROC SQL.

SAS/ACCESS modules allow communication with databases (including databases accessible via ODBC); in most cases, database tables can be viewed as though they were native SAS data sets. As a result, applications may combine data from many platforms without the end-user needing to know details of or distinctions between data sources.

Direct output of reports to CSV, HTML, PCL, PDF, PostScript, RTF, XML, and more using Output Delivery System. Templates, custom tagsets, styles incl. CSS and other markup tools available and fully programmable.

Interaction with the operating system (for example, pipelining on Unix and Windows and DDE on Windows).

Fast development time, particularly from the many built-in procedures, functions, in/formats, the macro facility, etc.

An integrated development environment.

Dynamic data-driven code generation using the SAS Macro language.

Can process files containing millions of rows and thousands of columns of data.

University research centers often offer SAS code for advanced statistical techniques, especially in fields such as Political Science, Economics and Business Administration.

Large user community supported by SAS Institute. Users have a say in future development, e.g. via the annual SASWare Ballot.


IBM SPSS Modeler is a data mining software application from IBM. SPSS Modeler is a powerful, versatile data mining and text analytics work bench that helps build accurate predictive models quickly and intuitively, without the need for programming. It enables users to discover patterns and trends in structured and unstructured data, using a visual interface supported by statistical and data mining algorithms. It is used by people seeking to generate business and scientific insights, and to improve decision making and business processes. Some of the domains using IBM SPSS Modeler include:

Customer relationship management (CRM)

Fraud detection and prevention

Risk management

Manufacturing quality improvement

Healthcare quality improvement

Forecasting demand or sales

Law enforcement and border security

SPSS Modeler was originally named SPSS Clementine by SPSS Inc., after which it was renamed PASW Modeler in 2009 by SPSS.[1] It was since acquired by IBM in its 2009 acquisition of SPSS Inc. and was subsequently renamed IBM SPSS Modeler, its current name.


There are two versions of SPSS Modeler;

SPSS Modeler Professional: Discover hidden relationships in structured data, such as databases, mainframe data systems, flat files or BI systems, to predict the outcomes of future events and interactions with statistical techniques.

SPSS Modeler Premium: Includes all the features of Modeler Professional, with the addition of:

Text Analytics (Extract concepts and sentiment with NLP within unstructured data such as call center notes, blogs, surveys or documents, to greatly improve model accuracy and predictive capability)

Entity Analytics (diverse data sources and resolve like entities - even when the entities do not share any key values)

Social Network Analysis (discover relationships among social entities and the implications of their behavior)

Both versions are available in desktop and server configurations.


The functionality of SPSS Modeler has been broken down into the phases of the CRISP-DM methodology. The items marked with a * are only available within SPSS Modeler Premium

Data understanding

Interact with data by selecting regions or items on a graph and viewing the selected information; or select key data for use in analysis

Access SPSS Statistics graphs and reporting tools directly from the SPSS Modeler interface

Data preparation

Access operational data from: IBM Cognos Business Intelligence, IBM DB2, IBM Netezza, mainframe data through zDB2 and IBM Classic Federation Server support, Oracle, mySQL (Oracle), Microsoft SQL Server, Teradata data sources.

Import file types including: SPSS Statistics .SAV, SPSS Data Collection data sources, Excel .xls .xlsx, Delimited and fixed-width text files, SAS, XML.

Multiple data-cleaning options to remove or replace invalid data, automatically impute missing values and mitigate outliers and extremes.

Automatic data preparation to interrogate and condition data for analysis in a single step.

Access data management and transformations performed in SPSS Statistics directly from SPSS Modeler.

Field filtering, naming, derivation, binning, re-categorization, value replacement and field reordering.

Rcord selection, sampling (including clustered and stratified sampling), merging (including inner joins, full outer joins, partial outer joins, and anti-joins), sorting, aggregation and balancing.

Data restructuring, partitioning and transposition.

Extensive string functions: string creation, substitution, search and matching, whitespace removal and truncation.

RFM scoring: aggregate customer transactions to provide Recency, Frequency, and Monetary value scores and combine these to produce a complete RFM analysis.

Export data to databases, IBM Cognos Business Intelligence packages, SPSS Statistics, SPSS Data Collection, delimited text files, Excel, SAS, or XML.

Entity Analytics to combine or separate records through context accumulation.

Social Network Analysis including Group analysis and Diffusion analysis.

Extract text data from files, operational databases and RSS feeds (i.e., blogs, web feeds) in Dutch, English, French, German, Italian, Portuguese, Spanish or Japanese.

Use and customize pre-built templates and libraries for sentiment analysis.

Group text documents and records based on content, using text classification algorithms.

Identify and extract sentiments (for example, likes and dislikes) from text in Dutch, English, French, German and Spanish.

Reveal complex relationships through interactive graphs that show multiple semantic links between two concepts.

Include opinions, semantic relationships and linked events in deployable predictive models.

Modeling and evaluation

-Extensive range of data mining algorithms with many advanced features

Anomaly Detection: Detect unusual records through the use of a cluster-based algorithm

Apriori: Popular association discovery algorithm with advanced evaluation functions

Bayesian Networks: Graphical probabilistic models

C&RT, C5.0, CHAID & QUEST: Decision tree algorithms including interactive tree building

CARMA: Association algorithm which supports multiple consequents

Cox regression: Calculate likely time to an event

Decision List: Interactive rule-building algorithm

Factor/PCA, Feature Selection: Data reduction algorithms

K-Means, Kohonen, Two Step, Discriminant, Support Vector Machine (SVM): Clustering and segmentation algorithms

KNN: Nearest neighbor modeling and scoring algorithm

Logistic Regression: For binary outcomes

Neural Networks: Multi-layer perceptrons with back-propagation learning, and radial basis function networks

Regression, Linear, GenLin (GLM), Generalized Linear Mixed Models (GLMM): Linear equation modeling

Self-learning response model (SLRM): Bayesian model with incremental learning

Sequence: Sequential association algorithm for order-sensitive analyses

Support Vector Machine: Advanced algorithm with accurate performance for wide datasets

Time-series: Generate and automatically select time-series forecasting models

Automatic classification (binary and numeric)in place of selecting individual algorithms.

Automatic clustering in place of selecting individual algorithms.

Interactive model and equation browsers and view advanced statistical output.

Variable importance graphs to show relative impact of data attributes on predicted outcomes.

Geographic maps to visualize the analytic results on.

Combine multiple models (ensemble modeling) or use one model to analyze a second model.

Use the SPSS Modeler Component-Level Extension Framework (CLEF) to integrate custom algorithms.

Use R to extend analysis options, through the integration of SPSS Statistics.


Export models using SQL or PMML (the XML-based standard format for predictive models).

Leverage IBM SPSS Collaboration and Deployment Services for innovative analytics management, process automation and deployment capabilities.

IBM SPSS Modeler Server adds the following capabilities

Leverage high-performance hardware, (eg IBM System Z) to experience quicker time-to-solution and parallel execution of streams and multiple models.

In-database mining to build models in the database using leading database technologies and leverage high-performance database implementations.

SQL-pushback to push data transformations and select modeling algorithms directly into your operational databases.

In-database mining algorithms for IBM InfoSphere: Association, Clustering, Decision Tree, Logistic Regression, Naive Bayes, Regression, Sequence, Time Series.

In-database mining algorithms for IBM Netezza: Bayes Net, Decision Trees, Divisive Clustering, Generalized Linear, K-Means, KNN, Linear Regression, Naive Bayes, PCA, Regression Tree, Time Series.

In-database mining algorithms for Microsoft SQL Server: Association Rules, Clustering, Decision Tree, Linear Regression, Naive Bayes, Neural Network, Sequence Clustering, Time-Series.

In-database mining algorithms for Oracle: Adaptive Bayes, Apriori, Artificial Intelligence (AI), Decision Tree, General Linear Model (GLM), KMeans, Minimum Description Length (MDL), Naive Bayes, Non-Negative Matrix Factorization, O-Cluster (Orthogonal Partitioning Clustering), Support Vector Machine.

Score data within the database, to reduce data movement and increase performance improvements (via IBM SPSS Modeler Server Scoring Adapters).

Transmit sensitive data securely between SPSS Modeler Client and SPSS Modeler Server through secure sockets layer (SSL) encryption.



SAS Enterprise Miner streamlines the data mining process to create highly accurate predictive and descriptive models using big data. It offers an easy-to-use interface for iterative development of analytical models and to quickly share insights to take better decisions.

Forward-thinking organizations today use SAS predictive analytics and data mining software to detect fraud, minimize risk, anticipate resource demands, increase response rates for marketing campaigns and curb customer attrition.

SAS Enterprise Miner provides superior analytical depth. It focuses on building the least complex model with the most predictive power.

It offers multiple ways to conduct data manipulation and preparation. Organizations can interactively visualize, explore, and understand relationships among big data to assess alternative courses of action. SAS Enterprise Miner helps solve a variety of business problems irrespective of industry. SAS' breadth in targeting different business problems is its biggest differentiator against its competition.

Independent industry analyst firm Hurwitz & Associates evaluated SAS Predictive Analytics in its 2011 Victory Index Report. As part of the research, Hurwitz fielded several predictive analytics customer surveys and spoke directly to a number of SAS customers. SAS scored higher that all other vendors in the Victory Index on following criteria: Business Value, Performance/Scalability, Customer Satisfaction and Technology and Tools.


The primary features of SPSS are Perpetual licensing, which allows customers to own their software instead of having to license it on an annual basis and incur large TCO for their Predictive Analytics solutions; flexible deployment options from a standalone desktop application makes it ideal for individuals, and a multi-client/server deployment is appropriate for larger analyst teams.

It also offers seamless deployment into business process with Collaboration and Deployment Services integration. The Self-Learning Response Model (SLRM) node enables customers to build a model that can be continually updated, or re-estimated, as a dataset grows without having to rebuild the model every time while using the complete dataset.

The performance supports all common data sources used by enterprise organizations. SQL Pushback and In-database Mining are available in one integrated solution suite. Seamless integration is possible with Survey Data Collection tools for additional information in developing more accurate models.

IBM SPSS Data Mining Workbench doesn't require users to buy access engines to access various data sources. IBM SPSS Modeler doesn't require replicating physical data to get efficient performance. The users won't have to buy and install a separate bolt on capability in order to manage your models - it is built into our Collaboration and Deployment Services. The text mining capability has complete ability to read from websites, RSS feed, file locations. No additional tool is required to extract concepts, create categories and do sentiment analysis.


• SAS Company History

• Barr, Anthony J., Goodnight, James H. SAS, Statistical Analysis System, Student Supply Store, North Carolina State University, 1971. OCLC 5728643

• Barr, Anthony J., Goodnight, James H., Sall, John P., Helwig, Jane T. A User's Guide to SAS 76, SAS Institute, Inc., 1976. ISBN 0-917382-01-3

• Barr, Anthony J., Goodnight, James H., Sall, John P., Helwig, Jane T. SAS Programmer's Guide, 1979 Edition, SAS Institute, Inc., 1979. OCLC 4984363

• Cody, Ron and Ray Pass. SAS Programming by Example. 1995. SAS Institute.

• Delwiche, Lora D. and Susan J. Slaughter. The Little SAS Book. 2008. SAS Institute.

• Slaughter, Susan J. and Lora D. Delwiche. The Little SAS Book for Enterprise Guide 4.2. 2010. SAS Institute.

• McDaniel, Stephen and Hemedinger, Chris. SAS for Dummies. 2007. Wiley.

• Greenberg, Bernard G.; Cox, Gertrude M.; Mason, David D.; Grizzle, James E.; Johnson, Norman L.; Jones, Lyle V.; Monroe, John; Simmons, Gordon D., Jr. (1978), Nourse, E. Shepley, ed., "Statistical Training and Research: The University of North Carolina System", International Statistical Review 46: 171-207

• Service, Jolayne A User's Guide to the Statistical Analysis System., Student Supply Stores, North Carolina State University, 1972. OCLC 1325510

• SAS homepage.

• is the Wiki for all things SAS

• Find the Sasopedia under

• Wikiversity:Data Analysis using the SAS Language.

• SAS tips and techniques.

• Colin Shearer (1994); Mining the data-lode, Times Higher Education, November 18, 1994.

• IBM SPSS Modeler website.

• IBM developerWorks user forum for IBM SPSS Modeler.

• Linkedin group for IBM SPSS Modeler.

• A public user group - not affiliated with SPSS.