Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.
#inittowinit Data Science Report
#inittowinit has conducted a broad and comprehensive analyses, yielding results that align with the objectives determined at the outset of the project. The conducted cluster analysis generated five distinct clusters and revealed information regarding Age Range, Most Frequent Category Name, Least Frequent Category Name, Income, LifeTime Value Gross Revenue and Household Composition. Of the five clusters, one cluster in particular – named Cheeky Chic – provided characteristics that align with the objectives of the group. Consumers within this cluster have a propensity to purchase Swimwear, Leggings and Pantiesmore to a greater degree than other customer segments, and a low propensity towards purchasing hosiery. Their age fluctuates between 41-45 and their income is between $200,000 and $249,000. Focusing on this cluster for future marketing efforts could not only help Spanx drive further sales for leggings and panties but also decrease the average age of a Spanx customer – one of the key concerns indicated to the Group by Spanx at the opening stages of the project.
In addition to clustering, the Group has also conducted a detailed market basket analysis. This process helps to identify association rules that illustrate which items appear most frequently within purchasing patterns. It is expected that this approach will help drive cross selling opportunities for the Company. Taking this approach, the group discovered data that illustrated instances of certain customers who purchased activewear were 31% likely to also purchase leggings. The other insight of significance gleaned from this analysis was that customers who purchase leggings are 24% likely to purchase apparel. These association rules are underpinned by the concept of confidence, defined as the probability that the aforementioned pairs appear together in a transaction. This association rule is also strengthened by a lift score of above 1. Traditionally, a lift score of above one signals a strong correlation between two particular products. A detailed description of lift and confidence can be found further on in the paper.
Based on both of the above analyses. The Group suggest the following recommendations to increase sales in the leggings, activewear and panties segments:
- Pair commonly purchased items together. The Company should utilize the data provided by the Group to suggest items under the ‘We think that you might also love’ that correlate with our findings. For example, if a potential customer is looking at Workout to Waves Sports Bra they should be suggested Look at me now Seamless Leggings as an additional purchase.
- Specifically target those potential customers expressing the criteria of “Cheeky Chic” through social media marketing material. They should be exposed to the leggings, activewear and panties that similar customers have exhibited the propensity to purchase already.
Given an additional amount of time, the Group would have attempted to leverage a wider range of third party vendor data to strengthen our existing analysis and perhaps provide a broader insight into Spanx consumers. In addition, we would also liked to have imputed for unknown variables to help better understand the customer base of the Company.
The purpose of this report is to provide a broader understanding into the presentation that is to be presented to certain members of Spanx Inc. (hereafter “Spanx” or “the Company”) by #inittowinit (hereafter “the Group”) on the 5th of December, 2018.
From initial meetings held with the Company in August, it became evident that there were three main objectives to be addressed upon the undertaking of the project:
- To lower the average age of a Spanx consumer (at time of undertaking, August 2018, this stood at 52)
- To drive sales in non-shapewear categories for Spanx
- To increase the frequency of multi item purchases. The Company has indicated that the majority of their transactions were for individual items
The Group has conducted a detailed data analysis that goes some distance in achieving these objectives. Since August, the Group has developed and applied both a market basket and K-means cluster analyses. These analyses have yielded important results that will be illustrated in the below report.
The intention of these analyses is to provide data driven insights to the Company with the hope of achieving an increase in sales for both the leggings, activewear and panties sectors. Our market basket analysis allows for a detailed scrutiny of the purchasing patterns based on data provided by Spanx to the Group. It is expected that the insight gained from our market basket analysis will allow the Company to utilize cross selling opportunities for leggings and active wear to achieve the aspirations laid out at the beginning of the project.
In addition to the market basket analysis, the Group conducted a cluster analysis. Cluster analyses allow for the identification of the characteristics of consumers who have a propensity to purchase particular categories of clothing. In total, five clusters were identified. One cluster in particular illustrates characteristics that correlate strongly and align with the objectives illustrated above. This form of analysis also provides additional indicators regarding customers that fall within these clusters, such as age range, average income, household composition and lifetime value (“LTV”) gross revenue.
As a consequence of our analysis, the group will detail a number of data driven recommendations based on our findings that should help to drive sales in active wear, leggings and apparel through highlighting particular cross selling opportunities. These will also aid the Company in reducing the average age of their customers by utilizing findings from the cluster analysis.
The data used for the purposes of this project was collected and provided by SPANX and uploaded to a shared SAS environment by Kennesaw State University. The data was received in six parts: Sales Facts, Consumer Dimensions, Product Dimension, Marketing Channel, Sales Dimension, and Sales Order Promotion Fact. Each component was successfully merged to create a final data set containing approximately 5,818,147 observations and 87 columns. The merged data set represented a one-to-many relationship between customer information and customer purchases spanning from 2014-2018. Of the total observations, less than .01% contained missing values; however, 43% (2,536,720) contained at least one unknown variable (i.e., age range, income range, household composition, birth date, etc.). Due to the nature of the data, the unidentified values were not imputed as to keep an accurate representation of the SPANX consumer population.
Preliminary analyses were completed, including descriptive statistics, to identify miscoded variables, gaps in information, and trends in the data. The information obtained in the preliminary analysis, such as in the frequency table below (see Figure 1.) was helpful in confirming the the limitations and capabilities of our initial objectives.
Figure 1. Count of Cross Purchases by Category
After gaining an understanding of the variables and confirming the feasibility of the initial objectives, the data was prepared to perform a market basket and cluster analysis.
Market Basket Analysis
To assess potential cross-selling patterns, a market basket analysis was completed. A market basket analysis is an unsupervised model designed to determine patterns or relationships between one or more items (Oracle, n.d.). This model is appropriate for the Group’s objectives to identify trends in purchasing behaviors amongst SPANX customers using transactional data from 2016, 2017, and 2018. The main purpose of association mining is to establish rules operating as predictors for products. These predictions stem predominantly from the prevalence of a particular product – such as apparel – when purchased with another product category – such as leggings.
The overall outputs of market basket analysis rely on three fundamental concepts. The first of these concepts is called support. Simply put, this is the number of times throughout all the provided transactions that both items appear purchased together. One example of this could be represented by asking the question: of all the transactions, how many of these transactions include both hosiery and activewear? The second element of this analysis, which has been touched upon before, is called Confidence. Confidence can be defined as the probability that a second item will be purchased given the initial incidence of the first. For example, if a customer has purchased shapewear, confidence will demonstrate the likelihood that hosiery will be purchased in tangent (IBM Knowledge Center, 2018).
The final of these concepts is called lift. Given the purchase of one product, lift is used to illustrate the increased probability that a secondary product is likely to be purchased compared to the probability of just the singular purchase of product one (Association Rules. 2018). This number is divided by the probability of purchases of both products one and two if they were independent occurrences. For example, assuming that shapewear has been purchased, what is the probability that leggings will be purchased afterwards compared to the probability that shapewear and leggings will be purchased separately. A lift value of 1 is generally seen as meaningful and adds broader context to the confidence rule. This is because lift indicates the extent to which each items are interdependent on one another.
Results of the Market Basket analysis can be found in the “Results” section. Additionally, the complete output from the final results is shown in Appendix B.
K-Means Cluster Analysis
Supplementing the market basket, a cluster analysis was performed to attempt to identify similarities in SPANX consumers and more specifically their purchasing habits. To execute the cluster analysis, a K-means was performed in SAS and chosen for its capabilities of quickly processing large data sets compared to a cluster analysis in R, which is restricted on the number of observations that can be processed.
K-means clustering is a popular unsupervised method of clustering that utilizes numeric data to calculate the centroid or average along each dimension/variable. With a number of clusters specified, the K-means iteratively attempts to find the nearest data points closest to the initial estimated seed.
For the purposes of this analysis, five variables were chosen to represent each cluster: age range, income range, household composition, lifetime value gross revenue, and category name of products purchased. Since the variables chosen were imported as character, each variable was transformed to numeric and standardized in SAS. For ordinal variables such as age range and income range, the variables were assigned a numeric value in ranking order (see Figure 2).
Figure 2. Ordinal Data Transformation and Standardization
To transform nominal variables such as category name, dummy variables were generated. For example, in the data set and as seen in Figure 3, a column for each category/product was created with 1 signifying the presence of the product and 0 signifying the absence of the product.
Figure 3. Dummy Variable Transformation
With the data accurately transformed, missing variables removed, and remaining observations, a K-means analysis was executed using a macro variable to process the PROC FASTCLUS statement nine times iteratively. Initial iterations produced uninterpretable and ineffectual results due to numerous outliers triggered by select customers who purchased several hundred products as well as the large proportion of unknown values. Initial clusters included one or more variables of unknown age, income or household composition. Determining this information was neither valuable nor informative, the PROC FASTCLUS procedure was modified to include the Strict, Radius, Replace options to combat the outliers. Additionally, observations containing at least one unknown value were removed.
The strict option excludes observations with large distances to its nearest cluster greater than the value specified by the radius. The radius establishes the minimum distance for selecting a new seedand the replace = random option generates a random seed for each iteration. With the new method employed, the PROC FASTCLUS was repeated for the remaining 2,861,803 observations (excluding unknown values). Applying this strategy allowed for better and more interpretable clusters. To determine the optimum number of clusters, measures including the Cubic Clustering Criterion, overall R-Squared, and frequency distribution were evaluated and compared across each cluster. A detailed project flow diagram is shown in Figure 4.
Results of the K-means can be found in the “Results” section. Additionally, the complete output from the final results is shown in Appendix A.
Figure 4. Project Process Map
Market Basket Analysis
A market basket of consumer transactions was processed year-by-year from 2016 to 2018. As expected, most category products have a relationship with shapewear, which is evidently the most frequently purchased product throughout the data set. However, more interestingly, in 2016 there was an association between active wear and leggings that may symbolize the beginning of a new market opportunity. According to the market basket analysis, in 2016 there were 727 instances of active wear and leggings cross-purchased. Furthermore, the confidence associated with this rule was 27.78 percent. In translation, for every transaction of activewear, there is a 27.78% probability leggings will additionally be purchased (see Table 1). A second interesting finding is an association between apparel and leggings. Although less instances of these two products cross-purchased were identified (approximately 459), the confidence was on relatively high at 19.62 percent. Put simply, for every transaction of apparel, there is a 19.62% probability legging will additionally be purchased.
Table 1. 2016 Market Basket Analysis Association Rules
[Note: Rules were generated with a support of 0.001 and Confidence of 0.05]
An analysis comparing product associations in 2017 still indicates that shapewear maintains the position of the most frequently item purchased; however, shapewear lost support from 49.99% in 2016 to 46.66% in 2017. Additionally, similar to 2016, there is a strong association between activewear and leggings with a confidence of 27.07% as well as a upward trend in confidence between apparel and leggings of 25 percent (see Table 2).
Table 2. 2017 Market Basket Analysis Association Rules
[Note: Rules were generated with a support of 0.001 and Confidence of 0.05]
Finally, an analysis comparing product associations in 2018 indicates shapewear is still the most frequent item with a negative trending support from 49.99% in 2016 to 43.34% in 2018. The reduction in support of shapewear may be the result of an expansion of SPANX’s new line of business dominated by leggings. This assumption is supported by the positive trend in confidence associated between activewear and leggings from 27.78% in 2016 to 30.08% in 2018 as well as association between apparel and leggings, which increased from 19.62% in 2016 to 24.17% in 2018.
Table 3. 2018 Market Basket Analysis Association Rules
[Note: Rules were generated with a support of 0.001 and Confidence of 0.05]
Cluster Analysis Results
Results from the K-means cluster analysis produced five distinct clusters and an overall R-Squared of 73 percent (see Figure 5.). The first cluster contained 622,257 observations and was labeled the “TrendFit” group. Consumers in this cluster ranged from 46-50 years of age with an income between $200,000-$249,999 and a propensity to purchase activewear, bras, and apparel. The second cluster contained 373,065 observations was labeled the “Xpressories” group. Consumers in this group ranged from 46-50 years of age with an income between $60,000-$69,999 and a propensity to purchase accessories and hosiery. Cluster three (also known as “Cheeky Chic”) contained 678,083 observations and included consumers who ranged from 41-45 years of age with an income of $200,000-$249,999 and a propensity to purchase swimwear, panties and leggings. It is important to note that based on the strategic goals of the company to expand leggings, the third cluster of consumers is particularly important. Cluster four (also known as “Big Guns”) contained 698,764 observations and included customers ranged from 56-60 with an income between $100,000-$124,999 and more likely to purchase arm tights and other products. Lastly, cluster five (also known as “Marvelust”) contains 489,506 observations and includes consumers ranged from 31-35 years of age with an income between $75,000-$99,999 and a propensity to purchase men’s products, shapewear and packaged shapewear.
Figure 4. Cluster Analysis
Limitations and Weaknesses
While the analyses performed are relevant and can provide useful insights for SPANX, there are some potential limitations worth mentioning. For instance, although the data set provided was comprehensive, it does not represent either the total transactional data or consumer profiles of the company. Since SPANX uses many alternative channels to sell products, the data provided represents approximately five percent of the total consumer base. Therefore, it should not be assumed that recommendations gleaned from the analysis of the submitted data could be applied to make overall assumptions regarding the remaining ninety-five percent of the transactional data.
Furthermore, 43 percent of the data contained “unknown” variables and thus were removed to prevent the abundance of these variables dominating the cluster results. Lastly, the cluster analysis performed is not a predictive model and should be reassessed in tangent with evolving trends and changing demographics.
Based on the market basket and cluster analysis The Group feels there are several ways to implement our analysis that will deliver value to SPANX by ultimately driving higher top line sales. The first recommendation is to leverage the market basket analysis by optimizing products shown together on SPANX.com. To use a specific example, when a customer clicks on a leggings product and goes into that specific leggings product page. On this page there is an additional area that says “We think you might also love…” and shows several images of recommended products. This is where the Group feels that the opportunity exists, currently the products shown for leggings include scrunchies, panties and arm tights but based on the basket analysis, customers that purchase leggings are most likely to purchase apparel type product. In place of scrunchies, panties and arm tights, it is recommended for Spanx.com to show apparel. Another related opportunity the Group identified exists around the follow up emails that are sent to customers who place a product in their cart on the website but do not purchase the product. The same principles can be applied to this process to ensure SPANX is pairing the most commonly purchased items together in front of the customers. Initial investigation leads us to believe this is a fairly low cost solution and if implemented SPANX could see a significant increase in sales. Using the above example, if apparel is paired with leggings, the Group has identified a potential to increase in top line sales of $1.17M or 2.0%.
Below is the process and calculations used to forecast the potential sales increase based on The Group’s recommendation:
Step 1: Determine the weighted probability assuming the lowest priced leggings product ($68) and the lowest priced apparel product ($30).
Look at me now Leggings (retail price)
Spanx Arm Tights Layering Piece (retail price)
Confidence (based on market basket analysis)
Probabilistic Expected Revenue
% Increase in Expected Revenue
Step 2: Based on annual unit sales for Leggings of 127k
Step 3: Average purchase price of Apparel equals $84 (based on average Apparel pricing as of 12/2/18)
Step 4: Multiply annual Legging units (127k) by average price of Apparel ($84) (127k x $84 = $10.7M)
Step 5: Multiply the $10.7M by 11% which gives you $1.17M or 2.0% of total sales ($10.7 x 11% = $1.17M)
The second area where The Group feels the analysis can be extremely valuable is by optimizing the target marketing initiatives by leveraging the cluster analysis. By identifying specific customer segments that are likely to cross purchase products the Group feels SPANX can build off of what they are already doing to ensure that they are marketing the right product to the right customer. The first recommendation is to optimize target marketing through custom catalogs. Based on our cluster analysis there are five distinct customer segments that cross-purchase common items. One example of this would be to send our “Cheeky Chic” customer, catalogs that contain swim, panties and leggings products but exclude hosiery. Again, this allows SPANX to ensure their marketing efforts are driven and backed by data. A second example would to offer promotions, with our cluster analysis SPANX can now send targeted promotions to each segment of customers to drive real value for all customers. The third recommendation associated with target marketing is around social media. Demographics for the social media platforms are readily available and can be used to the advantage of SPANX by aligning those social media demographics with our cluster segments. Facebook, which is used by over 165 million people, is the ideal social media site to market through for several reasons. The demographics of Facebook match up very well with two important clusters we identified as “TrendFit” and “Cheeky Chic”. Per Spoutsocial.com, 83% of all adult women use Facebook, 84% of Facebook users fall between the age of 30-49 and 75% of all adults who make more than $75K/year use Facebook. Those three demographic points match up very well with our two clusters that represent 46% of the entire data set thus making Facebook the optimal avenue for social media marketing.
Overall, by identifying the specific likelihood that the purchase of one item will potentially drive the purchase of additional items, SPANX can ensure that they are pairing appropriate products together on their website. Through cluster analysis we have identified five distinct customer segments. These segments provide SPANX with the ability to optimize the targets of their marketing strategies. These results provide SPANX with a more detailed and concise image into their current consumer base. As explained throughout our report the market basket, and cluster analysis have identified several meaningful patterns in cross-purchases and similar customer buying habits that should be of valuable to the future growth of SPANX. The market basket analysis has revealed a notable relationship between leggings and activewear that, based on our forecasted revenue assumptions, will be a consequential and continuing avenue of growth for the Company.
- Apriori. (2018). Retrieved from https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/algo_apriori.htm#BGBCDHEB
- Association Rules. (2018). Retrieved from https://www.solver.com/xlminer/help/association-rules
- IBM Knowledge Center. (2018). Retrieved from https://www.ibm.com/support/knowledgecenter/es/SSEPGG_10.1.0/com.ibm.im.model.doc/c_lift_in_an_association_rule.html
- SAS Institute. (2009). SAS/STAT 9.2 user’s guide, Chapter 34, The FASTCLUS procedure (pp. 1621-1673). Cary, N.C.
- Social Media Demographics to Inform a Better Segmentation Strategy. (2018).
Retrieved from https://sproutsocial.com/insights/new-social-media-demographics/
- Unsupervised Data Mining. (2018). Retrieved from https://docs.oracle.com/cd/B19306_01/datamine.102/b14339/4descriptive.htm
- What is the lift value in association rule mining? | DataMiningApps. (2018). Retrieved from https://www.dataminingapps.com/2017/04/what-is-the-lift-value-in-association-rule-mining/
Cite This Work
To export a reference to this article please select a referencing stye below:
Related ServicesView all
DMCA / Removal Request
If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please: