Automated Catalogue Mechanism for Integrating Products

By UK Essays

✅ Paper Type: Free Essay	✅ Subject: Computer Science
✅ Wordcount: 902 words	✅ Published: 03 Apr 2018

Reference this

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

ABSTRACT

Catalogue integration is a very essential and difficult job which needs to be carried out by various commercial portals and commerce search engines to built and consolidate the products gathered from various data providers.

In this paper we discuss an automated mechanism for integrating products from various providers and thus the process is considered from both the views of portal catalogue and the vendor providing catalogue.

This commercial portal has its own taxonomy for all the products which is named as the master taxonomy and the data provider organizes its products in a different taxonomy called as the taxonomy provider. This methodology is based on the taxonomy-aware processing step that adjusts the results of a text based classification so that the products that are similar to the provider’s catalogue will appear closer in the master catalogue. To the best of our knowledge , this is the first unique approach that uses structure of taxonomies in order to upgrade the catalogue integration. The proposed algorithm is scalable and can be applied to large data-sets in web. The algorithm is implemented on real- world data and has greater accuracy as it takes into account that the relationships between the product categories.

INTRODUCTION

The internet is no longer an academic and research oriented network but it is an open book with endless commercial opportunities. Online shopping has increased in recent times. These shopping websites have separate portals which can manage the sellers of the site where products arrive from multiple sellers. This includes e-commerce search engines such as Flipkart,Amazon , Sanpdeal and many common commercial search engines such as google product search and Bing shopping. Each seller will have a separate catalogue for the product available. Internet marketplace are now faced with new challenges that arises from the need to shamelessly integrate enormous number of product catalogue from different sources. Product categorization is the main task that needs to be carried out in here. Hence, we need some mechanism which combines both the approaches, provides accurate classification of products and also scalable to large volume of dataset that is typical on the web.

EXISTING SYSTEM

In the existing system there is a difficulty for the customer or the provider to update the details of a product. The ecommerce website has a master catalogue according to which the provider has to upload his/her catalogue of a product. If this catalogue does not match with the master catalogue, then the product will not be accepted by the e-commerce website. This will create a problem for the providers who are un aware of the master catalogue. Not all the providers are aware of these technological details and restrictions. The existing system has imposed more restrictions for the upload of products, which can create a backlog in the business of that shopping website.

PROBLEM DEFINITION

Given a source catalogue K_s=(P_s, S,s) that corresponds to some provider’s catalogue defined over the source taxonomy S=(C_s, E_s), and a target (or master) catalogue taxonomy K_t = (P_t,T, t) that corresponds to the catalogue of the commercial portal defined over the target (master) taxonomy T = (C_t, E_t). The goal is to learn a cross-catalogue labelling function function l: P_s -> C_t that maps products of the source catalogue to the categories of the target catalogue taxonomy.

PROJECT SCOPE

Here it makes use of provider taxonomy information to categorize products coming from data providers to the master taxonomy. This approach is based on taxonomy-aware processing step that adjusts the results of a text-based classifier so that the products that are in near-by categories in the provider taxonomy will be assigned near-by categories in the master taxonomy.

TACI is scalable to large volume of datasets and it has linear running time with respect to number of input products. It exploits the full structure of the taxonomy, defining relationships between items that belong to different categories, based on the relationship of the categories in the taxonomy tree. TACI provides accurate results when compared other existing approaches

SYSTEM DESIGN

This chapter describes the features and modular design of the proposed algorithm. Unlike existing approaches the proposed algorithm explores the entire taxonomy of the provider and master catalogs to find out a classification of products from provider’s catalog in master taxonomy. The formulation of the taxonomy-aware catalog integration problem is based on structured prediction problem. The optimized classification of products is achieved by designing the algorithm using metric labelling approach. Thus the proposed algorithm involves calculating two measures:

Cost indicating assignment of products to categories
Cost indicating strength of relationships among product categories

Since the existing approaches considered categories as flat collection of classes, pair-wise relationships are considered and hence suffered scalability issues. The proposed algorithm exploits the taxonomy structure to find out the relationship among the categories and uses them to prune the search space thus making the algorithm scalable. Thus the algorithm has linear running time with respect to the input data and is applicable to larger datasets.

FUNCTIONAL STEPS

The Taxonomy Aware Catalog Integration is a 2 step process.

Base Classification Step: This step does not consider taxonomy structure and utilizes general text classifier. Finds Assignment Cost.
Taxonomy-Aware Processing Step: This step involves exploiting the taxonomy structure of both source and target catalogs. Finds Separation Cost.

MODULAR DESIGN

PSEUDO CODE

Input: Source catalog K_s, Target taxonomy T, base classifier b, and parameters ï±,k, and γ

Output: A labeling vector â„“

F_s <- Φ
for all x Ñ” P_s do
τ*<- arg max _τ _Ñ”_Ct P_τ_b[τ|x]
if P_τ_b[τ*|x] >= ï± then
â„“_x <- τ*
F_ï±<- F_ï± U {x}
else
O_ï± <- O_ï± U {x}
Compute TOP_k{x}
Compute candidate pairs H _ï±_,k
Initialize hash table Ψ to empty
for all (σ, τ) € H _ï±_,k do
Ψ[(σ, τ)]=h (σ, τ)
for all x Ñ” O _ï± do
â„“_x <- argmin _τ _{Ñ” TOP k ( x )} { (1- γ) A COST(x,τ)+ γ Ψ[ (s_x, τ)] }

BASE CLASSIFICATION STEPS

This step does not consider the structure of both provider and master taxonomies. It uses Naive Bayes text classification result and ï± value to distinguish between fixed and open products. The fixed products are the set of products in the provider catalog with probability of categories greater than the threshold value and thus, Naive Bayes result is taken as the correct category.

TAXONOMY AWARE PROCESSING STEP

This step involves exploiting the taxonomy structure to find out the relationships among the categories.This needs to find the similarity measure between two categories in both source and target taxonomies. Pair of products (x,y) assigned to category pair having greater similarity measure in provider taxonomy should be assigned to category pair having greater similarity measure in master taxonomy which is ensured by the penalty function (δ). Absolute difference gives the difference between similarity values obtained for a category pair in both source and target taxonomies. Thus this helps in finding out the cost of separating a pair of products which is given by the separation cost.

CONCLUSION AND FUTURE WORKS

With the proliferation of data sharing applications that involve multiple data providers the development of automated techniques for catalog integration will be crucial to their success. In this paper, we presented an efficient and scalable automated approach to catalog integration that is based on the use of source category and taxonomy structure information. TACI is a pioneer to catalog integration which exploits the structure of taxonomies to enhance catalog integration with greater accuracy.

Here, we have explained with product integration in shopping portals. However, this technique can also be applied to many other important domains which deploy the concept of multiple data sources which contains several categories to be integrated to a single place in a unique way. This includes important verticals such as Local, Travel, Entertainment, etc.

This technique was carried out as a supervised learning technique. For future work, we would like to explore semisupervised learning techniques to incrementally retrain the base classifier with elements chosen during the taxonomy-aware calibration step.

UK Essays

Established in 2003 by qualified barrister Barclay Littlewood, UK Essays is a leading provider of expert educational support. Our dedicated in-house team of academically qualified specialists works alongside over 500 UK-qualified researchers to deliver exceptional bespoke essay writing services across a wide range of subjects and levels. With extensive press coverage and more than 1,800 verified reviews, we’re the UK’s #1 choice for academic excellence.

Areas of Expertise

Academic Writing Assignment Help Essay Writing Dissertation Writing Coursework Support Report Writing Literature Reviews Reflective Writing Case Studies Nursing Assignments Law Assignments Research Proposals Exam Revision Proofreading Editing Presentation Development Group Projects Portfolio Preparation Study Guidance

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Cite This Work

To export a reference to this article please select a referencing stye below:

Related Services

View all

Essay Writing Service

From £99

Report Writing Service

From £99

Student reading and using laptop to study

Assignment Writing Service

From £99

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please click the following link to email our support team:

Request essay removal