4 Critical Steps To Control Duplicates

  • Convergence Data
  • Convergence Data

Duplicate parts are the software bugs of the data world, hard to identify without specific tools and processes in place. If these “bugs” are not found in implementation they can be ultimately time consuming and costly down the road.  For these reasons it is crucial to get ahead of the issue as soon as possible, optimally during a data migration project or switch to a new platform.   For example, an SAP ERP data migration involving cleansing and classifying purchased parts from multiple legacy systems from US and International locations would be the perfect time to detect, isolate and deal with duplicate parts. The best case would be to have lead time of 6 to 12 months prior to the actual cutover to clearly find these duplicates and to give enough time to formulate the best plan to deal with these duplicates. Some may want to remove duplicate parts all together, while others may want to look for ways to consolidate and rationalize this data to help mitigate duplicate inventory, still others may want to associate them for purchaser’s reference for ordering and price comparisons.  Regardless of final goal for your company you first need to identify them through these steps.

Duplicate purchased parts – same part assigned to different internal part numbers

1. Collect then classify the data

It is key to run any duplicate analysis off of a specific category of data.  As a result, the first order of business is to classify the parts.  From each legacy ERP system, you would need to gather all the purchase part data together for analysis including: Manufacturing part numbers, supplier names, pricing, existing internal part numbers and descriptions plus any existing commodity codes can be useful to help with the initial classification. 

Typically, data coming from different systems needs to be normalized to a single standard classification model – you can do this by classifying all the purchase part data against a single data model.  This data model will help dictate the categories which in turn drive the attributes and allowed values each part should have. 

2. Enrich and validate the data

Once all the data from each system is classified – you need to enrich the data.  Using either the manuifacturer part numbers or associated documentation – you can harvest attribute data for each part. (This can be time consuming if you don’t have the expertise in house, you may want to contract this work out.)  The data is typically obtained from approved manufacturer websites or company approved documentation e.g specifications, drawings, etc. The harvested data should be loaded into a database and validated against the data model.

For example – for AC Motors – enriching each motor with key information – horsepower, speed, number of poles, phases, frequency, etc.  Enriching the data allows us to group duplicate motors and similar motors more easily.  In some cases, motors with different manufacturer part numbers can have very similar characteristics.

3. Duplicate and Near Duplicate Analysis

Once the data has been enriched, normalized and validated you are able to execute duplication analysis, preferable using duplicate analysis tools that are based on attribute value matches on critical attributes that drive pricing and differentiation. 

For grouping similar parts to help find near duplicates, you can use a clustering tool that allowed different weighting factors to be set on the critical attributes and the overall cluster – the CDS tool calls this a neighbor distance.  Critical to making this analysis work is having normalized data and having numeric data with separate units of measure.

Figure 1: Examples of duplicate, 5 HP Motors, with different part numbers and supplier pricing.

4. Processing Identified Duplicates

We recommend taking the groups of duplicates and select one internal part number to be the master for each manufacturer part number (MPN).  Since you will likely end up with multiple internal part numbers for a MPN, we recommend setting up a master cross reference.  That way purchasing could consolidate spend and rationalize inventory – without disrupting existing BOMs.  You could allow existing BOMS to use any of the part numbers to save time with engineering change control updating BOMs with just one internal part number.

We also recommend that you manage the master cross reference of internal part numbers to MPN in the ERP system.  This may be the easiest thing to do given all of the other priorities in a ERP deployment.  Overtime, engineering can use the master internal part number for new products – allowing a slow phase out the other internal part numbers.

Summary steps to the process: 

  1. Select the one Internal Part Number, Manufacturer Part Number (MPN) and Manufacturer (MFG) based on purchase history that will be the preferred part.
  2. Assign all other records as alternates to the assigned preferred part.
  3. Do not change the Engineering Bills of Materials.
  4. Move to a global part number on all documentation.
  5. Maintain the Global Part number to MPN and MFG internally in SAP, or other ERP/PLM systems.


Rich attribution and classification of data is pivotal to preventing and identifying duplicate and near duplicate data that lies within your systems today.  Collect then classify part data followed by enrichment and validation of the part data will help ensure your analysis can extract true duplicates. Once duplicates or near duplicates are identified it is essential to have a corporate plan on how to handle these duplicate on an ongoing basis.  A good plan will avoid over purchasing of parts and streamline your inventory.

Source: Convergence Data

Convergence Data
About the Author

Convergence Data is a one-stop software and service company for all your technical data needs. Our proprietary software and time-tested processes eliminate the clutter in your data—so you can finally use it to make sound business decisions.

Filter by category

Other similar posts

Digital Marketing Services, Engineering Resources, TraceParts News

End of Year Contest 2022: Search it, find it, win it 

Find one hidden product on TraceParts.com for a chance to win VR headset  TraceParts, one of the world's leading providers of 3D digital content, and...

Digital Parts Catalogs, Engineering Resources, TraceParts News

ReeR Safety’s industrial safety products’ CAD data now available on TraceParts

ReeR Safety, one of the world’s leading manufacturers of optoelectronic sensors for industrial safety, has partnered with TraceParts to make its 3D product data accessible...

Digital Parts Catalogs, Engineering Resources, TraceParts News

Global Connector Technology’s product catalog now available on TraceParts.com

Global Connector Technology (GCT), a manufacturer of world-class PCB connector and cable assembly solutions, has partnered with TraceParts to publish its 3D product catalog online,...