Remove duplicates from output python

Question

facing a issue here:

Nội dung chính Show

Not the answer you're looking for? Browse other questions tagged python web-crawler or ask your own question.
Remove Duplicates from a list using the Temporary List
Remove duplicates from list using Dict
Remove duplicates from a list using for-loop
Remove duplicates from list using list comprehension
Remove duplicates from list using Numpy unique() method.
Remove duplicates from list using Pandas methods
Remove duplicates using enumerate() and list comprehension
How do I remove duplicates from a list?
How do you remove duplicates from a DataFrame in Python?

Following example:

for item in g_data:
        Header = item.find_all("div", {"class": "InnprodInfos"})
        print(Header[0].contents[0].text.strip())

Output:

DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour
DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour

As you can see above, it gives me the output twice. Hence, only the second duplicates should be removed.

The result should look like:

DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour

Can anyone provide me feedback how to delete the duplicates? Any feedback is appreciated.

asked Oct 1, 2015 at 11:42

Serious RuffySerious Ruffy

6872 gold badges10 silver badges25 bronze badges

3

You should store the output in a set to verify if it has been "printed" already. After that you print out the elements of the set.

g_data = ["foo", "bar", "foo"]
g_unique = set()
for item in g_data:
        g_unique.add(item) # ensures the element will only be copied if not already in the set

for item in g_unique:
    print(item) # {'foo', 'bar'}

answered Oct 1, 2015 at 11:48

You can use list or set (if order doesn't matter):

Using list:

result = []
for item in g_data:
    header = item.find_all("div", {"class": "InnprodInfos"})
    item = header[0].contents[0].text.strip()
    if item not in result:
        result.append(item)

print '\n'.join(result)

Using set:

result = set()
for item in g_data:
    header = item.find_all("div", {"class": "InnprodInfos"})
    result.add(header[0].contents[0].text.strip())

print '\n'.join(result)

answered Oct 1, 2015 at 11:47

1

You can use a set to keep track of which items you have printed. This preserves the original order

already_printed = set()
for item in g_data:
    header = item.find_all("div", {"class": "InnprodInfos"})
    item = header[0].contents[0].text.strip()
    if item not in already_printed:
        print(item)
        already_printed.add(item)

answered Oct 1, 2015 at 11:50

John La RooyJohn La Rooy

285k50 gold badges357 silver badges498 bronze badges

1

There is a simple way to do this using list comprehension :)

s = set()
[s.add(text) for d_text in Header[0].contents[0].text.strip().split('\n')]
print('\n'.join([text for text in s]))

answered Oct 1, 2015 at 12:07

Remi GuanRemi Guan

20.5k17 gold badges62 silver badges83 bronze badges

1

Not the answer you're looking for? Browse other questions tagged python web-crawler or ask your own question.

A list is a container that contains different Python objects, which could be integers, words, values, etc. It is the equivalent of an array in other programming languages.

So here will go through different ways in which we can remove duplicates from a given list.

In this tutorial, you will learn:

Remove duplicates from list using Set
Remove Duplicates from a list using the Temporary List.
Remove duplicates from list using Dict
Remove duplicates from a list using for-loop
Remove duplicates from list using list comprehension
Remove duplicates from list using Numpy unique() method.
Remove duplicates from list using Pandas methods
Remove duplicates using enumerate() and list comprehension

To remove the duplicates from a list, you can make use of the built-in function set(). The specialty of set() method is that it returns distinct elements.

We have a list : [1,1,2,3,2,2,4,5,6,2,1]. The list has many duplicates which we need to remove and get back only the distinct elements. The list is given to the set() built-in function. Later the final list is displayed using the list() built-in function, as shown in the example below.

The output that we get is distinct elements where all the duplicates elements are eliminated.

my_list = [1,1,2,3,2,2,4,5,6,2,1]
my_final_list = set(my_list)
print(list(my_final_list))

Output:

[1, 2, 3, 4, 5, 6]

Remove Duplicates from a list using the Temporary List

To remove duplicates from a given list, you can make use of an empty temporary list. For that first, you will have to loop through the list having duplicates and add the unique items to the temporary list. Later the temporary list is assigned to the main list.

Here is a working example using temporary list.

my_list = [1, 2, 3, 1, 2, 4, 5, 4 ,6, 2]
print("List Before ", my_list)
temp_list = []

for i in my_list:
    if i not in temp_list:
        temp_list.append(i)

my_list = temp_list

print("List After removing duplicates ", my_list)

Output:

List Before  [1, 2, 3, 1, 2, 4, 5, 4, 6, 2]
List After removing duplicates  [1, 2, 3, 4, 5, 6]

Remove duplicates from list using Dict

We can remove duplicates from the given list by importing OrderedDict from collections. It is available from python2.7 onwards. OrderedDict takes care of returning you the distinct elements in an order in which the key is present.

Let us make use of a list and use fromkeys() method available in OrderedDict to get the unique elements from the list.

To make use of OrderedDict.fromkey() method, you have to import OrderedDict from collections, as shown below:

from collections import OrderedDict

Here is an example to remove duplicates using OrderedDict.fromkeys() method.

from collections import OrderedDict

my_list = ['a','x','a','y','a','b','b','c']

my_final_list = OrderedDict.fromkeys(my_list)

print(list(my_final_list))

Output:

['a', 'x', 'y', 'b', 'c']

From Python 3.5+ onwards, we can make use of the regular dict.fromkeys() to get the distinct elements from the list. The dict.fromkeys() methods return keys that are unique and helps to get rid of the duplicate values.

An example that shows the working of dict.fromkeys() on a list to give the unique items is as follows:

my_list = ['a','x','a','y','a','b','b','c']
my_final_list = dict.fromkeys(my_list)
print(list(my_final_list))

Output:

['a', 'x', 'y', 'b', 'c']

Remove duplicates from a list using for-loop

Using for-loop, we will traverse the list of items to remove duplicates.

First initialize array to empty i.e myFinallist = [].Inside the for-loop, add check if the items in the list exist in the array myFinallist.If the items do not exist, add the item to the array myFinallist using the append() method.

So whenever the duplicate item is encountered it will be already present in the array myFinallist and will not be inserted. Let us now check the same in the example below:

my_list = [1,2,2,3,1,4,5,1,2,6]
myFinallist = []
for i in my_list:
    if i not in myFinallist:
myFinallist.append(i)
print(list(myFinallist))

Output:

[1, 2, 3, 4, 5, 6]

Remove duplicates from list using list comprehension

List comprehensions are Python functions that are used for creating new sequences (such as lists, dictionaries, etc.) using sequences that have already been created. This helps you to reduce longer loops and make your code easier to read and maintain.

Let us make use of list comprehension to remove duplicates from the list given.

my_list = [1,2,2,3,1,4,5,1,2,6]
my_finallist = []
[my_finallist.append(n) for n in my_list if n not in my_finallist] 
print(my_finallist)

Output:

[1, 2, 3, 4, 5, 6]

Remove duplicates from list using Numpy unique() method.

The method unique() from Numpy module can help us remove duplicate from the list given.

To work with Numpy first import numpy module, you need to follow these steps:

Step 1) Import Numpy module

import numpy as np

Step 2) Use your list with duplicates inside unique method as shown below. The output is converted back to a list format using tolist() method.

myFinalList = np.unique(my_list).tolist()

Step 3) Finally print the list as shown below:

print(myFinalList)

The final code with output is as follows:

import numpy as np
my_list = [1,2,2,3,1,4,5,1,2,6]
myFinalList = np.unique(my_list).tolist()
print(myFinalList)

Output:

[1, 2, 3, 4, 5, 6]

Remove duplicates from list using Pandas methods

The Pandas module has a unique() method that will give us the unique elements from the list given.

To work with Pandas module, you need to follow these steps:

Step 1) Import Pandas module

import pandas as pd

Step 2) Use your list with duplicates inside unique() method as shown below:

myFinalList = pd.unique(my_list).tolist()

Step 3) Print the list as shown below:

print(myFinalList)

The final code with output is as follows:

import pandas as pd

my_list = [1,2,2,3,1,4,5,1,2,6]
myFinalList = pd.unique(my_list).tolist()
print(myFinalList)

Output:

[1, 2, 3, 4, 5, 6]

Remove duplicates using enumerate() and list comprehension

Here the combination of list comprehension and enumerate to remove the duplicate elements. Enumerate returns an object with a counter to each element in the list. For example (0,1), (1,2) etc. Here the first value is the index, and the second value is the list item. W

Each element is checked if it exists in the list, and if it does, it is removed from the list.

my_list = [1,2,2,3,1,4,5,1,2,6]
my_finallist = [i for j, i in enumerate(my_list) if i not in my_list[:j]] 
print(list(my_finallist))

Output:

[1, 2, 3, 4, 5, 6]

Summary

To remove the duplicates from a list, you can make use of the built-in function set(). The specialty of the set() method is that it returns distinct elements.
You can remove duplicates from the given list by importing OrderedDictfrom collections. It is available from python2.7 onwards. OrderedDictdict takes care of returning you the distinct elements in an order in which the key is present.
You can make use of a for-loop that we will traverse the list of items to remove duplicates.
The method unique() from Numpy module can help us remove duplicate from the list given.
The Pandas module has a unique() method that will give us the unique elements from the list given.
The combination of list comprehension and enumerate is used to remove the duplicate elements from the list. Enumerate returns an object with a counter to each element in the list.

How do I remove duplicates from a list?

To remove the duplicates from a list of lists: Use a list comprehension to convert each nested list to a tuple. Convert the list of tuples to a set to remove the duplicates. Use a list comprehension to convert the set to a list of lists.