How do you print repeated letters in a string in python?

Question

In this tutorial, we are going to learn how to find all duplicate values in a string. We can do it in different ways in Python. Let's explore them one by one.

Nội dung chính Show

Scratch Programs
Grand Performance Comparison
Don't reinvent the wheel
¿Dictionary, comprende?
Exceptional code
Playing catch-up
Use the right tool for the job
Practicality beats purity (except when it's not really practical)
And the winner is...
That's all folks
UPDATE 2: Large strings
Final verdict: Use collections.Counter unless you cannot or don't want to :)
Appendix: NumPy
Appendix 2: A somewhat useful plot
How do you print a repeated character in a string in Python?
How do I print a repeated character in a string?
How do you find repeating letters in a string?
How do you copy a letter in a string in Python?

The aim of the program we are going to write is to find the duplicate characters present in a string. For example, we have a string tutorialspoint the program will give us t o i as output. In simple words, we have to find characters whose count is greater than one in the string. Let's see.

Scratch Programs

Writing programs without using any modules. We can use different methods of Python to achieve our goal. First, we will find the duplicate characters of a string using the count method. Let's see the procedure first.

Initialize a string.
Initialize an empty list
Loop over the string.
- Check whether the char frequency is greater than one or not using the count method.

If greater than one check whether it's present in the list or not.
If not present append to the list

Print the characters

Example

## initializing string
string = "tutorialspoint"
## initializing a list to append all the duplicate characters
duplicates = []
for char in string:
   ## checking whether the character have a duplicate or not
   ## str.count(char) returns the frequency of a char in the str
   if string.count(char) > 1:
   ## appending to the list if it's already not present
   if char not in duplicates:
   duplicates.append(char)
print(*duplicates)

If you run the above program, you will get the following results.

Output

t o i

Now we will find the duplicate characters of string without any methods. We are going to use the dictionary data structure to get the desired output. Let's see the procedure first.

Initialize a string.
Initialize an empty dictionary
Loop over the string.
- Check whether the char is already present in the dictionary or not
- Initialize the count of the char to 1

Increase the count

Example

## initializing string
string = "tutorialspoint"
## initializing a dictionary
duplicates = {}
for char in string:
   ## checking whether the char is already present in dictionary or not
   if char in duplicates:
      ## increasing count if present
      duplicates[char] += 1
   else:
      ## initializing count to 1 if not present
      duplicates[char] = 1
for key, value in duplicates.items():
   if value > 1:
      print(key, end = " ")
print()

If you run the above program,

Output

t o i

Updated on 27-Aug-2019 12:37:28

Related Questions & Answers
Java program to find all duplicate characters in a string
Find All Duplicate Characters from a String using Python
Java Program to find duplicate characters in a String?
Java Program to Find the Duplicate Characters in a String
Program to find string after removing consecutive duplicate characters in Python
Program to find string after deleting k consecutive duplicate characters in python
Program to remove duplicate characters from a given string in Python
Python Program to find mirror characters in a string
C# Program to remove duplicate characters from String
Java program to delete duplicate characters from a given String
Python program to check if a string contains all unique characters
Find the smallest window in a string containing all characters of another string in Python
How to print duplicate characters in a String using C#?
C++ Program to Remove all Characters in a String Except Alphabets
Python program to Mark duplicate elements in string

Grand Performance Comparison

Scroll to the end for a TL;DR graph

Since I had "nothing better to do" (understand: I had just a lot of work), I decided to do a little performance contest. I assembled the most sensible or interesting answers and did some simple timeit in CPython 3.5.1 on them. I tested them with only one string, which is a typical input in my case:

>>> s = 'ZDXMZKMXFDKXZFKZ'
>>> len(s)
16

Be aware that results might vary for different inputs, be it different length of the string or different number of distinct characters, or different average number of occurrences per character.

Don't reinvent the wheel

Python has made it simple for us. The collections.Counter class does exactly what we want and a lot more. Its usage is by far the simplest of all the methods mentioned here.

taken from @oefe, nice find

>>> timeit('Counter(s)', globals=locals())
8.208566107001388

Counter goes the extra mile, which is why it takes so long.

¿Dictionary, comprende?

Let's try using a simple dict instead. First, let's do it declaratively, using dict comprehension.

I came up with this myself...

>>> timeit('{c: s.count(c) for c in s}', globals=locals())
4.551155784000002

This will go through s from beginning to end, and for each character it will count the number of its occurrences in s. Since s contains duplicate characters, the above method searches s several times for the same character. The result is naturally always the same. So let's count the number of occurrences just once for each character.

I came up with this myself, and so did @IrshadBhat

>>> timeit('{c: s.count(c) for c in set(s)}', globals=locals())
3.1484066140001232

Better. But we still have to search through the string to count the occurrences. One search for each distinct character. That means we're going to read the string more than once. We can do better than that! But for that, we have to get off our declarativist high horse and descend into an imperative mindset.

Exceptional code

AKA Gotta catch 'em all!

inspired by @anthony

>>> timeit('''
... d = {}
... for c in s:
...   try:
...     d[c] += 1
...   except KeyError:
...     d[c] = 1
... ''', globals=locals())
3.7060273620008957

Well, it was worth a try. If you dig into the Python source (I can't say with certainty because I have never really done that), you will probably find that when you do except ExceptionType, Python has to check whether the exception raised is actually of ExceptionType or some other type. Just for the heck of it, let's see how long will it take if we omit that check and catch all exceptions.

made by @anthony

>>> timeit('''
... d = {}
... for c in s:
...   try:
...     d[c] += 1
...   except:
...     d[c] = 1
... ''', globals=locals())
3.3506563019982423

It does save some time, so one might be tempted to use this as some sort of optimization.
Don't do that! Or actually do. Do it now:

INTERLUDE 1

import time
while True:
  try:
    time.sleep(1)
  except:
    print("You're trapped in your own trap!")

You see? It catches KeyboardInterrupt, besides other things. In fact, it catches all the exceptions there are. Including ones you might not have even heard about, like SystemExit.

INTERLUDE 2

import sys
try:
  print("Goodbye. I'm going to die soon.")
  sys.exit()
except:
  print('BACK FROM THE DEAD!!!')

Now back to counting letters and numbers and other characters.

Playing catch-up

Exceptions aren't the way to go. You have to try hard to catch up with them, and when you finally do, they just throw up on you and then raise their eyebrows like it's your fault. Luckily brave fellows have paved our way so we can do away with exceptions, at least in this little exercise.

The dict class has a nice method – get – which allows us to retrieve an item from a dictionary, just like d[k]. Except when the key k is not in the dictionary, it can return a default value. Let's use that method instead of fiddling with exceptions.

credit goes to @Usman

>>> timeit('''
... d = {}
... for c in s:
...   d[c] = d.get(c, 0) + 1
... ''', globals=locals())
3.2133633289995487

Almost as fast as the set-based dict comprehension. On larger inputs, this one would probably be even faster.

Use the right tool for the job

For at least mildly knowledgeable Python programmer, the first thing that comes to mind is probably defaultdict. It does pretty much the same thing as the version above, except instead of a value, you give it a value factory. That might cause some overhead, because the value has to be "constructed" for each missing key individually. Let's see how it performs.

hope @AlexMartelli won't crucify me for from collections import defaultdict

>>> timeit('''
... dd = defaultdict(int)
... for c in s:
...   dd[c] += 1
... ''', globals=locals())
3.3430528169992613

Not that bad. I'd say the increase in execution time is a small tax to pay for the improved readability. However, we also favor performance, and we will not stop here. Let's take it further and prepopulate the dictionary with zeros. Then we won't have to check every time if the item is already there.

hats off to @sqram

>>> timeit('''
... d = dict.fromkeys(s, 0)
... for c in s:
...   d[c] += 1
... ''', globals=locals())
2.6081761489986093

That's good. Over three times as fast as Counter, yet still simple enough. Personally, this is my favorite in case you don't want to add new characters later. And even if you do, you can still do it. It's just less convenient than it would be in other versions:

d.update({ c: 0 for c in set(other_string) - d.keys() })

Practicality beats purity (except when it's not really practical)

Now a bit different kind of counter. @IdanK has come up with something interesting. Instead of using a hash table (a.k.a. dictionary a.k.a. dict), we can avoid the risk of hash collisions and consequent overhead of their resolution. We can also avoid the overhead of hashing the key, and the extra unoccupied table space. We can use a list. The ASCII values of characters will be indices and their counts will be values. As @IdanK has pointed out, this list gives us constant time access to a character's count. All we have to do is convert each character from str to int using the built-in function ord. That will give us an index into the list, which we will then use to increment the count of the character. So what we do is this: we initialize the list with zeros, do the job, and then convert the list into a dict. This dict will only contain those characters which have non-zero counts, in order to make it compliant with other versions.

As a side note, this technique is used in a linear-time sorting algorithm known as count sort or counting sort. It's very efficient, but the range of values being sorted is limited, since each value has to have its own counter. To sort a sequence of 32-bit integers, 4.3 billion counters would be needed.

>>> timeit('''
... counts = [0 for _ in range(256)]
... for c in s:
...   counts[ord(c)] += 1
... d = {chr(i): count for i,count in enumerate(counts) if count != 0}
... ''', globals=locals())
25.438595562001865

Ouch! Not cool! Let's try and see how long it takes when we omit building the dictionary.

>>> timeit('''
... counts = [0 for _ in range(256)]
... for c in s:
...   counts[ord(c)] += 1
... ''', globals=locals())
10.564866792999965

Still bad. But wait, what's [0 for _ in range(256)]? Can't we write it more simply? How about [0] * 256? That's cleaner. But will it perform better?

>>> timeit('''
... counts = [0] * 256
... for c in s:
...   counts[ord(c)] += 1
... ''', globals=locals())
3.290163638001104

Considerably. Now let's put the dictionary back in.

>>> timeit('''
... counts = [0] * 256
... for c in s:
...   counts[ord(c)] += 1
... d = {chr(i): count for i,count in enumerate(counts) if count != 0}
... ''', globals=locals())
18.000623562998953

Almost six times slower. Why does it take so long? Because when we enumerate(counts), we have to check every one of the 256 counts and see if it's zero. But we already know which counts are zero and which are not.

>>> timeit('''
... counts = [0] * 256
... for c in s:
...   counts[ord(c)] += 1
... d = {c: counts[ord(c)] for c in set(s)}
... ''', globals=locals())
5.826531438000529

It probably won't get much better than that, at least not for such a small input. Plus it's only usable for 8-bit EASCII characters. О блять!

And the winner is...

>>> timeit('''
... d = {}
... for c in s:
...   if c in d:
...     d[c] += 1
...   else:
...     d[c] = 1
... ''', globals=locals())
1.8509794599995075

Yep. Even if you have to check every time whether c is in d, for this input it's the fastest way. No pre-population of d will make it faster (again, for this input). It's a lot more verbose than Counter or defaultdict, but also more efficient.

That's all folks

This little exercise teaches us a lesson: when optimizing, always measure performance, ideally with your expected inputs. Optimize for the common case. Don't presume something is actually more efficient just because its asymptotic complexity is lower. And last but not least, keep readability in mind. Try to find a compromise between "computer-friendly" and "human-friendly".

UPDATE

I have been informed by @MartijnPieters of the function collections._count_elements available in Python 3.

Help on built-in function _count_elements in module _collections:

_count_elements(...)
    _count_elements(mapping, iterable) -> None

    Count elements in the iterable, updating the mappping

This function is implemented in C, so it should be faster, but this extra performance comes at a price. The price is incompatibility with Python 2 and possibly even future versions, since we're using a private function.

From the documentation:

[...] a name prefixed with an underscore (e.g. _spam) should be treated as a non-public part of the API (whether it is a function, a method or a data member). It should be considered an implementation detail and subject to change without notice.

That said, if you still want to save those 620 nanoseconds per iteration:

>>> timeit('''
... d = {}
... _count_elements(d, s)
... ''', globals=locals())
1.229239897998923

UPDATE 2: Large strings

I thought it might be a good idea to re-run the tests on some larger input, since a 16 character string is such a small input that all the possible solutions were quite comparably fast (1,000 iterations in under 30 milliseconds).

I decided to use the complete works of Shakespeare as a testing corpus, which turned out to be quite a challenge (since it's over 5MiB in size 😅). I just used the first 100,000 characters of it, and I had to limit the number of iterations from 1,000,000 to 1,000.

import urllib.request
url = 'https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt'
s = urllib.request.urlopen(url).read(100_000)

collections.Counter was really slow on a small input, but the tables have turned

Counter(s)

=> 7.63926783799991

Naïve Θ(n2) time dictionary comprehension simply doesn't work

{c: s.count(c) for c in s}

=> 15347.603935000052s (tested on 10 iterations; adjusted for 1000)

Smart Θ(n) time dictionary comprehension works fine

{c: s.count(c) for c in set(s)}

=> 8.882608592999986

Exceptions are clumsy and slow

d = {}
for c in s:
  try:
    d[c] += 1
  except KeyError:
    d[c] = 1

=> 21.26615508399982

Omitting the exception type check doesn't save time (since the exception is only thrown a few times)

d = {}
for c in s:
  try:
    d[c] += 1
  except:
    d[c] = 1

=> 21.943328911999743

dict.get looks nice but runs slow

d = {}
for c in s:
  d[c] = d.get(c, 0) + 1

=> 28.530086210000007

collections.defaultdict isn't very fast either

dd = defaultdict(int)
for c in s:
  dd[c] += 1

=> 19.43012963199999

dict.fromkeys requires reading the (very long) string twice

d = dict.fromkeys(s, 0)
for c in s:
  d[c] += 1

=> 22.70960557699999

Using list instead of dict is neither nice nor fast

counts = [0 for _ in range(256)]
for c in s:
  counts[ord(c)] += 1

d = {chr(i): count for i,count in enumerate(counts) if count != 0}

=> 26.535474792000002

Leaving out the final conversion to dict doesn't help

counts = [0 for _ in range(256)]
for c in s:
  counts[ord(c)] += 1

=> 26.27811567400005

It doesn't matter how you construct the list, since it's not the bottleneck

counts = [0] * 256
for c in s:
  counts[ord(c)] += 1

=> 25.863524940000048

counts = [0] * 256
for c in s:
  counts[ord(c)] += 1

d = {chr(i): count for i,count in enumerate(counts) if count != 0}

=> 26.416733378000004

If you convert list to dict the "smart" way, it's even slower (since you iterate over the string twice)

counts = [0] * 256
for c in s:
  counts[ord(c)] += 1

d = {c: counts[ord(c)] for c in set(s)}

=> 29.492915620000076

The dict.__contains__ variant may be fast for small strings, but not so much for big ones

d = {}
for c in s:
  if c in d:
    d[c] += 1
  else:
    d[c] = 1

=> 23.773295123000025

collections._count_elements is about as fast as collections.Counter (which uses _count_elements internally)

d = {}
_count_elements(d, s)

=> 7.5814381919999505

Final verdict: Use collections.Counter unless you cannot or don't want to :)

Appendix: NumPy

The numpy package provides a method numpy.unique which accomplishes (almost) precisely what we want.

The way this method works is very different from all the above methods:

It first sorts a copy of the input using Quicksort, which is an O(n2) time operation in the worst case, albeit O(n log n) on average and O(n) in the best case.
Then it creates a "mask" array containing True at indices where a run of the same values begins, viz. at indices where the value differs from the previous value. Repeated values produce False in the mask. Example: [5,5,5,8,9,9] produces a mask [True, False, False, True, True, False].
This mask is then used to extract the unique values from the sorted input ‒ unique_chars in the code below. In our example, they would be [5, 8, 9].
Positions of the True values in the mask are taken into an array, and the length of the input is appended at the end of this array. For the above example, this array would be [0, 3, 4, 6].
For this array, differences between its elements are calculated, eg. [3, 1, 2]. These are the respective counts of the elements in the sorted array ‒ char_counts in the code below.
Finally, we create a dictionary by zipping unique_chars and char_counts: {5: 3, 8: 1, 9: 2}.

import numpy as np

def count_chars(s):
  # The following statement needs to be changed for different input types.
  # Our input `s` is actually of type `bytes`, so we use `np.frombuffer`.
  # For inputs of type `str`, change `np.frombuffer` to `np.fromstring`
  #  or transform the input into a `bytes` instance.
  arr = np.frombuffer(s, dtype=np.uint8)

  unique_chars, char_counts = np.unique(arr, return_counts=True)

  return dict(zip(unique_chars, char_counts))

For the test input (first 100,000 characters of the complete works of Shakespeare), this method performs better than any other tested here. But note that on a different input, this approach might yield worse performance than the other methods. Pre-sortedness of the input and number of repetitions per element are important factors affecting the performance.

count_chars(s)

=> 2.960809530000006

If you are thinking about using this method because it's over twice as fast as collections.Counter, consider this:

collections.Counter has linear time complexity. numpy.unique is linear at best, quadratic at worst.
The speedup is not really that significant ‒ you save ~3.5 milliseconds per iteration on an input of length 100,000.
Using numpy.unique obviously requires numpy.

That considered, it seems reasonable to use Counter unless you need to be really fast. And in that case, you better know what you're doing or else you'll end up being slower with numpy than without it.

Appendix 2: A somewhat useful plot

I ran the 13 different methods above on prefixes of the complete works of Shakespeare and made an interactive plot. Note that in the plot, both prefixes and durations are displayed in logarithmic scale (the used prefixes are of exponentially increasing length). Click on the items in the legend to show/hide them in the plot.

Click to open!

How do you print a repeated character in a string in Python?

Method 2:.

Define a function which will take a word, m, n values as arguments..

if M is greater than length of word. set m value equal to length of word..

Now store the characters needed to be repeated into a string named repeat_string using slicing..

Multiply the repeat_string with n..

Now print the string..

How do I print a repeated character in a string?

out. println("The string is:" + str); The duplicate characters are found in the string using a nested for loop. Then these characters are displayed.

How do you find repeating letters in a string?

Simple Solution: The solution is to run two nested loops..

Copy the given array to an auxiliary array temp[]..

Sort the temp array using a O(N log N) time sorting algorithm..

Scan the input array from left to right. For every element, count its occurrences in temp[] using binary search..

How do you copy a letter in a string in Python?

You can use dup1 += char twice in a row instead of dup1 += char + char in the for block, if you prefer it, or possibly if you want to modify the string between duplication.

programming python