Rectangle 27 139

If you want to remove specific punctuation from a string, it will probably be best to explicitly remove exactly what you want like

replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,"")

Doing the above still doesn't return the string as you have specified it. If you want to remove any extra spaces that were left over from removing crazy punctuation, then you are going to want to do something like

replace(/\s{2,}/g," ");
var s = "This., -/ is #! an $ % ^ & * example ;: {} of a = -_ string with `~)() punctuation";
var punctuationless = s.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,"");
var finalString = punctuationless.replace(/\s{2,}/g," ");

Results of running code in firebug console:

Curly braces in regex apply a quantifier to the preceding, so in this case it's replacing between 2 and 100 whitespace characters (\s) with a single space. If you want to collapse any number of whitespace characters down to one, you would leave off the upper limit like so: replace(/\s{2,}/g, ' ').

I've added a few more chars to list of punctuation replaced (@+?><[]+): replace(/[\.,-\/#!$%\^&\*;:{}=\-_`~()@\+\?><\[\]\+]/g, ''). If anyone is looking for a yet-slightly-more-complete set.

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
replace(/['!"#$%&\\'()\*+,\-\.\/:;<=>?@\[\\\]\^_`{|}~']/g,"");

@AntoineLize I agree that it's misleading. Updated the answer. Thanks.

I've tried with "it?" - doesn't work for me (regex101.com/r/F4j5Qc/1), the right solution is: /[.,\/#!$%\^&*;:{}=\-_`~()\?]/g

How can I strip all punctuation from a string in JavaScript using rege...

javascript regex
Rectangle 27 133

If you want to remove specific punctuation from a string, it will probably be best to explicitly remove exactly what you want like

replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,"")

Doing the above still doesn't return the string as you have specified it. If you want to remove any extra spaces that were left over from removing crazy punctuation, then you are going to want to do something like

replace(/\s{2,}/g," ");
var s = "This., -/ is #! an $ % ^ & * example ;: {} of a = -_ string with `~)() punctuation";
var punctuationless = s.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,"");
var finalString = punctuationless.replace(/\s{2,}/g," ");

Results of running code in firebug console:

Curly braces in regex apply a quantifier to the preceding, so in this case it's replacing between 2 and 100 whitespace characters (\s) with a single space. If you want to collapse any number of whitespace characters down to one, you would leave off the upper limit like so: replace(/\s{2,}/g, ' ').

I've added a few more chars to list of punctuation replaced (@+?><[]+): replace(/[\.,-\/#!$%\^&\*;:{}=\-_`~()@\+\?><\[\]\+]/g, ''). If anyone is looking for a yet-slightly-more-complete set.

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
replace(/['!"#$%&\\'()\*+,\-\.\/:;<=>?@\[\\\]\^_`{|}~']/g,"");

@AntoineLize I agree that it's misleading. Updated the answer. Thanks.

I've tried with "it?" - doesn't work for me (regex101.com/r/F4j5Qc/1), the right solution is: /[.,\/#!$%\^&*;:{}=\-_`~()\?]/g

How can I strip all punctuation from a string in JavaScript using rege...

javascript regex
Rectangle 27 19

For the convenience of usage, I sum up the note of striping punctuation from a string in both Python2 and Python3. Please refer to other answers for the detailed description.

import string 

s = "string. With. Punctuation?"
table = string.maketrans("","") 
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation
import string 

s = "string. With. Punctuation?"
table = str.maketrans({key: None for key in string.punctuation})
new_s = s.translate(table)                          # Output: string without punctuation

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 19

For the convenience of usage, I sum up the note of striping punctuation from a string in both Python2 and Python3. Please refer to other answers for the detailed description.

import string 

s = "string. With. Punctuation?"
table = string.maketrans("","") 
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation
import string 

s = "string. With. Punctuation?"
table = str.maketrans({key: None for key in string.punctuation})
new_s = s.translate(table)                          # Output: string without punctuation

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 19

For the convenience of usage, I sum up the note of striping punctuation from a string in both Python2 and Python3. Please refer to other answers for the detailed description.

import string 

s = "string. With. Punctuation?"
table = string.maketrans("","") 
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation
import string 

s = "string. With. Punctuation?"
table = str.maketrans({key: None for key in string.punctuation})
new_s = s.translate(table)                          # Output: string without punctuation

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 9

For Python 3 str or Python 2 unicode values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None is removed.

import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)

The dict.fromkeys() class method makes it trivial to create the mapping, setting all values to None based on the sequence of keys.

To remove all punctuation, not just ASCII punctuation, your table needs to be a little bigger; see J.F. Sebastian's answer (Python 3 version):

import unicodedata
import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                 if unicodedata.category(chr(i)).startswith('P'))
string.punctuation

@J.F.Sebastian: indeed, my answer was just using the same characters as the top-voted one. Added a Python 3 version of your table.

@J.F.Sebastian: it works for Unicode strings. It strips ASCII punctuation. I never claimed it strips all punctuation. :-) The point was to provide the correct technique for unicode objects vs. Python 2 str objects.

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 9

For Python 3 str or Python 2 unicode values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None is removed.

import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)

The dict.fromkeys() class method makes it trivial to create the mapping, setting all values to None based on the sequence of keys.

To remove all punctuation, not just ASCII punctuation, your table needs to be a little bigger; see J.F. Sebastian's answer (Python 3 version):

import unicodedata
import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                 if unicodedata.category(chr(i)).startswith('P'))
string.punctuation

@J.F.Sebastian: indeed, my answer was just using the same characters as the top-voted one. Added a Python 3 version of your table.

@J.F.Sebastian: it works for Unicode strings. It strips ASCII punctuation. I never claimed it strips all punctuation. :-) The point was to provide the correct technique for unicode objects vs. Python 2 str objects.

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 9

For Python 3 str or Python 2 unicode values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None is removed.

import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)

The dict.fromkeys() class method makes it trivial to create the mapping, setting all values to None based on the sequence of keys.

To remove all punctuation, not just ASCII punctuation, your table needs to be a little bigger; see J.F. Sebastian's answer (Python 3 version):

import unicodedata
import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                 if unicodedata.category(chr(i)).startswith('P'))
string.punctuation

@J.F.Sebastian: indeed, my answer was just using the same characters as the top-voted one. Added a Python 3 version of your table.

@J.F.Sebastian: it works for Unicode strings. It strips ASCII punctuation. I never claimed it strips all punctuation. :-) The point was to provide the correct technique for unicode objects vs. Python 2 str objects.

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 548

From an efficiency perspective, you're not going to beat

s.translate(None, string.punctuation)

It's performing raw string operations in C with a lookup table - there's not much that will beat that but writing your own C code.

If speed isn't a worry, another option though is:

exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)

This is faster than s.replace with each char, but won't perform as well as non-pure python approaches such as regexes or string.translate, as you can see from the below timings. For this type of problem, doing it at as low a level as possible pays off.

import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):
    return ''.join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko's solution, with fix.
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s

print "sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print "regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print "replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

This gives the following results:

sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

Thanks for the timing info, I was thinking about doing something like that myself, but yours is better written than anything I would have done and now I can use it as a template for any future timing code I want to write:).

Great answer. You can simplify it by removing the table. The docs say: "set the table argument to None for translations that only delete characters" (docs.python.org/library/stdtypes.html#str.translate)

table = string.maketrans("","")
table = str.maketrans({key: None for key in string.punctuation})

@mlissner - efficiency. It it's a list/string, you need to do a linear scan to find out whether the letter is in the string. With a set or dictionary though, it'll generally be faster (except for really small strings) since it doesn't have to check every value.

To update the discussion, as of Python 3.6, regex is now the most efficient method! It is almost 2x faster than translate. Also, sets and replace are no longer so bad! They are both improved by over a factor of 4 :)

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 548

From an efficiency perspective, you're not going to beat

s.translate(None, string.punctuation)

It's performing raw string operations in C with a lookup table - there's not much that will beat that but writing your own C code.

If speed isn't a worry, another option though is:

exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)

This is faster than s.replace with each char, but won't perform as well as non-pure python approaches such as regexes or string.translate, as you can see from the below timings. For this type of problem, doing it at as low a level as possible pays off.

import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):
    return ''.join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko's solution, with fix.
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s

print "sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print "regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print "replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

This gives the following results:

sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

Thanks for the timing info, I was thinking about doing something like that myself, but yours is better written than anything I would have done and now I can use it as a template for any future timing code I want to write:).

Great answer. You can simplify it by removing the table. The docs say: "set the table argument to None for translations that only delete characters" (docs.python.org/library/stdtypes.html#str.translate)

table = string.maketrans("","")
table = str.maketrans({key: None for key in string.punctuation})

@mlissner - efficiency. It it's a list/string, you need to do a linear scan to find out whether the letter is in the string. With a set or dictionary though, it'll generally be faster (except for really small strings) since it doesn't have to check every value.

To update the discussion, as of Python 3.6, regex is now the most efficient method! It is almost 2x faster than translate. Also, sets and replace are no longer so bad! They are both improved by over a factor of 4 :)

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 539

From an efficiency perspective, you're not going to beat

s.translate(None, string.punctuation)

It's performing raw string operations in C with a lookup table - there's not much that will beat that but writing your own C code.

If speed isn't a worry, another option though is:

exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)

This is faster than s.replace with each char, but won't perform as well as non-pure python approaches such as regexes or string.translate, as you can see from the below timings. For this type of problem, doing it at as low a level as possible pays off.

import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):
    return ''.join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko's solution, with fix.
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s

print "sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print "regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print "replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

This gives the following results:

sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

Thanks for the timing info, I was thinking about doing something like that myself, but yours is better written than anything I would have done and now I can use it as a template for any future timing code I want to write:).

Great answer. You can simplify it by removing the table. The docs say: "set the table argument to None for translations that only delete characters" (docs.python.org/library/stdtypes.html#str.translate)

Using a list comprehension for the ''.join() would make it a little faster, but not fast enough to beat the regex or translate. See list comprehension without [ ], Python for why that is so.

table = string.maketrans("","")
table = str.maketrans({key: None for key in string.punctuation})

@mlissner - efficiency. It it's a list/string, you need to do a linear scan to find out whether the letter is in the string. With a set or dictionary though, it'll generally be faster (except for really small strings) since it doesn't have to check every value.

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 3

This question is over 6 years old, but I'd figured I chime in with a function I wrote. It's not very efficient, but it is simple and you can add or remove any punctuation that you desire:

def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 3

This question is over 6 years old, but I'd figured I chime in with a function I wrote. It's not very efficient, but it is simple and you can add or remove any punctuation that you desire:

def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 3

This question is over 6 years old, but I'd figured I chime in with a function I wrote. It's not very efficient, but it is simple and you can add or remove any punctuation that you desire:

def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 16

Not necessarily simpler, but a different way, if you are more familiar with the re family.

import re, string
s = "string. With. Punctuation?" # Sample string 
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

Works because string.punctuation has the sequence ,-. in proper, ascending, no-gaps, ASCII order. While Python has this right, when you try to use a subset of string.punctuation, it can be a show-stopper because of the surprise "-".

Actually, its still wrong. The sequence "\]" gets treated as an escape (coincidentally not closing the ] so bypassing another failure), but leaves \ unescaped. You should use re.escape(string.punctuation) to prevent this.

Yes, I omitted it because it worked for the example to keep things simple, but you are right that it should be incorporated.

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 16

Not necessarily simpler, but a different way, if you are more familiar with the re family.

import re, string
s = "string. With. Punctuation?" # Sample string 
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

Works because string.punctuation has the sequence ,-. in proper, ascending, no-gaps, ASCII order. While Python has this right, when you try to use a subset of string.punctuation, it can be a show-stopper because of the surprise "-".

Actually, its still wrong. The sequence "\]" gets treated as an escape (coincidentally not closing the ] so bypassing another failure), but leaves \ unescaped. You should use re.escape(string.punctuation) to prevent this.

Yes, I omitted it because it worked for the example to keep things simple, but you are right that it should be incorporated.

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 16

Not necessarily simpler, but a different way, if you are more familiar with the re family.

import re, string
s = "string. With. Punctuation?" # Sample string 
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

Works because string.punctuation has the sequence ,-. in proper, ascending, no-gaps, ASCII order. While Python has this right, when you try to use a subset of string.punctuation, it can be a show-stopper because of the surprise "-".

Actually, its still wrong. The sequence "\]" gets treated as an escape (coincidentally not closing the ] so bypassing another failure), but leaves \ unescaped. You should use re.escape(string.punctuation) to prevent this.

Yes, I omitted it because it worked for the example to keep things simple, but you are right that it should be incorporated.

Best way to strip punctuation from a string in Python - Stack Overflow

python string punctuation
Rectangle 27 5

"This., -/ is #! an $ % ^ & * example ;: {} of a = -_ string with `~)() punctuation".replace( /[^a-zA-Z ]/g, '').replace( /\s\s+/g, ' ' )

Be aware that if you support UTF-8 and characters like chinese/russian and all, this will replace them as well, so you really have to specify what you want.

How can I strip all punctuation from a string in JavaScript using rege...

javascript regex
Rectangle 27 4

"This., -/ is #! an $ % ^ & * example ;: {} of a = -_ string with `~)() punctuation".replace( /[^a-zA-Z ]/g, '').replace( /\s\s+/g, ' ' )

Be aware that if you support UTF-8 and characters like chinese/russian and all, this will replace them as well, so you really have to specify what you want.

How can I strip all punctuation from a string in JavaScript using rege...

javascript regex
Rectangle 27 40

Here are the standard punctuation characters for US-ASCII: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

For Unicode punctuation (such as curly quotes, em-dashes, etc), you can easily match on specific block ranges. The General Punctuation block is \u2000-\u206F, and the Supplemental Punctuation block is \u2E00-\u2E7F.

Put together, and properly escaped, you get the following RegExp:

/[\u2000-\u206F\u2E00-\u2E7F\\'!"#$%&()*+,\-.\/:;<=>?@\[\]^_`{|}~]/
var punctRE = /[\u2000-\u206F\u2E00-\u2E7F\\'!"#$%&()*+,\-.\/:;<=>?@\[\]^_`{|}~]/g;
var spaceRE = /\s+/g;
var str = "This, -/ is #! an $ % ^ & * example ;: {} of a = -_ string with `~)() punctuation";
str.replace(punctRE, '').replace(spaceRE, ' ');

>> "This is an example of a string with punctuation"

For Unicode punctuation, the blocks are not enough. You have to look at the general category Punctuation, and you will see that not all punctuations are nicely located in those blocks. There are many familiar punctuations inside Latin blocks, for example.

How can I strip all punctuation from a string in JavaScript using rege...

javascript regex