My input is plain text string and requirement is to remove all html tags except few specific tags like: Show
If these specific tags have attributes like A few examples:
I have gone through this Remove HTML tags from a String but it does not answer my question completely. Can it be handled by a set of regex's or could I make use of some library?
asked Aug 11, 2011 at 10:00
RandomQuestionRandomQuestion 6,62616 gold badges58 silver badges95 bronze badges 2 I tried JSoup and It seems to be able to handle all such cases. Here is example code.
For input string
I get following output which is pretty much I require.
answered Aug 11, 2011 at 19:32
RandomQuestionRandomQuestion 6,62616 gold badges58 silver badges95 bronze badges 1 For simple HTML, this may be sufficient:
Hope that helps. answered Aug 11, 2011 at 10:55
3 Not the answer you're looking for? Browse other questions tagged java html or ask your own question.Remove HTML tags from String in Java example shows how to remove HTML tags from String in Java using a regular expression and Jsoup library. You can remove simple HTML tags from a string using a regular expression. Usually, HTML tags are enclosed in “<” and “>” brackets, so we are going to use the
Example
Output
The above regular expression worked fine except it did not handle the HTML entities like “ ” and “&”. Depending on the requirement, you can either replace them with the equivalent characters one by one or remove them using
Output
How to remove specific HTML tags from the String?What if you want to remove only a specific HTML tag from String? You can do that using regular expression too. Suppose you want to remove “a” tag from the String “<a href=’#’>HTML<b>Bold</b>link</a>”. You can use the
Output Let’s run some more tests to make sure that the pattern works.
Output
HTML is not a strict language. As you can see from the output, our pattern failed when an HTML tag was specified in the upper case or having multiple spaces. Let’s modify the pattern to “(?i)<[\\s]*[/]?[\\s]*a[^>]*>” to cover these scenarios.
Example
Output
Is it recommended to use a regular expression to remove HTML tags from String?The short answer is NO. Till now we have only seen happy scenarios. Consider below given example HTML string.
Output Our important text was removed by regular expression because HTML was not well-formed. It is very common to encounter such malformed HTML which cannot be taken care of by a regular expression. Consider another example.
Output What should I use to remove the HTML tags?If you are removing a tag or two from the string and you are absolutely certain that the input HTML is well-formed, using regular expression is OK. In all other scenarios, using HTML parser is the way to go. One such parser is Jsoup. Here is how you can remove the HTML elements from the string using Jsoup example.
Output The Jsoup library even allows you to whitelist elements in case you want to retain some tags while clearing all others. This example is a part of the Java String tutorial, Java RegEx Tutorial, and Jsoup Tutorial. Please let me know your views in the comments section below. About the authorMy name is RahimV and I have over 16 years of experience in designing and developing Java applications. Over the years I have worked with many fortune 500 companies as an eCommerce Architect. My goal is to provide high quality but simple to understand Java tutorials and examples for free. If you like my website, follow me on Facebook and Twitter. How do I remove text tags in HTML?Removing HTML Tags from Text. Press Ctrl+H. ... . Click the More button, if it is available. ... . Make sure the Use Wildcards check box is selected.. In the Find What box, enter the following: \<i\>([!<]@)\. In the Replace With box, enter the following: \1.. With the insertion point still in the Replace With box, press Ctrl+I once.. Is it possible to remove the HTML tags from data?Strip_tags() is a function that allows you to strip out all HTML and PHP tags from a given string (parameter one), however you can also use parameter two to specify a list of HTML tags you want.
How do I remove a specific HTML tag from a string in PHP?PHP provides an inbuilt function to remove the HTML tags from the data. The strip_tags() function is an inbuilt function in PHP that removes the strings form HTML, XML and PHP tags. It accepts two parameters. This function returns a string with all NULL bytes, HTML, and PHP tags stripped from a given $str.
Which function is used to remove all HTML tags from a string passed to a form?The strip_tags() function strips a string from HTML, XML, and PHP tags. Note: HTML comments are always stripped. This cannot be changed with the allow parameter.
|