Strip MS HTML from Enhanced Text Fiel

Apr 6, 2012 at 2:36 PM

Hey all,

Has anyone figured out a way to strip out all of the extra HTML tags that SharePoint adds to Rich Text or Enhanced Text fields?  When I retrieve the contents of one of these fields via getListItems, they always come back with a bunch of extra <div>'s or <br>'s.  Also, SharePoint adds a bunch of class="ExternalClass.......".

I keep adding regex, replace, and unwrap functions to clean it up, but I'm thinking there's got to be a better way (or at least a better regex).

 

Thanks,

Jeremy 

Apr 6, 2012 at 3:20 PM

Yeah regex just won't cut it when working with XML/HTML... Fortunately you can jQuer-ify the string and run standard jQuery functions against it.  Check out this quick fiddle I did for some inspiration:

http://jsfiddle.net/iOnline247/zPY4y/

 

Cheers,

Matt

Apr 6, 2012 at 3:21 PM

Jeremy,

Are you attempting to strip out all HTML and only get the plain text?

If so, the best way (if you have control over the list) is to set that field to not be rich text... The other alternative, if you ar working client side with SPServices (which I assume you are) is to use jQuery to get only the plain text out of the markup... the following should work:

Assuming you are looping through a set of rows (z:row) and the column name is 'richTextDescription', here is the jquery code:

var plainText = $.trim($($(this).attr("ows_richTextDescription")).text());
Paul

Apr 6, 2012 at 3:22 PM
Matt: looks like we both replied at the same time... (slow day at work) :)

_________
Paul T

Apr 6, 2012 at 4:03 PM

[You, Me].rock();

;-)

Apr 6, 2012 at 4:39 PM

Thanks for the quick response guys!

Matt - That looks a lot like what I have been doing to pick out specific html elements from the xml.  The only problem is that I am working strictly client side and have no idea what the xml response will look like until it gets pulled by the web page.

Paul - Your option is good too, but I am trying to preserve some of the formatting (headings, lists, links, paragraphs, etc.) beyond just the plain text.

Here's a little more about the situation:

Users are creating summaries of the list item in an enhanced text field.  Mostly they are using bold headings, paragraphs, hyperlinks, and the occasional un-ordered list.  A simple summary consisting of 5 headings with a couple paragraphs each, 1 hyperlink, and 1 list, turns into 37<div>'s with not a <p> tag in sight.  It looks like SP is turning each line that has a break after it and each paragraph into a separate <div>.

 

Again, thanks for the help!  I was hoping I was just missing something simple :)

Apr 6, 2012 at 6:34 PM

Gotcha...

Take a look at the fiddle once again... I've updated it with something you may be able to do...  Might be tough to pull off though since you don't really know the HTML you are working with.

Let us know if you get this working, it's pretty interesting!

Cheers,

Matt

Apr 6, 2012 at 7:13 PM

That looks like a good way to filter the results. Thanks!! Now I think I can just write some logic to check for some of the empty or unnecessary extra elements.  I'll let you know when I get it working.

-Jeremy