KNOWLEDGE BASE

Umbraco.config not updating - Error Republishing: System.Xml.XmlException:, hexadecimal value 0x01, is an invalid character.


Recently we experienced issues where the Umbraco.Config file was not being updated with published content - this had a knock on effect within the API calls we were making to create new nodes if they did not already exist.

I noticed that after republishing the umbraco.config file that nodes were not updating the data within it, nor when this file was deleted, it was not getting recreated.

First port of call was to look at the UmbracoLog table. This is the central error reporting mechanism for any Internal Umbraco errors.

Fire up SQL Management Studio and run the following query

SELECT * from [umbracoLog] where logComment like '%Error Republishing: System.Xml.XmlException%'

You should then get a list of records similar to the screenshot below (this one is filtered to show just one record)

Note that we have publishing errors due to invalid characters in the XML stream - this should never happen but unfortunately Umbraco does not sanitise the input data when creating the XML.

The actual error displayed is:

Error Republishing: System.Xml.XmlException: '', hexadecimal value 0x01, is an invalid character. Line 4, position 5. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args) at System.Xml.XmlTextReaderImpl.ThrowInvalidChar(Char[] data, Int32 length, Int32 invCharPos) at System.Xml.XmlTextReaderImpl.ParseCDataOrComment(XmlNodeType type, Int32& outStartPos, Int32& outEndPos) at System.Xml.XmlTextReaderImpl.ParseCDataOrComment(XmlNodeType type) at System.Xml.XmlTextReaderImpl.ParseElementContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace) at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc) at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace) at System.Xml.XmlDocument.Load(XmlReader reader) at System.Xml.XmlDocument.LoadXml(String xml) at umbraco.content.LoadContentFromDatabase()

Now we know why the umbraco.config file is not being updated, we now need to track down the node(s) that are causing the errors and remove the dodgy characters!

 

Content XML in Umbraco is stored within the cmsContentXml table. So what we need to do is run a query against this table to find the rogue nodes.

Run the following SQL statement in Management Studio:

Where your dodgy character you are searching for is within the %% tags. You can get this from your original SQL results

You should then get a node or list of nodes that have the dodgy characters in. It's now a simple case of removing these characters and resaving the data back to the db.

How we did it using Notepad++;

Notepad++ is a great developer tool, if you don't already use it, download now, it's free and open source and just well great!

Ok what we did was to open up the rogue node in the Content area in Umbraco, in our case a node called Winter League and find the body text (most likely it's an RTF control that has saved the bad code).

We then viewed this source in HTML mode, selected it all, copied and pasted into Notepad++ where you can clearly see the SOH character:

Simply delete it, paste the code back into Umbraco and republish. Your umbraco.config file should now instantly be rebuilt.

On my development machine, the original file was 9meg, after this fix it was 74meg (large site!).

This has caused me so much pain in the last few days, especially with duplicate nodes being created via the API (Node uses published XML to see if it exists, was not in the published xml so duplicates were created) and performance issues at the front end.

It's looking like the character got into the RTF editor by the user pasting from another source i.e. pdf document. This is apparently a known issue in Umbraco, just wish that someone would sanitise the input when it's saved so this does not happen at all!

 

 


Need an Umbraco Master?

Here at Simon Antony, we have an in house certified Umbraco Grand Master available for hire. Got a problem with your site, need architecture advice, give us a call to speak to Simon directly and see how we can help

Contact Simon Today!