DinoMage on Twitter RSS Feed

TinyXML is a great little library (tiny, even) for loading and parsing XML documents.  Here I’m going to introduce the basics of getting all your data out of (and into) XML files.  Part 1 is quick and simple.  In Part 2, I’ll use a more complete solution.

First of all, what TinyXML does is to load the entire XML document and populate a tree of data nodes.  This tree can be iterated over to get both the data itself and its hierarchy/organization.  This approach represents an alternative to the Expat library.  Expat is a stream-based XML parsing library, where you implement callbacks for the handling of XML elements.  Expat is very fast and memory-efficient, so it is used in places where that is important (e.g. the Mozilla project).  However, TinyXML tends to be easier to use or at least to understand, which really is most of the work when programming.

Extensible Markup Language

XML is a relatively simple data format.  It’s text-based, for one.  You can open an XML file and read it.  It looks just like HTML at a glance.  That’s both good and bad.  You can change it by hand, but so can others and you might not want that.  That’s what encryption is for, though.  Anyhow, here’s a sample XML file that TinyXML is happy with:

<?xml version="1.0" ?>
<root>
    <Element1 attribute1="some value" />
    <Element2 attribute2="2" attribute3="3">
        <Element3 attribute4="4" />
        Some text.
    </Element2>
</root>

To keep this description very minimal, I’ll just point out that there is a format descriptor first, a single root element, other nested elements, and their attributes.

Now, let’s have a go at parsing this.

Setup

First, we have to put TinyXML into our project.  Download it (TinyXML@Sourceforge or full example archive), then drop these files into your project:

tinystr.cpp
tinyxmlerror.cpp
tinystr.h
tinyxml.h
tinyxml.cpp
tinyxmlparser.cpp

TinyXML is licensed under a zlib license, found in each of the above files.  This means it can be used freely for any purpose.  The simplest way to use it is to compile it right in your project, but you may prefer to make a static library out of it.

In your parsing source file, you’ll need to #include “tinyxml.h”.  You’ll likely also want to #define TIXML_USE_STL project-wide or edit it in as the first line of tinyxml.h.  This compiles the STL versions of some TinyXML functions.  I might assume having them, so be aware of that in case you get related compile errors when following along.

Loading

Now, ready to load the document?  Let’s name the file “test.xml”.  Here’s how we load it:

TiXmlDocument doc;
if(!doc.LoadFile("test.xml"))
{
    cerr << doc.ErrorDesc() << endl;
    return FAILURE;
}

Now, “doc” holds all of our data.  Let’s get at that.

TiXmlElement* root = doc.FirstChildElement();
if(root == NULL)
{
    cerr << "Failed to load file: No root element."
         << endl;
    doc.Clear();
    return FAILURE;
}

The root element holds everything else.  We’ll be using this to get a hold of the rest of the data.  You’ve just seen how we’re going to do it, too.  The FirstChildElement() method returns a TiXmlElement pointer to the first child node.  Every class derived from TiXmlNode (including TiXmlDocument and TiXmlElement) has this method.

FirstChildElement() takes an optional string argument that is the name of the element to find.  We knew what to expect when there was just a single root, so no name was needed.  However, now we have to iterate over its children.  In our particular file, there are two with unique names, so we could just use FirstChildElement() with the name of each element.  But you don’t always know exactly how many elements there are and what their names are.  We should at least try to be a little more generic.

for(TiXmlElement* elem = root->FirstChildElement(); elem != NULL; elem = elem->NextSiblingElement())
{
    string elemName = elem->Value();

The above code will begin a loop over the elements which are direct children of ‘root’.  Notice that we didn’t specify a name in the calls to FirstChildElement() and NextSiblingElement().  That means we need to check what the name, or ‘value’ is.  The Value() method is a little different for each class derived from TiXmlNode, but all we need to know right now is that it returns the element name when used on a TiXmlElement.

With that info, we are ready to get down into the child elements.

    const char* attr;
    if(elemName == "Element1")
    {
        attr = elem->Attribute("attribute1");
        if(attr != NULL)
            ; // Do stuff with it
    }

‘attr’ is a variable we’ll use for catching the attributes as we ask for them.  If the attribute we request with Attribute() is not there, ‘attr’ is NULL.  There are times when this might be an error that you want your user to know about.  Other times, you might just use a default value if the attribute is not explicitly stated.  TinyXML has other methods for checking attributes for better error handling, such as if the attribute holds text when you expect an integer (look in the header or online docs for QueryIntAttribute() and friends).

Getting the attributes from Element2 is just the same:

    else if(elemName == "Element2")
    {
        attr = elem->Attribute("attribute2");
        if(attr != NULL)
            ; // Do stuff with it
        attr = elem->Attribute("attribute3");
        if(attr != NULL)
            ; // Do stuff with it

There’s more to do, though.  Element3 is nested in there.  Just for the sake of showing the use of a more specific loop (looking just for one type of element) we’ll use a loop for finding Element3.  This loop will skip any elements that are not named Element3:

for(TiXmlElement* e = elem->FirstChildElement("Element3"); e != NULL; e = e->NextSiblingElement("Element3"))
{
    attr = e->Attribute("attribute4");
    if(attr != NULL)
        ; // Do stuff with it
}

Text Nodes

Attributes aren’t the only way to store data.  One other common occurrence is a block of text contained by an element.  Looking back at our XML file, it looks like we have “Some text.” in there.  Do we really?  One might say that there’s a line break and other whitespace going on.  By default, TinyXML strips the outside whitespace and condenses whitespace between other characters.  This is pretty common in XML parsing, so go with it.  It may be wise not to rely on non-default behavior in case your parsing needs change.

Text nodes (TiXmlText) are a little funny.  XML like this…

Hi, <bold>Jack</bold> is my name.

…has four separate TinyXML nodes.  “Hi, ” will be put into a text node, <bold> will be a new element with “Jack” as a text node within it, and ” is my name.” will be another text node in the parent.  It can get a bit more complicated to parse text, but our example is simple enough for now.  As a note for the above XML, another option is to use special characters to make it all a single text node and parse it by hand.  See also the CDATA note in the Pitfalls section.

Here’s how we can deal with ours:

for(TiXmlNode* e = elem->FirstChild(); e != NULL; e = e->NextSibling())
{
    TiXmlText* text = e->ToText();
    if(text == NULL)
        continue;
    string t = text->Value();
    // Do stuff
}

There is no FirstChildText() method, so here we’re using the ToText() method to cast the TiXmlNode to a TiXmlText.  If it’s NULL, it’s not a text node.  The goods we want are retrieved with Value().

And that’s just about it.  Let’s free the memory that TinyXML allocated with all of those nodes and such.

doc.Clear();

Done!  Those are the basics of loading XML files with TinyXML.  In Using TinyXML, Part 2, I’ll be using a slightly more realistic example.  You’ll see this all in more detail there.  Now for…

Saving

Loading wasn’t so bad, right?   Either way, saving a TinyXML document is even easier than loading.  That’s because we don’t have to be so generic about it.  We know exactly what data we have.  It’s just a matter of squeezing it through TinyXML one piece at a time.

Create the document node and add our root named “root”:

TiXmlDocument doc;
 TiXmlElement* root = new TiXmlElement("root");
 doc.LinkEndChild(root);

TiXmlNodes have several methods for inserting children.  LinkEndChild() is the simplest and the only one strictly necessary here.

Next, create a new Element1.  Once we hand off the pointer to TinyXML, the call to TiXmlDocument::Clear() at the end will clean it up.

 TiXmlElement* element1 = new TiXmlElement("Element1");
 root->LinkEndChild(element1);
 element1->SetAttribute("attribute1", "some value");

SetAttribute() is pretty simple.  There’s a string for the attribute name and a string for the data.  The rest of the work is getting your attribute data into a string format in the first place.  TinyXML does have a couple of methods for simplifying that for ‘int’ and ‘double’ (and compatible) types.  SetAttribute() is overloaded to work with integers and SetDoubleAttribute() can be used for doubles or floats.

The rest of the saving function should be easy enough to understand.  We have to make Element3 a child of Element2 and we also need to make a text node as a child of Element2.

 TiXmlElement* element2 = new TiXmlElement("Element2");
 root->LinkEndChild(element2);
 element2->SetAttribute("attribute2", "2");
 // Using overloaded version
 element2->SetAttribute("attribute3", 3);
 TiXmlElement* element3 = new TiXmlElement("Element3");
 element2->LinkEndChild(element3);
 element3->SetAttribute("attribute4", "4");
 TiXmlText* text = new TiXmlText("Some text.");
 element2->LinkEndChild(text);

That’s nearly all for saving.  Now we’ll write out the file, clean up, and return.

bool success = doc.SaveFile("test_save.xml");
doc.Clear();
if(success)
    return SUCCESS;
else
    return FAILURE;

The Code

Here is the full source code for loading and saving the example XML file.  Included is a Code::Blocks project.

quick_tinyxml.zip

Pitfalls

Consider where particular elements can be nested.  If you don’t, you might end up writing code that you need to refactor right away.  If an element can be found within more than a single other element, then it is worth writing a separate function for loading it and calling that from where the other elements are loaded.  We’ll see this in Using TinyXML, Part 2.

Don’t use spaces in element or attribute names.  XML is to some extent whitespace-delimited, so this would cause all sorts of errors.  TinyXML will not be pleased.

Be careful when using special characters.  TinyXML will convert them for you, if it can, but you should still be aware.  The ampersand (&) is special in XML and so are the characters that delimit elements and attributes, less than (<), greater than (>), single quote/apostrophe (‘), and double quote (“).  You might be able to get away with some of them, but there are built-in codes that TinyXML recognizes:

&lt;     <
&gt;     >
&    &
&apos;   '
&quot;   "

You might also avoid whitespace and special character concerns by using a CDATA section:

<![CDATA[
<<< " 'Cr&zy ch&r&cters" ' >>>
// Source code works well here, too
void int main(int argc, char* argv[])
{
    return (argc < 2? 0 : 1);
}
]]>

TinyXML does support CDATA (unparsed character data) and loads it as a TiXmlText with TiXmlText::CDATA() == true.  You can easily save your own such sections with TiXmlText::SetCDATA().

Coming up

Using TinyXML, Part 2 will expand on most of the things mentioned here.  I’ll go through an example that more closely parallels what I actually use TinyXML for.  I’ll organize the code for the next tutorial the right way, instead of the simple way, and introduce some of my own code for simplifying conversions and such.  See ya there!

22 Responses to “Tutorial: Using TinyXML, Part 1”

  1.  Martoon says:

    I’ve always liked TinyXML for its simplicity, and I’ve used it in a number of projects.  The only complaint I have is that it won’t handle any DTD stuff (!DOCTYPE) in the header, and several XML editors put that in by default.  I don’t expect TinyXML to use any of the DTD stuff, but it’d be nice if it gracefully ignored it and continued parsing the rest of the document (it doesn’t).  With some XML editors, I’ve had to manually edit out that line every time after I save in the editor, or TinyXML can’t use the file.  What I’ve usually ended up doing is wrapping the TinyXML loading in a function that strips the DTD header before passing the fixed version to TinyXML.

    •  Jonny D says:

      Maybe your loading code assumed too much when finding the root element?  TinyXML doesn’t parse DTDs, but it is supposed to toss it into a TiXmlUnknown until it is saved again.  Those are safely skipped by my code here.  If it didn’t do that for you, then perhaps it’s a bug with the version you were using?

  2.  BBG says:

    Is there a way to load the file from a file dialog?

    I tried doing something like
    xmlFile.LoadF(OpenXML->FileName);

    OpenXML being the file dialog…but it doesn’t seem to be working

    •  Jonny D says:

      Is that the .NET FileDialog class?  I’ve never used it, but MSDN has some stuff on converting a System::String (like OpenXML->FileName) to std::string (http://msdn.microsoft.com/en-us/library/1b4az623(v=vs.80).aspx) and char* (http://msdn.microsoft.com/en-US/library/d1ae6tz5(v=vs.80).aspx).  I wouldn’t expect UNICODE support in the file name.

      •  BBG says:

        it’s actually a file dialog in C++ Builder. I ended up taking ur advice and converted (OpenXML->FileName) to a string. Here is the exact code in C++ for future reference

        XMLFile.LoadFile(AnsiString(OpenXML->FileName).c_str());

  3.  Ekin says:

    Hi Jonny, this short and concise tutorial is hands down the best I found on TinyXML. Eagerly waiting for the second part!!!

    In the meanwhile I have a question:

    I am trying to create an XML file structure for a specific application, and most of my data types can be handled by using text blocks, or by using a few attributes in a tag. However, I also have a huge array which I have to include within a tag. Let’s assume the original data looks like this:

    array[i]=(1,2,3,4,5,6,…,100,101,…,999,1000)

    1. What would be the best way to contain such data within an XML file?

    a) A huge comma seperated string block to be parsed out of TinyXML (I would still try to read the block by TinyXML) ?
    1, 2, 3, 1000

    b) Again a huge block with zillions of attributes one after another?

    c) Probably a much better way?

    2. Assuming I have the first part covered using option b, how can you loop and read through sequential attributes and write the values to an array using TinyXML and C++? Is there something like a NextAttribute() method?

    I’d be very glad if you can give some pointer (no pun intended) when you have time.

    Thanks!

    •  Jonny D says:

      Thanks Ekin.  I noticed a conspicuous lack of TinyXML tutorials that make good sense, so hopefully this will fill the gap for some.

      For your first question, I would go for the comma-delimited text to be parsed separately.  You can easily use some sort of “explode” function (I’ll show mine in the next installment, though it’s no better than others you can find out there) or you can use the C strtok(), which does the same thing iteratively.

      For the second question, you have to use TiXmlElement::FirstAttribute()(gives you a const TiXmlAttribute*) and loop over TiXmlAttribute::Next().  This lets you go through each attribute without knowing its name.  I’ll put some of that in the next installment too, for completeness.

      •  Ekin says:

        Thanks Jonny, I appreciate it! As I mentioned previously, I am looking forward to the next installment (especially I am curious about your ‘explode’ function!).

  4.  BBG says:

    it’s me again, i’ve been using ur tutorial to help me with my project for the last month now. it is very very helpful and a lot more clear than other tutorials out there, thanks! i just have a question:

    What is the difference between FirstChild() and FirstChildElement()?

    •  Jonny D says:

      You’re welcome!  FirstChild() returns a pointer to the first TiXmlNode, whereas FirstChildElement() returns the first TiXmlElement, skipping things like comments, text, and other stuff.

  5.  FvdL says:

    There is a small mistake in the code to get attribute4. Right now it says

    attr = elem->Attribute(“attribute4”);

    However, the TinyXMLElement is named “e”, so this should be

    attr = e->Attribute(“attribute4”);

    •  Jonny D says:

      Yes, you’re right.  I’ve fixed it now.  Thanks!

  6.  Waleed Khan says:

    Hello Sir, Every time i try to run your Code there is an error that says

    Failed to open file

    Please help me out
    Thanks

    •  Jonny D says:

      Is that coming from the !doc.LoadFile test?  That would mean either the file is not found or the file does not have read permissions set for you (both cases return a NULL FILE pointer with fopen).  Most likely the XML file is not named correctly or it is not placed in a location where the program can find it (double-check your program’s working directory).

  7.  TheDude says:

    Hey Man,

    Great Part1, Where on earth is Part2!!! 😛

    •  Jonny D says:

      Part 2 got caught up behind my projects…  Then I tried out TinyXML2.  So when Part 2 does come, it will be preceded by a Part 1 for TinyXML2.

  8.  Frank says:

    hi Jonny,

    Thanks for your tutorial. I’m a programer newbie and unfortunately
    I don’t know, how to search a node within a XML file.
    I have got a XML file 4 Levels deep.
    Can you help me please?

    Thanks in advance
    Frank

    •  Jonny D says:

      Searching for a node depends on the structure of your XML file (e.g. a saved game format).

      If you know the entire structure, then it’s best to iterate through each node so you can grab data from them all.  You can also punch through the known structure as the tinyxml.h header describes:
      TiXmlElement* child = docHandle.FirstChild( “Document” ).FirstChild( “Element” ).FirstChild( “Child” ).ToElement();

      If you don’t know it all or you just want to find a node regardless of its context, then you will have to go with the recursive (or stack) search anyway with TiXmlNode::FirstChild() and TiXmlNode::NextSibling(), checking for the value (node name) you want.  Usually, context matters when you specify a format in XML.

  9.  Golnaz says:

    Hi

    I want to print a number that I save in an integer variable between double quote in tinyXML , How Can I do that?

    String ^orig1 = gcnew String(numericupdown1->Value);
    pin_ptr wch1 = PtrToStringChars(orig1);
    size_t origsize1 = wcslen(wch1) + 1;
    const size_t newsize1 = 100;
    size_t convertedChars1 = 0;
    char nstring1[newsize1];
    wcstombs_s(&convertedChars1, nstring1, origsize1, wch1, _TRUNCATE);

    msg = new TiXmlElement( “BoardSize_Width” );
    msg->LinkEndChild( new TiXmlText( nstring1 ));
    msgs->LinkEndChild( msg );

    here nstring1 is a char that show a pure number; I want to have this :

    “5”

    thank u so much!

    •  Jonny D says:

      It looks like you want to use a TiXmlAttribute for that.  If you have an element called “BoardSize” (maybe that’s your msgs variable), do this:

      EDIT: This ain’t right!  See new comment below…

      msg = new TiXmlAttribute("width", NULL);
      msg->SetIntValue(numericupdown1->Value);
      boardSizeElem->LinkEndChild(msg);
      

      btw is that C#?  Just curious: Are you using .NET or Unity3D?

  10.  Golnaz says:

    Hi

    this is in a web application c++ and I am sing .Net;

    it does not work in my project . coz msg is an element not an attribute; and for this code I face with the error comig after that :
    TiXmlElement* msg;

    msg = new TiXmlElement( “BoardSize_Width” );

    TiXmlElement * msgs = new TiXmlElement( “Settings” );
    root->LinkEndChild( msgs );

    TiXmlAttribute* att = new TiXmlAttribute(“width”, NULL);
    att->SetIntValue(int::Parse(textBox1->Text));
    msg->LinkEndChild(att);
    msgs->LinkEndChild(msg);

    Error:
    error C2664: ‘TiXmlNode::LinkEndChild’ : cannot convert parameter 1 from ‘TiXmlAttribute *’ to ‘TiXmlNode *’

    Can you help me?

    •  Jonny D says:

      Hey!  I fixed the site so new comment notifications come through…  but this is awfully late.

      I’m not sure what I was doing up there in that last comment.  Look into the tinyxml header and use TiXmlElement::SetAttribute() and SetDoubleAttribute().  You can check the value later with QueryIntAttribute() and similar.

Leave a Reply