Confessions of a .NET Developer!

Read Huge XML files faster

Most developers use the XmlDocument class to read XML files. LINQ has given us an excellent class XDocument which is much better than XmlDocument, inserting and deleting has become easy with XDocument. But both these classes are inefficient in handling huge XML files which are for example 500MB or 2-3 GB.

The problem is that once you load the XML using these classes, they store the whole XML in memory which eventually creates memory overheads. If you load these huge files into the memory then you might get an exception like this :
“System out of memory exception”.

So the solution for this will be to use XmlTextWriter and XmlTextReader class which is a stream-based XML reader rather than an in-memory XML reader.
These two classes read the XML node by node instead of loading the whole document. Its very much similar to using StreamWriter or StreamReader!

Lets take an example of creating an xml file from scratch.

        Dim filePath As String = "C:\Users\Tarun\Desktop\Prac.xml"
        Dim fs As FileStream = New FileStream(filePath, FileMode.Create, FileAccess.Write)
        Dim xtw As XmlTextWriter = New XmlTextWriter(fs, Text.Encoding.UTF8)

        xtw.Formatting = Formatting.Indented  ' Used so that it can be easily read while opening in Notepad
        xtw.Indentation = 4

        xtw.WriteStartDocument(True)
        xtw.WriteStartElement("Products")
        xtw.WriteComment("This is a comment")

        xtw.WriteStartElement("Product")
        xtw.WriteAttributeString("ID", "1")
        xtw.WriteAttributeString("Name", "Sony")

        xtw.WriteStartElement("Price")
        xtw.WriteString("40.33$")
        xtw.WriteEndElement()  'Closes the Price element
        xtw.WriteEndElement()  'Closes the Product element

        xtw.WriteStartElement("Product")
        xtw.WriteAttributeString("ID", "2")
        xtw.WriteAttributeString("Name", "Dell")

        xtw.WriteStartElement("Price")
        xtw.WriteString("80$")
        xtw.WriteEndElement()  'Closes the Price element
        xtw.WriteEndElement()  'Closes the Product element

        xtw.WriteEndElement()  'Closes the Products element
        'xtw.WriteEndDocument()
        xtw.Close()  'Close the XmlTextWriter stream
        fs.Close()  'Close the FileStream

The functions used are self-explanatory. But writing an XML file was not our purpose. Now lets read this XML.

        Using fs As FileStream = New FileStream(filePath, FileMode.Open, FileAccess.Read)

            Dim xtr As XmlTextReader = New XmlTextReader(fs)
            Dim writer As StringWriter = New StringWriter()
            'Read each and every node encountered
            While (xtr.Read())
                writer.Write("Type:")
                writer.Write(xtr.NodeType.ToString())

                If (xtr.Name <> "") Then
                    writer.Write("Name:")
                    writer.Write(xtr.Name)
                End If

                If (xtr.Value <> "") Then
                    writer.Write("Value:")
                    writer.Write(xtr.Value)
                End If

                If (xtr.AttributeCount > 0) Then
                    writer.Write("Attributes")
                    For i As Integer = 0 To xtr.AttributeCount - 1
                        writer.Write(" ")
                        writer.Write(xtr.GetAttribute(i))
                        writer.Write(" ")
                    Next
                End If

            End While

            Dim s As String = writer.ToString()
            Dim strm as StreamWriter = new StreamWriter("C:\HereisTheResult.txt",false);
            strm.Write(s); //Write the result to the txt file.

        End Using
Advertisements

April 8, 2011 - Posted by | XML

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: