Archive for the ‘basic- xml’ Category

Basic- xml

January 5, 2008

(B)What is XML?
XML (Extensible markup language) is all about describing data. Below is a XML which
describes invoice data.
<?xml version=”1.0″ encoding=”ISO-8859-1″?>
<invoice>
<productname>Shoes</productname>
<qty>12</qty>
<totalcost>100</totalcost>
<discount>10</discount>
</invoice>
An XML tag is not something predefined but it is something you have to define according
to your needs. For instance in the above example of invoice all tags are defined according
to business needs. The XML document is self explanatory, any one can easily understand
looking at the XML data what exactly it means.
(I)What is the version information in XML?
“version” tag shows which version of XML is used.
(B)What is ROOT element in XML?
In our XML sample given previously <invoice></invoice> tag is the root element. Root
element is the top most elements for a XML.
(B)If XML does not have closing tag will it work?
No, every tag in XML which is opened should have a closing tag. For instance in the top
if I remove </discount> tag that XML will not be understood by lot of application.
427
(B)Is XML case sensitive?
Yes, they are case sensitive.
(B)What is the difference between XML and HTML?
XML describes data while HTML describes how the data should be displayed. So HTML
is about displaying information while XML is about describing information.
(B)Is XML meant to replace HTML?
No, they both go together one is for describing data while other is for displaying data.
(A)Can you explain why your project needed XML?
Note: – This is an interview question where the interviewer wants to know why you have
chosen XML.
Remember XML was meant to exchange data between two entities as you can define your
user friendly tags with ease. In real world scenarios XML is meant to exchange data. For
instance you have two applications who want to exchange information. But because they
work in two complete opposite technologies it’s difficult to do it technically. For instance
one application is made in JAVA and the other in .NET. But both languages understand
XML so one of the applications will spit XML file which will be consumed and parsed by
other applications
You can give a scenario of two applications which are working separately and how you
chose XML as the data transport medium.
(B)What is DTD (Document Type definition)?
It defines how your XML should structure. For instance in the above XML we want to
make it compulsory to provide “qty” and “totalcost”, also that these two elements can
only contain numeric. So you can define the DTD document and use that DTD document
with in that XML.
(B)What is well formed XML?
If a XML document is confirming to XML rules (all tags started are closed, there is a root
element etc) then it’s a well formed XML.
428
(B)What is a valid XML?
If XML is confirming to DTD rules then it’s a valid XML.
(B)What is CDATA section in XML?
All data is normally parsed in XML but if you want to exclude some elements you will
need to put those elements in CDATA.
(B)What is CSS?
With CSS you can format a XML document.
(B)What is XSL?
XSL (the eXtensible Stylesheet Language) is used to transform XML document to some
other document. So its transformation document which can convert XML to some other
document. For instance you can apply XSL to XML and convert it to HTML document or
probably CSV files.
(B)What is element and attributes in XML?
In the below example invoice is the element and the invnumber the attribute.
<invoice invnumber=1002></invoice>
(B)Which are the namespaces in .NET used for XML?
“System.xml.dll” is the actual physical file which has all XML implementation. Below are
the commonly used namespaces:-
√ System.Xml
√ System.Xml.Schema
√ System.Xml.XPath
√ System.Xml.Xsl
(A)What are the standard ways of parsing XML
document?
429
Twist: – What is a XML parser?
XML parser sits in between the XML document and the application who want to use the
XML document. Parser exposes set of well defined interfaces which can be used by the
application for adding, modifying and deleting the XML document contents. Now whatever
interfaces XML parser exposes should be standard or else that would lead to different
vendors preparing there own custom way of interacting with XML document.
There are two standard specifications which are very common and should be followed by
a XML parser:-
DOM: – Document Object Model.
DOM is a W3C recommended way for treating XML documents. In DOM we load entire
XML document into memory and allows us to manipulate the structure and data of XML
document.
SAX: – Simple API for XML.
SAX is event driven way for processing XML documents. In DOM we load the whole
XML document in to memory and then application manipulates the XML document. But
this is not always the best way to process large XML documents which have huge data
elements. For instance you only want one element from the whole XML document or you
only want to see if the XML is proper which means loading the whole XML in memory
will be quiet resource intensive. SAX parsers parse the XML document sequentially and
emit events like start and end of the document, elements, text content etc. So applications
who are interested in processing these events can register implementations of callback
interfaces. SAX parser then only sends those event messages which the application has
demanded.
430
Figure 13.1 : – DOM Parser loading XML document
Above is a pictorial representation of how DOM parser works. Application queries the
DOM Parser for “quantity” field. DOM parser loads the complete XML file in to memory.
431
Figure 13.2 : – Returning the Quantity value back to application
DOM parser then picks up the “quantity” tag from the memory loaded XML file and
returns back to the application.
432
Figure 13.3 : – SAX parser in action
SAX parser does not load the whole DOM in to memory but has event based approach.
SAX parser while parsing the XML file emits events. For example in the above figure its
has emitted Invoice tag start event, Amount Tag event, Quantity tag event and Invoice
end tag event. But our application software is only interested in quantity value. So the
application has to register to the SAX parser saying that he is only interested in quantity
field and not any other field or element of the XML document. Depending on what
interest the application software has SAX parser only sends those events to the application
the rest of events is suppressed. For instance in the above figure only quantity tag event
is sent to the application software and the rest of the events are suppressed.
433
(A)In What scenarios will you use a DOM parser and
SAX parser?
√ If you do not need all the data from the XML file then SAX approach is much
preferred than DOM as DOM can quiet memory intensive. In short if you need
large portion of the XML document its better to have DOM.
√ With SAX parser you have to write more code than DOM.
√ If you want to write the XML in to a file DOM is the efficient way to do it.
√ Some time you only need to validate the XML structure and do not want to retrieve
any Data for those instances SAX is the right approach.
(A) How was XML handled during COM times?
During COM it was done by using MSXML 4.0. So old languages like VB6, VC++ used
MSXML 4.0 which was shipped with SP1( Service Pack 1).
Note: – This book will not show any samples as such for MSXML 4.0. So if anyone
interested please do refer the same in MSDN and try to compile some sample programs.
(A)What is the main difference between MSML and .NET
Framework XML classes?
MSXML supports XMLDOM and SAX parsers while .NET framework XML classes
support XML DOM and XML readers and writers.
MSXML supports asynchronous loading and validation while parsing. For instance you
can send synchronous and asynchronous calls to a remote URL. But as such there is not
direct support of synchronous and asynchronous calls in .NET framework XML. But
same can be achieved by using “System.Net” namespaces.
(B) What are the core functionalities in XML .NET
framework? Can you explain in detail those
functionalities?
The XML API for the .NET Framework comprises the following set of functionalities:
434
XML readers
With XML readers the client application get reference to instance of reader class. Reader
class allows you to scroll forward through the contents like moving from node to node or
element to element. You can compare it with the “SqlDataReader” object in ADO.NET
which is forward only. In short XML reader allows you to browse through the XML
document.
XML writers
Using XML writers you can store the XML contents to any other storage media. For
instance you want to store the whole in memory XML to a physical file or any other
media.
XML document classes
XML documents provides a in memory representation for the data in an XMLDOM
structure as defined by W3C. It also supports browsing and editing of the document. So
it gives you a complete memory tree structure representation of your XML document.
(B)What is XSLT?
XSLT is a rule based language used to transform XML documents in to other file formats.
XSLT are nothing but generic transformation rules which can be applied to transform
XML document to HTML, CS, Rich text etc.
435
Figure 13.4 : – XSLT Processor in Actions
You can see in the above figure how the XSLT processor takes the XML file and applies
the XSLT transformation to produce a different document.
(I)Define XPATH?
It is an XML query language to select specific parts of an XML document. Using XPATH
you can address or filter elements and text in a XML document. For instance a simple
XPATH expression like “Invoice/Amount” states find “Amount” node which are children
of “Invoice” node.
(A)What is the concept of XPOINTER?
XPOINTER is used to locate data within XML document. XPOINTER can point to a
particular portion of a XML document, for instance
address.xml#xpointer(/descendant::streetnumber[@id=9])
So the above XPOINTER points streetnumber=9 in “address.xml”.
436
(B)What is an XMLReader Class?
It is an abstract class available from System.XML namespace. XML reader works on a
read-only stream browsing from one node to other in a forward direction. It maintains
only a pointer to the current node but has no idea of the previous and the next node. You
can not modify the XML document, you can only move forward.
(B)What is XMLTextReader?
The “XmlTextReader” class helps to provide fast access to streams of XML data in a
forward-only and read-only manner. It also checks if the XML is well-formed. But
XMLTextReader does not validate against a schema or DTD for that you will need
“XmlNodeReader” or “XmlValidatingReader” class.
Instance of “XmlTextReader” can be created in number of ways. For example if you
want to load file from a disk you can use the below snippets.
XmlTextReader reader = new XmlTextReader(fileName);
To loop through all the nodes you need to call the “read()” method of the “XmlTextreader”
object. “read()” method returns “true” if there are records in the XML document or else
it returns “false”.
//Open the stream
XmlTextReader reader = new XmlTextReader(file);
while (reader.Read())
{
// your logic goes here
string pdata = reader.Value
}
// Close the stream
reader.Close();
To read the content of the current node on which the reader object is you use the “value”
property. As shown in the above code “pdata” gets the value from the XML using
“reader.value”.
437
(I)How do we access attributes using “XmlReader”?
Below snippets shows the way to access attributes. First in order to check whether there
any attributes present in the current node you can use “HasAttributes” function and use
the “MoveToNextAttribute” method to move forward in attribute. In case you want to
move to the next element use “MoveToElement()”.
if (reader.HasAttributes)
{
while(reader.MoveToNextAttribute())
{
// your logic goes here
string pdata = reader.Value
}
}
reader.MoveToElement();
(I) Explain simple Walk through of XmlReader ?
In this section we will do a simple walkthrough of how to use the “XmlReader” class.
Sample for the same is available in both languages (C# and VB.NET) which you can find
in “WindowsApplicationXMLVBNET” and “WindowsApplicationCSharp” folders. Task
is to load “TestingXML.XML” file and display its data in a message box. You can find
“TestingXML.XML” file in “BIN” directory of both the folders. Below is the display of
“TestingXML.XML” file and its content.
Figure 13.5 : – Testing.XML Data
438
Both the projects have command button “CmdLoadXML” which has the logic to load the
XML file and display the data in messagebox. I have pasted only the “CmdLoadXML”
command button logic for simplicity. Following are the basic steps done:-
√ Declared the “XMLTextReader” object and gave the XML filename to load the
XML data.
√ Read the “XMLTextReader” object until it has data and concatenate the data in a
temporary string.
√ Finally display the same in a message box.
Figure 13.6 : – VB.NET code for XMLReader
Same holds true for C# code as shown below.
439
Figure 13.7 : – C# code for XMLReader
Figure 13.8 : – Data Display for “TestingXML.XML”
440
(A) What does XmlValidatingReader class do?
XmlTextReader class does not validate the contents of an XML source against a schema.
The correctness of XML documents can be measured by two things is the document well
formed and is it valid. Well-formed means that the overall syntax is correct. Validation is
much deeper which means is the XML document is proper w.r.t schema defined.
So the XmlTextReader only checks if the syntax is correct but does not do validation.
There’s where XmlValidatingReader class comes in to picture. So this again comes at a
price as XmlValidatingReader have to check for DTD and Schema’s there are slower
compared to XmlTextReader.