EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 600+ Courses All in One Bundle
  • Login
Home Software Development Software Development Tutorials XML Tutorial XML Encoding
Secondary Sidebar
XML Tutorial
  • Basic
    • What is XML?
    • XML Tags
    • XML URL
    • XPath Sibling
    • XML root element
    • XML Encryption
    • XML Parsing php
    • xml parsing with java
    • Dataset XML
    • XML Parser in C#
    • XML Tree
    • XML boolean
    • XML sitemap
    • XML Array
    • XML reserved characters
    • XML Viewer
    • XML Uses
    • XML Special Characters
    • XML generator
    • XML file format
    • XML DOM
    • XML ampersand
    • XML Mapping
    • XML File
    • XML Element
    • XML HttpRequest
    • XML XSD
    • XML Schema
    • XML Namespaces
    • XML Comments
    • XML Attributes
    • XML Encoding
    • XML Validation
    • XML CDATA
    • XML Database
    • XML Technologies
    • XML Error
    • XML DTD
    • XML Date
    • XML Parsers
    • String in XML
    • XML with CSS
    • XML Versions
    • XML Features
    • XML Commands
    • Servlet web.xml
    • XPath Injection
    • XPath Functions
    • XPath starts-with
    • XPath Selector
    • XPath Count
    • XPath XML
    • XML Parsing in Oracle
    • XML parsing in python
  • Xpath
    • What is XPath?
    • XPath namespace
    • XPath for JSON
    • XPath Last Element
    • Xpath Substring
    • XPath First Element
    • XPath local-name
    • XPath Multiple Attributes
    • XPath Matches
    • XPath Position
    • XPath Index
    • XPath regex
    • XPath id contains
    • XPath innertext
    • XPath Multiple Conditions
    • XPath Helper
    • XPath Generator
    • XPath ID
    • XPath Locator
  • Xquery
    • What is Xquery

XML Encoding

By Priya PedamkarPriya Pedamkar

XML Encoding

Definition of XML Encoding

XML Encoding is defined as the process of converting Unicode characters into binary format and in XML when the processor reads the document it mandatorily encodes the statement to the declared type of encodings, the character encodings are specified through the attribute ‘encoding’. Encoding plays a role in XML as the user needs to provide a correct encoding while transferring XML Documents on different platforms. With respective to XML 1.0 specification, the two Unicode UTF -8 and 16 must be supported in the processor automatically.XML parser encodes the document properly and translate them into standard Unicode internally.

Syntax of XML Encoding

This Unicode character set has a universal character that covers a major part of the world languages. To lead a better interaction with methods of encoding characters this Unicode gives us the specification. The encoding part is declared in the section of the XML document LINE1. The general Syntax of Unicode is given below:

Start Your Free Software Development Course

Web development, programming languages, Software testing & others

<?xml version="1.0" encoding="encoding-name”?>

UTF-8 Syntax

<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
-          It’s a pure ASCII character.

UTF-16 Syntax

If suppose a document includes a Unicode like (0XX…) they are considered to be UTF-16 encodings with 16bits.

<?xml version = "1.0" encoding = "UTF-16" standalone = "no" ?>

The encoding attribute names are not case-sensitive as they proceed ISO and IANA standards.

For Western European Character set the declaration is as follows as they use non-English characters (Latin-1).

<xml version="1.0" encoding="ISO-8859-1" >

Xml also recognizes different encodings like US-ASCII, ISO-8859-1 to 10 and windows version. The general annotation of XML declaration with valid encodings name are given below:

<?xml version='1.0' encoding='US-ASCII' standalone='yes’?>
<?xml version='1.0' encoding='ISO-10646-UCS-2’?>
<?xml version='1.0' encoding='ISO-8859-1’?>
<?xml version='1.0' encoding='Shift-JIS’?>

By default (with no encoding specified) UTF-8 is allowed to assume in the header of the XML file and this is used by the XML Parser.

How does Encoding Work in XML?

To avoid errors while working with XML it is necessary to specify the type of encoding or the XML file should be saved as Unicode. Different types of character encodings are provided while specifying any foreign languages which fall beyond the standard encoding scope. In some cases, the XML processor ignores encoding attributes in the XML Declaration when it is passed through the other network protocols as HTTP has specific headers for the encoding provided actual encoding should be the same as a specific encoder or else it shows the error. The Encoding given in the XML declaration could be overridden by HTTP Protocols during data transfer. The function XMLGetEncoding() helps to do the encoding process.

Format: XMLGetEncoding(generation, I/O entry)

  • generation is the task generation, 0 for the current task, 1 for the parent, and so on.
  • I/O entry defines the number of input/output file that has the XML document.
  • It gives a text box which is the value of the “encoding” attribute on the XML document.

Types of Encoding in XML with Example

XML classifies encoding into two different types they are:

Types of Encoding in XML with Example

1. UTF-8

For specific Document types, certain detections rules are given one such rule is for XML, DTD If no character encoding is specified then UTF-8 is used and java, SQL, XQuery uses this encoding as they have compression format. For numeric character reference in XML, this UTF-8 is been assigned with variable-length encoding. The BYTE ORDER MASKS for UTF-8 is EF BB BF. It is said that for languages like Chinese scripts the good choice is to use UTF-16 as there is a trouble with UTF-8 is as they make larger files yet not a universal solution. The significant bit of UTF-8 is defined as 7, 11,16,21 as they are encoded as one to four bytes.

Example

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<?xml-stylesheet href="clock.css" type="text/css"?>
<Clocks timezone="GMT">
<timehour>11</timehour>
<timeminute>50</timeminute>
<timesecond>40</timesecond>
<timemeridian>p.m.</timemeridian>
</Clocks>

Output:

XML Encoding-1.1

2. UTF-16

This type takes two bytes for each character and should be smaller also incompatible with ASCII. UTF-16 doesn’t follow uniform width which may use 2 or 4 bytes. It is again having classification to LE and BE (little Indian and big Indian) and the byte order is done by byte order mask. It faces some issues while processing in older programming languages like C version as they process zero-harder machine address. Here the significant bit is represented as 16, 20. But UTF-16 supports only for selected specification by xml parser. For national data items (COBOL) parsed in XML documents, it is suggested to prefer UTF-16. They are used mostly in java and windows.

Example

<?xml version="1.0" encoding="UTF-16"?>
<college>
<Professor>
<fullname>Evangeline MAC</fullname>
<Dept>Science-1</Dept>
</Professor>
<!--
<Professor>
<fullname>Antony Jay</fullname>
<Dept>Mathematics</Dept>
</Professor>
-->
</college>

When a file is read the bytes here changes encoding to UTF- 16. Note that the file should be changed to UTF-16 in the text while saving the file.

Output:

XML Encoding-1.2

Let’s take another example

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<Name>Mópezr Pchödinger</Name>

The above encoding changes the special international characters to special symbols.

Output:

Output-1.3

Now let’s see next sample example with ASCII encoding. here the code is.

<?xml version="1.0" encoding="ASCII" standalone="yes"?>
<Name>Mópezr Pchödinger</Name>

In ASCII format the first “ó “symbol is supposed to encode as C3 B3(Specific two bytes). And the second “ö “symbol as C6. The ASCII encoding entirely overlaps with UTF-8.

Output:

Output-1.4

Here comes an example of encoding in XML with C#. Here we use UTF-16 encoding mechanism.

using System;
using System.IO;
using System.Xml;
public class main {
public static void Main() {
XmlDocument d = new XmlDocument();
string xmlSt = "<tv><tvname>Samsung</tvname></tv>";
d.Load(new StringReader(xmlSt));
XmlDeclaration dec;
dec = d.CreateXmlDeclaration("1.0",null,null);
dec.Encoding="UTF-16";
dec.Standalone="yes";
XmlElement root = d.DocumentElement;
d.InsertBefore(dec, root);
Console.WriteLine(d.OuterXml);
}
}

Output:

Output-1.5

Conclusion

So that’s all about the encoding. We have gone through Unicode and encodes in the XML and also the implementation of XML encoding through C#. In this emerging software world, the characters sets are not made so feasible therefore there comes a character encoding schemes to be done with the XML and other programming languages. Therefore it is said that it is best to use UTF-8 everywhere where it doesn’t need any conversions encoding.

Recommended Articles

This is a guide to XML Encoding Here we also discuss the Introduction and how does it in xml along with types and examples. You may also have a look at the following articles to learn more –

  1. XML Parsers
  2. XML Versions
  3. XML Versions
  4. XML Commands
Popular Course in this category
XML Training (5 Courses, 6+ Projects)
  5 Online Courses |  6 Hands-n Projects |  40+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Java Tutorials
  • Python Tutorials
  • All Tutorials
Certification Courses
  • All Courses
  • Software Development Course - All in One Bundle
  • Become a Python Developer
  • Java Course
  • Become a Selenium Automation Tester
  • Become an IoT Developer
  • ASP.NET Course
  • VB.NET Course
  • PHP Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2023 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Software Development Course

Web development, programming languages, Software testing & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more