EDUCBA Logo

EDUCBA

MENUMENU
  • Explore
    • EDUCBA Pro
    • PRO Bundles
    • Featured Skills
    • New & Trending
    • Fresh Entries
    • Finance
    • Data Science
    • Programming and Dev
    • Excel
    • Marketing
    • HR
    • PDP
    • VFX and Design
    • Project Management
    • Exam Prep
    • All Courses
  • Blog
  • Enterprise
  • Free Courses
  • Log in
  • Sign Up
Home Software Development Software Development Tutorials XML Tutorial XML Encoding
 

XML Encoding

Priya Pedamkar
Article byPriya Pedamkar

XML Encoding

Definition of XML Encoding

XML Encoding is defined as the process of converting Unicode characters into binary format and in XML when the processor reads the document it mandatorily encodes the statement to the declared type of encodings, the character encodings are specified through the attribute ‘encoding’. Encoding plays a role in XML as the user needs to provide a correct encoding while transferring XML Documents on different platforms. With respective to XML 1.0 specification, the two Unicode UTF -8 and 16 must be supported in the processor automatically.XML parser encodes the document properly and translate them into standard Unicode internally.

 

 

Syntax of XML Encoding

This Unicode character set has a universal character that covers a major part of the world languages. To lead a better interaction with methods of encoding characters this Unicode gives us the specification. The encoding part is declared in the section of the XML document LINE1. The general Syntax of Unicode is given below:

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.

<?xml version="1.0" encoding="encoding-name”?>

UTF-8 Syntax

<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
-          It’s a pure ASCII character.

UTF-16 Syntax

If suppose a document includes a Unicode like (0XX…) they are considered to be UTF-16 encodings with 16bits.

<?xml version = "1.0" encoding = "UTF-16" standalone = "no" ?>

The encoding attribute names are not case-sensitive as they proceed ISO and IANA standards.

For Western European Character set the declaration is as follows as they use non-English characters (Latin-1).

<xml version="1.0" encoding="ISO-8859-1" >

Xml also recognizes different encodings like US-ASCII, ISO-8859-1 to 10 and windows version. The general annotation of XML declaration with valid encodings name are given below:

<?xml version='1.0' encoding='US-ASCII' standalone='yes’?>
<?xml version='1.0' encoding='ISO-10646-UCS-2’?>
<?xml version='1.0' encoding='ISO-8859-1’?>
<?xml version='1.0' encoding='Shift-JIS’?>

By default (with no encoding specified) UTF-8 is allowed to assume in the header of the XML file and this is used by the XML Parser.

How does Encoding Work in XML?

To avoid errors while working with XML it is necessary to specify the type of encoding or the XML file should be saved as Unicode. Different types of character encodings are provided while specifying any foreign languages which fall beyond the standard encoding scope. In some cases, the XML processor ignores encoding attributes in the XML Declaration when it is passed through the other network protocols as HTTP has specific headers for the encoding provided actual encoding should be the same as a specific encoder or else it shows the error. The Encoding given in the XML declaration could be overridden by HTTP Protocols during data transfer. The function XMLGetEncoding() helps to do the encoding process.

Format: XMLGetEncoding(generation, I/O entry)

  • generation is the task generation, 0 for the current task, 1 for the parent, and so on.
  • I/O entry defines the number of input/output file that has the XML document.
  • It gives a text box which is the value of the “encoding” attribute on the XML document.

Types of Encoding in XML with Example

XML classifies encoding into two different types they are:

Types of Encoding in XML with Example

1. UTF-8

For specific Document types, certain detections rules are given one such rule is for XML, DTD If no character encoding is specified then UTF-8 is used and java, SQL, XQuery uses this encoding as they have compression format. For numeric character reference in XML, this UTF-8 is been assigned with variable-length encoding. The BYTE ORDER MASKS for UTF-8 is EF BB BF. It is said that for languages like Chinese scripts the good choice is to use UTF-16 as there is a trouble with UTF-8 is as they make larger files yet not a universal solution. The significant bit of UTF-8 is defined as 7, 11,16,21 as they are encoded as one to four bytes.

Example

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<?xml-stylesheet href="clock.css" type="text/css"?>
<Clocks timezone="GMT">
<timehour>11</timehour>
<timeminute>50</timeminute>
<timesecond>40</timesecond>
<timemeridian>p.m.</timemeridian>
</Clocks>

Output:

XML Encoding-1.1

2. UTF-16

This type takes two bytes for each character and should be smaller also incompatible with ASCII. UTF-16 doesn’t follow uniform width which may use 2 or 4 bytes. It is again having classification to LE and BE (little Indian and big Indian) and the byte order is done by byte order mask. It faces some issues while processing in older programming languages like C version as they process zero-harder machine address. Here the significant bit is represented as 16, 20. But UTF-16 supports only for selected specification by xml parser. For national data items (COBOL) parsed in XML documents, it is suggested to prefer UTF-16. They are used mostly in java and windows.

Example

<?xml version="1.0" encoding="UTF-16"?>
<college>
<Professor>
<fullname>Evangeline MAC</fullname>
<Dept>Science-1</Dept>
</Professor>
<!--
<Professor>
<fullname>Antony Jay</fullname>
<Dept>Mathematics</Dept>
</Professor>
-->
</college>

When a file is read the bytes here changes encoding to UTF- 16. Note that the file should be changed to UTF-16 in the text while saving the file.

Output:

XML Encoding-1.2

Let’s take another example

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<Name>Mópezr Pchödinger</Name>

The above encoding changes the special international characters to special symbols.

Output:

Output-1.3

Now let’s see next sample example with ASCII encoding. here the code is.

<?xml version="1.0" encoding="ASCII" standalone="yes"?>
<Name>Mópezr Pchödinger</Name>

In ASCII format the first “ó “symbol is supposed to encode as C3 B3(Specific two bytes). And the second “ö “symbol as C6. The ASCII encoding entirely overlaps with UTF-8.

Output:

Output-1.4

Here comes an example of encoding in XML with C#. Here we use UTF-16 encoding mechanism.

using System;
using System.IO;
using System.Xml;
public class main {
public static void Main() {
XmlDocument d = new XmlDocument();
string xmlSt = "<tv><tvname>Samsung</tvname></tv>";
d.Load(new StringReader(xmlSt));
XmlDeclaration dec;
dec = d.CreateXmlDeclaration("1.0",null,null);
dec.Encoding="UTF-16";
dec.Standalone="yes";
XmlElement root = d.DocumentElement;
d.InsertBefore(dec, root);
Console.WriteLine(d.OuterXml);
}
}

Output:

Output-1.5

Conclusion

So that’s all about the encoding. We have gone through Unicode and encodes in the XML and also the implementation of XML encoding through C#. In this emerging software world, the characters sets are not made so feasible therefore there comes a character encoding schemes to be done with the XML and other programming languages. Therefore it is said that it is best to use UTF-8 everywhere where it doesn’t need any conversions encoding.

Recommended Articles

This is a guide to XML Encoding Here we also discuss the Introduction and how does it in xml along with types and examples. You may also have a look at the following articles to learn more –

  1. XML Parsers
  2. XML Versions
  3. XML Versions
  4. XML Commands

Primary Sidebar

Footer

Follow us!
  • EDUCBA FacebookEDUCBA TwitterEDUCBA LinkedINEDUCBA Instagram
  • EDUCBA YoutubeEDUCBA CourseraEDUCBA Udemy
APPS
EDUCBA Android AppEDUCBA iOS App
Blog
  • Blog
  • Free Tutorials
  • About us
  • Contact us
  • Log in
Courses
  • Enterprise Solutions
  • Free Courses
  • Explore Programs
  • All Courses
  • All in One Bundles
  • Sign up
Email
  • [email protected]

ISO 10004:2018 & ISO 9001:2015 Certified

© 2025 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Software Development Course

Web development, programming languages, Software testing & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA Login

Forgot Password?

🚀 Limited Time Offer! - 🎁 ENROLL NOW