Updated March 31, 2023
Introduction to UTF-8 in HTML
UTF-8 is defined as the default character encoding for HTML5 used to display an HTML page perfectly. It encourages web developers to use UTF-8 as it covers all the characters and symbols in the entity that uses one byte and works well in all the browsers. Unicode Transformation Format – 8 bits are a method converts typed character into machine-readable code. The charset attribute is used to perform a character encoding for the HTML.
Syntax of UTF-8 in HTML
Specification of UTF-8 Character encoding in the <meta> tag is given as:
<meta charset="UTF-8">
Here meta gives data about the HTML document but is machine-readable. And their elements specify a keyword, last modified etc. This meta tag contains the charset, which tells the web browser while accessing the page.
Encoding is how the given numbers are converted to binary numbers, which a machine understood. Here each character is made up of one or more bytes respectively.
How does UTF-8 Works in HTML?
- The most popular encoding character is ASCII; as the internet grew up globally, the only supported Latin is not efficient; that’s why an industry moved on to Unicode as the best option. UTF-8 is the encoding for Unicode, which assigns a unique value called code point for all the characters and emojis. This encoding system solves the issue in ASCII space and is considered to be a dominant encoding for the W3C. And recommended that all e-mail messages could be created using UTF-8. This checks if the page explicitly declares as UTF-8 using a meta tag at the beginning of the document. The significant bit of UTF-8 is defined as 8,16, 24 or 32 bits as they are encoded as one to four bytes. UTF-8 is considered to be a global standard for existing applications as it understands more applications. This encoding helps to encode text and transfer data. UTF-8 encoding s most preferable on most websites. This standard covers all characters, symbols, punctuation all over the world.
- UTF-8 treats a range 0-127 as ASCII code and later up to 192 as shift keys. And the next characters, 224-239, has to be double shifted. Therefore, it is termed multi-byte variable encoding.
- Unicode assigns unique code to every character in a human language. The character set (Grouping all available characters into a specific set) could be overridden using the lang attribute. This Unicode translates into a Binary and vice-versa. It prevents unexpected results during form submission applications. UTF-8 should be considered when we find web pages are lagging inordinate amount of space. Storing UTF-8 text into a binary meanwhile char becomes binary, varchar shows to VARBINARY in SQL.
As an example, let’s take the text Hi, EDUCBA!
The UTF-8-character Encoding is given as below:
01001000 01101001 00101100 01100101 01000100 01010101 01000011 01000010 01000001 00100001
Which converts into a machine-readable binary structure.
Key Importance to Use UTF-8
- It is deliberately compatible with encoding standard ASCII.
- This preferred HTML encoding uses less space and supports many languages.
- This benefits the SEO. Suppose you use two standards, then it leads to a decoding issue that wrongly impacts the SEO. It means we need to implement the character correctly to Help SEO efforts.
Next, we shall see how the Unicode representation is important while taking up foreign languages in the content.
Examples of UTF-8 in HTML
Given below are the examples of UTF-8 in HTML:
Example #1
Simple example with the paragraph content.
Code:
new.html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Page Title</title>
<style>
body {
background-color: red;
text-align: center;
color: yellow;
font-family: Arial, Helvetica, sans-serif;
}
</style>
</head>
<body>
<h1>!مرحبا بالعالم</h1>
<h2>你叫什么名字?<h2>
<p>This is Chinese Language.</p>
<p>This is the code demonstrating encoding Process</p>
</body>
</html>
Explanation:
- The screenshot below shows the content displayed in the Chinese language as well as in English. This is because when the above HTML code is executed in a modern Browser, it normally refers to Unicode.
Output:
Example #2
Using Buttons for the input text.
Code:
lang.html
<!DOCTYPE HTML >
<html>
<head>
<title>HTML sample -buttons</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<form action="addressing" method="post">
<fieldset>
<legend>Selection list</legend>
Checkbox: <input type="checkbox" name="King" value="one"><br>
RadioButton1: <input type="radio" name="Queen" value="two"><br>
RadioButton2: <input type="radio" name="Jack" value="three"
checked="checked"><br>
</fieldset>
<fieldset>
<legend>Give Input</legend>
Login Id: <input type="text" name="Login name"><br>
Password: <input type="password" name="Strong Password"><br>
</fieldset>
<fieldset>
<legend>Designation</legend>
<p><input type="checkbox" name=" Software Engineer"> Software Engineer</p>
<p><input type="checkbox" name="Data Analyst"> Data Analyst</p>
<p><input type="checkbox" name="Web Developer"> Web Developer</p>
<p><input type="checkbox" name=" Senior Analyst"> Senior Analyst</p>
</fieldset>
<p><input type="submit" value="press"> <input type="reset"></p>
</form>
</body>
</html>
Explanation:
- The screenshot below shows the input content displayed in the Chinese language as well as in English. This is because when the above HTML code is executed in a modern Browser, it normally refers to Unicode.
Output:
Example #3
Code using foreign-language content.
Code:
mett.html
<!DOCTYPE html>
<html>
<head>
<title>
HTML UTF-8 Charset
</title>
<meta name="keywords"
charset="UTF-8"
content="Meta Tags, Metadata" />
</head>
<body style="text-align:left">
<H1>Hi Instructor!</H1>
<h2>
This is my formal e-mail for the joining.
</h2>
<h3>Hola, me llamo Juan </h3>
<b>Mucho gusto </b>
</body>
</html>
Explanation:
- The above code uses the Spanish language to check the compatibility in the web browser.
Output:
Example #4
Using JavaScript.
Code:
name.js
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>UTF-8 Charset</title>
<style>
span {
color: blue;
}
span.name {
color: red;
font-weight: bolder;
}
</style>
<script src="https://code.jquery.com/jquery-3.5.0.js"></script>
</head>
<body>
<div>
<span>Thomas,</span>
<span>John Betson,</span>
<span>Valli Tromson</span>
</div>
<div>
<span>आभरणा,</span>
<span>आचुथान,</span>
<span>अभिनंध</span>
</div>
<script>
$( "div span:first-child" )
.css( "text-decoration", "Underline" )
.hover(function() {
$( this ).addClass( "name" );
});
</script>
</body>
</html>
Explanation:
- The above code uses functions to class the respective class. Before that, we have declared metadata for the encoding process. Here we have assigned an <span> element with another language. Unfortunately, ASCII doesn’t have compatibility to access. Therefore, we have declared UTF-8 to support the type.
Output:
Conclusion
So that’s all about the encoding of UTF-8 in HTML. We have gone through Unicode and encodes in the HTML briefly and the implementation of HTML and JavaScript. In this emerging software world, the character sets are not made so feasible; therefore, there comes character encoding schemes to be done with the HTML and other programming languages. Therefore, it is said that it is best to use UTF-8 everywhere where it doesn’t need any conversions encoding.
Recommended Articles
This is a guide to UTF-8 in HTML. Here we discuss the introduction, working, key importance to use UTF-8 and examples, respectively. You may also have a look at the following articles to learn more –