Updated March 14, 2023
Introduction to Avro schema
The Avro is the schema-based process that can be used for changing the data. The Avro can gain schemas as inputs in which it cannot follow the obtainable schema. It accompanied its standards for substituting the schemas. The schema can outline the type of the file, the position of the record, the label of the record, and fields in the record with matching data types; with the help of these schemas, we can able to reserve periodical values in pre-compiled format by using shorter space and that values can be reserved with no help of metadata.
What is Avro Schema?
The Avro schema can be reserved with Avro data if that is reserved in a file. The Avro depends on the schema, and if the data in Avro has able to read and write, then the schema has been used. When the process of data changing has been done, then it can allow every datum to get read without any particular value, as when it comes to filing the Avro data, Avro schema has Avro data so that any program can handle the file.
The Avro can receive schemas as input in which it can able to arrange values in the binary format, and it does not have metadata. It can follow its standards of describing schemas. Those schemas contain record, location, record name, and fields in the record.
How to write Avro schema?
Let us see a simple example having a string, so we need to have a message like,
{
“Myschema”: “Hi”
}
By using the above string, we can able to generate a document,
{
"type": "record",
"namespace": "company",
"name": "Avrotype",
"fields": [
{
"name": "Myschema",
"type": "string"
}
]
}
We can use the above code in our producer application by generating a message and sending it,
Avrotype a = new Avrotype();
a.setMyschema(“Hi”);
As per above, we do not want to generate the Avrotype class in which avro will create it for us. When we have described the schema, then that can be constituted as a class in Java by using the ‘set’ method for every field.
Creating Avro schemas:
Let us see the creation of Avro schema that has been generated in JSON format, which can be used for insubstantial text-based data exchanging format means it can exchange data in a format which can be readable by the human, and that schema has been generated in three ways, such as JSON string, JSON object, and JSON array in which JSON can be defined in several places.
Let us see an example to outline an Avro schema, in which we have generated the JSON record that can identify the schema, as given below,
{
"type": "record",
"namespace": "College",
"name": "Student",
"fields": [
{"name": "Section", "type": "string"},
{"name": "Subject", "type": "string"}
]
}
It is an example of a JSON record that can define the schema which can be used by the value part of the key-value pair which can define the student information.
- Type: It is a field that can approach the document in which it can display the document type. Typically, there are multiple fields in every record, and if this type is a field, then it can define the data type.
- Namespace: This can outline the name of the object in which that object lives.
- Name: This field can approach the document as well as approach the field if this field is related to the document, then it can define the schema name this schema name can uniquely define the schema inside the cache, and when it has fields, then it defines the name of that field.
- Fields: This field in the example can define the actual substitute of the schema in which it can describe which field will behold in the value, and also it defines the data type for every field in which we can say that it is a data type like integer or string.
Types of Avro schema
It has primitive data types and complex data types.
The primitive data type,
- null: Is a data type that does not have any value.
- int: It is the default, and it has ‘int 32-bit signed integers ‘
- long: It has ‘int 64-bit signed integers.
- Float: It is a floating-point type that has 32-bit IEEE floating-point numbers in which it is single precision.
- Double: It is double precision which has 64-bit and IEEE floating-point numbers.
- Bytes: It can receive any byte order, the value of it will not be arrays, and that cannot be changeable, as we can say that it is a raw string.
- String: This type can receive any valid Unicode, as it is not an array and it is not changeable.
The complex data types: including primitive data types, records, enums, arrays, maps, and unions, are fixed.
- Records: It is a group of various attributes such as name, namespace, type, and fields.
- Enums: It is a list of items in a group and it has attributes such as name, namespaces, and symbols.
- Arrays: It describes an array field that has a single attribute, and that attribute can define the type of item in an array.
- Maps: The key-value pairs have this data type in which it can arrange the data in a key-value format. It has a key as a string.
- Unions: This data type has been used when any field has more than one data type, constituting a JSON array.
- Fixed: This data type has been used to define set-sized fields for reserving the binary data.
Conclusion
In this article, we conclude that the Avro has been used for describing the data schema for documentation values, and any schema can define the fields which can be allowed in the value with any data type. We have also discussed creating, types, and how to write the schema in Avro.
Recommended Articles
This is a guide to Avro schema. Here we discuss the introduction, What is Avro schema, How to write Avro schema, and examples with code implementation. You may also have a look at the following articles to learn more –