External data representation and marshaling

Shalika Prasad
5 min readJun 20, 2020

As a developer, you write applications using program languages. so we can tell what are language level data and store in the data structure. But, at TCP/UDP that data is communicated as “messages” or stream of bytes. That means, your data should convert to the sequence of bytes. A problem is a different machine has different data representations like Integer, float-types, and char codes. So, We can get two solutions. There is both machines agree on format type or use an intermediate external standard.

So, External data representation comes. It is an agreed standard for representation of data structure and primitive values. As examples, we can get CORBA’s common data representation, Java’s object serialization, XML(Extensible Markup Language).

1. CORBA CDR

It supports to represent all of the data types that can be used as arguments and return values in remote invocations in CORBA. CDR defines a representation for both big-endian and little-endian orderings.

  • 15 primitive types: short, long, unsigned short, unsigned long, float, double, char, boolean, octet, any
  • Constructed types: sequence, string, array, struct, enum and union

But, It does not support objects. (only Java does: objects and tree of objects)

We see how to partition data to convert streams of bytes. data split to 4 bytes.

Example :

{‘Prasad’,1995}

The figure shows the sequence of bytes with four bytes in each row. Each string length followed by the characters in the string. In this here, we assume that each character gets just one byte. Each unsigned long is getting four bytes. Any data structure (primitive and constructed types) can be represented without pointers.

In this here,

  • No data type: assumed sender and recipient have common knowledge of the order and types of data items.
  • We can provide a notation for describing the types of arguments and results of RMI methods.

Struct User {

string name;

unsigned long year;

}

From the definitions of the types of their parameters and results, the CORBA interface compiler generates appropriate marshaling and unmarshalling operations for the arguments and results of remote methods.

2. Java object serialization

You can pass both objects and primitive data values as arguments and results of method invocations. You can write above User struct as bellow.

(Serializable class provided in java.io.package)

public class User implements Serializable {
private String name;
private int year;

public User(String uname, int uyear) {
name = uname;
year = uyear;
}
}

You cannot see any methods in the above class that implement the Serializable interface. Then, we can use the Serialization interface for serializing its instances. In java, What is serialization? It can tell as flattering an object or a set of objects into a serial form. That is suitable for sorting on disk or transmitting in a message. In Deserialization, Serialized form restore to the state of objects and a set of objects.

But, deserialization doesn’t have prior knowledge about the objects included in the serialized form. So, We transfer extra information about the class of each object. Thus, when an object is serialized, that information is important to load, appropriate class.

We know that when an object is serialized, all the objects that it references are serialized together with it to ensure that when the object is reconstructed, all of its references can be fulfilled at the destination. References are serialized as handles. Thus, the handle is a reference to an object within the serialized form. This recursive procedure continues until the class information and
types and names of the instance variables of all of the necessary classes have been written out. All of the classes are given a handle.

The constants of the instant variable (integers, chars, booleans, bytes, and longs) are written using the ObjectOutputStreaming class. writeUTF method write Strings and characters using UTF-8. Multiple bytes represent Unicode characters.

User user = new User(“Prasad”, 1995);

The first instance variable (1995) is an integer that has a fixed length, and
the second instance variables are strings and are preceded by their lengths.

Serialization: Firstly, create an instance of the class ObjectOutputStream and invoke its writeObject method, passing the Person object as its argument.

Deserialization: After getting an object from a stream of data, open an ObjectInputStream on the stream, and use its readObject method to reconstruct the original object.

3. Extensible Markup Language (XML)

XML is developed in 1997 as a markup language. It was defined by W3C(World Wide Web Consortium). It uses identifying tags similar to HTML. In general, the term markup language refers to a textual encoding that represents both a text and details as to its structure or its appearance.

  • SGML(Standardized Generalized Markup Language): Both XML and HTML were derived from this.
  • HTML: It was designed for defining the appearance of web pages.
  • XML: It was designed for writing structured documents for the Web.

More about XML:

  • XML data items are tagged with ‘markup’ strings. (we can use gas to describe: logical structure of data and to associate attribute-value pairs with logical Structures. In XML, these tags connect to the structure of the text that they enclose.
  • XML use to communicate with web services and for defining the interfaces and other properties of web services.
  • It can archive and retrieve systems. as well as It can be read by any computers.
  • XML is extensible. That means users can set their own tags as they like. but, in HTML, we cannot use own tags, because in HTML using only a fixed set of tags. But, we use more than one XML documents, those tags must be agreed upon those documents.
  • Multiple applications use XML for different purposes. We can use tags to enable applications to select need document part to process at that time. It will not be affected by information on other applications.

You can see the XML code part here. In this code, You can read well, that means XML can be read by humans. Basically, It is human-readable, software-readable, and hardware-readable. As well as, here, we can tell, if any wrong comes, we can read easily. Another advantage is that it makes text that independent of any particular platform.

Here, You can see these texts are human-readable. Thus, you know that textual representation wants longer processing, transmission times, and more space rather than a binary representation.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response