Avatar

dogada's blog

Related topics

About

cliWs is the social RSS-reader for people living on the web. Interesting news from different channels are grouped by topics. Each topic has its own rating of news formed only by readers of this topic. Read more.

Full personalization — сreate your own newspaper and read only topics interesting to you personally. Be yourself and keep in touch with people who share your interests.

RJSON: compress JSON to JSON

b@dogada Jun 2, 2012 programming javascript программирование

RJSON converts any JSON data collection into more compact recursive form. Compressed data is still JSON and can be parsed with JSON.parse. RJSON can compress not only homogeneous collections, but also any data sets with free structure.

RJSON is single-pass stream compressor, it extracts data schemes from document, assign each schema unique number and use this number instead of repeating same property names again and again.

Bellow you can see same document in both formats.

JSON:

{
"id": 7,
"tags": ["programming", "javascript"],
"users": [
    {"first": "Homer", "last": "Simpson"},
    {"first": "Hank", "last": "Hill"},
    {"first": "Peter", "last": "Griffin"}
],
"books": [
    {"title": "JavaScript", "author": "Flanagan", "year": 2006},
    {"title": "Cascading Style Sheets", "author": "Meyer", "year": 2004}
]
}

RJSON:

{
"id": 7,
"tags": ["programming", "javascript"],
"users": [
    {"first": "Homer", "last": "Simpson"},
    [2, "Hank", "Hill", "Peter", "Griffin"]
],
"books": [
    {"title": "JavaScript", "author": "Flanagan", "year": 2006},
    [3, "Cascading Style Sheets", "Meyer", 2004]
]
}

When RJSON founds new object schema (unique combination of names of properties) it outputs the object as is and assign it a new index starting from 1 (0 is reserved for numeric arrays). Next objects with same schema are encoded as arrays (in beginning — schema's index, then — values of properties). Several consecutive objects with same schema are merged into same array, so shema index is stored only once for them. Schemes itself aren't stored in the document, unpacker will index new schemes in exactly same way like packer does and we will have implicit in-memory index identical for both packing and unpacking.

You can download source code from https://github.com/dogada/RJSON or try RJSON demo where you can convert any JSON data into RJSON-format, decode result and ensure that it matches original JSON data.

RJSON allows to:

  • reduce JSON data size and network traffic when gzip isn't available. For example, in-browser 3D-modeling tools like Mydeco 3D-planner may process and send to server megabytes of JSON-data;
  • analyze large collections of JSON-data without unpacking of whole dataset. RJSON-data is still JSON-data, so it can be traversed and analyzed after parsing and fully unpacked only if a document meets some conditions.

The above JSON vs RJSON example is based on the data structure from the JSON DB: a compressed JSON format. JSONDB concept is implemented in JSONH - JSON Homogeneous Collections Compressor. RJSON provides similar level of data compression like JSONH does, but RJSON isn't limited to homogeneous collections only.

Recently I also found CJSON, it uses similar approach like RJSON does, but uses explicit schemes and wraps compressed objects into other objects with empty keys. RJSON instead tries to preserve original structure of the document, uses implicit index of data schemes, encode compressed objects into arrays and merge homogeneous sequences of objects into single array.

In my opinion RJSON combines best features from JSONH and CJSON and introduces beautiful implicit index of data schemes.

The idea of this algorithm come to my mind near five or six years ago when I worked on J2ME XHTML-rendering engine for mobile phone browsers where each extra Kb matters. When I realized that XML is like violence I started to like JSON and adapted original idea to JSON format.

The code is available under Simplified BSD License. Fell free to compress the world.

Discussion on Hacker News


Comments (4)

  • Fumpful: AFAIK, the order of object fields is not necessarily preserved by JSON reader implementations. I don't think, that this will work reliably with implementations, which discard the ordering of keys after reading (not unusual with hash table based representations, e.g., org.json Java). 1
  • estebank: Have you compared the results of RJSON against GZipped JSON? 1
  • dogada @Fumpful: Fumpful, thanks for your review. The issue with ordering of keys was also noticed on Hacker News and I already fixed it: https://github.com/dogada/RJSON/commit/a27c8927cd0c2d7d151e22bc28c56ba061dbbae9
  • dogada @estebank: RJSON and Gzipped JSON have different area of application. I didn't want to replace Gzip, RJSON is more working proof of concept ATM. It demonstrates how it's possible to decrease JSON entropy with implicit schema index.
    I think about improving RJSON with better compression of nested schemes that in theory should decrease structural entropy to 0. Then I will compare Gzip vs RJSON. Taking into the account the fact that Gzip works with dictionaries and RJSON works with data schemes, they should help each other and compression ratio of RJSON+Gzip should be better on 20-30% for typical JSON documents with repeated patterns.

Comments are closed.