Understanding Serialization and Deserialization: Security Implications and Best Practices

Serialization and deserialization are fundamental concepts in computer science, crucial for data storage, transmission, and application functionality. This post explores what they are, how they work across different programming languages, and, most importantly, the security vulnerabilities they can introduce and how to mitigate them.

What is Serialization?

Think of serialization as a way to neatly package up complex data structures (like objects in object-oriented programming) into a format that can be easily stored in a file, sent across a network, or saved in a database. It’s like taking a disassembled piece of furniture and packing it flat for easy transport. The original object’s state is converted into a stream of bytes, a human-readable string (like JSON), or a combination of both. This “packed” format can then be “unpacked” later to reconstruct the original object.

A simple analogy is taking notes. You gather various pieces of information, then write them down sequentially in a notebook. Serialization does something similar for software objects.

What is Deserialization?

Deserialization is the reverse process. It’s taking that packed data (the byte stream or string) and rebuilding the original object from it. Continuing the furniture analogy, it’s like taking the flat-packed furniture and reassembling it into its original form. The serialized data is read, and the object is recreated in memory, ready to be used by the application.

Think of it as arriving at school and unpacking your bag. You take out each item (books, pens, lunch) and prepare them for use. Deserialization does the same, converting the stored data back into a usable object.

Serialization Formats Across Languages

Different programming languages handle serialization in their own ways. Here are some common examples:

PHP

PHP uses the serialize() function to convert objects and data structures into a serialized string. The unserialize() function performs the reverse operation.

Example:

<?php
class Note {
    public $title;
    public $content;

    public function __construct($title, $content) {
        $this->title = $title;
        $this->content = $content;
    }
}

$myNote = new Note("Shopping List", "Milk, Eggs, Bread");
$serializedNote = serialize($myNote); // Serialize the object

// To deserialize:
$reconstructedNote = unserialize($serializedNote);
echo $reconstructedNote->title; // Outputs: Shopping List
?>

The serialized output for the example might look like: O:4:"Note":2:{s:5:"title";s:12:"Shopping List";s:7:"content";s:15:"Milk, Eggs, Bread";}. Let’s break this down:

O:4:"Note":2:: Indicates an object (O) of class “Note” (with a name length of 4) and 2 properties.
s:5:"title";s:12:"Shopping List";: The “title” property (string of length 5) has a value of “Shopping List” (string of length 12).
s:7:"content";s:15:"Milk, Eggs, Bread";: The “content” property (string of length 7) has a value of “Milk, Eggs, Bread” (string of length 15).

Important PHP Methods:

PHP provides special methods that can be defined within classes to customize the serialization process:

__sleep(): Called before serialization. It can be used to clean up resources and should return an array of property names to be serialized.
__wakeup(): Called after deserialization. It can be used to re-establish connections or perform other initialization tasks.
__serialize() (PHP 7.4+): Allows custom serialization logic by returning an associative array representing the object’s state.
__unserialize() (PHP 7.4+): Allows custom deserialization logic, restoring the object from the provided associative array.

Python

Python uses the pickle module for serialization and deserialization.

import pickle
import base64

class Notes:
    def __init__(self):
        self.notes_list = []

    def add_note(self, content):
      self.notes_list.append(content)

#Serialization 
my_notes = Notes()
my_notes.add_note("Remember the milk!")
pickled_notes = pickle.dumps(my_notes) # Serialize using pickle.dumps
encoded_notes = base64.b64encode(pickled_notes).decode('utf-8') # Encode for safer transport

#Deserialization
decoded_notes = base64.b64decode(encoded_notes)
unpickled_notes = pickle.loads(decoded_notes)  # Deserialize using pickle.loads
print(unpickled_notes.notes_list[0]) #output : Remember the milk!

Pickling: It convert the data into a binary format.
Base64 encoding: The binary format from pickling will be converted to base64 string.

.NET and Ruby

.NET Serialization: .NET has historically used BinaryFormatter for binary serialization, but this is now strongly discouraged due to security vulnerabilities. Modern .NET applications should use safer alternatives like System.Text.Json (for JSON serialization) or System.Xml.Serialization (for XML). The trend is towards these standardized, more secure data formats.
Ruby Serialization: Ruby uses the Marshal module for object serialization. For human-readable data, YAML (.yml files) is often preferred.

Identifying Serialization in Applications

How can you tell if an application is using serialization?

With Source Code Access

If you have access to the application’s source code, it’s relatively straightforward:

Look for function calls like serialize(), unserialize(), pickle.loads(), pickle.dumps(), Marshal.dump(), Marshal.load(), or references to serialization libraries like System.Text.Json.

Without Source Code Access

Without source code, you need to be more observant:

Error Messages: Look for error messages that mention terms like “unserialize”, “deserialization error”, or “object reconstruction”. These can indicate that serialization is happening and that your input is affecting it.
Application Behavior: Experiment with inputs. Modify cookies, form data (especially POST requests), or URL parameters. If seemingly small changes cause unexpected behavior or errors, it could be a sign that serialized data is being manipulated.
Cookies: Cookies often store serialized data. Look for:
- Base64 Encoded Values: Many languages, including PHP and .NET, often store serialized data in base64-encoded strings within cookies. Try decoding these values (using a tool like CyberChef or a simple base64 decoder). If you see structured data resembling serialized objects, that’s a strong indicator.
- ASP.NET View State (__VIEWSTATE): In older ASP.NET applications, the __VIEWSTATE field (often found in hidden form fields) is a base64-encoded blob that can contain serialized data.
File Extensions: Some applications might use specific file extensions that suggest serialized data, although this is less reliable than the other methods.

Security Vulnerabilities: Exploiting Deserialization

Deserialization vulnerabilities are a serious threat. They arise when an application deserializes untrusted data without proper validation. This can allow attackers to inject malicious code or manipulate the application’s state.

Exploitation 1: Property Modification

One common attack involves modifying the values of properties within a serialized object. For example, consider a PHP application that stores user session data in a serialized cookie:

O:5:"User":2:{s:8:"username";s:5:"guest";s:5:"admin";b:0;}

This represents a User object with two properties: username (a string) and admin (a boolean). The admin property is currently false (represented by b:0). An attacker could modify the cookie, changing b:0 to b:1 (representing true), re-encode it to base64, and send it back to the server. If the application blindly deserializes this modified cookie, the attacker might gain administrative privileges.

Exploitation 2: Object Injection

Object injection is a more sophisticated attack. It relies on the presence of “magic methods” (like PHP’s __wakeup() or __destruct()) within classes defined in the application. These methods are automatically called during serialization or deserialization.

If an attacker can control the data being deserialized, they can craft a malicious serialized object that, when unserialized, triggers these magic methods to execute arbitrary code.

Example (Conceptual PHP):

Imagine a class with a __destruct() method that executes a system command:

<?php
class VulnerableClass {
    public $command;

    public function __destruct() {
        if (isset($this->command)) {
            system($this->command); // Extremely dangerous!
        }
    }
}
?>

An attacker could create a serialized instance of VulnerableClass with the command property set to a malicious command (e.g., rm -rf / – DO NOT ACTUALLY DO THIS!). If the application deserializes this crafted object, the __destruct() method will execute the attacker’s command when the object is destroyed.

Automation Tools

Several tools can help automate the process of finding and exploiting deserialization vulnerabilities:

PHPGGC (PHP Gadget Chain Generator)

PHPGGC is a tool specifically designed for PHP. It helps generate “gadget chains” – sequences of object instantiations and method calls that can be chained together to achieve a specific goal, such as remote code execution.

Download: PHPGGC is typically available on GitHub.
Listing Gadget Chains: Use php phpggc -l to list available gadget chains for various PHP frameworks and libraries.
Generating Payloads: Use commands like php phpggc <Framework>/<GadgetChain> <function> "<command>" to generate a serialized payload. For example, php phpggc Laravel/RCE3 system "whoami" would generate a payload to execute the whoami command using a Laravel gadget chain.

Ysoserial (Java)

Ysoserial is a tool for generating payloads that exploit insecure deserialization in Java applications.

Download: Ysoserial is available on GitHub.
Generating Payloads: Use a command like java -jar ysoserial.jar [payload type] '[command to execute]'. For example, java -jar ysoserial.jar CommonsCollections1 'calc.exe' would generate a payload using the CommonsCollections1 gadget chain to execute calc.exe (on Windows).

Mitigation Measures: Protecting Your Applications

Deserialization vulnerabilities are preventable. Here’s how to protect your applications, from both a penetration tester’s and a secure coder’s perspective:

Red Teamer / Penetration Tester Perspective

Codebase Analysis: Thoroughly review the application’s codebase to identify all points where deserialization occurs.
Vulnerability Identification: Use static analysis tools (code scanners) to automatically detect potential insecure deserialization vulnerabilities.
Fuzzing and Dynamic Analysis: Use fuzzing techniques to send malformed or unexpected serialized data to the application and observe its behavior. This can reveal vulnerabilities that static analysis might miss.
Error Handling Assessment: Carefully examine error messages and stack traces. Ensure they don’t leak sensitive information about the application’s internal workings or the serialization process.

Secure Coder Perspective

Avoid Insecure Serialization Formats: Never deserialize data from untrusted sources using inherently unsafe formats like Java’s ObjectInputStream or PHP’s unserialize() without extreme caution and robust validation. Favor safer, data-only formats like JSON or XML.
Input Validation and Sanitization: Always validate and sanitize data before deserializing it, even if you’re using a safer format. This means:
- Type Checking: Ensure that the data conforms to the expected data types (e.g., strings are strings, numbers are numbers).
- Length Restrictions: Limit the size of the data being deserialized.
- Whitelist Allowed Values: If possible, define a whitelist of allowed values and reject anything that doesn’t match.
- Schema Validation: For JSON or XML, use schema validation to enforce the expected structure and data types.
Avoid eval() and exec(): Never use functions like eval() or exec() (or their equivalents in other languages) with data derived from deserialized input. This is a recipe for remote code execution.
Principle of Least Privilege: Ensure that the code performing deserialization runs with the minimum necessary privileges. Don’t run it as an administrator or root user.
Defense in Depth: Implement multiple layers of security. Even if one layer fails, others should prevent exploitation.
Follow Security Best Practices: Adhere to secure coding guidelines for your chosen language and framework. Organizations like OWASP (Open Web Application Security Project) provide excellent resources.
Don’t deserialize Untrusted Input: if you must deserialize, consider using alternative data formats like JSON

Innovative Software Technology: Securing Your Data Serialization

At Innovative Software Technology, we understand the critical importance of secure serialization and deserialization practices. We can help your organization protect against these vulnerabilities through a comprehensive approach:

Secure Code Reviews: Our expert developers can conduct thorough code reviews, focusing on identifying and remediating insecure deserialization practices. We use a combination of manual analysis and automated tools to ensure comprehensive coverage. This helps improve your SEO by ensuring you use secure code and protecting your clients from vulnerabilities.
Penetration Testing: Our skilled penetration testers can simulate real-world attacks to identify and exploit any existing deserialization vulnerabilities in your applications. This proactive approach helps you identify weaknesses before malicious actors can exploit them. Having secure website by Penetration Testing will make sure that your SEO rank is the best.
Security Consulting: We provide expert guidance on implementing secure serialization strategies, including choosing appropriate data formats, implementing robust input validation, and following secure coding best practices. Having the best consultation for your business will make you in the top searches for SEO.
Secure Development Training: We offer training programs to educate your development team on secure coding practices, including how to avoid and mitigate deserialization vulnerabilities. This empowers your team to build secure applications from the ground up. Using Secure Development Training will improve your SEO and make sure that your business is protected.

By partnering with Innovative Software Technology, you can ensure that your applications are protected against the risks of insecure deserialization, safeguarding your data and maintaining the trust of your users, and have a good SEO rank.