TikTok for Developers
PETAce: Using Applied Cryptography to Enhance Privacy
by Yongchuan Niu, Research Engineer and Donghang Lu, Research Scientist at TikTok Privacy Innovation
Open source
Research
Privacy

We recently shared the exciting work of the TikTok Privacy Innovation Lab in our post regarding PrivacyGo. We continue our journey by introducing PETAce: a comprehensive framework for enhancing privacy with applied cryptography.

Background

In the era of digitalization and big data, there is a growing demand for privacy across diverse businesses. Various applications benefit from data collaboration on a large scale, while individuals are increasingly motivated to gain control of their personal data. Simultaneously, privacy regulations like the General Data Protection Regulation (GDPR) in the European Union, the California Consumer Privacy Act (CCPA) in the United States, the General Personal Data Protection Law (LGPD) in Brazil, and Singapore's amendment to the Personal Data Protection Act (PDPA), reflect the global embrace of robust privacy regulations.

Privacy-enhancing technologies (PETs) have attracted considerable attention from both the research community and industry in response to the growing demand for privacy protection and data collaboration. PETs encompass a broad range of technologies utilized in collecting, storing, transferring, and analyzing data with the primary goal of safeguarding privacy. These techniques aim to provide innovative solutions that allow data utilization while preserving the users' privacy.

Introduction

What is PETAce?

PETAce (Privacy-Enhancing Technologies via Applied Cryptography Engineering) is a privacy-enhancing protocol framework based on state-of-the-art research results. It provides data processing methods such as secret sharing [1], homomorphic encryption [2][3][4], and oblivious transfer [5][6], and can perform collaborative computation and analysis of two-party data while preserving data privacy. PETAce has the following characteristics:

  • PETAce provides various fundamental cryptographic protocols, such that it is flexible for users to build their own PET applications.
  • The core building blocks are implemented in C++ for best performance. Meanwhile, PETAce also provides user-friendly Python interfaces similar to the well-known Python library Numpy.
  • A variety of data analysis functions are available to fulfill common statistical and computational needs (e.g., maximum, minimum, percentiles, correlation, group-by, sigmoid, etc). These are the crucial building blocks for popular applications such as SQL query and ML training/inference.
  • PETAce provides a complete solution for secure two-party computation, enabling efficient development of end-to-end privacy-preserving applications. This guarantees the compliance of privacy regulations and ensures the privacy of user data.
  • State-of-the-art technologies and 10 million data processing capabilities. For specific protocols, our protocols are even more scalable (e.g., billion-size PSI).

Why use PETAce?

PETAce offers not just high security but also a user-friendly interface. Here are four primary reasons why people should consider using PETAce:

  1. Fast prototyping and testing PET applications.
  2. Developing privacy-preserving computation with the familiar Python API.
  3. Ensuring the privacy of user data throughout the computation process.
  4. Building privacy-preserving machine learning (PPML) applications and business intelligence (BI) analysis efficiently.

How does PETAce work?

The core functionalities of PETAce are implemented in C++. However, to enhance the ease of developing PETS programs, PETAce also offers user-friendly Python APIs. Below is an example illustrating how to construct your applications.

Consider a scenario where we have a linear regression model and a sample input, and the goal of data analysis is to obtain the model's evaluation result given the input. The underlying computation is an inner product between two vectors. More formally, a linear model can be represented by a vector M. The sample input can also be represented as a vector X. The inference result can be computed through the inner product of M and X. Therefore, plaintext logic can be achieved by the code below through Numpy:

import numpy as np

def numpy_linear_regression(model, sample):
    return np.sum(model * sample)

Now let's shift our setting to a distributed scenario where there are two parties: the Server and the Client. The Server holds a linear regression model, while the Client has a private sample input. They want to finish the same task while keeping their own inputs private. In this context, you can employ SecureNumpy as outlined below to establish your protocol. First, you need to input certain parameters (e.g., IP address, port number, party ID) so that PETAce can be properly instantiated, then you put the corresponding inputs into snp.Private(), which is the "private version" of numpy arrays. Note that each snp.Private() also has an "owner", such that only the owner has access to plaintext data. The inner product computation is supported by PETAce through elementwise multiplication followed by a sum.

import petace.duet.securenumpy as snp
from petace.duet.pyduet import NetParams, DuetVM, NetScheme

def securenumpy_linear_regression(model, sample):
    return snp.sum(model * sample)

# Init network
net_params = NetParams()

# Init mpc engine
duet = DuetVM(net_params, NetScheme.SOCKET, party)

# Set data
sample = snp.Private(sample_0, 0, duet)
model = snp.Private(model_1, 1, duet)

# Computation
ret = securenumpy_linear_regression(model, sample)

Key Technologies

Architecture

The core components of PETAce include secure multi-party computation (MPC), private set intersection (PSI), oblivious transfer (OT), homomorphic encryption (HE), and (Python) virtual machine. They collaborate with each other to provide efficient and user-friendly privacy computing capabilities.

  • PETAce-Solo implements primitive hashing, encryption, and randomness generation algorithms performed by a single party.
    • Hash functions: SHA-256, SHA3-256, and BLAKE2b
    • Psuedo-random number generators based on: SHAKE_128, BLAKE2Xb, and AES_ECB_CTR.
    • Prime field elliptic curve group arithmetics including hash-to-curve
    • Hashing tables: Cuckoo hashing and simple hashing
    • Partially homomorphic encryption: the Paillier cryptosystem
  • PETAce-Verse includes frequently used cryptographic subprotocols such as oblivious transfer and oblivious shuffling.
    • Naor-Pinkas oblivious transfer
    • IKNP 1-out-of-2 oblivious transfer extension
    • KKRT 1-out-of-n oblivious transfer extension
  • PETAce-Duet abstracts general-purpose two-party secure computing operator protocols.
    • Protocols from ABY
    • Secure comparison protocols from Cheetah
    • The secure random shuffling protocol from Secret-Shared Shuffle
    • Protocols that convert arithmetic shares to and from ciphertexts of the Paillier cryptosystem
  • PETAce-SetOps archives several protocols that perform private set operations.
    • An ECDH-PSI protocol based on Elliptic-Curve Diffie-Hellman
    • The KKRT-PSI protocol based on Oblivious Pseudorandom Functions (OPRF)
    • A private join and compute protocol based on Circuit-PSI
  • PETAce-Network provides a preliminary interface of network communication.
    • Network abstract interface
    • Socket network implementation

Key Designs

Virtual Machine

PETAce provides the architecture of the compiler, intermediate representation (IR), and virtual machine to support an interface similar to Numpy, reducing the cost of user code migration.

  • IR defines some MPC bytecodes used to describe MPC operations. In PETAce, we have three types of data: private, public, and share, which respectively represent private, public, and shared data. An IR typically looks like: {add, am, am, am}. This means that adding two arithmetic share matrices results in an arithmetic share matrix.
  • The interpreter converts Python code and SQL code into bytecode for the execution of the virtual machine.
  • The virtual machine executes the corresponding operations based on the bytecode it receives.
  • New compilers can also be developed based on PETAce's IR to support other languages, such as Go and Java.
ABH Conversion

PETAce efficiently combines secure computation schemes based on Arithmetic sharing, Boolean sharing, and Homomorphic encryption, providing best-practice solutions for secure two-party computation.

  • Arithmetic sharing supports various standard operations such as addition, subtraction, multiplication, and division, etc.
  • Boolean sharing supports various standard bitwise operations such as AND, XOR, and OR, etc.
  • Homomorphic encryption can perform various classes of computations on encrypted data.
  • The conversion between Arithmetic sharing and Boolean sharing can be leveraged to perform efficient comparisons.
  • The conversion between Arithmetic sharing and Homomorphic encryption can perform efficient specific operations, such as oblivious shuffling, matrix-vector multiplication, and so on.
SQL Operators

PETAce optimizes specific SQL statements to provide efficient SQL query operators.

  • Pre-code the group by operation and use multiplication to achieve complex grouping operations, avoiding complex comparison and sorting operations.
  • Perform preprocessing for the permutation function, which can improve the performance of complex sorting operations.
  • Use filter vectors and multiplications to improve the performance of the where operations.
  • Provide different types of PSI, such as circuit-PSI, KKRT-PSI, and ECDH-PSI, to ensure the efficiency and security of join operations.

Long-term Vision and Conclusion

We live in a highly digitalized world where privacy concerns are emerging from hundreds if not thousands of new Internet applications created every day. The pace of privacy-enhancing technology development should catch up with the progress of new applications and also the increasing pressure of privacy protection demands from regulatory bodies.

The PETAce framework aims to be a one-stop solution for fast-prototyping PET innovations, for both academic researchers and industry developers. We will keep track of state-of-the-art cryptographic solutions and integrate them into PETAce in due time. PETAce will also publish and incorporate technical advancements made by the TikTok Privacy Innovation Lab.

We welcome everyone from the academy and industry to contribute to PETAce. We want to create a collection of easy-to-use tools for designing, prototyping, testing, and deploying PETs for the community. Together, we can further push the boundaries of PETs and build a more privacy-aware and secure digital world.


Get involved with the project. Follow us on GitHub.

https://github.com/tiktok-privacy-innovation/PETAce

https://github.com/tiktok-privacy-innovation/PETAce-Solo

https://github.com/tiktok-privacy-innovation/PETAce-Verse

https://github.com/tiktok-privacy-innovation/PETAce-Duet

https://github.com/tiktok-privacy-innovation/PETAce-SetOps

https://github.com/tiktok-privacy-innovation/PETAce-Network

References

[1] An Introduction to Secret-Sharing-Based Secure Multiparty Computation

[2] Somewhat Practical Fully Homomorphic Encryption

[3] Homomorphic Encryption for Arithmetic of Approximate Numbers

[4] Public-Key Cryptosystems Based on Composite Degree Residuosity Classes

[5] Ferret: Fast Extension for coRRElated oT with small communication

[6] Extending Oblivious Transfers Efficiently

[7] ABY - A Framework for Efficient Mixed-Protocol Secure Two-Party Computation

[8] Cheetah: Lean and Fast Secure Two-Party Deep Neural Network Inference

[9] Efficient Batched Oblivious PRF with Applications to Private Set Intersection

[10] Efficient Circuit-based PSI with Linear Communication

[11] Circuit-PSI with Linear Complexity via Relaxed Batch OPPRF


Share this article
Discover more
TikTok for Developers Is Now on YouTube!
We're launching our official TikTok for Developers YouTube channel for our community to explore TikTok's tools, APIs, and developer insights, including exclusive content from TikTok DevDay.
Community
Introducing TikTok Research API Wrappers on GitHub
Check out TikTok's new Research API Wrappers, which make it easier for researchers of all technical skill levels to use TikTok's Research API
Research
Developer products
Highlights from our Privacy Innovation Meetup at ACM CCS 2024
TikTok's Privacy Innovation team hosted a meetup at ACM CCS 2024, showcasing privacy-preserving technologies like ManaTEE and reinforcing the team's commitment to privacy and security through industry and academic collaboration.
Privacy
Community