Last year, we announced PETAce, one of the initiatives undertaken by Privacy Innovation at TikTok to research innovative ways of safeguarding the privacy and security of user data. PETAce is a comprehensive framework for enhancing privacy with applied cryptography.
Within the PETAce ecosystem, we now introduce SecureNumpy, a data analysis module designed to bridge the gap between NumPy, a fundamental package for scientific computing with Python, and the privacypreserving paradigms of secure MultiParty Computation (MPC).
PETAce Ecosystem
PrivacyEnhancing Technologies via Applied Cryptography Engineering (PETAce) is a framework for privacypreserving computing. It can provide a strong guarantee of privacy by allowing data to be analyzed and computed without revealing any sensitive information during data processing. It consists of the following parts:
 The user interface layer provides users with highlevel programming interfaces for collaborative data analysis, joint SQL query, and privacypreserving machine learning.
 The virtual machine is responsible for parsing highlevel language into MPC operators, and performing automatic optimization and scheduling.
 The protocol layer includes secure multiparty computation protocols, such as general twoparty secure computation protocols, privacy set intersection, privacy information retrieval, and more.
 The primitive layer consists of standard cryptographic algorithms and protocols, differential privacy mechanisms, abstract network interfaces, and more.
Introduction
The use of largescale data and machine learning has significantly transformed the way the industry processes data and creates knowledge. However, this technological advancement also brings significant privacy threats, especially when dealing with confidential information. Traditional data analysis tools, such as NumPy, mainly function on plaintext data, making such data more vulnerable to potential misuse or unauthorized access.
Secure MultiParty Computation (MPC) offers a solution to this problem by allowing computations to be performed on encrypted data, thus maintaining data privacy throughout the analysis process. Despite its privacy protections, MPC is often perceived as having complex interfaces and steep learning curves, limiting its adoption among a wider audience.
SecureNumpy is a library designed to bridge the gap between NumPy, a fundamental package for scientific computing with Python, and the privacypreserving paradigms of MPC. With its intuitive interface, SecureNumpy provides a seamless experience that allows users to perform NumPy's powerful array manipulations and mathematical functions on encrypted data without compromising their privacy. SecureNumpy aims to address privacy and security concerns while offering the following:
 Make MPC easier to use: By providing an interface similar to NumPy, SecureNumpy significantly reduces the entry level required to use MPC, empowering more practitioners without deep cryptographic expertise.
 Facilitate secure data analysis: SecureNumpy facilitates collaboration between organizations by enabling the extraction of valuable insights from shared datasets while ensuring the strict confidentiality of each participant's individual contributions.
 Enhance performance and efficiency: SecureNumpy strives to optimize the underlying MPC operations and make secure computation practical for realworld applications.
 Encourage open research and collaboration: As an opensource library, SecureNumpy is expected to facilitate a collaborative environment that combines cryptography and data science. Interdisciplinary collaboration will produce innovative solutions that prioritize user privacy.
In summary, SecureNumpy merges the simplicity and power of NumPy with the security guarantees of MPC, creating an effective and secure tool for data collaboration analysis.
Design Principles
In the development of SecureNumpy, we adopted five main design principles:
 Security and reliability: SecureNumpy maintains strict cryptographic standards and leverages stateoftheart techniques for data confidentiality, correctness of calculations, and resistance to attacks. MPC ensures the confidentiality and integrity of the data in the semihonest setting, and the data is secretshared among participants to prevent singleparty access.
 Simplicity and ease of use: SecureNumpy simplifies the complexity of MPC, providing a straightforward interface reminiscent of NumPy. The goal is to enable users to write code naturally
and learn quickly. Users can write code as if they were using NumPy, without understanding the underlying cryptographic operations. Interface familiarity significantly reduces the learning curve, allowing new users to become proficient in a short time.  Flexibility: SecureNumpy employs a userfriendly interface design that mirrors the NumPy suite, allowing users to engage with it without learning unfamiliar APIs, significantly lowering the learning curve. More importantly, this approach also empowers users to develop new functions that do not yet exist in the library, thus further enhancing its functionality. For example, even though SecureNumpy does not currently offer a builtin function to compute the ciphertext log function, users can apply the Taylor expansion method to construct a log function based on the existing multiplication function. This design approach not only promotes SecureNumpy's scalability, but also encourages users to customize and extend it to meet their specific needs, catering to a wide range of complex computing requirements. Through the use of modularity and plugin mechanisms, users can easily integrate custom functions, thereby making SecureNumpy more flexible and powerful in multiparty computing environments.
 Efficiency: SecureNumpy, as a core component of PETAce, provides users with an easytouse API. For the underlying core protocol, the industry's most advanced research results were integrated and implemented in C++ to ensure optimal performance.
 Openness and transparency: SecureNumpy prioritizes openness and transparency by embracing the opensource philosophy, offering users access to its source code and detailed development documentation, facilitating community review and contributions.
Key Designs of SecureNumpy
The Ndimensional array (SecureArray)
SecureArray
is the foundational object provided by SecureNumpy, designed specifically to enable MPC while maintaining a userfriendly interface similar to that of NumPy arrays. SecureNumpy aims to bring the power and simplicity of NumPy's array operations to the domain of secure computations, allowing users to perform complex data manipulations and analyses without compromising privacy or security.
A SecureArray
is a fixedsize multidimensional container of items of the same type and size distributed in two parties. Like NumPy, SecureArray
provides facilities such as shape
and dtype
, as well as explicit indexing capabilities. SecureArray
also supports operator overloading, enabling basic operations like addition, subtraction, multiplication, division, and comparison, with either SecureArray
or numpy.ndarray
.
Key Features
 Operator Overloading:
SecureArray
supports operator overloading, which allows users to perform arithmetic and logical operations using standard Python operators. This means you can use operators like+
for addition,
for subtraction,*
for multiplication,/
for division, and more, directly onSecureArray
instances.  Arithmetic Operations: You can perform secure addition, subtraction, multiplication, and division between
SecureArray
instances or betweenSecureArray
andnumpy.ndarray
.  Comparison Operations: You can also perform secure comparison operations such as
==
,!=
,<
,<=
and=
, enabling secure conditional logic without revealing the underlying data.  Elementwise Operations: All these operations are applied elementwise, similar to how they work in NumPy, ensuring a consistent and intuitive experience.
 Attributes: Array attributes are integral to the structure and behavior of the array itself. The properties associated with an array through its attributes can be accessed, and occasionally modified, without recreating a new array.
 Shape: The
shape
attribute allows users to query the shape of theSecureArray
. This is particularly useful for understanding the structure of the data and for performing operations that require specific shapes.  Number of Dimensions (ndim): The
ndim
attribute provides the number of dimensions (axes) of theSecureArray
.  Size: The
size
attribute returns the total number of elements in theSecureArray
.  Data Type (dtype): The
dtype
attribute specifies the data type of the elements stored in theSecureArray
, such asnp.float64
andnp.bool_
.  Methods:
SecureArray
provides some useful methods to operate an array. All the methods will return an array result. Common methods are as follows.  Shape manipulation:
SecureArray
supports common shape manipulation methods such asreshape
andtranspose
. Thereshape
method allows users to change the shape of theSecureArray
and return a new array. Thetranspose
method returns a new array with axes transposed. These operations are essential for preparing data for specific algorithms that require inputs of a certain shape or for mathematical computations that require rearranging dimensions.  Array conversion: In contrast to NumPy,
SecureArray
is a twoparty scientific computing library. Each party holds a share of the data, rather than the original data. At some point, you may want to perform an operation on this share, such as saving it to a file. Therefore, we implemented theto_share
method, which transforms the local share into annumpy.ndarray
for subsequent user operations. Currently, we have also provided thefromshare
function for restoring the share of both parties to aSecureArray
.  Reveal: As a result of executing a series of ciphertext computations, the function
reveal_to
allows you to reveal the result to one of the parties. This function restores theSecureArray
to anumpy.ndarry
and transmits it to one of the parties, while the other party will receive a null value.  Indexing:
SecureArray
can be indexed using the standard Pythonarr[obj]
syntax. Basic indexing is available, it includes single element indexing and slicing.  Single element indexing: Single index can be used to access individual elements of a
SecureArray
. If one indexes a 2d array, one gets a 1d array. And negative indices are also supported. IfX
is a 2d array, then you can usex[0]
,x[2]
,x[0, 2]
,x[0][2]
to get different objects.  Slicing: slice
obj
constructed bystart:stop:step
,SecureArray
support slicing index andstep
must equal to 1. IfX
is a 2d array, then you can usex[:2]
,x[2:]
,x[0:, 2:]
to get different objects.
Routines
One of the main reasons for the widespread use of NumPy is its extensive library of functions. These functions cover a wide range of capabilities, from basic numerical arithmetic to advanced matrix operations. Whether you are performing simple elementwise operations or complex linear algebra computations, NumPy's optimized C and Fortran code ensures that these operations are performed with maximum efficiency.
For example, NumPy supports a variety of mathematical operations including trigonometric, statistical, and algebraic functions. It also provides powerful capabilities for handling arrays. These features make it an indispensable tool for data analysis, scientific computing, and machine learning.
To boost usability, SecureNumpy also offers a series of practical routines, including reshape
, stack
, sum
, argmax
, and more. These routines, grouped by functionality, offer an experience similar to using NumPy, making it easy for users to transition between the two libraries. SecureNumpy ensures that the computations are performed securely and efficiently, adhering to best practices in secure computing.
The following are the main modules and supported functions in SecureNumpy:
Module  Description  Examples 
Array creation  Methods to create 

Array manipulation  Change array shape, transpose an array, and join arrays 

Mathematical functions  Some arithmetic operations, exponents and logarithms will be provided in the future 

Linear algebra  Some matrix and vector product functions 

Sorting, searching and counting  Some sort and search functions 

Statistics  Statistic functions to calculate order, average and variance 

Usage methodology
Setting up the virtual machine
SecureNumpy is developed based on PETAce Duet. In order to utilize SecureNumpy, the first step involves initializing a Duet
virtual machine (VM). Here, each party needs to specify its own party ID. Party identification plays a significant role in establishing and securing network connections. Once this is complete, the two parties can communicate using the specified IP and port.
Since this is a twoparty computing library, it is essential to execute the subsequent program on two distinct machines or two distinct processes, whilst utilizing your unique party identification (0 or 1) through the command line argument.
import sys
from petace.network import NetParams, NetScheme, NetFactory
from petace.duet import VM
party_id = sys.argv[1]
host = "127.0.0.1"
port0 = 8090
port1 = 8091
net_params = NetParams()
if party_id == 0:
net_params.remote_addr = host
net_params.remote_port = port1
net_params.local_port = port0
else:
net_params.remote_addr = host
net_params.remote_port = port0
net_params.local_port = port1
# init net and mpc engine
net = NetFactory.get_instance().build(NetScheme.SOCKET, net_params)
vm = VM(net, party_id)
Create SecureArray and conduct basic operations
In a twoparty computing scenario, one party will provide the original plaintext data to secretshare, and the other only needs to enter None
. The array
function is used to help users share plaintext securely and convert it into a SecureArray
object. To indicate where the plaintext data originated, the user must enter a party ID for each operation.
When data is converted into a SecureArray
, it can be treated like a local numpy.ndarray
. We've overridden all basic operations, enabling users to freely perform mathematical operations on plaintext and ciphertext. Moreover, the module also provides common attributes for code debugging. However, these properties only showcase metainfo about the data, such as its shape and dimensions, and do not directly reveal the original data source.
Once all the MPC calculations are complete, the final result can be restored to a party using the reveal_to
method. In this example code, the final result of the calculation is revealed to party 0, who will receive the correct comparison result, while party 1 will receive a None
.
In the example code, the original data source is identified by reading the code. It's important to remember that the original data source should not be created through code, but imported by user. For example, you can load the data using plain_data0 = np.load("/path/data.npy")
, which will only reveal that you provided some data, but not the actual data itself. The data in this example is generated solely for the purpose of making it easier to understand.
Currently, our module only supports data sources that are numpy.ndarray
with data types of float64
or bool
that have at most two dimensions. However, we are planning to add support for more data types in the future.
import petace.securenumpy as snp
import numpy as np
snp.set_vm(vm)
if party_id == 0:
plain_data0 = np.array([1., 2., 3.])
plain_data1 = None
else:
plain_data0 = None
plain_data1 = np.array([4., 5., 6.])
# transform a numpy.ndarray to a SecureArray
cipher_data0 = snp.array(plain_data0, 0)
cipher_data1 = snp.array(plain_data1, 1)
# some basic operations
res1 = cipher0 + cipher1
print(res1.shape) # (3,)
print(res1.dtype) # np.float64
res2 = cipher0 * 2
res3 = cipher0 / cipher1[0]
res4 = cipher0 > cipher1
print(res4.dtype) # np.bool_
# reveal data
res_plain = res1.reveal_to(0)
Advanced features
SecureNumpy has a comprehensive library of functions designed to enhance the usability of the application. These functions mirror the native functions in the NumPy package, allowing users to seamlessly incorporate them into their workflow. The following example code demonstrates how to use some of the basic functions provided by SecureNumpy, such as creating special arrays, modifying array dimensions, summing array elements, and identifying maximum values.
# create data
data0 = snp.arange(20).reshape((4, 5))
data1 = snp.ones((4, 4))
# manipulation
data = snp.concatenate([data0, data1], axis=1)
print(data.shape) #(4, 9)
data_reshape = snp.reshape(data, (6, 6))
# some math function and statistic
res_sum = snp.sum(data)
res_max = snp.max(data, axis=0)
res_argmax = snp.argmax(data, axis=1)
SecureNumpy's extensive library enables users to customize their unique functions, even without a thorough knowledge of cryptography. It is shown below that users can effectively implement ReLU activation functions by taking advantages of the functions we offer. This is a major driving force behind SecureNumpy's development. We strongly believe that all users, regardless of their cryptography skills, can be part of SecureNumpy and contribute effectively to its development.
def ReLU(x):
return snp.max(0, x)
def PReLU(x, a):
return snp.where(x<0, a*x, x)
Use cases
To clarify how SecureNumpy applies to various scenarios, here are a few hypothetical examples.
Scenario 1: Multiparty data analysis
Imagine two affiliates: Company A, which specializes in selling highend luxury goods, and Company B, which specializes in private banking. Each affiliate maintains a unique database, with Company A holding data on its customers' purchasing habits and financial profiles, while Company B holds data on its customers' banking history and other financial data. In an effort to more precisely identify and tailor wealth management solutions for highnetworth customers, Company B wishes to integrate these two datasets for a more comprehensive understanding of its clients' financial situations.
Company A has the customers' luxury purchase history data table data0
, including two columns: purchase frequency and purchase amount. Company B has the customers' financial asset data data1
, including two columns: account balance and wealth management account amount. Highnetworth customers (HNWC) are defined as follows: purchase frequency is greater than 2, purchase amount is greater than 5000, bank account balance is greater than 500w. This logic can be expressed as below:
cond = (data0[:, 0]>2) & (data0[:, 1]>5000) & (data1[:, 0] >5000000)
cond_float = snp.where(cond, snp.ones(cond.shape), snp.zeros(cond.shape))
hnwc_number = snp.sum(cond_float)
hnwc_cost = snp.sum(cond_float * data1[:, 1])
average_cost = hnwc_cost / hnwc_number
Through multiparty data analysis, Company B gains insights into the investment preferences of highnetworth customers, enabling the creation of customized marketing strategies specifically designed to cater to their needs and preferences. This process not only improves customer satisfaction but also strengthens their brand loyalty. Additionally, SecureNumpy ensures the privacy and security of data during the analysis process, providing enterprises with a reliable solution that allows for both data sharing and joint analysis.
Scenario 2: Privacypreserving machine learning
Privacypreserving machine learning is a complex field that requires a comprehensive understanding and careful consideration of multiple aspects.
 Multiparty computing protocols: You need to understand and implement complex multiparty computing protocols that ensure that participants do not expose their data during the computation process.
 Design of basic operators: You need to design and implement basic operators based on multiparty computing protocols, such as matrix multiplication, addition, and more. These operators need to perform efficient calculations under the premise of ensuring data privacy.
 Data encryption and decryption: To ensure the security of data during transmission and computing, it is necessary to encrypt and decrypt the data.
 Performance optimization: Multiparty computing typically increases computing and communication overhead, so performance optimization is required to ensure computational efficiency.
Using SecureNumpy makes it easy to implement privacypreserving machine learning. It simplifies the complexity of multiparty computing (MPC), making it easy even for developers who are new to MPC technology to get started. Here are the main advantages of SecureNumpy:
 Simplified interface: SecureNumpy provides a similar interface to NumPy, allowing developers to develop in a familiar environment. There is no need to relearn complex multiparty computing protocols.
 Builtin privacy protection: SecureNumpy implements multiparty computation protocols and basic operators at the bottom, so that data is always encrypted during computing. Developers do not need to manage the underlying encryption and decryption operations.
 Efficient computing: The optimized multiparty computing algorithm ensures the efficiency of computing and minimizes the performance overhead in traditional multiparty computing.
 Easy to integrate: Existing NumPybased code can be converted to a privacypreserving version with only a few modifications. SecureNumpy is designed to make this conversion process very simple.
 Rich function library: SecureNumpy inherits the rich function library of NumPy and is optimized for multiparty computation, providing powerful numerical computation and matrix operation functions.
For example, consider that you already have a linear regression model based on a NumPy implementation, and now you want to implement the same model in a multiparty computing environment to protect the data privacy of all parties. With SecureNumpy, you can achieve this with only a few modifications to the original code.
import numpy as np
class LinearRegression:
def __init__(self, learning_rate=0.01, n_iterations=1000):
self.learning_rate = learning_rate
self.n_iterations = n_iterations
self.theta = None
def fit(self, X, y):
# Add bias term (intercept) to X
X_b = np.concatenate([np.ones((X.shape[0], 1)), X], axis=1)
# Number of training samples and features
m, n = X_b.shape
# Initialize weights (theta) to zeros
self.theta = np.zeros(n)
# Gradient Descent
for _ in range(self.n_iterations):
gradients = (1 / m) * np.dot(X_b.T, np.dot(X_b, self.theta)  y)
self.theta = self.learning_rate * gradients
def predict(self, X):
# Add bias term (intercept) to X
X_b = np.concatenate([np.ones((X.shape[0], 1)), X], axis=1)
return np.dot(X_b, self.theta)
def get_params(self):
return self.theta
As illustrated in the above figure, the MPC version of linear regression can be conveniently implemented by replacing the initial line of code with import petace.securenumpy as np.
This will ensure a privacypreserving linear regression process.
Example code can be found in the PETAce GitHub repository.
Limitations
Although SecureNumpy is powerful, it still has certain limitations.
Unable to implement if
Using a ciphertext bool in an if
statement will result in a runtime error, because the if
statement treats the ciphertext bool as a nonempty object, regardless of its actual value, and therefore always evaluates to True. Encryption data is highly sensitive, and malicious actors could exploit the shortcircuit effect to potentially gain unauthorized access to information if the if
statement were allowed to operate on ciphertext. For instance, the plaintext value of a binary variable could easily be deduced through a simple if
statement.
if cond:
res = a
else:
res = b
While a direct determination of the conditional statement is not feasible, we offer the where
function to assist users in selecting branches. The above code may be modified as follows: res = snp.where(cond, a, b)
, which achieves the intended result and offers additional security.
For more complex judgment needs, corresponding conversion strategies can be implemented. For example, the code snippet can be translated into the equivalent res=snp.where(cond1, a, snp.where(cond2, b, c))
efficiently.
if cond1:
res = a
elif cond:
res = b
else:
res = c
Restricted indexing capabilities
We do not currently support ciphertext indexing. Instead, we primarily support plaintext indexing with comprehensive base indexing capabilities that include integer indexing and slice indexing (which must be continuous). For example, the slice range specified by `start:end:step` must adhere to a step size of 1.
Furthermore, we do not currently support advanced indexing mechanisms such as integer array indexes or boolean indexes. Here's an example of our indexing capabilities:
# Good cases
arr[0]
arr[0][2]
arr[0, 2]
arr[:2]
arr[:1]
arr[:3, :4]
# Bad cases
arr[[0, 1, 3]]
arr[1:10:2]
arr[np.array([1, 2, 3])]
arr[np.array([True, False, True])]
Dimensional limitation
Currently, SecureNumpy limits the system to twodimensional arrays at most due to practicality and efficiency considerations. As for ciphertext scalars, SecureNumpy treats them simply as zerodimensional arrays.
Type restriction
The array
function is mainly used for secretsharing plaintext data and converting it into a SecureArray
. It is important to note that the input parameters must comply with the following requirements: first, they must be an numpy.ndarray
; second, the data must be of type float64
or bool
.
No crosstype operation function
It is worth noting that NumPy
has incorporated crosstype operation capabilities, such that boolean arrays can be operated directly with floating point arrays. At this point, boolean arrays are treated as 01 arrays. Unfortunately, SecureNumpy
is unable to implement operations between boolean and floatingpoint arrays at this stage. However, we are committed to addressing this issue in a future release.
If the necessity for this functionality is critical, it is recommended that the boolean array be manually converted to a floatingpoint array prior to executing subsequent calculations.
arr_float = snp.where(arr_bool, snp.ones(arr_bool.shape), snp.zeros(arr_bool.shape))
Roadmap
We outline four important future directions for development.
 Capabilities expansion: SecureNumpy's thriving efforts aim to enhance its library functionality, catering to broader scientific computing demands. These forthcoming features include the following:
 Adding new data types: Beyond float and bool, int type and conversion interfaces will be incorporated.
 Augmenting the function library: The existing 30+ functions are being expanded for enhanced utility.
 Enhanced computational precision: While prioritizing data privacy, SecureNumpy strives to boost computational precision, particularly for highprecision tasks. Key enhancements include the following:
 Numeric stability: Algorithmic optimization reduces rounding errors and loss of precision, ensuring precise results.
 Highprecision data types support: Introducing support for highprecision floating point numbers and fixed points, catering to users' needs in highprecision scenarios.
 Rigorous verification and testing: Comprehensive testing ensures no additional errors or instabilities with new features and improvements.
 Optimized computation performance: To meet the demands of largescale data analysis and machine learning, SecureNumpy will continually optimize computation performance. Measures include the following:
 Parallel computing support: Harnessing multithreading and multiprocessing technologies for increased efficiency and reduced processing time.
 Memory optimization: Streamlining memory management for reduced occupancy and improved big data handling capabilities.
 Algorithm refinement: Enhancing current algorithms using more efficient data structures and computations for further performance enhancement.
 Userfriendly documentation and tutorials: Opensource projects should provide intuitive documentation and tutorials to facilitate user adoption, understanding, and effective utilization. We plan to deliver the following:
 Thorough documentation: Providing comprehensive API documentation and usage guides from basic operations to advanced applications, aiding users in quickly mastering SecureNumpy.
 Diverse sample code: Offering diverse sample codes covering various practical application scenarios, assisting users in comprehending and utilizing SecureNumpy functionality.
In summary, SecureNumpy's future roadmap revolves around enhancing capabilities, precision, performance, and documentation. By continuously expanding functionalities, boosting precision, and refining performance, we aspire to create a powerful, efficient, and superior privacypreserving computing tool for our users. Whether you are a data scientist, machine learning engineer, or an industry user requiring multiparty computation, SecureNumpy is a reliable and highly effective solution.