Docs
Getting Started
This guide shows you how to access and use the Virtual Compute Environment (VCE), a secure space that allows you to query and analyze public data.
How does the Virtual Compute Environment work?
The VCE allows you to access and analyze TikTok's public data in two stages. These stages are meant to protect user privacy and help organize your data analysis.
- Test Stage: Query the data using TikTok's query software development kit (SDK). The VCE will return random sample data based on your query, limited to 5,000 records per day.
- Execution Stage: Submit a script to execute against all public data. TikTok provides a powerful search capability that allows data to be paginated in increments of up to 100,000 records. TikTok will review the results file to make sure the output is aggregated.
Note: TikTok only reviews the results to ensure that there is no identifiable individual information extracted out of the platform. All aggregated results will be shared as a downloadable link to the approved primary researcher's email.
View your client registration
Once your application is approved, a research client will be generated for your project. You can view your approved research projects on your Research projects page. Select a project from the list to view the research client details.
The provided Client key and Client secret are required to access the VCE. The client key and secret are hidden by default but can be displayed by clicking the Display button (eye icon).
Note: The client secret is a credential used to authenticate your connection to TikTok's Research Tools. Do not share this with anyone!
Log in to the Virtual Compute Environment
First, go to the Virtual Compute Environment login page.
Then, sign in using your Client key as the Username and your Client secret as the Password.
Use the Virtual Compute Environment
Test Stage: Query TikTok's public data
In the Test Stage, you will submit a query to access random sample public data about videos, comments, and users. You can retrieve up to 5000 records per day.
Note: If you want to analyze all public user data, you must submit a script to the VCE, as explained later in this guide.
Install query SDK
After logging into the VCE, click the New Launcher [+] button. Then open a new notebook, choosing the Python 3 (ipykernel) option.
Copy and paste the following code into the terminal, then run the code to install the query SDK from TikTok.
!pip install \
-U --index-url https://us-west2-python.pkg.dev/research-platform-prod/jupyterlab-extensions-prod/simple/ \
pyrqs
Set query parameters
You must structure your query according to the following guidelines. Use the query structure example code below as a framework for formatting your query.
Data category
Indicate what category
of data you want to query. The available data categories are described in the respective reference pages:
Condition groups
Create your query condition_groups
using the listed field names, operations, and boolean operators.
Field names
The following are the field_name
values:
keyword
create_time
display_name
region_code
id
video_description
hashtag_name
music_id
like_count
comment_count
share_count
view_count
effect_ids
hashtag_names
playlist_id
voice_to_text
duration_type
video_length
Operations
The following are the operation
values:
IN
: Tests if an expression matches any value in a list of valuesEQ
: Tests if an expression matches the specified valueGT
: Tests if an expression is strictly greater than the specified valueGTE
: Tests if an expression is greater than or equal to the specified valueLT
: Tests if an expression is strictly less than the specified valueLTE
: Tests if an expression is less than or equal to the specified valueLIKE
: Available for video_description, returns the rows if it contains a specified valueCONTAINS
: Available foreffect_ids
andhashtag_names
, returns the rows if they contain the specifiedeffect_ids
orhashtag_names
Boolean operators
Conditions are grouped by the following boolean operators:
AND
: Displays a record if all the conditions separated byAND
areTRUE
OR
: Displays a record if any of the conditions separated byOR
isTRUE
NOT
: Displays a record if all the conditions separated byNOT
areFALSE
Fields, limit, and client
Specify the fields
to be returned in the query results, and a limit
indicating the maximum number of records to return. Create a client
, such as RQSClient, to interact with the query service.
Query structure example
Below is a complete sample command that can be executed in the VCE. This example defines a data variable and prints the data received by the query to display it on the VCE.
Example code
from pyrqs import rqs
category = 'video'
condition_groups = [
{
"operator": "and",
"conditions": [
{
"field": "like_count",
"operator": "gte",
"field_values": ["10"]
}
]
}
]
fields = 'display_name,video_description,create_time,id'
limit = 10
client = rqs.RQSClient()
data = client.query(
category=category, condition_groups=condition_groups, fields=fields, limit=limit)
print(data)
The data from this sample code should be displayed on the VCE, as below.
Execution Stage: Submit script to analyze data
After you have queried the data, you can run a script to analyze TikTok's public data.
To submit a script to the VCE, do the following:
- Click the shield icon on the right sidebar.
- Select your script file in the right sidebar, then click the upload button.
- When prompted to submit a job to Data Clean Room, click the Ok button.
- Once submitted, your script will run in a trusted execution environment to analyze and prepare the results file.
- TikTok will review the results to verify that results are aggregated.
- After the results are verified, TikTok will send an email to the primary researcher to download the approved results file.