This section outlines the entire process of data generation, covering the various data sources, the data processing workflows and annotation methods, and providing an explanation of the included label categories and their definition methods.
Data collection methods
Figure 1 illustrates the three data sources of BINS and their corresponding data preprocessing processes used in this study. The data sources include:

The three sources of data, along with the data preprocessing and annotation process, all three data sources undergo corresponding annotation processing, with an annotation example provided in the lower-right corner.
Data source 1
As shown in Fig. 1, the first source of data comes from real-world operational data collected during intent-based network operations provided by Sichuan Telecom, Sichuan, China. This data was monitored and recorded within the intent-based network, covers network demands and performance metrics across various business scenarios, with a total of 10,001 records. The data metrics include service name, service type, start and end times, and network performance requirements. Based on these metrics, the data can be transformed into corresponding expressions of network intent. For instance, a record with a service name “web browsing”, service type “network service”, start time “20201230”, end time “20201231”, and a performance requirement of “latency ≤200 ms” is translated into a user intent expression such as: “Ensure that the latency for web browsing is ≤200 ms from 20201230 to 20201231” or “Maintain web browsing latency consistently ≤200 ms”. Through such transformations, irrelevant data was filtered out, yielding over 14,000 intent expressions from real business scenarios as part of the BINS dataset.
Data source 2
As illustrated in Fig. 1, some data was manually generated by network engineers based on their practical work experience. These intent data samples were derived from common scenarios in network configuration, troubleshooting, and business requests. Since these network engineers possess specialized knowledge, the intents they created are more detailed than those expressed by end users, often including more specific network parameters, such as bandwidth and latency, which are particularly beneficial for building intent-based network systems.
Volunteers were invited to simulate user network requests, generating user intent data through interaction with a voice assistant system. This process was primarily facilitated through a voice interface, where volunteers engaged with pre-defined task scenarios, such as network configuration modifications, business requests, and performance optimization, using voice inputs. The user interface of the voice assistant was developed using HTML, CSS, and JavaScript, and incorporates the Google speech recognition API Google Cloud Speech-to-Text22. This API is renowned for its high accuracy and real-time processing capabilities, capable of rapidly and accurately converting spoken input into text, ensuring the smoothness and accuracy of the data collection process. Figure 2 provides an example of this process.

An example of the intent collection API, where the simulator interacts with the voice assistant system in a predefined task scenario, generating real business intent expression data through voice input.
On the other hand, different intent expression paradigms can significantly impact the accuracy of intent recognition systems and the subsequent generation of intent strategies. Therefore, formalizing the grammar of intent expressions is crucial23. Additionally, Some simulated users may be limited by their limited network expertise, which could result in ineffective intent expressions. To more accurately simulate end-user network requests, this paper refers to the intent classification provided in IETF RFC931624 and introduces a reference model for intent expressions, simulated users limited by network knowledge can refer to this template, as shown in Fig. 3. This model standardizes intent expressions and is composed of the following entity label sets:
Intent user (customer, or network or service operators). The “user” entity is the declarer of the intent, and can include technical users such as network experts, network administrators, service providers, or non-technical end users, such as teachers and students in a school, or administrative departments in an enterprise.
Network action. This entity represents the manner in which the required service or the network should achieve a certain state, such as providing a service, limiting bandwidth, or self-healing.
Intent target. This entity collects the user’s needs, services, or objectives that need to be fulfilled, such as web browsing, online education, or video conferencing.
Target device or function. This entity denotes the network function (e.g., data communication, resource sharing) or physical device (e.g., routers, switches) used to deploy the intent. This is an optional entity.
Network performance. This entity collects the desired network performance metrics specified by the user, such as bandwidth, latency limitations, jitter requirements, etc. This is also an optional entity.
Lifecycle (persistent, transient). This entity refers to the duration of the intent, which is an optional entity and can range from minutes to hours or longer.

The reference model provided for standardizing intent expression allows participants to select and combine relevant entity labels based on their needs, forming key entity groups to accurately express business intent.
It is noted that the standardized intent expression process illustrated in Fig. 3 serves as a reference guide only and does not require strict adherence in practical applications. Participants may flexibly adjust the sequence of entity label usage and selectively employ partial or complete entity label sets according to specific application scenarios and practical requirements, thereby achieving diversified and effective intent expression. For instance, in a campus network scenario, a network administrator may express the following intent: “For students, providing web browsing services to achieve resource sharing, requires network bandwidth exceeding 100Mbps with the service being available for a minimum of 6 hours.” This can also be simplified to: “For students, providing web browsing services to achieve resource sharing.” The corresponding entity labels are: [user]: “student”, [network action]: “provide service”, [intent target]: “web browsing”, [target function]: “resource sharing”, [network performance]: “bandwidth exceeding 100Mbps”, [lifecycle]: “6 hours”. In this case, [target function], [network performance], and [lifecycle] are optional entities.
Data source 3
As shown in Fig. 1, network intents were parsed by analyzing academic papers, industrial standards, and websites related to operators, educational institutions, and enterprises. Telecom operators and technical support websites typically contain a wealth of information, including network issues raised by researchers, intent examples, and corresponding solutions. These data encompass diverse types of intents and data. Data is collected through methods such as keyword search, topic search, and the snowball method. Subsequently, noise is removed and irrelevant information is filtered manually, while PySpellChecker25 is used for spell checking to ensure high data quality. Subsequently, through a combination of automated tools and manual review, this data was transformed into valid intent expression data. For example, in a journal26 case study on IBN, the authors presented the following intent: “Transfer a common-level video service from user A in Beijing to user B in Nanjing.” After being formatted, this intent was passed on to the intelligent policy mapping module, which parsed the intent and broke it down into a specific service function chain (SFC), such as network address translation (NAT) and firewall functions, then constructed the corresponding SFC request. In this case, user A and user B can be represented by their respective IP addresses. Hence, the intent could be translated into: “Transfer a common-level video service from user 220.15.2.10 in Beijing to user 210.59.4.15 in Nanjing” (with assumed IP addresses). Through this process of data collection and transformation, a total of 274 intent entries were gathered from relevant documents and websites. After effective conversion, 257 verified intent data entries were obtained.
Data processing and annotation
To ensure data accuracy and reliability, targeted preprocessing measures were implemented based on different data sources. For the real-world data provided by Sichuan Telecom, consistency and data completeness were manually verified, irrelevant network metrics were removed, and the data was transformed into structured intent expressions. For the data constructed by network engineers, manual reviews were conducted to ensure data relevance and reliability, guaranteeing that the constructed data accurately reflected actual network operational demands and intents. Data generated by simulated users was transcribed into text via the automated voice assistant system, after which it underwent redundancy and noise checks, as well as language normalization. Due to potential inaccuracies or irrelevant content in the speech recognition process, careful manual filtering of valid intent data was required. For network-parsed data, the processing was more complex, involving manual filtering, redundancy removal, noise reduction, and extraction of content related to network intents, which were then converted into structured intent descriptions to ensure data relevance and usability.
During network data processing, raw data collected from multiple sources could be affected by various distortions, including semantic redundancy, non-ASCII encoded characters, spelling errors, and unrelated expressions. These distortions are categorized as follows:
Semantic repetition. Sentences that convey the same meaning but differ in linguistic expression. For instance, “Ensure that there is no buffering while playing high-definition video” and “Ensure stable network performance during HD video streaming without any delays” express the same intent, which is to guarantee a stable and buffer-free network connection while watching HD videos. Although the wording differs, the core intent is the same: ensuring smooth video playback of streaming media videos.
Non-ASCII encoding. Sentences containing special characters, such as Greek letters, mathematical symbols, or consecutive punctuation marks.
Spelling errors. To ensure linguistic accuracy, an automated Python-based spell-checking tool, PySpellChecker25 was used to detect and correct spelling errors within the data.
Unrelated expressions. These are statements that contain additional information unrelated to the core intent. For example, in the sentence, “The company has recently expanded its workforce, so we need to ensure that video conferencing for all employees has no network latency,” the unrelated portion, “The company has recently expanded its workforce,” should be removed, leaving the key intent as “Ensure no network latency for video conferencing.”
All preprocessed data underwent entity annotation, relationship annotation, and slice type annotation to make it suitable for natural language processing tasks, such as Named Entity Recognition (NER)27 and relation extraction. Entity labels are used to identify key elements in the text, such as business types, network performance, or functionalities; relationship labels describe the interconnections between business entities, such as “provides” or “ensures”; slice type labels specify the corresponding network slice type for the requirements expressed in the intent, based on domain-specific needs. These slice types include eMBB, URLLC, and mMTC, which are used to differentiate various network service types. To further enhance the utility of the data, the BIO (Begin, Inside, Outside)28 annotation scheme was applied to label the raw data, where the BIO annotation is applicable to tokenized text. The BIO annotation scheme is widely used in sequence labeling tasks, such as NER, tokenization, and syntactic analysis. It assigns a label to each word to indicate its entity type, with “B” indicating the beginning of an entity, “I” the middle part, and “O” representing non-entity words. The annotated structured data was stored in JSON format, ensuring consistency and usability, and providing a high-quality data foundation for the subsequent training and deployment of intent recognition models.