Probabilistic QoS-aware Placement of VNF chains at the Edge

Deploying IoT-enabled Virtual Network Function (VNF) chains to Cloud-Edge infrastructures requires determining a placement for each VNF that satisfies all set deployment requirements as well as a software-defined routing of traffic flows between consecutive functions that meets all set communication requirements. In this article, we present a declarative solution, EdgeUsher, to the problem of how to best place VNF chains to Cloud-Edge infrastructures. EdgeUsher can determine all eligible placements for a set of VNF chains to a Cloud-Edge infrastructure so to satisfy all of their hardware, IoT, security, bandwidth, and latency requirements. It exploits probability distributions to model the dynamic variations in the available Cloud-Edge infrastructure, and to assess output eligible placements against those variations.


Introduction
New Edge computing (Abbas et al . 2018) infrastructures aim at supporting Internet of Things (IoT) applications, especially when applications must meet stringent Quality of Service (QoS) requirements (e.g., latency, bandwidth, and security) or handle large amounts of data. To achieve this goal, such new distributed infrastructures rely on computing capabilities which are closer to the edge of the Internet and to where data are produced and consumed (e.g., personal devices, access points, smart network gateways, base stations, switches, and micro-datacenters). The fruitful interplay between Cloud and Edge resources aims at realizing a Cloud-IoT continuum (Puliafito et al . 2019), also commonly known as Fog computing (Yousefpour et al . 2019).
Meanwhile, the ongoing evolution of network technologies, namely software-defined networking (SDN) (Nguyen et al . 2017) and network function virtualization (NFV) (Mijumbi et al . 2015), is targeting a more effective usage of network resources for containing deployment and operational costs while coping with dynamic traffic demands, including requirements for delivering customized IoT-enabled services (Baktir et al . 2017).
Probabilistic QoS-aware Placement of VNF Chains at the Edge 3 considering static infrastructure conditions. For what concerns security aspects, to the best of our knowledge, only a few works (Fischer et al . 2017;Dwiardhika and Tachibana 2019;Shameli Sendi et al . 2018) have been proposed that consider them when deciding on VNF placement and SDN-enabled routing in Cloud-Edge scenarios (Farris et al . 2019).
In this article, we present a simple, yet general, probabilistic declarative methodology and a (heuristic) backtracking strategy to model and solve the VNF placement problem in dynamic Cloud-Edge computing scenarios, while considering hardware, IoT, security, bandwidth, and end-to-end latency requirements of the VNF chain to be deployed. The methodology has been open-sourced by means of the probabilistic logic programming language ProbLog (De Raedt and Kimmig 2015) in the prototype EdgeUsher 1 and also allows to easily specifying and considering placement constraints (i.e., affinity and anti-affinity among functions). The main novel contribution of this work, exploiting a probabilistic declarative description of the VNF chain placement problem, is in that EdgeUsher permits determining VNF chain placements that are likely to ensure high QoS guarantees, security, and service reliability over dynamic Cloud-Edge infrastructures that will be needed to achieve the ultimate NFV and SDN vision (Laghrissi and Taleb 2018).
Since it follows a declarative implementation, EdgeUsher is more concise, easier to understand, modify and maintain with respect to procedural solutions, and it shows a high level of flexibility, being extensible and capable of accommodating the possibly evolving needs of Cloud-IoT scenarios. Besides, EdgeUsher is intrinsically explainable as it derives proofs for input user queries by relying on state-of-the-art resolution engines, and it can be easily extended to justify why a certain management decision was taken at run time in the spirit of explainable AI.
The rest of this article is organized as follows. After formulating the VNF chain placement problem and highlighting the main features of the proposed solutions (Section 2), the Prolog implementation of EdgeUsher is incrementally described by means of a series of examples (Section 3). Then, the probabilistic heuristic version of EdgeUsher -exploiting ProbLog capabilities -is described (Section 4) and shown at work over a lifelike motivating example (Section 5). A discussion of related work (Section 6) and some lines for future work (Section 7) conclude the article 2 .

VNF chain placement: problem statement
As aforementioned, the joint adoption of NFV and SDN technologies is considered by many as a promising approach to support next-gen IoT applications in Edge and Fog environments (Massonet et al. 2017;Farris et al . 2019). Indeed, those networking technologies are expected to enable the flexible matching of diverse IoT traffic requirements, ranging from low-latency and deployment costs minimization (Wang et al . 2018;Leivadeas et al . 2019) to security mechanisms coping with threats in dynamic distributed environments (Farris et al . 2019;Puliafito et al . 2019).
In this context, our work aims at contributing to solve the problem of placing VNF chains into VNF-and SDN-enabled Cloud-Edge infrastructures. Such a problem can be generally stated as follows: 4 S. Forti et al. Let C be a VNF chain with a set of deployment requirements R D on its composing VNFs and a set of communication requirements R C to suitably support communication flows between VNFs, and let I be a distributed Cloud This work tackles the described problem following a probabilistic declarative methodology, based on the logic programming paradigm. As we will show throughout this article, the proposed EdgeUsher methodology and prototype input: -a description of one (or more) VNF chain(s) along with its (their) hardware, IoT, and security requirements (i.e., R D ), and minimum bandwidth, maximum end-toend latency, and security requirements between consecutive VNFs in the specified chain(s) (i.e., R C ), and -a probabilistic description of the corresponding hardware, IoT, security, bandwidth, and latency capabilities offered by the available Cloud-Edge infrastructure (i.e., I).
Based on those, EdgeUsher outputs a ranking of all eligible solutions for the input instance of the VNF chain placement problem. Solutions include the mapping of each chain VNF to its deployment nodes and the routing of traffic flows between those deployment nodes. The ranking of the eligible solutions considers how likely is a certain placement to satisfy all chain requirements as the infrastructure state (probabilistically) varies.

EdgeUsher methodology
In this section, we incrementally describe our methodology to solve the VNF chain placement problem, by following the declarative implementation 3 of our Prolog prototype, EdgeUsher. Working increments of the prototype are assessed against small running examples, excerpted from a lifelike motivating example on a university video surveillance distributed application, which we fully describe and use to assess EdgeUsher in Section 5. Such examples refer to a video surveillance application (consisting of various service functions) to be deployed on a campus network connecting various buildings and, more specifically, computing nodes with different hardware, IoT, and security capabilities.

Hardware requirements
In the first place, we consider a single VNF to deployed to a single infrastructure node by matching only its hardware requirements. In this scenario, a VNF can be simply declared as in: where FId is an identifier of the considered function and HWReqs is an integer representing the amount of a generic hardware unit needed by the function to run properly. Specularly, an infrastructure node to match a VNF with can be easily declared as in: 1 node ( NodeId , HWCaps ).
where NodeId is an identifier of the considered node and HWCaps is an integer representing the availability of hardware units at that node. In these very simple settings, the matching of a VNF with a node that can support its hardware requirements can be simply achieved as in: Example. Consider a service function feature_extr that analyses video frames streamed through it and extracts features for further analyses and requires 5 hardware units to run properly. It can be declared as: 1 service ( feature_extr , 5) . Now, consider an infrastructure made of the following two nodes (named after the building they are installed at): 6 S. Forti et al. assuming that IoT devices can be uniquely identified by a symbolic name, we can now declare a VNF as in: 1 service ( FId , HWReqs , TReqs ).
where TReqs is a list [IoTId1, ..., IoTIdk] of all the identifiers of the IoT devicessensors and actuators -that need to be reachable by the deployment node of the function identified by FId.
Analogously, we extend the representation of an infrastructure node as in: 1 node ( NodeId , HWCaps , TCaps ).
where TCaps is the list of the identifiers of the IoT devices -sensors and actuators -that the node reaches out directly. The servicePlacement/2 predicate defined before can be simply extended with a check on the IoT requirements as in: Example. Consider a service function cctv driver that requires one hardware unit to run properly and to directly reach a CCTV system identified by video1. It can be declared as: infrastructure nodes can feature different security capabilities, expressed in terms of a common vocabulary of Edge computing security capabilities, as per the taxonomy 4 of Figure 1  .
Such a taxonomy can be used to express security policies for a given VNF as either a list or an AND/OR composition of security properties, over a common dictionary.

Example.
As an example, a security policy for a function collecting sensitive data at the edge of the Internet, could be expressed as the list: 8 S. Forti et al. Building on top of this, we can now extend the representation of a VNF by including its security requirements as in: where SecCaps is the list of security capabilities featured by the node, expressed in terms of a common dictionary.
Finally, the servicePlacement/2 predicate defined before can be simply extended with a check on the security policies as in:  Example. Consider a service function cctv driver that requires one hardware unit to run properly to directly reach a CCTV system identified by video1, and the presence of either anti-tampering capabilities or an access control mechanism at the deployment node, or both. It can be declared as: Now, consider an infrastructure made of the following two nodes that reach out the same CCTV system (video1) but feature different security capabilities: Querying the predicate servicePlacement(cctv_driver, N) will result in the possibility of deploying cctv driver to the parkingServices node, as parkingServices2 do not feature anti-tampering nor access control capabilities, despite satisfying all VNF requirements on hardware resources and IoT.

Matching a VNF chain to infrastructure nodes
At this stage, it is possible to easily specify chains of virtual network service functions and their hardware, IoT, network QoS, and security requirements. Indeed, a VNF chain can be declared as: where ChainID uniquely identifies the chain and ServiceFunctionIDs lists the identifiers of all VNF composing the chain. In these settings, it is then possible to extend and exploit the servicePlacement predicate so to determine the placement of an entire chain, by placing one-by-one all services that compose it. This new behavior is achieved by the placement/2 predicate that follows  The new servicePlacement/2 predicate inputs the list of Services in the chain and returns an eligible Placement of them to the available infrastructure. In doing so, not only it checks that hardware (line 9), IoT (line 10), and security requirements (line 11) of each VNF are satisfied, but also it checks that cumulative hardware requirements of VNFs mapped to a same infrastructure node N do not exceed the overall capacity of the node. To this end, the Prolog program relies on an extended version of the hwReqsOK predicate (line 12, lines 18-22) to update the accumulator list AllocatedHW added as a third parameter in the servicePlacement/3 predicate to keep track of the hardware resources as per the placement being built.

Example.
As an example, consider a simple chain made of three VNFs that streams video footage from a CCTV system toward a feature extraction service function capable of identifying events of interests such as unauthorized vehicles access or fire and sending them to a lightweight analytics function for more accurate pattern recognition. Such a chain can be declared as in: Querying the predicate placement(cctv_driver, P) will output the following placements: where cctv_driver is always placed on the parkingServices so to reach out the required CCTV system, while feature_extr and lw_analytics can be placed either both on the firePolice node (which satisfies their cumulative hardware requirements of eight units), or on the lifeSciences and firePolice nodes, respectively.

Routing traffic flows
As the last step to complete the EdgeUsher prototype, we consider QoS requirements related to bandwidth allocation and end-to-end latency along a VNF chain. To this end, we extend the representation of a service with the information on its processing time as in: 1 service ( FId , TProc , HWReqs , IoTReqs , SecReqs ).
where TProc is the average time -expressed in milliseconds -elapsed between the instant an input is received by function FId and the instant the corresponding output is ready to be transmitted to the next function in the chain.
We also extend the representation of a chain so to include the possibility of specifying bandwidth requirements as directed traffic flows between couple of functions, as in:

S. Forti et al.
Then, we permit specifying constraints on maximum tolerated latency for (directed) service paths crossing the functions F1 → F2 → · · · → FN as: where LatReq is the end-to-end latency (in ms) not to be exceeded, summing up network and function processing delays.
Finally, a (point-to-point or end-to-end) link 5 connecting NodeA to NodeB which is available in the considered infrastructure can be declared as: where Latency is the latency experienced over the link (in ms) and Bandwidth is the transmission capacity it offers (in Mbps).
The program above first checks bandwidth requirements (line 10) and, afterward, latency requirements (line 11-12). First, a routing satisfying bandwidth constraints is determined by the predicate flowPlacement (line 10) which holds if: -the services S1 and S2 in between which a flow is established have been placed onto the same node N (lines 15-17), or -the services S1 and S2 in between which the flow is established have been placed onto different nodes N1 and N2 and there exists a path in between those nodes that supports the bandwidth requirement of the flow (lines 18-22).
The path(N1, N2, Radius, [], Path, 0, NewLat) predicate determines an acyclic Path of length at most Radius (i.e., maximum hop number) in between N1 and N2, which features latency Lat (line 20). A path is either a direct infrastructure link between N1 and N2 (lines 24-25), or a route of links that connect them (lines 26-29). It is worth noting that even when setting the value of Radius to low values K (i.e., 2 − 3), the found routing will actually be able to spread a chain of length L over a path of length K × L, thus extending the chain potential reach. Naturally, it is possible to relax the constraint on the Radius -incurring in longer execution times -by setting Radius to values larger than the default one (viz. 2) at line 20. After a path is found, update checks if the bandwidth requirements of the considered flow can be supported by such path (lines 31-34). Similarly to hardware allocation, a list of elements of the form (N1, N2, Bf) is maintained to keep track of the bandwidth Bf allocated on each link along a certain path and to check whether more flows mapped onto the same link do not exceed its capacity. Particularly, updateOne scans the list of links along a path and checks such requirements by accumulating the bandwidth consumed by all flows routed onto the same link (lines 36-42).
Finally, latencyOK holds if the chain latency -which is computed by summing the functions processing times of the traversed functions with the latency of the chosen path (lines 48-55) -is less than or equal to the one required by the specified maxLatency requirement.

Example.
As an example, consider the chain of the previous example, extended with the following requirements on traffic flows and end-to-end latency: Querying the predicate placement(cctv_driver, P, R) will output the following placement and associated routing directives: where cctv_driver is placed on the parkingServices and feature_extr and lw_analytics can only be placed on the firePolice node. Indeed, the previously determined placement of feature_extr to the lifeSciences node is not eligible anymore as the path that connects the lifeSciences node to the firePolice node cannot support the end-to-end latency of 50 ms. It is worth noting that the traffic flow of 15 Mbps between cctv_driver and feature_extr follows a path passing through westEntry, which connects parkingServices to firePolice. On the other hand, as feature_extr and lw_analytics are mapped onto the same node, no routing is output for the 8 Mbps traffic flow in between them.

(Anti-)affinity constraints and partial solutions
It is worth noting that EdgeUsher allows users to easily specify placement constraints in the form of function affinity or anti-affinity requirements among functions. Affinity consists in placing two or more functions in the same physical node, thus reducing latency and communications costs between VNFs, while anti-affinity prevents two or more VNFs from sharing the same resources (Oechsner and Ripke 2015). The possibility to add affinity and anti-affinity constraints is useful since it allows specifying deployment location requirements needed for performance, economic, resilience, legislative, and privacy issues (Bouten et al . 2016).
In the case of affinity constraints, the user can force the mapping of two (or more) functions to the same node, as for instance in the query: stating that F2 and F3 must be mapped on a same node N2. Analogously, anti-affinity constraints can be specified by queries of the form: imposing that F2 and F3 must be mapped on two different nodes N2 and N3.
Finally, users can specify partial deployments and/or routes and use EdgeUsher to complete them. This is useful to quickly determine on-demand re-configurations of part of a chain in case this is affected by infrastructure failures or malfunctioning (e.g., crash of a node hosting a function service) without recomputing (and eventually migrating) the whole chain.

Complexity analysis
EdgeUsher relies on logic programming backtracking mechanism to determine the eligible VNF placement(s) and traffic routing(s) for a VNF chain to be deployed to an Edge infrastructure.
The worst-case time complexity of servicePlacement is clearly O(n s ), where n is the number of nodes in the available infrastructure and s is the number of services in the input service chain. The worst-case time complexity of flowPlacement is O(b Radius ), where b is the average out-degree of nodes in the infrastructure, and in any case it is bounded by O(n Radius ) (when dealing with fully connected network topologies). These hold both in the case we aim at determining a single eligible placement or routing (and only one exists, and it is the last one found via backtracking), and in the case in which we aim at determining all eligible placements or routings (which naturally requires to fully explore the placement and routing search spaces, independently of the number of existing eligible solutions). Now, the worst-case case for determining an eligible placement and routing happens when a single eligible solution exists and it corresponds to combining the last found placement with the last found routing. In such a case, the combination of servicePlacement with flowPlacement incurs in a time complexity of O(n s × n Radius ) = O(n s+Radius ). As per the above considerations, the worst-case time complexity of the approach described until now is exponential, and so is the exhaustive exploration of the search space.

Probabilistic modeling
In this section, we first recapitulate on the probabilistic logic programming language ProbLog (Section 4.1). Then, we illustrate how the probabilistic capabilities of ProbLog permit enhancing the Prolog EdgeUsher prototype so to consider dynamic infrastructure conditions when solving the VNF embedding problem (Section 4.2). Besides, we will show how ProbLog meta-reasoning capabilities can be exploited to reduce the exp-time complexity discussed before (Section 4.3).

Background: the ProbLog language
Probabilistic logic programming extends logic programming by enabling the representation of uncertain information. More specifically, logic programming allows representing relations among entities, while probability theory can model uncertainty over attributes and relations (Riguzzi 2018). To implement both the model and the matching strategy, we used the ProbLog language (Kimmig et al . 2011;De Raedt and Kimmig 2015), a probabilistic extension of Prolog.
Prolog programs are finite sets of rules of the form: a :-b1, ... , bn.
ProbLog programs are logic programs in which some of the facts are annotated with probabilities. A ProbLog fact, such as p::a.
states that a holds with probability p. Non-annotated facts are assumed to always hold with probability 1.
Problog also allows to use semicolons to express OR conditions in rules. For instance, a :-b1; ... ; bn.
state that at most one of the facts a1, ..., aK holds with the associated probability 6 .
Each ProbLog program defines a probability distribution over logic programs where a fact p::a. is considered true with probability p and false with probability 1−p. The 6 If T = K i=1 pi < 1, ProbLog assumes the presence of an implicit null choice which states with probability 1 − T that none of the K options holds.

S. Forti et al.
ProbLog engine (De Raedt and Kimmig 2015) determines the success probability of a query q as the probability that q has a proof, given the distribution over logic programs.
Intuitively, a ProbLog program leverages input probability distributions to analyze all possible Prolog programs (i.e., worlds) that could be generated according to them. Assuming that Ω(q) is the set of possible worlds W that entail a valid proof for a certain query q (i.e., Ω(q) = {W | W |= q}), the ProbLog engine computes the probability p(q) that q holds as: where f are facts within a possible world and p(f ) is the probability they are labeled with.

Probabilistic EdgeUsher
The usage of ProbLog permits to naturally specify probabilistic profiles of both nodes and links by exploiting annotated disjunctions. Such language constructs permits to capture the intrinsic dynamicity and uncertainty of Edge infrastructures by relying on probability distributions based on historically monitored data.
An infrastructure node can then be declared as in: where k i=0 Pi 1 and Pi is the probability that a particular node configuration (with respect to available hardware, IoT devices, and security capabilities) occurs.
Analogously, links between nodes can be specified as in: where k i=0 Pi 1 and Pi is the probability that a particular link QoS configuration (with respect to end-to-end latency and available bandwidth) occurs.
It is worth noting that three different types of facts (and any mixture of them) can constitute the input infrastructure description in the ProbLog version of EdgeUsher. Indeed, a user can exploit (a) non-probabilistic facts when using real-time monitoring or averaged monitoring data (i.e., avoiding annotating facts), as shown in Section 3, (b) a single-probability facts that are just annotated with an indication of their reliability (e.g., 0.99::(cloudX, 30, [], [firewall]).), or (c) a fully-probabilistic facts annotated with complete probability distributions describing the infrastructure dynamics based on aggregate historical monitoring data.
Naturally, running EdgeUsher in ProbLog with probabilistic input enhances output precision via ranking eligible VNF placements and traffic routings according to how well they are expected to satisfy the chain requirements as the infrastructure state (probabilistically) varies.

Example.
As an example, consider the chain of the previous example, to be matched to the following probabilistic infrastructure description: where nodes feature different hardware capabilities 7 according to a probability distribution and links are assumed to be wireless links with an associated reliability of 98%. Querying the predicate placement(cctv_driver, P, R) will output the following placements, the associated routing directives, and a probability value ranking them as per how well they can satisfy all chain requirements: where cctv_driver is placed on the parkingServices and feature_extr and lw_analytics can be placed both on the firePolice node or on the lifeSciences node, that now has a 20% probability of featuring enough resources to host them. The best choice for the service chain deployer is still represented by the first output VNF chain placement and routing. It is worth noting that additional outputs with lower probability values could be kept as a backup deployments, when their associated probabilities exceed a desired threshold.

Complexity analysis and heuristics
As per the considerations we have made in Section 3.5, the algorithmic time complexity of the approach described until now is exponential, that is, O(n s+Radius ). Besides, the probabilistic reasoning on disjoint clauses requires running EdgeUsher over an exponential number of possible worlds with a worst-case time complexity of O(k n+m ) where k is the average number of disjoint clauses per each of the n+m facts denoting the n infrastructure nodes and the m infrastructure links, respectively. For instance, in the case k = 2, the overall complexity increases to O(2 n+m × n s+Radius ). Naturally, such time complexity becomes unbearable for very large infrastructures and for long service chains. Hence, we extended the prototype with a heuristic based on the probabilistic modeling we gave for the infrastructure capabilities.
The heuristic version of EdgeUsher allows users to specify two threshold values that are used to prune the search space whenever the probability of satisfying the chain hardware or bandwidth requirements, respectively, falls below them. Such pruning is implemented via the ProbLog subquerying system which is used to evaluate the probabilities of the servicePlacement (i.e., PHw) and the path (i.e., PQoS) goals during the search for eligible placements, and to check them against the user-specified thresholds (i.e., THw and THw). As soon as a candidate solution being built by the ProbLog engine is associated with a probability lower than the user-set thresholds (and, thus, will not suitably satisfy the VNF chain requirements), the heuristic version of EdgeUsher stops searching along the corresponding path, hence reducing search times.
Particularly, the illustrated meta-reasoning behavior is implemented by simple extensions of the placement and flowPlacement predicates, as in The new placement predicate exploits ProbLog built-in subquery/2 to check whether historical variations in the behavior of deployment nodes might lead to insufficient (hardware, IoT, and security) resource availability for a certain placement of the services composing the VNF chain (lines 3-4). Similarly, the heuristic version of flowPlacement checks whether the historical variations in the behavior of communication links might intolerably affect the routing of traffic flows along a certain path (lines 20-21).
Example. Running the heuristic prototype over the previous example and requiring that both deployment and communication requirements of output placement are met in 90% of the cases, that is, setting THw = TQoS = 0.9, we only obtain as a result the first output placement which has an associated probability of 0.9604.

EdgeUsher at work
In this section, we illustrate a lifelike motivating example (Section 5.1), and we discuss the EdgeUsher prototype performance and the effectiveness of the proposed heuristics over such motivating example (Section 5.2).

Motivating example
Hereinafter, we describe a lifelike example to better introduce the VNF placement problem and to highlight some of the related challenges. The example extends those that have been used throughout the paper to illustrate the proposed approach. We consider a portion of the topology of the Edge computing infrastructure deployed at UC Davis, inspired from Ning et al . (2019), and sketched in Figure 2. Such infrastructure is a wireless-optical broadband access network (WOBAN) and consists of ten heterogeneously capable edge nodes.
We assume that available edge nodes feature either 2, 4, 8, or 16 hardware units 8 and that they are subject to workload variations as per the distributions reported in Figure  3. For instance, nodes with two hardware units are totally free in 20% of the cases, while they only have one free hardware unit in the remaining 80%. We also assume that different node types feature different security capabilities as reported in Figure 3, expressed in terms of a common vocabulary of Edge computing security capabilities, as per the taxonomy of Figure 1. Last, but not least, the nodes featuring 16 GB of memory (viz., the Fire & Police and the Student Centre devices) connect the Edge network to a Cloud data center through the same Internet Service Provider (ISP) node (not shown in the figure).  Analogously, we assume that network links have the bandwidth and latency profiles listed in Figure 4. For instance, on-campus wireless connections may be not available in 2% of the cases, and feature 70 Mbps bandwidth and 15 ms latency in the remaining 98%.
We suppose that a new smart CCTV system has been installed at the Transportation & Parking Services building, and that it continuously captures video footage and streams it to a CCTV System Driver deployed to the edge node which is in physical proximity.  A VNF chain ( Figure 5) must be deployed to support the video surveillance IoT system with a running application. The chain application, when suitably deployed, permits detecting events of interest (e.g., unauthorized access, fire, anomalous behavior) by analyzing video streams and by promptly notifying an alarm system installed at the Fire & Police station on campus. Such a VNF chain includes • a Feature Extraction service function that applies image processing techniques to isolate potentially interesting video portions, and • a Lightweight Analytics service function that further processes such video portions by performing object recognition, by detecting anomalies or potentially dangerous situations, and by sending appropriate notifications to the Alarm Driver deployed at the Police Station.

S. Forti et al.
To work as expected, the end-to-end latency from the CCTV System Driver to the Alarm Driver must not exceed 150 ms latency, as shown in Figure 5. Additionally, for each link between two VNFs, a minimum bandwidth requirement is specified, as shown in the figure. The traffic originated by the CCTV system is also collected by a WAN Optimizer service function that improves video data delivery efficiency (e.g., compression) and forwards video data to a Storage service. Complex video analytics are then performed with more relaxed latency constraints by a Video Analytics service function which updates, when needed, the model used by the system to recognize potentially dangerous events.
Probabilistic QoS-aware Placement of VNF Chains at the Edge 25 Fig. 6. Example of VNF requirements and processing times. Figure 6 lists the requirements for the deployment of each VNF in terms of hardware units, connection to IoT devices (sensors or actuators), and security policies, along with the expected processing time of each chain function. As a further (soft) requirement, VNF chain deployers at UC Davis would prefer the Video Analytics and Storage service functions be placed on the same node (affinity) to reduce communication costs. Overall, both the available Cloud-Edge infrastructure and the VNF chain to be deployed on campus can be naturally declared in the input format exploited by EdgeUsher. In fact, deploying the described chain to the infrastructure available at UC Davis implies solving the VNF placement problem, that is, deciding how to map a VNF graph on top of an infrastructure substrate made of heterogeneous Edge and Cloud nodes and communication links, so that hardware, IoT, and end-to-end network QoS requirements are all satisfied.
Furthermore, the infrastructure is a dynamic environment and we assume it being subject to node workload variations and changing network conditions as per the probability distributions (possibly obtained from historical monitoring data (Forti et al . 2019)) we described in this section. Such changes can indeed affect deployment performance and turn momentarily optimal solutions into bad or unfeasible ones, potentially leading to unsatisfactory application QoS, and application downtime or unavailability.  As we will show in the next section, EdgeUsher methodology permits determining VNF placement (i.e., function mappings and flow routes) and evaluating their performance against probabilistic infrastructure variations in this scenario. In the next section, after discussing solutions with this first VNF placement, we will illustrate how the deployers at UC Davis can exploit our methodology to determine a further placement for the dashed part of the chain in Figure 5, handling a video stream for the second CCTV system deployed at the Mann Lab, and joining the first chain at the WAN Optimiser service function.

Motivating example: experiments
We started by looking for eligible placements of the VNF chain supporting the CCTV system installed at the Parking Services building. For the purpose of the experiments 9 , we first run the non-heuristic EdgeUsher over three different inputs (of the types (a)-(c) illustrated in Section 4.2) for the infrastructure description: (a) a non-probabilistic description that only considers the most probable values of each probability distribution without indicating the associated probabilities (i.e., for both node hardware and link QoS profiles), (b) a single-probability description of the infrastructure that accounts only for the highest probability value of each distribution (i.e., one probability value per each node or link), and (c) a complete fully-probabilistic description of the infrastructure that includes all probability distributions available for nodes and links. Figure 7 shows an example of the same link fact in the non-probabilistic, single-probability, and probabilistic infrastructure descriptions 10 . Figure 8 shows the obtained results in terms of number of generated eligible placements and computation time 11 needed to obtain those. Figure 9 shows one of the best placements obtained, which features 98% probability of complying to all hardware, IoT, security, bandwidth, and end-to-end latency requirements of the input chain. 9 The experiments were run on a commodity laptop provided with an Intel Core i5-6200U CPU (2.30GHz) and 8 GB of RAM, running Ubuntu 18.04.2 LTS, ProbLog 2.1.0.36, and Python 3.6. 10 The three input files are available at: https://github.com/di-unipi-socc/EdgeUsher/tree/master/infra 11 Timings obtained by averaging results over 15 executions for each case.  It is worth noting that the prototype runs fairly fast on the non-probabilistic 12 and on the single-probability infrastructure, which do not suffer from the additional combinatorial complexity that ProbLog incurs in, when evaluating (the probability distributions expressed as) annotated disjunctions in the fully-probabilistic infrastructure description. We will use the results obtained by the non-heuristic prototype over the fully-probabilistic description of the Edge infrastructure at UC Davis as a baseline to evaluate the performance of the heuristic prototype.
Thus, we run the heuristic EdgeUsher over the fully-probabilistic description of the Edge infrastructure at UC Davis. For the sake of simplicity, we set both thresholds (i.e., the one on node requirements THW and the one on network QoS TQoS) to a value T that was varied during the experiments in the range [0.1, 0.8] with a step of 0.1. Figure 10 shows the obtained results in terms of number of generated eligible placements and execution times 10 needed to obtain those.
The results show that the employed heuristics considerably reduces the search space and, thus, the execution time needed to determine eligible VNF placements and routing.  Particularly, the fully-probabilistic description of the UC Davis infrastructure can be handled in a time which shows a speed-up between 15 and 214 with respect to the exhaustive prototype and still returns a subset of the optimal results. Besides, with thresholds set to 0.8, EdgeUsher determines six eligible placements for the VNF chain supporting the CCTV system. Four of such placement solutions have a probability of meeting all set requirements of 96%, while the remaining two of 98%. All output solutions fall within the best solutions generated also by the non-heuristic prototype, when it is run over the complete infrastructure description.
We then included an affinity constraint between the Storage and the Video Analytics service functions, and we run the prototype again over the fully-probabilistic input with T = 0.8. As a result, execution time halved with respect to the execution without such constraint, reaching around 24 s. The placement of Figure 9 is still output as one of the best possible when forcing the affinity constraint. In this case, both the Storage and the Video Analytics are deployed to the available Cloud node. The highlighted linksalong with their labels -show the routing path associated with the placement and the bandwidth to be allocated to traffic flows mapped on each infrastructure link. Such piece of information could be actually used to instruct the network (e.g., via SDN controllers) so to allocate a suitable amount of bandwidth to each flow.
Afterward, assuming the first chain was deployed as in Figure 9, EdgeUsher was exploited to check whether it was possible to extend the deployment by placing anew the dashed part of the chain for a second CCTV system installed at the Mann Lab. By querying again the heuristic prototype, seven new possible VNF placements were obtained in around 22 s. All output solutions featured a 96% probability of meeting all chain constraints. Four of such solutions placed services as sketched in Figure 11 (a), the remaining three as in Figure 11 (b), other routings -which are not shown -were possible. It is worth noting that the deployers might consider using one of the output solutions and keep some of the others as possible backups to guarantee chain functioning in case of device failures or overloading, or in case of network congestion.

Related work
SDN and NFV technologies are gaining increasing interest for their potential benefits in hybrid Cloud-Edge environments (Mouradian et al . 2018) and in the IoT (Morabito and Beijar 2017). Indeed, the concept of Service Function Chaining (i.e., the ordered interconnection of service functions implemented as VNFs) is expected to enable the offer of added-value services, like virtual reality or tactile Internet applications, over next-generation telecommunication networks (Cziva et al . 2018) and in multi-access edge computing (MEC) scenarios (Taleb et al . 2017;ETSI 2019). Hereinafter, we discuss main related work in the area of SDN and NFV technologies applied in the IoT.
An SDN and NFV architecture for IoT network and application management is proposed in Ojo et al . (2016). Morabito and Beijar (2017) propose an architecture and a prototype implementation of an NFV/SDN framework enabling automated and dynamic network service chaining across Edge (i.e., IoT gateway) and Cloud (i.e., central data center) domains. SDN and NFV are jointly used in Rametta and Schembra (2017) to assure service continuity of a video monitoring application deployed over a flying ad hoc network built on a fleet of drones over rural areas. Drones are used as point of presence that can host virtual network or application functions.
The problem of placing VNFs on a physical substrate for realizing service chains in a hybrid Cloud-Edge infrastructure to support IoT applications has only recently emerged. Previous work has focused on network service placement in VNF infrastructure, considering intra-and/or inter-DC networks (Pham et al . 2018;Luizelli et al . 2017). A survey on resource allocation strategies for the network services deployment in VNF infrastructures has been provided by Gil Herrera and Botero (2016).
Although converged approaches are emerging for managing NFV, Edge, and Fog computing services (van Lingen et al . 2017), traditional VNF placement approaches do not tackle challenges brought by Fog and Edge computing for IoT applications. These challenges include heterogeneity of computing nodes, dynamic changes of network, and node conditions that may turn optimal or quasi-optimal solutions into unfeasible ones, and security threats, just to mention the main ones. Recent work in the area of application placement in the Fog have partially begun to tackle these aspects, but open research problems still exist, such as placement approaches accounting for security aspects and dynamic infrastructure variations, as discussed in the review by Brogi et al . (2020).
Only few works have addressed the problem of placing VNFs in a hybrid environment made of Edge and Cloud computing nodes. Leivadeas et al . (2019) model the problem of Service Function Chain (SFC) placement in hybrid MEC and Cloud environment considering location requirements posed by VNFs and targeting minimization of deployment costs and delays. They propose a mixed integer programming formulation of the problem and a suboptimal approach based on the tabu search meta-heuristic.
SFC placement in IoT scenarios, which demand for low-latency response, highthroughput processing, and cost-effective resource usage, is tackled in Wang et al . (2018). The work proposes a linear programming model and an approximation optimization algorithm to achieve deadline and packet rate guarantees while avoiding resource idleness. However, SFC orchestration is done within the Cloud domain and the availability of computing resources at the Edge is not considered. Yala et al . (2018) propose a VNF placement algorithm that optimizes access latency and service availability in a mixed Edge and Cloud environment for ultra-reliable lowlatency communications (uRLLC) services. The multi-objective optimization problem is solved by exploiting a genetic algorithm metaheuristic whose achieved performance is compared against an exact algorithm implemented in CPLEX. Although a network service is defined as a set of VNFs, chaining constraints are not considered. Mouradian et al . (2018) tackle application component placement in NFV-based hybrid Cloud-Edge systems and propose an Integer Linear Programming (ILP) formulation that represents applications as non-deterministic VNF forwarding graphs. Graphs can be built using sequence, parallel, selection and loop substructures, and probabilities are used to model selection and loop iterations. Although all the above-mentioned works (Leivadeas et al . 2019;Wang et al . 2018;Yala et al . 2018;Mouradian et al . 2018) consider latency requirements (either as minimization objective or as constraint), none of them accounts for dynamic variations of network status, which instead can influence the extent to which QoS requirements are satisfied in the long run. Neither security aspects are taken into account.
Typically, VNF placement approaches that consider dynamic network conditions either recompute placement (Cziva et al . 2018), enforce scaling and/or migration actions (Eramo et al . 2017;Jia et al . 2018), or try to find a solution that is robust against network status variations (Cheng et al . 2018). Cziva et al . (2018) formulate the problem of Edge VNF placement as an ILP to derive latency optimal deployments of VNFs. They also define a dynamic scheduler that recomputes placement to account for latency variations on links. This scheduling problem, which consists in selecting the time for placement recalculation so that unnecessary VNF migrations are prevented and latency violations are bound, is solved using optimal stopping theory. While Cziva et al . (2018) deals with placement of single VNFs and infrastructure dinamicity is modeled only in terms of network latency variation, EdgeUsher handles chains of VNFs and accounts for probabilistic distributions of latency and available bandwidth of links as well as of resource node capacity. In Cheng et al . (2018), network dynamics are taken into account to find temporal robust placement solutions. The SFC placement is formulated as a stochastic resource allocation problem that exploits both currently observed network information and future variation. However, the work does not tackle latency-aware placement and the network model does not represent variations of neither network latency nor node resources, as our work does. Zhu and Huang (2018) formulate a stochastic programming problem that minimizes the placement cost and aims at achieving high-availability application deployments. The problem formulation thus accounts for probabilities of Virtual Machine (VM), host, and link failures but does not consider latency constraints.
As analyzed in Farris et al . (2019), IoT environments introduce challenging security threats, ranging from attacks to IoT devices, attacks in IoT-oriented clouds, and networks to threats in the application layer, such as vulnerabilities in software, data leakage, and phishing. Risks exist in executing VNFs over third-party infrastructures, and security and trust criteria have to inform placement decisions (Farris et al . 2019). Indeed, the need to consider security issues in virtual network embedding (VNE) and VNF placement problems is gaining increasing interest. A classification of security requirements into node, link, and topological requirements to be considered in VNE problems is provided in Fischer et al . (2017). In Dwiardhika and Tachibana (2019), the problem of VNE is formulated so to account for standard protection provided by substrate nodes and links (quantitatively referred to as security level ). If the level of security is lower than the security demand, the VNE algorithm tries to place security VNFs (e.g., firewall, deep packet inspection, and intrusion detection) to improve the offered security level. Optimal placement of security SFCs is tackled in Shameli Sendi et al . (2018), where the placement problem is formulated including deployment constraints derived from network security patterns. Figure 12 provides a comparative overview of the discussed related work and highlights how, to the best of our knowledge, this is the first work aiming at addressing VNF chain placement in a hybrid Cloud-Edge network with latency constraints while accounting for network status variations and security requirements. In addition, while most related works rely on linear programming formulations, we adopt a probabilistic declarative approach. Indeed, declarative approaches have been successfully applied to modeling and reasoning on problems related to distributed systems other than VNF embeddingas for instance in Lopes et al . (2010) and Ma et al . (2013).
Last but not least, our prototype is released as open-source software and the experiment data are also made publicly available.

Conclusions and future work
In this article, we have proposed a logic programming approach for solving the problem of placing VNF chains onto Cloud-Edge infrastructures. To achieve this objective, we followed three main steps: 1. we gave a concise logic programming formulation (hence, a declarative solution) of the considered VNF chain placement problem, 2. we extended it with a suitable probabilistic representation of variations in the available infrastructure capabilities to assess the quality of eligible solutions against those, and 3. we devised a heuristic to ensure scalability of our prototype and suitably high quality of output solutions, by immediately pruning out low-quality solutions based on user-specified thresholds for hardware and QoS requirements.
The obtained prototype, EdgeUsher, returns the eligible deployments of VNF chains to a hybrid Cloud-Edge infrastructure that guarantee the fulfillment of a set of placement requirements, namely hardware, IoT reachability, bandwidth, latency, and security policies.
Thanks to the declarative approach, additional constraints, such as affinity, anti-affinity, or placement into a specific node can be easily expressed. Moreover, EdgeUsher implementation has been fully provided in this paper (around 70 source lines of code) together with some example of usage. Declarative programming also makes EdgeUsher more flexible and extensible than procedural solutions, what makes it better suited to accommodate the ever-changing needs of Cloud-Edge scenarios. By leveraging ProbLog, EdgeUsher permits specifying and solving the VNF chain placement problem while considering infrastructure variations, by assuming that probabilistic distributions describing the behavior of computing nodes and infrastructure links are available. Since probabilistic logic programming is a natural extension of plain logic programming, it was straightforward to also consider dynamic infrastructure conditions by suitably extending (the input of) the Prolog version of EdgeUsher. Such an extension to account for dynamic settings would have required significantly more effort, if implemented by means of other paradigms. It is worth noting that EdgeUsher could also be used to quickly evaluate VNF placements computed by alternative placement algorithms with respect to the infrastructure variability, by calculating their probability of satisfying hardware and QoS requirements. In this work, we discussed and showcased the use of our prototype over a lifelike reference scenario, which we also used to assess and epitomize the performance of EdgeUsher.
Naturally, being the considered problem NP-hard, the worst-case time complexity of our approach is exponential in the size of the input infrastructure. For larger scenarios, we thus envision a hierarchical architecture of clusters of edge nodes (partitioned, for instance, according to administration, application, or geographical criteria) where orchestration features are run by one head node and connected to a few Cloud nodes. We intend to elaborate further on this vision by running EdgeUsher over a domain made by a few clusters of edge nodes and their associated Cloud nodes. However, the potential advantage of the probabilistic approach relies on the provisioning of solutions that are resilient to infrastructure variations over time. Our effort goes toward the direction of determining placements that are more likely to ensure high QoS guarantees, security, and service reliability against dynamic infrastructure conditions, thus allowing amortizing the cost of reasoning over an increased chain life time.
On the direction for future work, we plan to comparatively evaluate our approach with state-of-the-art solutions that react to infrastructure changes by computing and executing costly VNF migrations or scaling actions (e.g., (Eramo et al . 2017;Jia et al . 2018)), using simulation as well as testbed environments. The setting up of a small-scale testbed is undergoing in our campus network, and it will be used to perform tests using probabilistic distributions derived from real monitoring data (Forti et al . 2021).
EdgeUsher also allows specifying security requirements in terms of logical expressions over security properties. It is worth noting that some security properties can be provided exclusively as hardware capabilities (e.g., anti-tampering), while other ones could be implemented also as software and deployed as VNFs (e.g., firewall). This option opens up to the possibility of adaptively inserting required security VNFs when needed, which we also plan investigating in the near future. Besides, we envision enhancing EdgeUsher with continuous reasoning capabilities  to further tame exp-time complexity at run time, automatic techniques to perform parameter tuning of the heuristic thresholds (e.g., via machine learning), and a user-friendly GUI to ease user interactions with the prototype.