|Theme||Need / Pain Point||Polestar category||Suggested By||Comments|
|Customer OPEX integration cost (Standard industry Frameworks & API's)|
ONUG voiced that there can easily be single or even double orders of magnitudes of cost increase to integrating & maintaining multiple provider "sets of API's." This is in fact why ONUG exists - from a "reasonable" solution, ONUG has "network telemetry" API project to get a standard "INDUSTRY LEVEL" network telemetry API framework to make the solution cost effective for them as a customer base.
i.e - the customer has moved to software, and has an integration cost, could NFV please consider that in the "industry solution" that we also have a "per provider" industry level cost... For the INDUSTRY to provide a reasonable solution there should be some common solution.
|Model Driven||ONUG via Michael Bugenhagen|
|Provider Touchless customer portal framework|
The key to lowering cost and meeting customer expectation is a solid NFV customer portal schema that is "provider touch less"... this means all design should be centered around the portal - this is what the orchestrator is reacting to, and orchestrating - portal objects (resources and services) along with transparently showing the resource / service state, Fault, PM, ... (this is the primary goal)
Industry need - Match Customer Experience with "provider touch less service portals" (cloud portal alignment)
Provide the modular objects to enable a provider customizable, yet in-provider align-able "Cloud Portal Framework."
Align at "object levels" (Model driven)
Large Enterprise customers such as ONUG are already asking for these "standard industry objects between providers so they can display those in their portals.
|Templates, and model driven constructs (TOSCA, yang)||Michael Bugenhagen|
The Portal is the critical element, however we tend to focus on orchestration... if we orchestrate without a portal we have missed the key customer demand.
If we fail on "model driven" portal objects being used by orchestrators we have also failed to create a standard way to share resource views between providers, and customers.
|Resilience at Scale|
Platform resilience and performance for clusters up to 1000 nodes.
There are two different interpretations of nodes here - compute nodes ( e.g. servers/ CPUs) vs locations (NFVI-PoPs) . bryan: this was mainly meant to refer to distinct sites, i.e. a complete control plane (cloud/SDN VIMs) and virtual infra (VI) deployment (servers, etc) that provides service at a particular location. "Nodes" refers to the number of servers primarily, distributed across VIM and VI.
Platform Resilience has to do with the ability of the OPNFV reference platform to maintain normal operations as resources become available or unavailable (e.g. due to failures). This also include the case of expanding or shrinking the aggregate footprint of the NFV upon which the OPNFV Reference Platform is executing.
Small NFVI nodes may be homogeneous but larger NFVI deployments are likely to have variations in the deployed infrastructure e.g. compute nodes with different vintage server blades, BIOS versions etc.
bryan: Performance focuses include the control plane (e.g. orchestration, monitoring, closed-loop control) and VI (e.g. data plane), as the number of nodes scales up.
The performance of OPNFV Reference Platform operations may be variable with operations that impact a large number of nodes. For example the time to deploy a VNF may be larger for VNFs which have VNFCs deployed on different nodes.
The capacity of the NFVI to support VNFs also changes as the resources available change.
also need to support the platform at small scale - Enterprise NFV is considered a significant application by analysts. The platform in this case is typically a much smaller device - e.g. Pizza box
|NFV Reference Platform - TBD - Daisy may be addressing partial||Bryan||Steven expanded discussion of some of the terms|
|Security and Policy|
Integrate platform services with enterprise security systems e.g. for RBAC, lock out, password aging, encryption, vulnerability scanning, policy management. Includes some items in scope for the PCI DSS feature of Keystone in Newton.
roots of trust and legal requirmeents for critical infrastructure
A recommendation of Linux hardening that doesn’t impact OpenStack functionality ( best practices) will be useful
-NFVI networking security mechanism
As an OPNFV Platform Provider or Network Service Operator, I need a security mechanism for the NFVI networking to apply a unified security policy, as a baseline, to any traffic on the whole of our managed NFVI. When we use a DPDK or SR-IOV to improve a VNF networking performance, the security policy, ex.) newtron security group, is shortcut and ignored. We need a solution without Linux kernel function like iptables of the host OS. OVS based fierwall driver which is released in OpenStack Mitaka will be a canditate to solve this problem on DPDK side, but other data acceleration method such as SR-IOV should be coverd.
NFV Reference Platform
Copper - Policy platform
Models - Policy model
Moon - Security framework - analytic tools
TBD - security holes of platform & infrastructure
Need Data Plane WG
|note that the security project in OPNFV produced first openSCAP reporting in Colorado (http://artifacts.opnfv.org/logs/functest/lf-pod1/colorado/2016-09-01_07-47-15/colorado/security_scan/184.108.40.206_2016-09-01_07-41-22/report.hmtl). This is a first step, hardening shall be possible.|
|VNF Onboarding and Lifecycle Management|
Drive convergence across End Users for a comprehensive, end-to-end (developer-to-operations) process of service/VNF development, packaging, onboarding, service creation, cataloging, deployment, and lifecycle management. Support this though tooling (e.g. SDKs) and open source for key components of the end-to-end process.
From the high level messaging page: As a VNF Provider, I use the OPNFV platform to validate interoperability with multiple orchestration scenarios so that ingestion and administration problems in my customer's operational deployments are minimized.
This could be refined as: As a VNF Provider, I use the OPNFV platform to validate VNF Packaging so that OPNFV platform returns no errors when onboarding a VNF under all of the OPNFV platform configuration options. This may require a number of Reference VNFs to validate all of the VNFC deployment constraints that could be expressed in the VNF Package.
Automation & Testing
Steve added epic format statements on ingestion
created a daughter page to expand the VNF onboarding topic.
|CI/CD & Day 2 Operations||Support migration to a true devops based production deployment, with infra lifecycle management features such as in-place upgrades, interoperability between platform/component versions, etc. Includes various prerequisites and enablers such as API micro-versioning, containerization of OpenStack services, Control plane maintenance mode.|
|Networking for NFV||Enhance network capabilities and performance to meet NFV demands. Includes such capabilities as driving Gluon (common port abstraction enabling flexible networking / SDNC provider integration), L3VPN-based service chaining, dataplane acceleration (e.g. DPDK, SRIOV, FD.io/VPP), Smart NICs (e.g. P4), ...||NFV Reference Platform||Bryan|
Support multisite deployments ranging from simple site pairs, to massively distributed and hierarchical deployments. Key aspects include identity/auth federation, quota management, catalog distribution, global/local resource orchestration, ...
|NFV Reference Platform||Bryan|
|Platform Operational Insights|
Across platform components, provide consistent approaches to deep observability into platform (control plane, physical/virtual infra) and workload analytics. This include fast site deployments - both incorporation of new white box NFVI ( storage, servers, switches, networking) within an NFVI-PoP and the integration of new sites within a multisite deployment.
Manage complex validation challenges driven by frequent changes of a huge number of interdependent platform and application components.
Steve added fast deployment
Herbert added "Manage complex valication challenges ..." (could be also separate topic)
Evolve toward platform infra and workload support based upon a containerized, microservices oriented model. This will be necessary to overcome the performance/cost limitations of VM-based infra and workloads, as well as achieve true agile lifecycle management.
Cloud Native Architecture applies independently to VNFs and the OPNFV Reference platform. A distinguishing feature of Cloud native applications is that they are designed for operation in a cloud infrastructure.
|NFV Reference Platform||Bryan|
|Cloud-Native Architecture / Application CI/CD|
Support migration to a true devops based management of applications, with automated testing and staging in order to unleash full speed of agile software delivery
Devops / Continuous Integration/ Continuous Delivery processes will apply independently for the VNFs and the OPNFV Reference platform. A distinguishing feature of Devops/ CI/CD is the faster and asynchronous cycling of versions so that delivered functionality and performance improve with frequent small increments
The OPNFV Reference Platform needs to have automated procedures for the onboarding of VNFs which are available on a CI/CD basis.
The deployed VNFs need to remain in normal operation through OPNFV Reference Platform CI/CD upgrades. [see also API backwards compatibility below]
|Automation?||Herbert||Could be separate topic or combined with "Cloud-Native Architecture"|
Build a catalog of production-quality Reference VNFs for common network functions, to be used in OPNFV testing and otherwise.
As an OPNFV Platform Provider, I need a set of Reference VNFs to validate that VNFs (including VNFCs) are correctly deployed by the OPNFV Platform such that the deployed configuration complies with the VNFD information in the VNF Package. (e.g., VNFC deployments respect affinity/ anti-affinity constraints, VNFC deployments respect the portability constraints, Internal connectivity between VNFC deployments established.) Separate Reference VNFs may be required for each deployment constraint in the VNF Package.
As an OPNFV Platform Provider or Network Service Operator, I need a set of Reference VNFs to benchmark performance of a specific OPNFV Platform deployment on an NFVI Node using a defined reference load, so that infrastructure performance benchmarks from different deployments area comparable. Separate Reference VNFs may be required for different representative workloads (e.g., compute intensive, memory intensive, networking traffic intensive, intensive VNFC instance deployment/ deletion operations etc.)
As an OPNFV Platform Provider or Network Service Operator, I need a set of Reference VNFs to benchmark performance of a specific OPNFV Platform deployment on an NFVI Node using a defined reference load, so that OPNFV Platform performance benchmarks from different deployments area comparable. Separate Reference VNFs may be required for different representative workloads (e.g., intensive VNFC instance deployment/ deletion operations, intra-VNF networking changes, VNFC migration operations, etc.)
|Automation / Testing||Bryan|
Steve added epic format statements for the cases of Ingestion validation, infrastructure benchmarking, and VNF Management operations.
We will likely need a separate page for discussion around the scope of potential reference VNFs. I suspect we would need to include some of the following:
Openstack WG working page on the subject: https://etherpad.openstack.org/p/generic-nfv-platform
|API Backwards Compatibility|
All OPNFV Platform APIs must be versioned. (micro-versioning is desirable). An API version must not be removed without a deprecation cycle (2 releases). New incompatible APIs may only be introduced with a new version. The old (deprecated) API must be supported for the duration of the deprecation cycle.
As a Network Service Provider I need 2 release cycles to work with a VNF provider to ensure upgraded OPNFV Platform APIs are supported, so that a smooth migration of existing services can be planned.
|NFV Reference Platform||Steve|
Real-time performance monitoring and fast fault detection
The amount of alarms and statistical data collected from NFVI to detect failures, would increase because of growths of NFVI resources (CPU, memory, disk.... collected from OS and hypervisor) we should monitor .
However, such data collection becomes coarse-grained (e.g., 5 minute-interval) to prevent overload on the monitoring system. Eventually, It causes delay in detecting failure as well as healing failure.
So, achievement of real-time monitoring and fast detection mechanism is required in NFV platform.
|NFV Reference Platform||Masanori||bryan: metrics collection for fault/performance purposes is enabled by VES/Barometer.|
|VM placement optimization|
Linked to multi sites and/or cloud native application.
There are already some solutions but not lots of clear guidelines/rules.
Placement within one VIM: Some solutions exist to take into account affinity/anti-affinity rules, EPA requirements.
Multi-site deployment: Complicates things quite a bit. Operators have to consider which sites the VNF/VMs needs to be deployed in which depends on (1) geography of data centers, (2) network topologies within and between data centers, (3) latency requirements from access networks and other resources, (4) capacities needed/allocated at each site, (5) vendor placements rules if any, (6) redundancy requirements (e.g., N:1), and (7) recovery procedures in case of failure.
Instead of a laborious human planning/design task, this needs to be automated as much as possible. One possibility is to create operator deployment “policy” or rules to be input to the Orchestration system, as these considerations are stemming from the operator deployment environment instead of the VNF/service itself. But open to any solutions.
|NFV Reference platform||Morgan, Serge||bryan: metrics collection for realtime inventory/placement purposes is enabled by VES/Barometer. Projects such as Valet (proposed by AT&T for OpenStack) are specifically addressing advanced workload placement based upon realtime inventory and SLAs.|
|Metrics for billing/charging||Are the current metrics in OpenStack and/or the controllers enough to build the charging chain for the VNF (bandwidth used by a VM/Tenant/Stack)||NFV Reference platform||Morgan||bryan: metrics collection is enabled by VES/Barometer|
|Energy||Tool missing to have a clear view on the power consumption. Power consuption is a driver of NFV/virtualization however today there is no native feedback on such consuption, and not connection with VM placement||NFV Reference platform||Morgan|
As far as I know there is a project in ODL and they used to be project in OpenStack some time ago.
bryan: should be enabled by VES/Barometer
tooling also missing to have a reliable and fast feedback in case of trouble
|NFV reference platform||Morgan|
There are projects in OPNFV (pinpoint, bottlenecks) dealing with such issues
bryan: enabled by VES/Barometer, at least for data collection.
|Interoperability||interoperability assessment with orchestration platforms (linked to VNF onboarding)||NFV reference platform||Diego||bryan: should be covered by VNF Onboarding above.|
|Security aspects||especially those related to roots of trust and legal requirements for critical infrastructures||NFV reference platform||Diego||bryan: should be covered by Security and Policy above.|
|Documentation / technical architecture||Due to the extreme modularity of the systems, Defining a clear list of flows needed for filtering between all components will be helpful||NFV reference platform||Morgan||In OPNFV Pharos defines some rules, but due to diversity of installer/scenarios, it should be also up to the scenario owner to provide a flow matrix associated to the integration scenario|
Linked to troubleshooting/SLA, today they are lots of modules/components and logs are everywhere...you do not really care when everything is fine.
But when there are troubles, it is sometimes a nightmare to find the right log file at the right places. An aggregation of the logs and some format convergence will save time
|NFV reference platform||Morgan|
Task seems not easy as the any Telco Cloud solution based on Open Source component is the result of an integration where logs are managed differentely. It could however be a good opportunity for a new project dealing with integration such as OPNFV.
bryan: Syslog archiving (as one key log example) is in scope for VES.
|Backup&Restore||Ability to restore customer virtual machines if the stack is deleted and recreated (not sure if this is red hat specific).||NFV reference platform||Morgan||There are several technical options but no native/integrated/reference ones|
|Real HA||Despite of HA modes for single components, practice showed that manual operations may be needed to restore the project. A better end-to-end HA is needed and if all the sub systems are not HA, it shall be clear to knwo what is HA and what is not||NFV reference platform||Morgan||In OPNFV there is a HA project addressing these issues. It already created test cases (executed in yardstick). However recent discussions also show that HA was probably misleading in the scenario naming as most of the controllers are not HA today|
|Service Function Plane||Ability to compose network services with flexible combinations of VNF and PNF embodied service functions participating in dynamic service function chains that are underlay network agnostic (with service functions being unaware of network topology), with service function data plane interfaces being standardized and independent of the underlay network, enabling service functions to be placed anywhere within an administrative domain on any platform-supported underlay network.||NFV reference platform||Meador||bryan: likely aligned with cloud-native goals of abstraction from the infra env. 1st-gen VNF solutions are inherently complex re compatibility, due to their own size and complexity/dependencies.|
|Certification Program(s)||Some VIM suppliers (OpenStack distros) initiated programs to "Certify" variant VNFs from different suppliers as successfully on-boarded on their VIM. Such certification programs should be led by OPNFV so as to have it structured and verified by vendor agnostic entity . We might expand the scope to other interfaces (VNFM/VIM, VNFM/VNF-O,..etc) .||NFV reference platform||Samer||bryan: OPNFV should promote its role in VNF portability, but diversity in all aspects (even certification) is a market enabler, as long as it does not exacerbate fragmentation and become a market inhibitor.|
Versioning + Skip release(s) when upgrading a (live) system
Linked to CI/CD + API backwards compatibility, upgrade from version A to version B must not require manual operations especially for critical components. Moreover it shall be possible to skip intermediate version and upgrade only stable version not necessarily from version N to N+1.
To upgrade OpenStack used in a commercial deployment a huge amount of integration efforts and testing is required despite the progress in automation of deployment and testing. Given this and the fact that OpenStack releases on a 6-month cadence, I, as an end user, may not want to update my commercial deployment with the same cadence, but may want to choose a yearly or even bi-yearly upgrade of my systems. For example, I want to be able to directly “jump” from D-release to G-release skipping the E- and F-releases. Ideally, such system upgrades shall be done “live”, i.e. without the need shut down everything running on the older release.
Note: we may also want to consider the option of live/rolling upgrades for OPNFV.
NFV Reference platform
Ashiq, Gerald, Morgan
|bryan: suggest feedback from Escalator project.|
|portability||Platform and VNFs are reported as having difficulties in deployment due to minor changes in NFVI versions e.g. BIOS upgrades, changes in brand of device deployed etc... need processes to support in service deployments across new infrastructure instances as deployments scale. exacerbates the 1000 node scaling issue above, as the deployments at 1000 NFVI-PoPs are unlikely to be at the same hardware version. Need to support a hardware lifecycle for refresh of new compute, storage, networking whiteboxes.||NFVI Platform + automation||Steve||bryan: reported issues should be brought specifically so that we can address them specifically. An enabler (part) of the solution is likely a deep HW env inspection/inventory capability and compatibility verification pre-install.|
‘Data Plane’ VNF Decoupling from Physical high throughput NICs and usage of virtual NICs with ‘same’ high speed / throughput
VNF requirments /
Andrea & Cecelia
(12/7 email to list)