This page can be used to share end-user pain points with various components/aspects of NFV/SDN platforms. This can help us coalesce initiatives, messaging around them, and strategy aspects in support of them.
Links to Epics created for the Euphrates release (created as subpages of this page):
Subpage draft Epics / User Stories:
- Infrastructure and the OPNFV Reference Platform
- Invocation / Deletion of Services on OPNFV Reference Platform
- Upgrade and Management of the OPNFV Reference Platform
- VNF Onboarding and Deployment
- Customer Interfaces
- Resilience at Scale
- Security and Policy
- VNF Lifecycle
- Automatic Integration and Testing
|Theme||Need / Pain Point||Category||Suggested By||Comments|
|Customer OPEX integration cost (Standard industry Frameworks & API's)||See Customer Interfaces|
|Provider Touchless customer portal framework|
|Resilience at Scale|
|Security and Policy|
|VNF Onboarding and Lifecycle Management||See VNF Lifecycle|
|CI/CD & Day 2 Operations||Support migration to a true devops based production deployment, with infra lifecycle management features such as in-place upgrades, interoperability between platform/component versions, etc. Includes various prerequisites and enablers such as API micro-versioning, containerization of OpenStack services, Control plane maintenance mode.|
|Networking for NFV||Enhance network capabilities and performance to meet NFV demands. Includes such capabilities as driving Gluon (common port abstraction enabling flexible networking / SDNC provider integration), L3VPN-based service chaining, dataplane acceleration (e.g. DPDK, SRIOV, FD.io/VPP), Smart NICs (e.g. P4), ...||NFV Reference Platform||Bryan|
Support multisite deployments ranging from simple site pairs, to massively distributed and hierarchical deployments. Key aspects include identity/auth federation, quota management, catalog distribution, global/local resource orchestration, ...
|NFV Reference Platform||Bryan|
"As multi-cloud is one of the major technical tendency for the next generation cloud infrastructure, the Moon project have been starting working on a multi-PODs security solution. The main idea is to install different modules on different PODs which will collaborate together.
The deployment of such a solution includes:
However, a connection between geographically distributed PODs seems to be very difficult even impossible. We hope that the OPNFV can start working on that, take Moon as a scenario to set up such a multi-PODs scenario."
Ruan He Moon PTL
|Platform Operational Insights|
Across platform components, provide consistent approaches to deep observability into platform (control plane, physical/virtual infra) and workload analytics. This include fast site deployments - both incorporation of new white box NFVI ( storage, servers, switches, networking) within an NFVI-PoP and the integration of new sites within a multisite deployment.
Manage complex validation challenges driven by frequent changes of a huge number of interdependent platform and application components.
Steve added fast deployment
Herbert added "Manage complex valication challenges ..." (could be also separate topic)
Evolve toward platform infra and workload support based upon a containerized, microservices oriented model. This will be necessary to overcome the performance/cost limitations of VM-based infra and workloads, as well as achieve true agile lifecycle management.
Cloud Native Architecture applies independently to VNFs and the OPNFV Reference platform. A distinguishing feature of Cloud native applications is that they are designed for operation in a cloud infrastructure.
|NFV Reference Platform||Bryan|
|Cloud-Native Architecture / Application CI/CD|
Support migration to a true devops based management of applications, with automated testing and staging in order to unleash full speed of agile software delivery
Devops / Continuous Integration/ Continuous Delivery processes will apply independently for the VNFs and the OPNFV Reference platform. A distinguishing feature of Devops/ CI/CD is the faster and asynchronous cycling of versions so that delivered functionality and performance improve with frequent small increments
The OPNFV Reference Platform needs to have automated procedures for the onboarding of VNFs which are available on a CI/CD basis.
The deployed VNFs need to remain in normal operation through OPNFV Reference Platform CI/CD upgrades. [see also API backwards compatibility below]
|Automation?||Herbert||Could be separate topic or combined with "Cloud-Native Architecture"|
Build a catalog of production-quality Reference VNFs for common network functions, to be used in OPNFV testing and otherwise.
As an OPNFV Platform Provider, I need a set of Reference VNFs to validate that VNFs (including VNFCs) are correctly deployed by the OPNFV Platform such that the deployed configuration complies with the VNFD information in the VNF Package. (e.g., VNFC deployments respect affinity/ anti-affinity constraints, VNFC deployments respect the portability constraints, Internal connectivity between VNFC deployments established.) Separate Reference VNFs may be required for each deployment constraint in the VNF Package.
As an OPNFV Platform Provider or Network Service Operator, I need a set of Reference VNFs to benchmark performance of a specific OPNFV Platform deployment on an NFVI Node using a defined reference load, so that infrastructure performance benchmarks from different deployments area comparable. Separate Reference VNFs may be required for different representative workloads (e.g., compute intensive, memory intensive, networking traffic intensive, intensive VNFC instance deployment/ deletion operations etc.)
As an OPNFV Platform Provider or Network Service Operator, I need a set of Reference VNFs to benchmark performance of a specific OPNFV Platform deployment on an NFVI Node using a defined reference load, so that OPNFV Platform performance benchmarks from different deployments area comparable. Separate Reference VNFs may be required for different representative workloads (e.g., intensive VNFC instance deployment/ deletion operations, intra-VNF networking changes, VNFC migration operations, etc.)
|Automation / Testing||Bryan|
Steve added epic format statements for the cases of Ingestion validation, infrastructure benchmarking, and VNF Management operations.
We will likely need a separate page for discussion around the scope of potential reference VNFs. I suspect we would need to include some of the following:
|API Backwards Compatibility|
All OPNFV Platform APIs must be versioned. (micro-versioning is desirable). An API version must not be removed without a deprecation cycle (2 releases). New incompatible APIs may only be introduced with a new version. The old (deprecated) API must be supported for the duration of the deprecation cycle.
As a Network Service Provider I need 2 release cycles to work with a VNF provider to ensure upgraded OPNFV Platform APIs are supported, so that a smooth migration of existing services can be planned.
|NFV Reference Platform||Steve|
Real-time performance monitoring and fast fault detection
The amount of alarms and statistical data collected from NFVI to detect failures, would increase because of growths of NFVI resources (CPU, memory, disk.... collected from OS and hypervisor) we should monitor .
However, such data collection becomes coarse-grained (e.g., 5 minute-interval) to prevent overload on the monitoring system. Eventually, It causes delay in detecting failure as well as healing failure.
So, achievement of real-time monitoring and fast detection mechanism is required in NFV platform.
|NFV Reference Platform||Masanori||bryan: metrics collection for fault/performance purposes is enabled by VES/Barometer.|
|VM placement optimization|
Linked to multi sites and/or cloud native application.
There are already some solutions but not lots of clear guidelines/rules.
Placement within one VIM: Some solutions exist to take into account affinity/anti-affinity rules, EPA requirements.
Multi-site deployment: Complicates things quite a bit. Operators have to consider which sites the VNF/VMs needs to be deployed in which depends on (1) geography of data centers, (2) network topologies within and between data centers, (3) latency requirements from access networks and other resources, (4) capacities needed/allocated at each site, (5) vendor placements rules if any, (6) redundancy requirements (e.g., N:1), and (7) recovery procedures in case of failure.
Instead of a laborious human planning/design task, this needs to be automated as much as possible. One possibility is to create operator deployment “policy” or rules to be input to the Orchestration system, as these considerations are stemming from the operator deployment environment instead of the VNF/service itself. But open to any solutions.
|NFV Reference platform||Morgan, Serge||bryan: metrics collection for realtime inventory/placement purposes is enabled by VES/Barometer. Projects such as Valet (proposed by AT&T for OpenStack) are specifically addressing advanced workload placement based upon realtime inventory and SLAs.|
|Metrics for billing/charging||Are the current metrics in OpenStack and/or the controllers enough to build the charging chain for the VNF (bandwidth used by a VM/Tenant/Stack)||NFV Reference platform||Morgan||bryan: metrics collection is enabled by VES/Barometer|
|Energy||Tool missing to have a clear view on the power consumption. Power consuption is a driver of NFV/virtualization however today there is no native feedback on such consuption, and not connection with VM placement||NFV Reference platform||Morgan|
As far as I know there is a project in ODL and they used to be project in OpenStack some time ago.
bryan: should be enabled by VES/Barometer
tooling also missing to have a reliable and fast feedback in case of trouble
|NFV reference platform||Morgan|
There are projects in OPNFV (pinpoint, bottlenecks) dealing with such issues
bryan: enabled by VES/Barometer, at least for data collection.
|Interoperability||interoperability assessment with orchestration platforms (linked to VNF onboarding)||NFV reference platform||Diego||bryan: should be covered by VNF Onboarding above.|
|Security aspects||especially those related to roots of trust and legal requirements for critical infrastructures||NFV reference platform||Diego||bryan: should be covered by Security and Policy above.|
|Documentation / technical architecture||Due to the extreme modularity of the systems, Defining a clear list of flows needed for filtering between all components will be helpful||NFV reference platform||Morgan||In OPNFV Pharos defines some rules, but due to diversity of installer/scenarios, it should be also up to the scenario owner to provide a flow matrix associated to the integration scenario|
Linked to troubleshooting/SLA, today they are lots of modules/components and logs are everywhere...you do not really care when everything is fine.
But when there are troubles, it is sometimes a nightmare to find the right log file at the right places. An aggregation of the logs and some format convergence will save time
|NFV reference platform||Morgan|
Task seems not easy as the any Telco Cloud solution based on Open Source component is the result of an integration where logs are managed differentely. It could however be a good opportunity for a new project dealing with integration such as OPNFV.
bryan: Syslog archiving (as one key log example) is in scope for VES.
|Backup&Restore||Ability to restore customer virtual machines if the stack is deleted and recreated (not sure if this is red hat specific).||NFV reference platform||Morgan||There are several technical options but no native/integrated/reference ones|
|Real HA||Despite of HA modes for single components, practice showed that manual operations may be needed to restore the project. A better end-to-end HA is needed and if all the sub systems are not HA, it shall be clear to knwo what is HA and what is not||NFV reference platform||Morgan||In OPNFV there is a HA project addressing these issues. It already created test cases (executed in yardstick). However recent discussions also show that HA was probably misleading in the scenario naming as most of the controllers are not HA today|
|Service Function Plane||Ability to compose network services with flexible combinations of VNF and PNF embodied service functions participating in dynamic service function chains that are underlay network agnostic (with service functions being unaware of network topology), with service function data plane interfaces being standardized and independent of the underlay network, enabling service functions to be placed anywhere within an administrative domain on any platform-supported underlay network.||NFV reference platform||Meador||bryan: likely aligned with cloud-native goals of abstraction from the infra env. 1st-gen VNF solutions are inherently complex re compatibility, due to their own size and complexity/dependencies.|
|Certification Program(s)||Some VIM suppliers (OpenStack distros) initiated programs to "Certify" variant VNFs from different suppliers as successfully on-boarded on their VIM. Such certification programs should be led by OPNFV so as to have it structured and verified by vendor agnostic entity . We might expand the scope to other interfaces (VNFM/VIM, VNFM/VNF-O,..etc) .||NFV reference platform||Samer||bryan: OPNFV should promote its role in VNF portability, but diversity in all aspects (even certification) is a market enabler, as long as it does not exacerbate fragmentation and become a market inhibitor.|
Versioning + Skip release(s) when upgrading a (live) system
Linked to CI/CD + API backwards compatibility, upgrade from version A to version B must not require manual operations especially for critical components. Moreover it shall be possible to skip intermediate version and upgrade only stable version not necessarily from version N to N+1.
To upgrade OpenStack used in a commercial deployment a huge amount of integration efforts and testing is required despite the progress in automation of deployment and testing. Given this and the fact that OpenStack releases on a 6-month cadence, I, as an end user, may not want to update my commercial deployment with the same cadence, but may want to choose a yearly or even bi-yearly upgrade of my systems. For example, I want to be able to directly “jump” from D-release to G-release skipping the E- and F-releases. Ideally, such system upgrades shall be done “live”, i.e. without the need shut down everything running on the older release.
Note: we may also want to consider the option of live/rolling upgrades for OPNFV.
NFV Reference platform
Ashiq, Gerald, Morgan
bryan: suggest feedback from Escalator project.
Boston Forum session: https://etherpad.openstack.org/p/BOS-forum-skip-level-upgrading
|portability||Platform and VNFs are reported as having difficulties in deployment due to minor changes in NFVI versions e.g. BIOS upgrades, changes in brand of device deployed etc... need processes to support in service deployments across new infrastructure instances as deployments scale. exacerbates the 1000 node scaling issue above, as the deployments at 1000 NFVI-PoPs are unlikely to be at the same hardware version. Need to support a hardware lifecycle for refresh of new compute, storage, networking whiteboxes.||NFVI Platform + automation||Steve||bryan: reported issues should be brought specifically so that we can address them specifically. An enabler (part) of the solution is likely a deep HW env inspection/inventory capability and compatibility verification pre-install.|
‘Data Plane’ VNF Decoupling from Physical high throughput NICs and usage of virtual NICs with ‘same’ high speed / throughput
VNF requirments /
Andrea & Cecelia
(12/7 email to list)