Being an early adopter of high Ethernet rates (e.g., 800G) and next-gen coherent optic technologies (800ZR/ZR+/LR) comes with unique challenges. It requires innovative tools and techniques to effectively implement these new specifications. In this blog post, we’ll explore insights shared in the September 2024 webinar moderated by David Rodgers, Business Development Manager at EXFO, along with experts from EXFO and Lumentum. They provide an industry overview of these emerging technologies and strategies to overcome the associated challenges.

With first-hand experience in early implementations of 800G interfaces, the webinar highlights signal breakout configurations like 2x400GE, 8x100GE, and 4x200GE, as well as the latest 800G Ethernet from IEEE. The evolution toward 1.6T, the new 800ZR specification, and its compatibility with various form factors are also key points.

In 2023 the TeleGeography network report showed a notable increase in global internet capacity. In 2022, bandwidth grew by 28%, reaching 997 Tbps by 2023. This growth, fueled by a 29% CAGR over four years, is due to rising demand for higher bandwidth from content providers, users, and telecom companies.

Despite short-term uncertainties, long-term trends suggest that the future of network infrastructure will see an increasing integration of 800G units and coherent pluggable optics. Network design and upgrades will prioritize performance, cost-effectiveness, and scalability.

Implementing coherent pluggable optics requires effort. CMIS has established a clear interface for system software tagging, and ongoing efforts aim to eliminate the need for transponders by creating a direct connection between system software and transceivers. While discussions and research have advanced, widespread implementation or standardization has not yet occurred, although there have been notable developments in the OpenXR forum. Multiple groups, including CMIS, OIF, EA, and IEEE, are collaborating to tackle these challenges. Each group defines a solution from its perspective, aiming to converge on a mutually agreeable approach—something worth keeping an eye on.

Accelerating technology cycles

Speed cycles are accelerating, with upgrades occurring every two to three years instead of every four to five years. The question remains whether this trend will persist or if technological complexities will necessitate extending cycles back to three to five years or longer, depending on the specific use case.

Anticipated acceleration in AI and ML is expected

With the focus on AI, LLMs, and ML, there’s an expectation of faster acceleration. These cycles have shortened over the past seven or eight years, so shorter application cycles and timeframes are anticipated. Machine learning and AI operations mostly happen in data centres, and their impact on long-haul traffic will accelerate this cycle.

Challenges beyond data centres

One of the emerging challenges outside of data centres is the availability of fibre.

Balancing speed and practicality

Faster speeds and efficient network infrastructures are driven by AI, ML, and other data-intensive applications. However, this acceleration poses challenges like standardisation issues and fibre availability. The industry must balance speed pursuit with practical implementation and infrastructure considerations.

Coherent pluggable optics play a key role, offering cost-effectiveness and network simplification. The shift towards integrated solutions like coherent pluggable optics will continue, but timelines and implementations vary by use case and organisation.

Expanding capacity and exploring new avenues

The industry is discussing expanding long-haul and metro capacity. Technologies like 800G or 1.6TB don’t significantly enhance fibre capacity. Fibre capacity is a limiting factor. New avenues like L-band expansion and leveraging extended C and L bands are being explored to augment capacity. These solutions are being implemented quickly.

800G Adoption Plans

When talking to people around the industry about deploying 800G units in the next year, most are unsure about their plans, showing how complex network enhancement and project requirements are. Test and measurement experts face challenges despite their experience.

The role of pluggable coherent optics in 800G networks

800G coherent solutions aren’t suitable for all interconnect scenarios, especially scalability. What’s the exact role of coherent technology in these networks? Are 800G solutions replacing transponders, or are they building new networks? The answer isn’t straightforward.

Transceivers and transponders will coexist. Both 800G transceivers and 1.6T transponders are available. 800G transceivers and 1.6T transponders have different performance levels. If you need a 1.6T solution, there are recently announced transponder options. However, 800G and pluggable transceivers reduce network equipment layers.

Direct router interfaces and cost efficiency are important

Direct router interfaces connect directly to routers, eliminating an entire layer of equipment in many cases. While this equipment is crucial in certain applications, pluggable solutions offer significant cost savings. Pluggable coherent optics provide substantial reductions in deployment costs, considering both capital and operational expenditures. Generally, where end-users can integrate them, pluggables are the preferred solution for coherent applications.

The landscape is becoming increasingly complex. Interconnect and interoperability—how products from different companies work together—are critical challenges. We actively participate in global interoperability events to address these issues.

The application space for 800G is expanding, particularly in connecting compute racks for artificial intelligence (AI) to ensure consistent high-speed channels between CPU cores. However, interoperability raises questions about lower speed interconnects.

Current lower speed interconnects, such as SAS 3 at 12 Gbit/s for rotating disks and PCIe for solid-state drives (SSDs), have not yet caught up to 800G. However, we anticipate that they will with M.7 and other advancements.

With tiered deployments—800G for long-haul and rack connections—what other applications do you foresee for 800G in the near future? Where will it be in three to five years? Will it be communicating with storage arrays at line rate? These are questions worth consideration.

Moving off the board into the rack is crucial

Let’s start with the basics. The connections between the GPU and SSD, and the CPU and SSD, are usually wide bandwidth but slower, and are usually electrical. Coherent pluggable optics have a big impact on network infrastructure. They’re changing how we design and implement networks, and they offer new ways to make networks more efficient and faster.

Coherent pluggable optics are evolving quickly and have many applications in different network scenarios. They present both challenges and opportunities. As we move forward, these technologies will be crucial for shaping the next generation of networks. Coherent pluggable optics are also interacting with emerging technologies like artificial intelligence and machine learning. As these technologies become more demanding, the need for faster and more efficient network connections becomes even more important. This relationship will lead to further advancements in optical networking.

Density and power consumption are also important. Optical connections haven’t reached the density of electrical connectors, but active optical cables are getting closer. However, the main challenge is power consumption. Combining many slow lanes with a single high-speed lane uses a lot of power, but it solves a different problem.

Consistency in data rates is also important

As we move from front-end to MLAI deployments and finally to the back-end, we see a consistent data rate pattern. The goal is to maximise capacity within a single connection. Due to power constraints limiting MLAI cluster sizes, training data is being moved to campus environments and possibly even to data centre interconnection (DCI) environments. This shift will ensure data rate consistency across these regions. We expect a faster transition from 400G to 800G, followed by 1.6T and 3.2T. However, the progression slows beyond DCI or regional networks. This is because increasing data rates in a DWDM environment doesn’t make it more efficient per fibre. This is important because DWDM applications are usually not the norm in data centres and campuses. Outside these environments, DWDM is necessary because of fibre constraints. The farther the distance, the slower the transition of data rates becomes.

Coherent pluggable optics are increasingly important in network transitions, offering new ways to improve efficiency and speed. Once a technology gains traction, it becomes easier to progress in these directions.

800ZR is an example. Its rapid implementation was partly due to our experience with 400G technology. Network engineers need to understand these trends. The evolving landscape of optical networking, especially with the advent of coherent pluggable optics, presents both challenges and opportunities. As computational demands increase, driven by technologies like AI and ML, the need for faster and more efficient network connections becomes more pressing.

In conclusion, while the path ahead is complex, the potential benefits of these advancements are undeniable. Network engineers must adapt and continuously learn to fully utilise emerging optical networking technologies.

Transitioning from 400G to 800G

Initially, 400ZR generated significant industry interest. Engineers faced challenges in testing and implementing it. Collaborative efforts, including attending events, provided a comprehensive understanding of measurements and procedures, facilitating a smoother transition from 400G to 800G. Knowledge gained from 400G simplifies the implementation and deployment of 800ZR. Now that 800ZR is commercially available, adapting it for other protocols is easier.

Current Temperatures and Power Consumption

Current 800G transceivers consume about 32 watts and operate between 69 and 72 degrees Celsius, with some exceeding 72 degrees Celsius. This is the power consumption of just one transceiver. Imagine a switch with 32 of these transceivers or an NVIDIA accelerated computing system. Apply an engineering mindset and calculate the implications. Cooling solutions are crucial.

Future Data Rates and Heat Generation

We are discussing 800G, offering 100G per lane. Future iterations may offer 200 and 400G per lane, generating substantial heat.

Cooling Challenges in Coherent Pluggable Optics

As we progress towards coherent pluggable optics, cooling challenges become more important. Network engineers must address heat dissipation issues with higher data rates and more powerful components. Liquid cooling in data centres could revolutionise network infrastructure design and maintenance.

Modern, efficient liquid cooling solutions address these challenges

As server capacities expand, consuming more power and generating more heat, data centre professionals and HPC thermal engineers face pressure to enhance efficiency and reduce costs. Liquid-cooled server racks with quick disconnect fittings (QDs) are an effective solution. QDs facilitate efficient liquid cooling, addressing cooling technology gaps for high-power density applications. While liquid helium immersion cooling, like Cray’s technology, is unlikely to be revisited, dielectric fluid cooling, which has been around for at least 25 years, has gained prominence recently. You may have seen transparent, chest freezer-like containers filled with equipment, prompting curiosity about their functionality and deployment. Understanding the temperature and power consumption involved is crucial.

Heat management in data transmission and computational power

As data transmission speeds and computational power increase, heat management becomes crucial. The optical networking landscape evolves rapidly, so staying informed is vital for managing future networks. Are our switches overheating due to excessive heat?

Enhanced Thermal Cooling Capabilities

Testing shows OSFP has enhanced thermal cooling compared to QSFPDD.

QSFPDD and Power Consumption

QSFPDD Preference and Power Consumption Considerations

QSFPDD is widely preferred, but power consumption is critical in regional and long-haul scenarios. OSFP is a viable solution for power management. Router manufacturers support QSFPDD at 400G, but the market is fragmented.

RHS Modules and Potential Form Factor Changes

RHS modules are emerging, but there’s a concern about another potential form factor change. Alternatively, will OSFP suffice for 1.6T and 3.2T applications?

Power Issues to Consider

Power consumption involves local heat dissipation and ZR power consumption. Heat sinks are used to address local heat dissipation, even in OSFP. As technology advances, ZR power consumption is expected to decrease to approximately 26-27 watts. Both form factors remain viable at 800G.

Impact of 800G and 1.6TB

What are the technological implications of scaling up to 800G and 1.6TB? We need to consider FEC, cooling requirements, and new interoperability standards. How can we ensure the successful implementation of 800G and its impact on network scalability?

800G: The Beginning of the AI Era

When considering the impact of 800G on today’s networks, we realised it will mark the beginning of the AI era. 800G will be synonymous with AI and its evolution. Unlike previous data rates, this is where we’ll record the commencement of AI and the data rate that enabled its realisation. 800G will play a pivotal role in the industry. The inflection point for AI lies in the interconnectivity enabled by 800G.

Coherent pluggable optics is a pivotal milestone for 800G, paving the way for the next generation of networking technologies. As we advance in coherent pluggable optics, these developments become increasingly pertinent. The potential shift towards the widespread adoption of novel cooling technologies within data centres holds the promise of revolutionising network infrastructure design and maintenance. As we transcend data transmission speeds and computational power, managing heat becomes paramount alongside managing data flow.