Tag Archives: FCoE

Comparing different technologies

OK, let me make one this clear, and most of you know this already, I’m not a fan of FCoE and I wrote numerous articles around this topic. I work for a vendor especially focusing on storage connectivity in a engineering support role. Basically it means I need to know this stuff. When I see people struggling with storage from a connectivity perspective and I come to a point I either meet these people in person or have them on the phone I’m pretty often absolutely flabbergasted around the lack of fairly basic knowledge of Fibre-Channel. The basics of checking port and link errors, the generic concept of flow control and operational management is very often on a pretty low level. Don’t get me wrong, I’m not accusing anyone of being stupid however it turns out that education and self-study w.r.t. storage is not seen as a critical item on the day-to-day IT department shopping list and focus is more leaning towards networking, operating systems and applications.

If, however, your sole job is storage and you are tasked to provide a study between two different sorts of technology both touching the same subject and you’re able to screw up the way the Evaluator Group has done last week I seriously doubt the credibility of such an organization.

Continue reading

ipSpace.net: FCoE between data centers? Forget it!

Cisco guru and long time networking expert Ivan Pepelnjak  hits the nail on its head with the below post. It is one more addition to my series of why you should stay away from FCoE. FCoE is only good for ONE thing: to stay confined in the space of the Cisco UCS platform where Cisco seems to obtain some benefits.

Today I got a question around the option for long-distance replication between enterprise arrays over a FCoE link. If there is ONE thing that I would put towards the HDS arrays is that they are virtually bulletproof with regards to data-replication and 6x6x6 across 3 data-centres replication scenarios in cascaded and multi-target topologies are not uncommon. (yes, read that again and absorb the massive scalability of such environments.)

If however you then start to cripple such environments with a greek trolley from 1000BC (ie FCoE) for getting traffic back and forth you’re very much out of luck.

Read the below from Ivan and you’ll see why.

ipSpace.net: FCoE between data centers? Forget it!: Was anyone trying to sell you the “wonderful” idea of running FCoE between Data Centers instead of FC-over-DWDM or FCIP? Sounds great … un…

He also refers to a Cisco whitepaper (a must read if you REALLLYY need to deploy FCoE in your company) which outlines the the technical restrictions from an protocol architectural point of view.

The most important parts are that the limitation is there and Cisco has no plans to solve this. Remember though I referred to Cisco in this article all the other vendors like Brocade and Juniper have the same problem. Its an Ethernet PFC restriction inherent to the PAUSE methodology.

So when taking all this into consideration you have a couple of options.

  • Accept business continuity to be less than zero unless the hurricane strikes with a 50 meter diameter. (small chance. :-))
  • Use FibreChannel with dark-fibre or DWDM/CWDM infrastructure
  • Use FCIP to overcome distances over 100KM
So far another rant of the FCoE saga which is stacking up one argument after another of why NOT to adopt it and which is getting more and more support across the industry by very respected and experienced networking people.
Cheers,
Erwin

Surprise: Cisco is back on the FC market

The Cisco MDS 9506,9509 and 9513 series director class FC switches have been in for the long haul. They were brought to market back in the early 2000’s (2003 to be exact) and have run a separate code-base from the rest of the Cisco switching family in the form of the Catalyst Ethernet switches. The storage and Ethernet side have always been a separate stream and for good reason. When Cisco opened up the can of worms with FCoE a few years back, they needed a platform which could serve both the Ethernet and fibre-channel side.

Thus the Nexus was born and a new generation of switches that could do it all. All ???? …. no, not really. For storage support you still needed native FC connectivity and hook this up to the Nexus which could then switch it via FCoE to either CNA’s or other, remote, FC switches to targets.

It seems the bet on FCoE didn’t go that well as the uptake on the entire FCoE protocol and convergence bandwagon has not taken off. (I wonder who predicted that :-)) This left Cisco a bit in a very nasty situation in the storage market since the MDS 9500 platforms where a bit at there end from an architectural standpoint. The updated supervisors to revision 2a plus the cross-bars were only able to handle up to 8G FC speeds plus the huge limitation of massive over-subscription on the high port-count modules (Marketing called it “host-optimised ports”) did not stack up against the requirements that came into play when the huge server virtualisation train took off with lightning speed. Their biggest competitor Brocade was already shipping 16G switches and directors like hot cakes and Cisco did not really have an answer. The NX5000 was limited regarding portcount and scalability whilst the NX7000 cost and arm and a limb so only for companies with a fair chunk of money who had the need to refresh both their storage switching as well as their network switching gear at the same time in addition to the fact they had enough confidence in the converged stack of FCoE this platform was a viable choice. Obviously I don’t have the exact number of NX7K’s sold but my take is that the ones that are sold are primarily used in high density server farms like the UCS (great platform) with separate storage networks hanging off of that via the MDS storage switches and maybe even these are direct connected to the server farms.

So it seems Cisco sat a bit in limbo in the datacentre storage space. It’s a fairly profitable market so throwing in the towel would have been a huge nut to crack so Cisco, being the company with one of the biggest treasure chests in Silicon Valley, pulled up their sleeves and cranked out the new box announce a few weeks ago, the all new MDS9710 based on new ASIC technology which now also delivers 16G FC speeds.

I must say after going through the data-sheets I’m not overly impressed. Yes, it’s a great box with all the features you need (with a couple of exceptions, see below) but it seems it was rushed to market from an engineering perspective to catch up with what was already available from Brocade. It looks like one or more execs switched into sub-state panic mode and ordered the immediate development of the MDS95xx successor.

Lets have a short look.
I’ll compare the MDS9710 and the Brocade DCX8510-8 since these are most aligned from a business market and technical perspective.

Both support all the usual FC protocol specs so not differentiation there. This includes the FC-SP, and FC-SB for security and Ficon (support for Ficon was not yet available at launch but I think that’s just due to qualification. It should be there somewhat later in the year.) What does strike me is that no FCoE support is available at launch of the product. Again, they could save that money and spend it more wisely.

There is not that much difference from a performance perspective. According to the spec sheets the MDS provide around 8.2Tbps FC bandwidth from a front-end perspective as does the DCX. Given the architectural difference between the MDS and the DCX you cannot really compare the overall switching capabilities given the fact the MDS switches frames on the cross-bar (or fabric-card as they seems to call it now) and the DCX on the ASIC level. From a performance standpoint you might gain a benefit by doing it the brocade way if you have the ability to have all ASIC locality however this also imposes limitations w.r.t. functionality. An example is link aggregation ie port-channel according to Cisco and trunking according to Brocade. (Do not confuse the Cisco “trunking” with the Brocade one.) Brocade requires you to have all members of a trunk to sit in the same port-group on a single ASIC whereas with Cisco’s method you don’t really care. The members can be all over the chassis so in case an entire slot fails the port-channel itself still keeps going. It seems Cisco learned its lesson from the over-subscription ratios that is hampering the high point-count modules on the 95xx series. The overall back-end switching capability of the 9710 seems to be more than sufficient to cater for a triple jump generation of FC and it seems likely the MDS could cated for 40G Ethernet in the not so distant future without blinking an eye. The 100G Ethernet implementation will take a while so I think this 97xx generation will sing out the current and next generation of 32G FC and 40G Ethernet. Since I don’t have insights in roadmaps I’ll direct you to the representatives of Cisco.

There is another thing that is bothering me and that is power consumption. The MDS (fully populated) draws a whopping 4615 watts of juice. Comparing that to the DCX8510-8, which needs less than half of that amount, it looks like somebody requested the ability to fry eggs on the box.

As for the software side Cisco bumped the major NX-OS level to 6.2 ( I don’t know what happened to 6.00 and 6.1 so don’t ask). Two of the major fabulous features that I like of NX-OS are VOQ and build in FC analysis with the fcanalyser tool. The VOQ (Virtual Output Queue) provides some sort of separate “Ingress” buffer per port so that in the case of a lack of credits on the destination of the frame this port does not cause back-pressure causing all sorts of nasty-ness.

From a troubleshooting perspective the fcanalyser tool is invaluable and just for that feature alone I would buy the MDS. It allows you to capture FC frames and store them in PCAP format for analysis via a 3rd party tool (like wireshark) or output to stdio raw text format so you can execute a screen capture on a terminal program. The later requires you know exactly how to interpret a FC frame which can be a bit daunting for many people.

As I mentioned in the beginning I’m not quite sure yet what to think of the new MDS as the featureset seems fairly limited w.r.t. portcount and innovation in NX-OS. It looks like they wanted to catch up to Brocade in which they very well succeeded but they have come very late to market with the 16G FC adoption. I do think this box has been developed with some serious future proofing build in so new modules and features can be easily adopted in a non-disruptive way so the transition to 32G FC is likely to be much quicker than they have done with the jump from 8G to 16G.

I also would advise Cisco to have a serious look into their hardware design regarding power-consumption. It is over the top to require that amount of watts to keep this amount of FC ports going whilst your biggest competitor can do with less than half.

To conclude I think Brocade has a serious competitor again but Cisco would really need to crank up the options with regards to port and feature modules as well as Ficon support.

Cheers,
Erwin

Brocade and the "Woodstock Generation"

I encourage change especially if there are technological advances that improves functionality, reliability, availability and serviceability (or RAS in short). What I hate if these changes increase risk and require a fair chuck of organisational change even more if they amplify each other (like FCoE).

What I hate even more though is for marketing people to take ownership of an industry standard and try to re-brand it for their own purposes and this is why I’m absolutely flabbergasted about Brocades press-release in which they announce a “new” generation of the fibre-channel protocol.

When the 60’s arrived a whole new concept of how people viewed the world and the way of living changed. Freedom became a concept by which people lived and they moved away from the way their ancestors did in previous decades. They’d shed the straitjacket of the early 20th century and changed the world through various parts of life. This didn’t mean they discredited their parents or grandparents or even the generations before, neither did they perceive to be better or different in a sense they wanted to “re-brand” themselves. They didn’t need differentiation by branding, they did it by way of living.

Fibre-Channel has always been an open standard developed by many companies in the storage industry and I’ll be the last to deny that Brocade has contributed (and still does) a fair chunk of effort into the protocol. This doesn’t mean that they “own” the protocol nor does it give them the right to do anything with it as they please. It’s like claiming ownership of print when you’ve written a book.

So lets go back to their press release. In that they announced the following:

Brocade Advances Fibre Channel Innovation With New Fabric Vision Technology



Also Expands Gen 5 Fibre Channel Portfolio With Industry’s Highest-Density Fixed Port SAN Switch and Enhanced Management Software; Announces Plans for Gen 6 Fibre Channel Technology Supporting the Next-Generation Standard

and a little further-up:

Brocade Fabric Vision technology includes:

  • Policy-based tools that simplify fabric-wide threshold configuration and monitoring
  • Management tools that allow administrators to identify, monitor and analyze specific application data flows, without the need for expensive, disruptive, third-party tools
  • Monitoring tools that detect network congestion and latency in the fabric, providing visualization of bottlenecks and identifying exactly which devices and hosts are impacted
  • Customizable health and performance dashboard views, providing all critical information in one screen
  • Cable and optic diagnostic features that simplify the deployment and support of large fabrics

When I read the first line I thought that a massive stack of brocade engineers had come out of a dungeon and completely overhauled the FC protocol stack but I couldn’t be further from the truth. It were not the engineering people who came out of the dungeon but more the marketing people and they can up with very confusing terms like Gen5 and Gen6 which only represent the differences in speed technologies. Where the entire industry is used to 1G,2G,4G,8G and now 16G plus 32G, Brocade starts to re-brand these as GenX. So in a couple of years time you’ll need a cross-reference spreadsheet to map the speed to the GenX marketing terms.

As for the content of the “Fabric Vision technology” I must say that the majority of component were already available for quite a while. The only thing that is actually new is the ability to map policies onto fabrics via BNA to be able to check quickly for different kind of issues and setting which is indeed a very welcome feature but I wouldn’t call this rocket-science. You take a set of configuration options and you distribute this across one or more switches in a fabric and you report if they comply to these policies. The second bullet-point seems to be targeted at Virtual Instruments. From what I know of VI Brocade is still a long way off to be able to displace their technology. Utilities like frame-monitoring do not compete with fc-analyser kind of equipment.

The software side of BNA and VI seems to be getting closer which I do welcome. BNA is getting better and better with each release and closing the gap to tools which provide a different view of problems will certainly be useful. It also prevents the need for polling fabric-entities for the same info.

Bullet point  3 and 5 piggy back on already available features in FOS but now can be utilised to capture these in a dashboard kind of way via BNA. Bottleneck monitoring commands in addition to credit-recovery features plus D-port diagnostics to check on the health of links were already available. The BNA dashboard also provides a nifty snapshot of most utilised ports, port-errors, cpu and memory status of switches etc. This, in theory, should make my life easier but we’ll wait and see.

Don’t get me wrong, I do like Brocade. I’ve worked with their technologies since 1997 so I think I have a fairly good view of what they provide. They provide around 50% of my bread-and-butter these days (not sure if that’s a good thing to say though if you check out my job-title. :-)) but I cannot give them credit for announcing a speed-bump as a new generation of Fibre-Channel nor can I think of any reason of defining a “Vision” around features, functions and options that have been lacking for a while in their product set. I do applaud them for having it fixed though.

From my personal point of view what would be compulsory is to overhaul the FOS CLI. You can argue to take a consistent approach of using the “Cisco” like tree structure which Brocade also uses in their NOS on VDX switches but that doesn’t work for me either. Moving up and down menu trees to configure individual parts of the configuration is cumbersome. What needs to be done is to painstakingly hold on to consistent command naming and command parameters. This, in addition to the nightmare of inconsistent log-output across different FOS version,  is one of the worst parts of FOS and this should be fixed ASAP.

Cheers,
Erwin

Why convergence still doesn’t work and how you put your business at risk

I browsed through some of the great TechField Day videos and came across the discussion “What is an Ethernet Fabric?” which covered the topic of Brocade’s version of a flat layer 2 Ethernet network based on their proprietary “ether-fabric protocol”. At a certain point the discussion led to the usual “Storage vs. Network” and it still seems there is a lot of mistrust between the two camps. (As rightfully they should. :-))

For the video of the “EtherFabric” discussion you can have a look >>here<<


Convergence between storage en networking has been a wishful thinking ever since parallel SCSI became in it 3rd phase where the command set was separated from the physical infrastructure and became serialised over an “network” protocol called Fibre-Channel.

The biggest problem is not the technical side of the conversion. Numerous options have already been provided which allow multiple protocols being transmitted via other protocols. The SCSI protocol is able to be transmitted via FibreChannelC, TCPIP, iSCSI and even the less advanced protocol ATA can be transferred directly via Ethernet.

One thing that is always forgotten is the intention of which these different networks were created for. Ethernet was developed somewhere in the 70’s by Robert Metcalf at Xerox (yes, the same company who also invented the GUI as we know it today) to be able to have two computers “talk” to each other and exchange information. Along that path the DARPA developed TCP/IP protocol was bolted on top of that to make sure there was some reliability and a more broader spectrum of services including routing etc was made possible. Still the intention has always been to have two computer systems exchange information along a serialised signal.

The storage side of the story is that this has always been developed to be able to talk to peripheral devices and these days the dominant two are SCSI and Ficon (SBCCS over FibreChannel). So lets take SCSI now. Just the acronym already tells you its intent:  Small Computer Systems Interface. It was designed for a parallel bus, 8-bits wide, had a 6 meter distance limitation and could shove data back and forth at 5MB/s. By the nature of the interfaces it was a half-duplex protocol and thus a fair chunk of time was spent on arbitration, select, attention and other phases. At some point in time (parallel) SCSI just ran into brick wall w.r.t. speed, flexibility, performance, distance etc. So the industry came up with the idea to serialise the dataflow of SCSI. In order to do this all protocol standards had to be unlinked from the physical requirements SCSI had always had. This was achieved with SCSI 3. In itself it was nothing new however as of that moment it was possible to bolt SCSI onto a serialised protocol. The only protocols available at that time were Ethernet, token ring, FDDI and some other niche ones. These ware all considered inferior and not fit for the purpose of transporting a channel protocol like SCSI. A reliable, high speed interface was needed and as such FibreChannel was born. Some folks at IBM were working on this new serial transport protocol which had all the characteristics anyone would want in a datacentre. High speed (1Gbit/s, remember Ethernet at that time was stuck at 10Mb/s and token ring at 16Mb/s), both optical and copper interfaces , long distance, reliable (ie no frame drop) and very flexible towards other protocols. This meant that FibreChannel was able to carry other protocols, both channel and network including IP, HIPPI, IPI, SCSI, ATM etc. The FC4 layer was made in such a flexible way that almost any other protocol could easily be mapped onto this layer and have the same functionality and characteristics that made FC the rock solid solution for storage.

So instead of using FC for IP transportation in the datacentre some very influential vendors went the other way around and started to bolt FC on top of Ethernet which resulted in the FCoE standard. So we now have a 3 decade old protocol (SCSI) bolted on top of a 2 decade old protocol (FC) bolted on top of a 4 decade old protocol (Ethernet).

This in al increases the complexity of datacentre design, operations, and troubleshooting time in case something goes wrong. Although you can argue that costs will be reduced due to the fact you only need single CNA’s, switchports etc instead of a combination of HBA’s and NIC’s, but think about the fact you lose that single link. This means you will lose both (storage and network) at the same time. This also means that manageability is reduced to zero and you will to be physically behind the system in order resuscitate it again. (Don’t start you have to have a separate management interface and network because that will totally negate the argument of any financial saving)

Although it might seem that from a topology perspective and the famous “Visio” drawings the design seems more simplified however when you start drawing the logical connections in addition to the configurable steps that are possible with a converged platform you will notice that there is a significant increase in connectivity. 

I’m a support engineer with one of the major storage vendors and I see on a day to day basis the enormous amount of information that comes out of a FibreChannel fabric. Whether it’s related to configuration errors, design issues causing congestion and over-subscription, bugs, network errors on FCIP links and problems with the physical infrastructure. See this in a vertical  way were applications, operating systems, volume managers, file-systems, drivers etc. all the way to the individual array spindle can be of influence of the behaviour of an entire storage network and you’ll see why you do not want to duplicate that by introducing Ethernet networks in the same path as the storage traffic.
I’m also extremely surprised that during the RFE/RFP phase for a new converged infrastructure almost no emphasis is placed on troubleshooting capabilities and knowledge. Companies hardly question themselves if they have enough expertise to manage and troubleshoot such kind of infrastructures. Storage networks are around for over over 15 years now and still I get a huge amount of questions which touch on the most basic knowledge of these networks. Some call themselves SAN engineers however they’ve dealt with this kind of equipment less than 6 months and the only thing that is “engineered” is the day-to-day operations of provisioning LUNs and zones. As soon a zone commit doesn’t work for whatever reason many of them are absolutely clueless and immediate support-cases are opened. Now extrapolate this and include Ethernet networks and converged infrastructures with numerous teams who manage their piece of the pie in a different manner and you will, sooner or later, come to the conclusion that convergence might seem great on paper however there is an enormous amount of effort that goes into a multitude of things spanning many technologies, groups, operational procedures and others I haven’t even touched on. (Security is one of them. Who determines which security policies will be applied on what part of the infrastructure. How will this work on shared and converged networks?)

Does this mean I’m against convergence? No, I think it’s the way to go as was virtualization of storage and OS’es. The problem is that convergence is still in its infancy and many companies who often have a CAPEX driven purchase policy are blind to the operational issues and risks. Many things need to be fleshed out before this becomes real “production ready” and the employees who keep your business-data on a knifes-edge are knowledgeable and confident they master this to the full extent.

My advice for now:

1. Keep networks and storage isolated. This improves spreading of risk, isolates problems and recoverability in case of disasters.
2. Familiarise yourself with these new technologies. Obtain knowledge through training and provide your employees with a lab where they can do stuff. Books and webinars have never been a good replacement for one-on-one instructor led training.
3. Grow towards an organisational model where operations are standardised and each team follows the same principles.
4. Do NOT expect you suppliers to adopt or know these operational procedures. The vendors have thousands of customers and a hospital requires far different methods of operations than an oil company. You are responsible for your infrastructure and nobody else. The support-organisation of you supplier deals with technical problems and they cannot fix your work methods. 
5. Keep in touch with where the market is going. What looks to become mainstream might be obsolete next week. Don’t put your eggs in one basket.


Once more, I’m geek enough to adopt new technologies but some should be avoided. FCoE is one of them at this stage.


Hope this helps a bit in making you decisions.

Comments are welcome.

Regards,
Erwin van Londen

Brocade just got Bigger and Better

A couple of months ago Brocade invited me to come to San Jose for their “next-gen/new/great/ what-have-ya” big box. Unfortunately I had something else on the agenda (yes, my family still comes first) and I had to decline. (they didn’t want to shift the launch of the product because I couldn’t make it. Duhhhh.)

So what is new? Well, it’s not really a surprise that at some point in time they had to come out with a director class piece of iron to extend the VDX portfolio towards the high-end systems.  I’m not going to bore you with feeds and speeds and other spec-sheet material since you can download that from their web-site yourself.

What is interesting is that the VDX 8770 looks, smells and feels like a Fibre-Channel DCX8510-8 box. I still can’t prove physically but it seems that many restriction on the L2 side have a fair chunk of resemblance with the fibre-channel specs. As I mentioned in one of my previous posts, flat fabrics are not new to Brocade. They have been building this since the beginning of time in the Fibre-Channel era so they do have quite some experience in scalable flat networks and distribution models. One of the biggest benefits is that you can have multiple distributed locations and provide the same distributed network without having to worry about broadcast domains, Ethernet segments, spanning-tree configurations and other nasty legacy Ethernet problems.

Now I’m not tempted to go deep into the Ethernet and IP side of the fence. People like Ivan Pepelnjack and Greg Ferro are far better in this. (Check here and here )

With the launch of the VDX Brocade once again proves that when they set themselves to get stuff done they really come out with a bang. The specifications far outreach any competing product currently available in the market. Again they run on the bleeding edge of what the standards bodies like IEEE, IETF and INCITS have published lately. Not to mention that Brocade has contributed in that space makes them frontrunners once again.

So what are the draw-backs. As with all new products you can expect some issues. If I recall some high-end car manufacturer had to call-in an entire model world-wide to have something fixed in the brake-system so its not new or isolated to the IT space. Also with the introduction of the VDX an fair chuck of new functionality has gone into the software. It’s funny to see that something we’ve taken for granted in the FC space like layer 1 trunking is new in the networking space.

Nevertheless NOS 3.0 is likely to see some updates and patch releases in the near future. Although I don’t deny some significant Q&A has gone into this release its a fact that by having new equipment with new ASICS and functionality always brings some sort of headaches with them.

Interoperability is certified with the MLX series as well as the majority of the somewhat newer Fibre-Channel kit. Still bear in mind the require code levels since this is always a problem on supportcalls. 🙂

I can’t wait to get my hand on one of these systems and am eager to find out more. If I have I’let you know and do some more write-up here.

Till next time.

Cheers,
Erwin

DISCLAIMER : Brocade had no influence in my view depicted above.

Brocade vs Cisco. The dance around DataCentre networking

When looking at the network market there is one clear leader and that is Cisco. Their products are ubiquitous from home computing to enterprise Of course there are others like Juniper, Nortel, Ericson but these companies only scratch the surface of what Cisco can provide. These companies rely on very specific differentiators and, given the fact they are still around, do a pretty good job at it.

A few years ago there was another network provider called Foundry and they had some really impressive products and I that’s mainly why these are only found in the core of data-centres which push a tremendous amount of data. The likes of ISP’s or  Internet Exchanges are a good fit. It is because of this reason Brocade acquired Foundry in July 2008. A second reason was that because Cisco had entered the storage market with the MDS platform. This gave Brocade no counterweight in the networking space to provide customers with an alternative.

When you look at the storage market it is the other way around. Brocade has been in the Fibre Channel space since day one. They led the way with their 1600 switches and have outperformed and out-smarted every other FC equipment provider on the planet. Many companies that have been in the FC space have either gone broke of have been swallowed by others. Names like Gadzoox, McData, CNT, Creekpath, Inrange and others have all vanished and their technologies either no longer exist or have been absorbed into products of vendors who acquired them.

With two distinct different technologies (networking & storage) both Cisco and Brocade have attained a huge market-share in their respective speciality. Since storage and networking are two very different beasts this has served many companies very well and no collision between the two technologies happened. (That is until FCoE came around; you can read my other blog posts on my opinion on FCoE).

Since Cisco, being bold, brave and sitting on a huge pile of cash, decided to also enter the storage market Brocade felt it’s market-share declining. It had to do something and thus Foundry was on the target list.

After the acquisition Brocade embarked on a path to get the product lines aligned to each other and they succeeded with  their own proprietary technology called VCS (I suggest you search for this on the web, many articles have been written). Basically what they’ve done with VCS is create an underlying technology which allows a flat level 2 Ethernet network operate on a flat fabric-based one which they have experiences with since the beginning of time (storage networking that is for them). 

Cisco wanted to have something different and came up with the technology merging enabled called FCoE. Cisco uses this extensively around their product set and is the primary internal communications protocol in their UCS platform. Although I don’t have any indicators yet it might well be that because FCoE will be ubiquitous in all of Cisco’s products the MDS platform might be abolished pretty soon from a sales perspective and the Nexus platforms will provide the overall merged storage and networking solution for Cisco data centre products which in the end makes good sense.

So what is my view on the Brocade vs. Cisco discussion. Well, basically, I do like them both. As they have different viewpoints of storage and networking there is not really a good vs bad. I see Brocade as the cowboy company providing bleeding edge, up to the latest standards, technologies like Ethernet fabrics and 16G fibre channel etc whereas Cisco is a bit more conservative which improves on stability and maturity. What the pros and cons for customers are I cannot determine since the requirement are mostly different.

From a support perspective on the technology side I think Cisco has a slight edge over Brocade since many of the hardware and software problems have been resolved over a longer period of time and, by nature, for Brocade providing bleeding edge technology with a “first-to-market” strategy may sometimes run into a bumpy ride. That being said since Cisco is a very structured company they sometimes lack a bit of flexibility and Brocade has an edge on that point.

If you ask me directly which vendor to choose when deciding a product set or vendor for a new data centre I have no preference. From a technology standpoint I would still separate fibre-channel from Ethernet and wait until both FCoE and Ethernet fabrics have matured and are well past their “hype-cycle”. We’re talking data centres here and it is your data. Not Cisco’s and not Brocade’s. Both FC and Ethernet are very mature and have a very long track-record of operations, flexibility and stability. The excellent knowledge there is available on each of these specific technologies gives me more piece of mind than the outlook of having to deal with problems bringing the entire data centre to a standstill.

Erwin

Why not FCoE?

You may have read my previous articles on FCoE as well as some comments I’ve posted on Brocade’s and Cisco’s blog sites. It won’t surprise you that I’m no fan of FCoE. Not for the technology itself but for the enormous complexity and organisational overhead involved.

So lets take a step back and try to figure out why this has become so much of a buzz in the storage and networking world.

First lets make it clear that FCoE is driven by the networking folks and most notably Cisco. The reason for this is that Cisco has around 90% market share of the data centre networking side but they only have around 10 to 15% of the storage side. (I don’t have the actual numbers at hand but I’ m sure it’s not far off). Brocade with their FC offerings have that part (storage) pretty well covered. Cisco hasn’t been able to eat more out of that pie for quite some time so they had to come up with something else. So FCoE was born. This allowed them (Cisco) to slow but steady get the foot in the storage door by offering a, so called,  “new” way of doing business in the data centre and convince customers to go “converged”.

I already explained that their is no or negligible benefit from an infrastructural and power/cooling perspective so cost-effectiveness from a capex perspective is nil and maybe even negative. I also showed that the organizational overhaul that has to be accomplished is tremendous. Remember you’re trying to glue two different technologies together by adding a new one. The June-2009 FC-BB-5 document (where FCoE is described) is around 1.9 MB and 180 pages give or take a few. FC-BB-6 is 208 pages and 2.4 MB thick. How does this decrease complexity?
Another part that you have to look at is backward compatibility. The Fibre Channel standard went up to 16Gb/s a while ago and most vendors have released product for it already. The FC standard does specify backward compatibility to 2Gb/s. So I’m perfectly safe when linking up an 16G SFP with a 8Gb/s or 4 Gb/s SFP and the speed will be negotiated to the highest possible. This means I don’t have to throw away some older, not yet depreciated, equipment. How does Ethernet play in this game? Well, it doesn’t, 10G Ethernet is incompatible with 1G so they don’t marry up. You have to forklift your equipment out of the data center and get new gear from top to bottom. How’s that for investment protection? The network providers will tell you this migration process comes naturally with equipment refresh but how do you explain that if you have to refresh one or two director class switches were your other equipment can’t connect to it this is a natural process? This means you have buy additional gear that bridges between the old and the new; resulting in you paying even more. This is probably what is meant by “naturally”. “Naturally you have to pay more.”

So it’s pretty obvious that Cisco needs to pursue this path will it ever get more traction in the data center storage networking club. They’ve also proven this with UCS, which looks like to fall off the cliff as well when you believe the publications in the blog-o-sphere. Brocade is not pushing FCoE at all. The only reason they are in the FCoE game is to be risk averse. If for some reason FCoE does take off they can say they have products to support that. Brocade has no intention of giving up an 80 to 85% market share in fibre channel just to be at risk to hand this over the other side being Cisco Networking. Brocade’s strategy is somewhat different than Ciscos’. Both companies have outlined their ideas and plans on numerous occasions so I’ll leave that for you to read on their websites.

“What about the other vendors?”  you’ll say. Well that’s pretty simple. All array vendors couldn’t care less. For them it’s just another transport mechanism like FC and iSCSI and there is no gain nor loss if FCoE makes it or not. They won’t tell you this in your face of course. The other connectivity vendors like Emulex and Qlogic have to be on the train with Cisco as well as Brocade however their main revenue comes out of the server vendors who build products with Emulex or Qlogic chips in them. If the server vendors demand an FCoE chip either party builds one and is happy to sell it to any server vendor. For the connectivity vendors like these it’s just another revenue stream they link into and cannot afford to be outside a certain technology if the competition is picking it up. Given the fact there is some significant R&D required w.r.t. chip development these vendors also have to market their kit to have some ROI. This is normal market dynamics.

“So what alternative do you have for a converged network?” was a question that was asked to me a while ago. My response was “Do you have a Fibre Channel infrastructure? If so, then you already have a converged network.” Fibre Channel was designed from the bottom up to transparently move data back and forth irrespective of the upper protocol used including TCP/IP. Unfortunately SCSI has become the most common but there is absolutely no reason why you couldn’t add a networking driver and the IP protocol stack as well. I’ve done this many times and never have had any troubles with it.

The question is now: “Who do you believe?” and “How much risk am I willing to take to adopt FCoE?”. I’m not on the sales side of the fence not am I in marketing. I work in a support role and have many of you on the phone when something goes wrong. My background is not in the academic world. I worked my way up and have been in many roles where I’ve seen technology evolve and I know when to spot bad ones. FCoE is one of them.

Comments are welcome.

Regards,
Erwin

Will FCoE bring you more headaches?

Yes it will!!.

Bit of a blunt statement but here’s why.

When you look at the presentations all connectivity vendors (Brocade,Cisco,Emulex etc…) will give you they pitch that FCoE is the best thing since sliced bread. Reduction in costs, cooling, cabling and complexity will solve all of your to-days problems! But is this really true?

Let start with costs. Are the cost savings really that big as they promise. These days a server 1G Ethernet port sits on the motherboard and is more or less almost a free-bee. Expectation is that the additional cost of 10Ge will be added to a server COG but as usual they will decline over time. Most servers come with multiple of these ports. On average a CNA is 2 times more expensive then 2 GE ports + 2 HBA’s so that’s not a reason to jump to FCoE. Each vendor have different price lists so that’s something you need to figure out yourself. The CAPEX is the easy part.

An FCoE capable switch (CEE or FCF) is significantly more expensive than an Ethernet switch + a FC switch. Be aware that these are data center switches and the current port count on an FCoE switch is not sufficient to deploy large scale infrastructures.

Then there is the so called power and cooling benefit. (?!?!?) I searched my butt of to find the power requirements on HBA’s and CNA’s but no vendor is publishing these. I can’t imagine an FC HBA chip eats more than 5 watts however a CNA will probably use more given the fact it runs on a higher clock speed and for redundancy reasons you need two of them anyway so in general I think these will equate to the same power requirements or an eth+hba combination is even more efficient than CNA’s. Now lets compare a Brocade 5000 (32 port FC switch) with a Brocade 8000 FCoE from a BTU and power rating perspective. I used their own specs according to their data sheets so if I made a mistake don’t blame me.

A Brocade 5000 uses a maximum of 56 watts and has a BTU rating of 239 at 80% efficiency. An 8000 FCoE switch uses 206 watts when idle and 306 watts when in use. The BTU heat dissipation is 1044.11 per hour. I struggled to find any benefit here. Now you can say that you also need an Ethernet switch but even if that has the same ratings as a 5000 switch you still save a hell of a lot of power and cooling requirement on separate switches. I haven’t checked out the Cisco, Emulex and Qlogic equipment but I assume I’m not far off on those as well.

Now, hang on, all vendors say there is a “huge benefit” in FCoE based infrastructures. Yes, there is, you can reduce your cabling plant but even there is a snag. You need very high quality cables so an OM1 or OM2 cabling plant will not do. As a minimum you need OM3 but OM4 is preferred. Do you have this already? If so good you need less cabling, if not buy a completely new plant.

Then there is complexity. Also an FCoE sales pitch. “Everything is much easier and simpler to configure if you go with FCoE”. Is it??? Where is the reduction in complexity when the only benefit is that you can get rid of cabling. Once a cabling plant is in place you only need to administer the changes and there is some extremely good and free software to do that. So even if you consider this as a huge benefit what do you get in return. A famous Dutch football player once said “Elk voordeel heb z’n nadeel” (That’s Dutch with an Amsterdam dialect spelling :-)) which more or less means that every benefit has it’s disadvantage i.e. there is a snag with each benefit.

The snag here is you get all the nice features like CEE,DCBX,LLDP,ETS,PFC,FIP,FPMA and a lot more new terminology introduced into you storage and network environment. (say what???). This more or less means that each of these abbreviations needs to be learned by your storage administrators as well as you network administrators, which means additional training requirements (and associated costs). This is not a replacement for your current training and knowledge but this comes on top of that.
Also these settings are not a one-time-setup which can be configured centrally on a switch but they need to be configured and managed per interface.

In my previous article I also mentioned the complete organizational overhaul you need to do between the storage and networking department. From a technology standpoint these two “cultures” have a different mindset. Storage people need to know exactly what is going to hit their arrays from an applications perspective as well as operating systems, firmware, drivers etc. Network people don’t care. They have a horizontal view and they transport IP packets from A to B irrespective of the content of that packet. If the pipe from A to B is not big enough they create a bigger pipe and there we go. In the storage world it doesn’t work like this as described before.

Then there is the support side of the fence. Lets assume you’ve adopted FCoE in your environment. Do you have everything in place to solve a problem when it occurs. (mind the term “when” not “if”)  Do you know exactly what it takes to troubleshoot a problem. Do you know how to collect logs the correct way? Have you ever seen a Fibre Channel trace captured by an analyzer? If so, where you able to bake some cake of it and actually are able to pinpoint an issue if there is one and more importantly how to solve this? Did you ever look at fabric/switch/port statistics on a switch to verify if something is wrong? For SNIA I wrote a tutorial (over here) in which I describe the overall issues support organisations face when a customer calls in for support and also what to do about it. The thing is that network and storage environments are very complex. By combining them and adding all the 3 and 4 letter acronyms mentioned above the complexity will increase 5-fold if not more. It therefore takes much and much longer to be able to pin-point an issue and advise on how to solve it.

I work in one of those support centers of a particular vendor and I see FC problems every day. Very often due to administrator errors but far more because of a problem with software or hardware. These can be very obvious like a cable problem but in most cases the issue is not so clear and it take a lot of skills, knowledge, technical information AND TIME to be able to sort this out. By adding complexity it just takes more time to collect and analyze the information and advise on resolution paths. I’m not saying it becomes undo-able but it just takes more time. Are you prepared and are you willing to provide your vendor this time to sort out issues?

Now, you probably think I must hold a major grudge against FCoE. On the contrary; I think FCoE is a great technology but it’s been created for technologie’s sake and not to help you as customer and administrator to really solve a problem. The entire storage industry is stacking protocols upon protocols to circumvent the very hard issue that they’ve screwed up a long time ago. (Huhhhh, why’s that?)

Be reminded that today’s storage infrastructure is still running on a 3 decade old protocol called SCSI (or SBCCS for z/OS which is even older). Nothing wrong with that but it implies that shortcomings of this protocol needs to be circumvented. SCSI originally ran on a parallel bus which was 8-bit wide and hit performance limitations pretty quick. So they created “wide scsi” which ran on a 16-bit wide bus. With increase of the clock frequencies they pumped up the speed however the problem of distance limitations became more imminent and so they invented Fibre-Channel. By disassociating the SCSI command set from the physical layer the T10 committee came up with SCSI-3 which allowed the SCSI protocol to be transported over a serialized interface like FC which had a multitude of benefits like speed, distance and connectivity. The same thing happened with Escon in the mainframe world. Both the Escon command set (SBCCS now known as Ficon) as well as SCSI (on FC known as FCP) are now able to run on the FC-4 layer. Since Ethernet back then was extremely lossy this was no option for a strict lossless  channel protocol with low latency requirements. Now that they have fixed up Ethernet a bit to allow for loss-less transport over a relatively fast interface they now map the entire stack into a mini-jumbo frame and the FCP-4 SCSI command and data sits in a FC encapsulated frame which in turn now sits in an Ethernet frame. (I still can’t find the reduction in complexity, if you can please let me know.)

What should have been done instead of introducing a fixer-upper like FCoE is that the industry should have come up with an entirely new concept of managing, transporting and storing data. This should have been created based on todays requirements which include security (like authentication and authorization), retention, (de-)duplication , removal of awareness of locality etc. Your data should reside in a container which is a unique entity on all levels from application to the storage and every mechanism in between. This container should be treated as per policy requirements encapsulated in that container and those policies are based on the content residing in there. This then allows for a multitude of properties to be applied to this container as described above and allows for far more effective transport

Now this may sound like trying to boil the ocean but try to think 10 years ahead. What will be beyond FCoE? Are we creating FCoEoXYZ? 5 Years ago I wrote a little piece called “The Future of Storage” which more or less introduced this concept. Since then nothing has happened in the industry to really solve the data growth issue. Instead the industry is stacking patch upon patch to circumvent current limitations (if any) or trying to generate a new revenue stream with something like the introduction of FCoE.

Again, I don’t hold anything against FCoE from a technology perspective and I respect and admire Silvano Gai and the others at T11 what they’ve accomplished in little over three years but I think it’s a major step in the wrong direction. It had the wrong starting point and it tries to answer a question without anyone asking.

For all the above reasons I still do not advise to adopt FCoE and urge you to push your vendors and their engineering teams to come up with something that will really help you to run your business and not patching up “issues” you might not even have.

Constructive comments are welcome.

Kind regards,
Erwin van Londen