Article aug2007.tar

Getting to Know the SolarisTM iSCSI Stack

Ryan Matteson

iSCSI is rapidly gaining traction in the storage networking world. There are several reasons for this. First, the iSCSI protocol was developed in an open forum, which helps to ensure that iSCSI solutions from different vendors will interoperate with each other. Second, end nodes in iSCSI networks operate on block storage, which allows iSCSI storage to be managed by existing storage tools (e.g., storage monitoring tools, volume managers, file systems, etc.). Third, customers can reduce infrastructure costs by deploying iSCSI, since iSCSI can work over Ethernet networks, which are typically less expensive than their Fibre Channel counterpart. And finally, since iSCSI uses the TCP/IP protocol, customers can use existing management frameworks to monitor and manage their storage infrastructure.

Based on this rapid growth, several of the large storage vendors have extended their storage offerings to include iSCSI support. One vendor, Sun Microsystems, has enhanced their Solaris operating system to support iSCSI. Solaris 10 currently ships with an iSCSI software initiator, and recent builds of Nevada (Nevada is the development version of Solaris that will eventually become Solaris 11) contain an iSCSI target implementation.

In this article, I'll provide an introduction to iSCSI and describe how to set up a Solaris 10 host to act as an iSCSI initiator (the endpoint reponsible for initiating iSCSI requests -- i.e., the "client"), and a Nevada host to act as an iSCSI target (the endpoint responsible for receiving and processing iSCSI requests from one or more initiators -- i.e., the "server"). I'll also discuss some of the new iSCSI functionality that is available in recent builds of Nevada, to give storage administrators an idea of what is coming in future Solaris 10 updates.

iSCSI Naming

iSCSI uses unique names to identify each target or initiator in a network element (a system capable of acting as an initiator or target). iSCSI names are unique to each node, and come in two formats: enterprise unique identifiers (EUI) and iSCSI qualified names (IQN). EUI addresses consist of 16 hexadecimal digits and are prefixed by the string "eui." Here is an example of an iSCSI name in EUI format:

eui.02004567A425678D

IQN-formatted addresses contain a date string, the domain of a naming authority, a unique string to identify the node, and are prefixed with the string "iqn." Here is an example of an iSCSI name in IQN format:

iqn.1986-03.com.sun:01:0003ba0e0795.4455571f

The Solaris initiator can access targets that use EUI names, but defaults to assigning IQN names to each initiator and target (IQN names are assigned automatically when the iSCSI initiator and target are initialized). Since Solaris uses IQN names by default, the rest of this article will use this naming method. If you're interested in learning more about iSCSI naming conventions, please see the references for additional details.

iSCSI Connections

The initiator and target in iSCSI contain one or more "portals." Each portal contains an IP address and port number, which initiators and targets use to determine the set of interfaces on which they can initiate or accept iSCSI connections. All connections between an iSCSI initiator portal and target portal are associated with a specific "session." iSCSI uses sessions to link logical connections and to ensure the ordered delivery of commands between initiators and targets. An initiator can create one or more sessions to a target, and each session can have one or more TCP connections associated with it. If a session contains more than one TCP connection, the session is referred to as a multiple-connection session, or MC/S for short.

Solaris iSCSI Software Initiator Configuration

Now that I've introduced the important iSCSI concepts, let's move right into configuring a Solaris 10 host to act as an initiator. The Solaris 10 initiator is controlled by the service management facilities iscsi_initiator service. This service is disabled by default and must be enabled before configuring the initiator. To enable the iSCSI initiator service, the svcadm utility can be run with the "enable" command and the service name:

$ svcadm enable iscsi_initiator

After the iscsi_initiator service is enabled, the iscsiadm utility can be used to manage the configuration of the initiator. The iscsiadm utility takes a command as its first argument, a subcommand to indicate what the command applies to as the second argument, and allows one or more options to be passed to the subcommand. The list of available commands can be viewed by running iscsiadm without any arguments:

$ iscsiadm
Usage:  iscsiadm -?,-V,--help
Usage:  iscsiadm add [-?] <OBJECT> [-?] [<OPERAND>]
Usage:  iscsiadm list [-?] <OBJECT> [-?] [<OPERAND>]
Usage:  iscsiadm modify [-?] <OBJECT> [-?] [<OPERAND>]
Usage:  iscsiadm remove [-?] <OBJECT> [-?] [<OPERAND>]

For more information, please see iscsiadm(1M).

One of the most useful commands is list, which can be used to list the configured discovery methods, as well as the configuration of an initiator or target. To use the list command to view the qualified name of an initiator, iscsiadm can be run with the list command and initiator-node subcommand:

$ iscsiadm list initiator-node
Initiator node name: iqn.1986-03.com.sun:01:0003ba0e0795.4455571f
Initiator node alias: -
Login Parameters (Default/Configured):
        Header Digest: NONE/-
                Data Digest: NONE/-
Authentication Type: NONE i
        RADIUS Server: NONE
        RADIUS access: unknown
Configured Sessions: 1

The list initiator-node output displays the initiators IQN, the parameters to use during session establishment, and the number of sessions that will be used between the initiator and the target. I will show how this information is used a bit later in the article.

In order for the Solaris initiator to use a target, it must be configured with a discovery method -- a way of identifying targets on the network. Solaris supports three discovery methods: static discovery, SendTargets, and iSNS.

With static discovery, the initiator is manually configured with a list of targets, and the portals through which the targets are presented. Static discovery can be enabled by running iscsiadm with the modify command, the discovery subcommand, the --static option, and the keyword enable:

$ iscsiadm modify discovery --static enable

Once static discovery is enabled, iscsiadm can be used to add targets to the host. To add a target with the IQN iqn.1986-03.com.sun:02:21947caf-20ca-c035-c95c-dbb96a87cf89.tigger that is presented through the network portal 192.168.1.13:3260, iscsiadm can be run with the add command, the static-config subcommand, the IQN of the target to add, and the IP address and optional port number of the portal that is presenting the target:

$ iscsiadm add \
  static-config iqn.1999-08.com.array:sn.01234567,192.168.1.3:3260

The second discovery method Solaris supports is SendTargets. SendTargets allows one or more network portals to be configured on an initiator, and the initiator will query these portals during a discovery session to locate targets that have been presented to it. To enable the SendTargets discovery method, iscsiadm can be run with the modify command, the discovery subcommand, the --sendtargets option, and the keyword enable:

$ iscsiadm modify discovery --sendtargets enable

After SendTargets discovery is enabled, iscsiadm can be run with the add command, the discovery-address subcommand, and the IP addresses and optional port number of the portal(s) to query:

$ iscsiadm add discovery-address 192.168.1.13:3260

The final discovery method supported by Solaris is iSNS. iSNS allows the initiator to be configured with the IP address and port of an iSNS server. During the discovery phase, the iSCSI initiator will query the configured iSNS server for the list of portals and targets that have been allocated to the initiator.

To enable iSNS discovery, iscsiadm can be run with the modify command, the discovery subcommand, the --isns option, and the keyword enable:

$ iscsiadm modify discovery --isns enable

After iSNS discovery is enabled, iscsiadm can be run with the add command, the isns-server subcommand, and the IP address and optional port of an iSNS server to use:

$ iscsiadm add isns-server 192.168.1.13:3205

Once the initiator is configured with a valid discovery method, the initiator should see one or more targets (assuming targets have been made available to the initiator) when the iscsiadm utility is run with the list command, the target subcommand, and optionally the -v (verbose) flag:

$ iscsiadm list target -vS
Target: iqn.1986-03.com.sun:02:21947caf-20ca-c035-c95c  \
        -dbb96a87cf89.tigger
Alias: tigger
        TPGT: 1
        ISID: 4000002a0000
        Connections: 1
            CID: 0
              IP address (Local): 192.168.1.3:32772
              IP address (Peer): 192.168.1.13:3260
              Discovery Method: SendTargets
              Login Parameters (Negotiated):
                  Data Sequence In Order: yes
                        Data PDU In Order: yes
                  Default Time To Retain: 20
                        Default Time To Wait: 2 Error
                  Recovery Level: 0
                        First Burst Length: 65536
                        Immediate Data: yes
                        Initial Ready To Transfer (R2T): yes
                        Max Burst Length: 262144
                        Max Outstanding R2T: 1
                        Max Receive Data Segment Length: 8192
                        Max Connections: 1
                        Header Digest: NONE
                        Data Digest: NONE

        LUN: 1
            Vendor:  SUN Product: SOLARIS OS Device Name:
            /dev/rdsk/c1t010000CBC18475E900002A00457C908Dd0s2
        LUN: 0
            Vendor:  SUN Product: SOLARIS OS Device Name:
            /dev/rdsk/c1t010000CBC18475E900002A00457C908Ad0s2

The list output shows that two LUNs, LUN 0 and LUN 1, are presented through the target iqn.1986-03.com.sun:02:21947caf-20ca-c035-c95c-dbb96a87cf89.tigger. We can also see the list of parameters that were negotiated between the initiator and target, and the session id (ISID) that is associated with the session. Each iSCSI device can be managed identically to local disk devices. The format utility can be used to identify and partition iSCSI devices, newfs or mkfs can be used to create a file system on a partition, and mount can be used to mount the file system for general purpose use. For additional information on using iscsiadm(1m), please see the Solaris manual page.

Solaris iSCSI Target Configuration

In recent releases of Nevada, an iSCSI target implementation was integrated. The iSCSI target is managed by the service management facility and, like the iSCSI initiator, is not enabled by default. To enable the iSCSI target, the svcadm utility can be run with the enable option and the target's SMF service name:

$ svcadm enable iscsitgt

The iscsitadm utility uses an expression syntax similar to iscsiadm. Commands are used to indicate the action to perform, subcommands control what that the action is applied to, and one or more options can be passed to the subcommand. To view the list of commands, iscsitadm can be run without any options:

$ iscsitadm
Usage:  iscsitadm -?,-V,--help
Usage:  iscsitadm create [-?] <OBJECT> [-?] [<OPERAND>]
Usage:  iscsitadm list [-?] <OBJECT> [-?] [<OPERAND>]
Usage:  iscsitadm modify [-?] <OBJECT> [-?] [<OPERAND>]
Usage:  iscsitadm delete [-?] <OBJECT> [-?] [<OPERAND>]
Usage:  iscsitadm show [-?] <OBJECT> [-?] [<OPERAND>]

For more information, please see iscsitadm(1M).

To begin using the iSCSI target, a base directory needs to be created. This directory is used to persistently store the target and initiator configuration that is added through the iscsitadm utility. To create the base directory, iscsitadm can be run with the modify command, the admin subcommand, the -d option, and the directory to store the configuration:

$ iscsitadm modify admin -d /etc/iscsi

Each target will present one or more block devices to initiators, which will require the system acting as the target to have one or more free block devices available, or enough free space to store one or more files that act as the backing store. If block devices are used, they can take three forms:

1. A single device, which is made available to the iSCSI target through an entry in /dev/dsk/ (e.g., /dev/dsk/c2t0d0)

2. A Solaris Volume Manager metadevice, which is made available to the iSCSI target through an entry in /dev/md/dsk/ (e.g., /dev/md/dsk/d100)

3. A pseudo-volume in a ZFS pool, which is made available to the iSCSI target through an entry in /dev/zvol/dsk/<dataset name>/ (e.g., /dev/zvol/dsk/stripedpool/iscsivol000)

ZFS provides end-to-end data protection, data compression, and the ability to automatically share out ZFS volumes as iSCSI targets, so I will use ZFS block devices (zvols for short) in my examples. To create a ZFS volume for use with iSCSI, a ZFS pool will need to be identified to back the volume. If a ZFS pool is not available, one can be created by first choosing a RAID protection level (ZFS supports RAID0, RAID1, RAIDZ, and RAIDZ2) and then invoking the zpool utility with the "create" option, the name of the pool to create, and the devices to add to the pool:

$ zpool create stripedpool c0d1 c1d0 c1d1

After the pool is created, the zfs utility can be used to create ZFS volumes. To create two zvols each 1GB in size, the zfs utility can be run with the create subcommand, the -V option to indicate that a volume should be created, the size of the volume, and the location in the pool to store the volume:

$ zfs create -V 1g stripedpool/iscsivol000

$ zfs create -V 1g stripedpool/iscsivol001

I could have also included the -s option to create a sparse volume. Sparse volumes will not be allocated any storage up front but will grow to the specified size as data blocks in the volume are written. This allows storage to be oversubscribed, or in storage jargon, the storage can be "thinly provisioned."

Once the volumes are created, they need to be exported to an initiator. This can be done with the ZFS shareiscsi property, or through the iscsitadm utility. To use iscsitadm, the command can be run with the create command, the target subcommand, the block device to use, and a name to assign to the target (if the target specified already exists, the device is presented as a LUN behind that target). The following example creates two targets and associates the zvols we created above with the new targets:

$ iscsitadm create target -b \
  /dev/zvol/dsk/stripedpool/iscsivol000 tigger-tgt0

$ iscsitadm create target -b \
  /dev/zvol/dsk/stripedpool/iscsivol001 tigger-tgt1

After the targets are created, the list command and target subcommand can be used to display the targets and their properties:

$ iscsitadm list target -v
Target: tigger
    iSCSI Name: iqn.1986-03.com.sun:02:21947caf-20ca-c035 \
                -c95c-dbb96a87cf89.tigger
    Connections: 0
    ACL list:
    TPGT list:
    LUN information:
      LUN: 0
          GUID: 0    
          VID: SUN
          PID: SOLARIS
          Type: disk
          Size: 1.0G
          Backing store: /dev/zvol/dsk/stripedpool/iscsivol000
          Status: online
      LUN: 1
          GUID: 0
          VID: SUN
          PID: SOLARIS
          Type: disk Size: 1.0G
          Backing store: /dev/zvol/dsk/stripedpool/iscsivol001
          Status: online

To ensure that storage resources are accessed by authorized initiators, an ACL can be created on the target to limit which initiator IQNs can access the target, and the CHAP protocol can be configured to authenticate the initiator and target.

To simplify the management of ACLs, each IQN can be assigned an alias. This allows a descriptive name to be assigned to each IQN, which makes ACLs easier to interpret and manage. To assign the alias "tigger" to the IQN "iqn.1986-03.com.sun:01:0003ba0e0795.4455571f," the iscsitadm utility can be run with the create command, the initiator subcommand, an IQN, and the alias to associate with that IQN:

$ iscsitadm create initiator -n \
  iqn.1986-03.com.sun:01:0003ba0e0795.4455571f tigger

After the alias is created, it can be added to a target's ACL list by running iscsitadm with the modify command, the target subcommand, the -l option, the alias to add, and the name of the target to modify:

$ iscsitadm modify target -l tigger tigger

To display the list of ACLs assigned to a target, iscsitadm can be run with the list command and target subcommand:

$ iscsitadm list target -v | egrep '(Target|ACL|Init)'
Target: tigger
    ACL list:
            Initiator: tigger

Once the targets and ACLs are setup, an initiator can be configured to use the targets and LUNs that have been allocated to it. The section on configuring the Solaris initiator describes how to set up a Solaris 10 initiator to use the LUNs I presented previously.

iSCSI Performance Considerations

The Solaris iSCSI stack is built to perform and can easily saturate multiple-gigabit Ethernet links if configured correctly. When deploying high-performance iSCSI solutions, several items should be considered before choosing a network and storage architecture:

Network Infrastructure Considerations

  • Jumbo frames -- Using Ethernet jumbo frames can improve throughput between initiators and targets and reduce CPU utilization, because fewer Ethernet frames need to be transmitted.
  • Link aggregations -- Link aggregation technologies such as 802.3ad and Cisco's etherchannel can be used to aggregate multiple physical interfaces into one or more logical interfaces. This can often improve performance, because multiple links can be used to send and receive data.
  • Gigabit Ethernet -- Gigabit Ethernet or comparable high-speed network interconnect technologies should be used to improve throughput.
  • Dedicated storage networks -- Isolating storage traffic on its own network can improve performance and security and will lesson the potential issues that come with using jumbo frames on public networks.

Hardware Considerations

  • iSCSI TCP/IP offload engines (TOEs) and iSCSI HBAs -- TOEs and iSCSI HBAs allow iSCSI, TCP, IP, and physical and data link processing to be offloaded to hardware specifically designed for this purpose.
  • Use enterprise class Ethernet adaptors -- Enterprise grade adaptors typically have larger TX and RX ring buffers, and support hardware checksumming, segmentation offload, hardware packet classification, scatter gather, jumbo frames, advanced interrupt processing (e.g., MSI and MSI-X interrupts), and the latest high-performance bus technologies (e.g., PCI-X and PCI express).

Operating System Considerations

  • Ensure that TCP/IP send and receive buffers are tuned for the workload, and the sliding window algorithms have been adjusted to optimize data throughput. The ttcp, netperf, and iperf utilities can assist with this, and links to each tool are provided in the reference section.

For latency-sensitive and throughput-intensive workloads, the list of considerations above may not be enough to get your applications to perform adequately. Understanding the I/O patterns of your applications should be the first step taken in planning iSCSI storage infrastructure, and in some cases alternative storage interconnects (e.g., Fibre Channel) may be required. For additional information on performance and determining application I/O patterns, please see the references.

High Availability

As iSCSI solutions penetrate further into the enterprise, the need to deploy highly available iSCSI solutions will become a necessity. Highly available iSCSI solutions can be deployed through the use of IP multipathing and link aggregation software (e.g., Solaris IPMP, 802.3ad link aggregation, etc.), storage multipathing software (e.g., Solaris traffic manager), as well as through the use of multiple sessions and multiple connections per session. The Sun blueprint "Using iSCSI multipathing in the Solaris 10 operating system" describes these topics in detail, and a reference to the blueprint is provided in the references section.

Troubleshooting

When issues arise on iSCSI networks, it is important to have tools available to quickly troubleshoot and isolate the source of the problem. Currently the best non-commercial tool for debugging iSCSI problems is the open source protocol analysis tool Wireshark (formerly called Ethereal). Wireshark contains protocol dissectors for the iSCSI protocol, which can be useful for debugging network and performance problems. For debugging server-side issues on Solaris hosts, DTrace and truss can be valuable for locating contention points and the source of that contention. Another great outlet for debugging problems is the OpenSolaris storage list. The individuals who wrote the iSCSI software are members of this list and are quick to answer questions pertaining to the Solaris iSCSI stack.

Future iSCSI work

The OpenSolaris storage community has been extremely busy over the past year, and the Solaris kernel developers are working to integrate more iSCSI functionality into Nevada, and eventually a Solaris update. The following features are being worked on to enhance debugging, simplify administration, and to increase availability:

  • A DTrace provider for the iSCSI target
  • A standalone iSNS server
  • iSNS support in the iSCSI target
  • Sun cluster support for the iSCSI target
  • Multiple connections per session support for the iSCSI target

For detailed information on these projects and more, please check out the documentation in the OpenSolaris iSCSI, iSNS, storage, and ZFS communities:

iSCSI target community -- http://opensolaris.org/os/project/iscsitgt/

iSNS community -- http://opensolaris.org/os/project/isns/

Storage community -- http://opensolaris.org/os/community/storage/

ZFS community -- http://opensolaris.org/os/community/zfs/

Conclusion

In this article, I discussed iSCSI and showed how to use the Solaris initiator and target. Three areas that I covered in less detail are iSCSI security, scalability, high availability and performance tuning. These topics have received a fair amount of coverage in various storage communities, and the references section contains pointers to presentations and links on these topics. The initiator examples were tested on a Solaris 10 host running the 11/06 release, and the target examples were tested on a host running build 53 of Nevada.

References

Netperf -- http://www.netperf.org/netperf/NetperfPage.html

Observing I/O behavior with the DTraceToolkit -- http://www.samag.com/documents/s=9915/sam0512a/0512a.htm

iPerf network bandwidth tester -- http://dast.nlanr.net/Projects/Iperf/#whatis

iSCSI multipathing -- http://www.sun.com/blueprints/1205/819-3730.pdf

iSCSI RFC -- http://www.faqs.org/rfcs/rfc3720.html

iSCSI Security -- http://www.blackhat.com/presentations/bh-usa-05/bh-us-05-Dwivedi-update.pdf

ttcp -- http://www.netcordia.com/tools/tools-ttcp.shtml

SCSI protocol specifications -- http://www.t10.org/drafts.htm

Wireshark (Ethereal) -- http://www.wireshark.org

Acknowledgements

Ryan thanks Adam Leventhal for taking the time to review this article. He also thanks the Solaris kernel developers and the OpenSolaris storage community for their contributions to the Solaris storage stack.

Ryan Matteson works as a systems engineer, and specializes in Web technologies, SANs, and the OpenBSD, Linux, and Solaris operating systems. When Ryan isn't busy working, he enjoys playing guitar and maintaining his blog at prefetch.net. Questions and comments about this article can be addressed to matty91@gmail.com.