Monitor MD3600 array in SCOM 2012

September 30, 2015, 11:35 pm

≫ Next: Installing Windows 7 64Bit on new SSD

I am trying to configure MD3600 storage array to be monitored in SCOM 2012 R2. I imported MD Storage Array Management Pack Suite Version 6.0 and read the admin guide for it. MD3600 appears in SCOM now but the status is not monitored. I tried to configure Run As account with community string as described here http://www.dell.com/support/manuals/us/en/19/Topic/dell-msamp-v6.0-for-mscom/MDStorageArraysv6.0_IG_Pubs-v1/en-us/GUID-80AA059C-F066-4464-AEF1-FD76402AA4BE

but I am not sure if I need to configure this community string somewhere on MD3600 itself? I also added this run as account into SCOM Profile - SNMP monitoring Account

↧

Installing Windows 7 64Bit on new SSD

October 2, 2015, 1:44 am

≫ Next: SANHQ 3.1 reporting 100% usage on NAS volumes

≪ Previous: Monitor MD3600 array in SCOM 2012

Hey Guys,

i'm just getting a headdache. Windows installation doesn't find the installed ssd. I created a new VD in the PERC H310 menu and everything looks fine but windows setup doesn't see the device. Also not, when i choose the driver folder from the CD. I than downloaded some drivers from the dell homepage and choosed the windows setup to look on the usb device but... nothing works, windows doesn't "see" the ssd.

Changed vom ATA to AHCI, tested, back to ATA, tested ... how can i solve this problem?

↧

SANHQ 3.1 reporting 100% usage on NAS volumes

October 7, 2015, 8:04 am

≫ Next: PowerVault 114X LTO4 drive - frequent problem inserting tape

≪ Previous: Installing Windows 7 64Bit on new SSD

Since upgrading to 3.1, SANHQ has been alerting for Critical In-use space on 2 volumes. There are 6 NAS volumes on that group and there's enough space for the NAS. I believe the data distribution on those volumes is done automatically by the controller, so this distribution would be considered normal under these circumstances. Should I be worried that SANHQ is alerting me on those 2 volumes? is there something I can do to improve data distribution across NAS volumes? lastly, if this is considered normal, is there a setting on SANHQ to prevent this 'false positive' alert?

↧

PowerVault 114X LTO4 drive - frequent problem inserting tape

November 4, 2015, 11:02 am

≫ Next: Reconstruction after Syswipe...

≪ Previous: SANHQ 3.1 reporting 100% usage on NAS volumes

Our customer is having an intermittent problem with their PowerVault 114X LTO4 drive. It's a standalone drive, not a changer. The unit just turned 3 years old last month, and has been out of warranty for about a month now.

The customer is experiencing an intermittent problem when trying to insert a new tape. He's able to eject the prior night's backup tape fine, but when trying to install the next tape, it does not always pull in the next tape. The way it's being described, it's like it's hitting a hard stop on something. When this happens, he's having to power cycle the tape unit, at which point it will accept the tape.

The backup jobs themselves have been running with no problems, and the tapes will eject with no problems.

Since it's out of warranty, an out-of-warranty support call with a Dell tech is $600, and that only includes being able to talk with a tech to try to narrow down the actual problem, not fix the problem if it's a physical problem with the drive.

At this point, I'm just looking to see if anyone else has ideas on what may be causing this.

↧

Reconstruction after Syswipe...

November 9, 2015, 11:04 pm

≫ Next: md3000i Degraded Physical Disk Channel

≪ Previous: PowerVault 114X LTO4 drive - frequent problem inserting tape

hello,

Is there any possibility to restore or reconstruct array configuration after using syswipe on MD3000i?

thanks I.

↧

md3000i Degraded Physical Disk Channel

November 19, 2015, 10:29 am

≫ Next: md3000i + md1000 + 15x450Gb 15kSAS = Poor Performance

≪ Previous: Reconstruction after Syswipe...

Hi,

I have an md3000i with attached md1000. We had a HDD fail which the hot spare took over with no down time or issues.

Now I replaced the failed drive and it was put back into the array. Once that completed now I'm showing Degraded Physical Disk Channel on channel 0,1 (I'm assuming that's the LUN number since LUNs 0 and 1 are part of the RAID group with the failed disk).

Also it seems to have created a Virtual Disk Not On Preferred Path issue. This md3000i has dual controllers and the ESX hosts are all multipathed but not sure why all of this is coming around after replacing a failed disk.

Thanks,
Josh.

↧

md3000i + md1000 + 15x450Gb 15kSAS = Poor Performance

December 16, 2015, 1:27 pm

≫ Next: Power Supply / Cooling Fan issue

≪ Previous: md3000i Degraded Physical Disk Channel

Morning,

We recently acquired a used md3000i with an md1000 added shelf. The 3000i has 15x450Gb 15k SAS (in one RAID6 LUN) Dell drives. The 1000 has 15x 1Tb 7200rpm (in 2 RAID 6 LUNs).

I have this hooked up to two clusters of ESXi machines using multipathing and iSCSI on a dedicated iSCSI switch. This switch also has our original SAN which is similar but more of a mishmash of drives.

The two clusters are 4xr710 and 4xPE2950.

Performance to the original SAN is ok. It has too many VMs on it and I need to upgrade the firmware. So we got the 2nd SAN so I could update the first SAN without powering down 60 VMs.

I updated the unit to the latest firmware:

Current configuration
Firmware version: 07.35.39.64
NVSRAM version: N1532-735890-005
EMW version: 03.35.G6.50
AMW version: 03.35.G6.50

I have successfully copied data to the unit and it was performing great until about 2 weeks into production when everything shat the bed. VMWare storage migration wouldn't even migrate the machines out because it was timing out. I had to SSH into each host and manually move the machines from the new san to the old san. This process took a really long time. I guessed it was transfering at about 5mb/sec. The average latency on LU0 was over 90seconds (90,000ms), which of course would cause vMotion to time out.

Now I don't know what kind of tests I can perform on this, that's what I'm looking for. Trying to see if it's a controller issue or if it's a drive issue, back plane...I don't know but I would like to find out.

Thanks!
Josh!

↧

Power Supply / Cooling Fan issue

December 16, 2015, 10:44 pm

≫ Next: Dell CX4-120 Fault SPS and Failed/Removed status

≪ Previous: md3000i + md1000 + 15x450Gb 15kSAS = Poor Performance

Hello there,

I have problem in our PowerEdge MD36000f which is the left fan (Slot 1) has problem in the power supply. I called the service support this morning and the quality of the connection wasn't good at all. They phoned me back on my mobile number and they asked me to wait. The call cut off and they did not call me again. We have a 4-hours support warranty and did not get the benefit of that so far.

Is there any other way to sort this out, please? My location is in Saudi Arabia and the service tag number is <ADMIN NOTE: Service tag removed per privacy policy>

Kindest,

↧

Dell CX4-120 Fault SPS and Failed/Removed status

January 4, 2016, 6:18 am

≫ Next: LUNs unknown group won't come online / rebuild Dell CX4-120

≪ Previous: Power Supply / Cooling Fan issue

Hello,

I have a Dell CX4-120 reporting some faults.

1\Disk array Enclosure Bus 0 Enclosure 1 is faulted

2\Disk Processor Enclosure SPE is faulted

3\Standby Power Supply SPE SPS B is faulted (I have checked the cabling is all ok with this and carried out a power down and back up)

4\Disk 10/12/7 bus 0 enclosure 1 is faulted.

It is showing 4 disks removed the disks are connected and have a amber light, i removed each disk in turn and reseated and still got the same error, i then did the same but moved to another known working slot to check if the issue follows the disk or the backplane the issue shows its the disks.

(I left disconnected and then connected and waited 1 minute at a time)

I have a spare disk so swapped one of the reported slots "removed" with the new disk and its showing online.

so suspect all the disks have failed that are reporting "removed"

i am awaiting more replacement disks but with it reporting 4 disks at fault and in RAID6 i suspect i will have lost the Raid data?

The setup is

enclosure 1 15 x 1TB SATA 7.5 Disks

enclosure 0 15 x 146GB 15k Disks with the first 4 with the RAID software on them

EMC PN for the disks that have failed are 118032589 REV:A04 Seagate

can i use other disks if so which?

Or does it seem that the enclosure is at fault and the disks are ok and a new enclosure is required?

NaviSphere 6

any help would be appreciated

Thanks

↧

LUNs unknown group won't come online / rebuild Dell CX4-120

January 7, 2016, 11:58 am

≫ Next: Dell EMC CX4-120 reset disk command?

≪ Previous: Dell CX4-120 Fault SPS and Failed/Removed status

Hello,

I have a dell emc cx4-120

2 enclosures

Raid 6

I have disk failed replaced and Show rebuild 0%

I have found that if I put the failed disks back in they show as removed but rebuild 100%

I can replace the disks but disk 10 and get disks showing enabled and rebuilt but soon as connect disk 10 the rebuild changes to 0% and transioning and doesn't change

And I am unable to bring any of the Luns online

If I have the replacement disks in there are no errors just unknown Luns which cannot get online

If I have the old disk 10 in it reports failed disk and enclosure fault

The disks are emc disks same size as before 1tb

The Luns are set to auto own by sp but also tried manual

Rebuild tried adsp and high but doesn't rebuild with new disks in not even 1% in two days

I have downed all hosts and sp and enclosures and back on in order and good time between to no avail

Any ideas would be greatly appreciated

Thanks Danny

↧

Dell EMC CX4-120 reset disk command?

January 8, 2016, 5:56 am

≫ Next: Placing MD3220 controller on-line

≪ Previous: LUNs unknown group won't come online / rebuild Dell CX4-120

Hello,

I have disks reporting as removed i need the command or process on how to clear the removed flag

trying to restore the LUNs and disk arrary

Thanks Danny

↧

Placing MD3220 controller on-line

January 8, 2016, 12:44 pm

≫ Next: Dell PowerVault 124T

≪ Previous: Dell EMC CX4-120 reset disk command?

A SAS controller was replaced on an MD3220. The controller now shows as being in service mode and the recovery guru walks through the steps to bring it on-line, the main step being go to Advanced->Recovery->Place Controller Module ... within Storage Manager

This menu item (with a number of others in the Recovery menu) is grayed out.

How can I get around this issue to bring the controller back on-line?

Thanks

Raja

↧

Dell PowerVault 124T

January 11, 2016, 5:20 pm

≫ Next: MD3820i MPIO

≪ Previous: Placing MD3220 controller on-line

Hi, i am new in dell world and i have a concern regarding powervault 124t tape loader. My Robotic library is DELL PV-124 T and my Tape Drive is IBM Ultrium-HH4. i have a backup solution called backup exec 15 installed on my dell power edge 320. everytime i run backup the state always turn offline and the error message is "Robotic Library destination element full error" does i have a problem with robotic library or in tape drive or on my backupexec 15 backup solution?

i already test using ITDT graphical Edition and this software detects the IBM ultrium HH4 tape drive.

↧

MD3820i MPIO

January 12, 2016, 12:14 pm

≫ Next: MD1000 slows way down on large file transfers

≪ Previous: Dell PowerVault 124T

I am running a MD3820i (storage) with 2 controllers. I have 4 nodes (server 2012 r2) configured and connected to the storage however it seems that only one path is being used.

I read in a post that this is because only one(1) controller is the preferred path for the LUNs and the other is fail over. If that is true how can i get both controllers to be active so the load can be shared.

Other suggestions as to why i am having only one path being used are also welcomed.

↧

MD1000 slows way down on large file transfers

January 20, 2016, 5:38 am

≫ Next: Need Win 7 Driver for PERC 9 h330 Mini

≪ Previous: MD3820i MPIO

We have two MD1000 SAS storage arrays hooked up to a Dell R710 server. During a large file transfer (40+ GB) to one of the MD1000 arrays it will slow way down to a crawl. It looks as if it is slowing down at the same spot every time we attempt to transfer a file. The other MD1000 that is hooked to the server has no problem with file transfers. This only happens when we copy to the MD1000. Copying from the MD1000 works as it should. Smaller file transfer copy fine. The Dell Sever Manager shows no errors with any of the hardware. PERC card batteries do not show low or bad. Any suggestions appreciated!

↧

Need Win 7 Driver for PERC 9 h330 Mini

January 28, 2016, 3:52 pm

≫ Next: Degraded Hard drive

≪ Previous: MD1000 slows way down on large file transfers

I need to install Win 7 64 on a Precision Rack 7910 workstation. The installation disk for SP1 doesn't have a driver for the RAID controller. The download from the 7910 support page is an exe file that isn't recognized and won't let me extract the driver to copy it. I haven't been able to find one that I can put on a thumb drive and that the Windows installation routine will recognize as a valid driver.

I've tried searching the Dell site and Googling for it, also tried to find it on another 7910 running Win 10.

Can you help me to find it or suggest a work-around?

↧

Degraded Hard drive

February 4, 2016, 4:52 am

≫ Next: H310 Mini MessageID: STOR0210, Message: SCSI sense data (Sense key: B Sense code: 4B Sense qualifier: 3) - problem moves around

≪ Previous: Need Win 7 Driver for PERC 9 h330 Mini

Hi,

We have poweredge R910 that is configured with 4 drive in RAID 5 and 2 drives in RAID 1 and othere 2 Drives Also in RAID 1.

Now when I check the Manament server I can see that one of the disk of one of the RAID 1 is showing Degraded and I can see that one of the disks has red X so we have to replace this drive.

if I understand it correctly we must replace the drive with exact the same drive.

Question:

the current drive has order numer of <ADMIN NOTE: Order number removed per privacy policy>

we can not find with the same order numer any drive so I did search the Dell website and find the same drive but has the order numer <ADMIN NOTE: Order number removed per privacy policy>

can we use this drive to replace the defect one? also I think the only thing we must do is without shuting down the server just remove the defect drive and replace it with new one is this also correct?

defect drive:

Physical Disks


ID	0:0:6
Status	Critical
Name	Physical Disk 0:0:6
State	Failed
Bus Protocol	SAS
Media	HDD
Revision	DE09
T10 PI Capable	No
Certified	Yes
Capacity	837.75GB
Used RAID Disk Space	837.75GB
Available RAID Disk Space	0.00GB
Hot Spare	No
Vendor ID	DELL(tm)
Product ID	AL13SEB900
Serial No.	54U0A0RHFRD2
Part Number	PH0RC34W7557145V8F73A03
Negotiated Speed	6.00 Gbps
Capable Speed	6.00 Gbps
Sector Size	512B
Manufacture Day	07
Manufacture Week	22
Manufacture Year	2014
SAS Address	500003958840DE4A

↧

H310 Mini MessageID: STOR0210, Message: SCSI sense data (Sense key: B Sense code: 4B Sense qualifier: 3) - problem moves around

February 5, 2016, 11:53 am

≫ Next: Dell SC 2020 - Failed to re-balance ports

≪ Previous: Degraded Hard drive

R720xd with Perc H310 Mini.18 Drives
16 of the drives are unraided.
2 drives are Raid1 (they are the last two drives on the bus bays 16 and 17)

O/S is Red Hat 6.7

OMSA has started populating the messages file with these

Server_Administrator: 4530 2095 - Storage Service Severity: Informational, Category: Storage, MessageID: STOR0210, Message: SCSI sense data (Sense key: B Sense code: 4B Sense qualifier: 3) received from Physical Disk 0:1:12 on Controller 0 at Connector 0.

Which disk reports the error changes.
On a totally idle system I wrote a loop to exersise the drives.
Against each drive on the system it executes:

drives=$( lvmdiskscan | grep sd | awk '{printf "%s ",$1}' )

for dx in $drives
do
        echo Exercising $dx | tee -a /var/log/messages
        lsscsi 2>&1 | grep $( echo $dx | cut -b1-8 ) | tee -a /var/log/messages
        dd if=$dx of=/dev/null ***=1024 count=1000000
        echo
done
echo Exercising done | tee -a /var/log/messages

Synopsis: It reads from disks, logs which disk its working on to /var/log/messages. In parallel the OMSA software is logging any storage events to /var/log/messages.

Output looks like this:

Exercising /dev/sda1
[0:0:0:0]    disk    SEAGATE ST91000640SS     AS0B /dev/sda
Feb 5 13:11:13 Server_Administrator: 4530 2095 - Storage Service Severity: Informational, Category: Storage, MessageID: STOR0210, Message: SCSI sense data (Sense key: B Sense code: 4B Sense qualifier: 3) received from Physical Disk 0:1:0 on Controller 0 at Connector 0.
Exercising /dev/sdq1
[0:2:0:0]    disk    DELL     PERC H310        2.12 /dev/sdq
Exercising /dev/sdq2
[0:2:0:0]    disk    DELL     PERC H310        2.12 /dev/sdq
Exercising /dev/sdb1
[0:0:1:0]    disk    SEAGATE ST91000640SS     AS0B /dev/sdb
Exercising /dev/sdc1
[0:0:2:0]    disk    SEAGATE ST91000640SS     AS0B /dev/sdc
Exercising /dev/sdd1
[0:0:3:0]    disk    SEAGATE ST91000640SS     AS0B /dev/sdd
Exercising /dev/sde1
[0:0:4:0]    disk    SEAGATE ST91000640SS     AS0B /dev/sde
Exercising /dev/sdf1
[0:0:5:0]    disk    SEAGATE ST91000640SS     AS0B /dev/sdf
Exercising /dev/sdg1
[0:0:6:0]    disk    SEAGATE ST91000640SS     AS0B /dev/sdg
Exercising /dev/sdh1
[0:0:7:0]    disk    SEAGATE ST91000640SS     AS0B /dev/sdh
Feb 5 13:12:17 Server_Administrator: 4530 2095 - Storage Service Severity: Informational, Category: Storage, MessageID: STOR0210, Message: SCSI sense data (Sense key: B Sense code: 4B Sense qualifier: 3) received from Physical Disk 0:1:7 on Controller 0 at Connector 0.
Feb 5 13:12:21 Server_Administrator: 4530 2095 - Storage Service Severity: Informational, Category: Storage, MessageID: STOR0210, Message: SCSI sense data (Sense key: B Sense code: 4B Sense qualifier: 3) received from Physical Disk 0:1:7 on Controller 0 at Connector 0.
Exercising /dev/sdi1
[0:0:8:0]    disk    SEAGATE ST91000640SS     AS0B /dev/sdi
Feb 5 13:12:31 Server_Administrator: 4530 2095 - Storage Service Severity: Informational, Category: Storage, MessageID: STOR0210, Message: SCSI sense data (Sense key: B Sense code: 4B Sense qualifier: 3) received from Physical Disk 0:1:8 on Controller 0 at Connector 0.
Exercising /dev/sdj1
[0:0:9:0]    disk    SEAGATE ST91000640SS     AS0B /dev/sdj
Exercising /dev/sdk1
[0:0:10:0]   disk    SEAGATE ST91000640SS     AS0B /dev/sdk
Exercising /dev/sdl1
[0:0:11:0]   disk    SEAGATE ST91000640SS     AS0B /dev/sdl
Exercising /dev/sdm1
[0:0:12:0]   disk    SEAGATE ST91000640SS     AS0B /dev/sdm
Exercising /dev/sdn1
[0:0:13:0]   disk    SEAGATE ST91000640SS     AS0B /dev/sdn
Exercising /dev/sdo1
[0:0:14:0]   disk    SEAGATE ST91000640SS     AS0B /dev/sdo
Exercising /dev/sdp1
[0:0:15:0]   disk    SEAGATE ST91000640SS     AS0B /dev/sdp

In this particular run the drive in bays 0,7 and 8 complained. On other runs it may be other drives. It may also be a varying number of drives that flag the issue.

I've updated all the firmware available via the lifecycle controller by pointing it at ftp.dell.com
All the drives are dell certified.
OMSA is at 8.2
Tried downgrading to OMSA 7.4 - but still received same messages.

Sometimes the key qualifier value is 2 other times it is 3.
In the month of January over 17,000 messages were generated to /var/log/messages.
Mixed in with these are also 83 messages in the format:

Storage Service Controller event log: Unexpected sense: Encl PD 20 Path 5e4ae0208df0bb00, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00: Controller 0 (PERC H310 Mini)

The 5/24/0 messages do not occur often and are deemed (per googling) to be informational in nature and not of realy concern. I'm more concerned about the msgs shown above. I have reports when the end users are allowed to use the box that it lags behind all other boxes doing the same work. The other boxes have identical configs and do not generates these messages.

Any thoughts/advice would be much appreciated.

↧

Dell SC 2020 - Failed to re-balance ports

February 15, 2016, 8:01 am

≫ Next: NX4 Error

≪ Previous: H310 Mini MessageID: STOR0210, Message: SCSI sense data (Sense key: B Sense code: 4B Sense qualifier: 3) - problem moves around

Hi ALL

I have some problem

but

I can not find the cause of the problem!!

zoning set

↧

NX4 Error

February 16, 2016, 1:31 am

≫ Next: Dell powercault MD3220i control panel flashing Amber light

≪ Previous: Dell SC 2020 - Failed to re-balance ports

Hi All,

following to power problems, our servers do not have access to the most NX4 storage.

Checking by SSH (on Control Station), by running the command nas_checkup i get this results:

[root@control-station bin]# nas_checkup
Check Version: 5.6.45.5
Check Command: /nas/bin/nas_checkup
Check Log    : /nas/log/checkup-run.160216-093544.log

-------------------------------------Checks-------------------------------------
Control Station: Checking if NBS clients are started....................... Fail
Control Station: Checking if NBS configuration exists...................... Pass
Control Station: Checking if NBS devices are accessible.................... Fail
Control Station: Checking if NBS service is started........................ Fail
Control Station: Checking if NAS partitions are mounted.................... Pass
Data Movers    : Checking status........................................... Pass
--------------------------------------------------------------------------------

One or more error-level checks have failed. Follow the instructions
below to correct the problem and try again.

-------------------------------------Errors-------------------------------------
Control Station: Check if NBS clients are started
Symptom: NBS clients (nd-clnt 5 6) are not started
Action : Contact EMC Customer Service and refer to EMC Knowledgebase
         emc146016. Include this log with your support request.

Control Station: Check if NBS devices are accessible
Symptom: Failed NBS (nd-clnt processes) devices access check

         NOTE: Several checks depend on NBS device access to run. These checks
               were not run.
Action :
         1. This may occur if NBS is not configured correctly or if the NBS
            service is not started. Look in the "Checks" section to see if the
            following checks passed:

            * Control Station: Check if NBS configuration exists
            * Control Station: Check if NBS service is started
            * Control Station: Check if NBS clients are started

         If either of those checks did not pass, follow the instructions for
         that check to correct the problem, then rerun "nas_checkup" to verify
         that the NBS devices can now be accessed.
         2. This may also occur if Data Movers are powered down or pulled out.
            If you are on the primary Control Station, look in the "Checks"
            section to see if the following check passed:

            * Data Movers: Check status

         If this check failed, follow its instructions to correct the
         problem, then rerun "nas_checkup" to verify that the NBS devices can
         now be accessed.
         3. If the problem persists, escalate this issue through your support
            organization. Provide this output and any errors or output that
            occurred running the commands in this procedure in the escalation.

Control Station: Check if NBS service is started
Symptom: NBS (nd-clnt processes) service is not (or not fully) started
Action : Use the command "/sbin/service nbs start" to restart the NBS
         service or reboot the Control Station.

Please can you help us?

I wish i could talk to the customer also paid.

Best Regards

Renato

↧