This is an account of the events that took place which were related to the
problematic WFII EUBLADE commanding on the 213 sms.
Proposal 5569, which is an internal flat calibration proposal, was scheduled on
the 213 sms. To implement this proposal, new commanding was written that would
precondition the shutter blades before the internal exposure to allow the
different reflection patterns of the two shutter blades to be calibrated. A
pmdb change request was needed to populate the QESIPARM 'blade', with the value
A or B, depending on the shutter blade selected. The problems with the
commanding of these exposures was twofold. One problem was inherent in the
commanding itself, and the other was related to the improper implementation of
the pmdb change request.
The instruction EUBLADE was in violation of the OLD item 2.4.6.13 WFPC-II Move
Shutter/Fail-Safe Command transmitting, which basically says that you can't
command the shutter while other WFII scheduled events (i.e. prepare/readout,
infrequent, expose) are taking place, at the risk of science data loss (an
understatement). The instruction was written so that the shutter prepositioning
could take place during the previous exposure's readout. It turns out that if
the previous exposure is done as a parallel, the extra minute allocated for the
readout (a total of 2 minutes) negates the error condition, since the readout
activity has finished before the shutters are commanded to move.
When the shutter commanding executed, the firmware code's logic caused the
subsequent readout to be delayed by one major frame, or 1 minute. This is a
remnant from WFPC-1. In fact, the code keeps track of what activity type is
being interrupted, and sets a flag that delays the next activity of the same
type by one minute. So when the WFII started the readout after the shutter
motion, filter prepping, and exposing, the readout was delayed 1 minute into an
activity which was supposed to take 1:02. Thus, after approximately 2 seconds
of data transmission, a cease command from the NSSC-1 caused the SDF to stop
transfering any more data.
MF MF MF
| | |
|--------------------|
tape activity |==================|
readout
^^
||
|cease command
|
2 seconds overlap
This caused the first status buffer message to be sent at 215:04:28. It was a
CU/SDF error message, code 13(decimal). The result was that the SDF disabled
the WFII SD interface. Since the WFII SD interface was disabled, subsequent
WFII exposures did not get status buffer messages posted after the first one
for the rest of the exposures in that obset, nor did any more WFII science data
come across the SDF.
Around 11:23, the WFII, which had been commanded down to Standby, was commanded
back up to Ready. The upward transition brought the Science Data Interface up.
There was and OKSEND command at the tail end of the tape commanding that was a
fossil from an old HRS test, and this caused a line of latched data in the WFII
to come across. This hosed the SDF, and caused another CU/SDF error message,
code 22.
These status buffer messages were followed by six errors, message # 31,
parameter 30, which are "CU/SDF failed to respond to an ""NSSC-1 has SD to
output"" request", for the exposures taken after the SDF had been hosed.
A first attempt to recover the WFII/SDF interface was made in the window
215:14:00-14:30. This didn't solve the problem because the SDF was hung
already. The actions were as follows:
S,UCLRFLGS(90)
S,SDISIIF(5,1)
A second, different attempt was made during the window 215:17:50-18:03. The
actions were as follows:
S,MAPCTRL(6,2)
S,RESETSDF
S,RSENCODE(1)
S,PNENCODE(1)
S,MSDFMTLD
S,SDFINPUT(1)
S,UCLRFLGS(90)
S,SDSIIF(5,1)
S,MAPCTRL(6,6)
S,MAPCTRL(6,3)
This procedure corrected the problems on the spacecraft. The erroneous
instruction (non-parallel) was scheduled to execute again on day 217, so a
'clean-up' procedure was prepared for the first available window after the WFII
SD interface would hose up. The window was 217:11:05-11:25. The following
actions were taken and the interface was successfully recovered:
S,UCLRFLGS(90)
S,SDISIIF(5,1)
Glenn Schneider and Mike Hinds looked at various data to verify that the
readout was indeed delayed by 1 minute. From the .pkt, we looked at the spacing
between packets, and also verified that only approximately 2 seconds of real
data came across for the first delayed readout. Also the telemetry reporting
(the echo back of the UEXPORT1 and URDUTPT2 commanding in the MF after the
commands were actually executed) was inspected, and the 1 minute delay was
present where we expected it to be.
The other problem with this proposal was a procedural error in SPSS. The change
request was not processed correctly. The EUBLADE QESIPARM population was
requested, but some other entities on the database were deleted by mistake when
this request was implemented. This caused the exposures to take on the default
value of type EXTERNAL when they were supposed to be internal. This propagated
through to commanding by having the shutter AP take control and command the
shutters open for the exposure, when they should have been internals using the
lamps with no shutter motion. But the TDF was down for the first exposures, so
that when the shutters were opened by the AP, they were immediately closed
again. So the data should have been similar to darks for these, had they been
read out properly.
Additional information:
It was determined by looking at the .pkt data that the intra-line R/O in
progress was not affected by the blade motion. This was not predicted by
looking at the firmware code. A 110 ms delay during shutter motion was
expected.
In the WF/PC Flight Microcomputer Controller's Firmware Design Document, the
flow diagram for the Move Shutter state processing subroutine is incomplete
and must be updated. It makes no mention of the delay next similar activity
flags being set.
There was a procedural problem at MOSES in that a Stat Buff error which was
not on the benign list went unreported.
There was a procedural error in commanding in that a first-time execution of
special commanding on an SMS went unreviewed.
OPR #27159 for EUBLADE was opened to bullet-proof it against the same type of
failure occuring again.