2008-07-29

TimeMachine for windows

I love my Mac, and I like TimeMachine as the best new feature (I use) from Leopard.
I also searched for such a solution for Windows (e.g. for my parents or my sisters laptop).
My search has an end, I found rsyncbackup.vbs from heise.de.
I know, it's not a solution, jus a script. but I'm sure, I can adapt it a little bit, so it would be useable for my family (e.g.: in startup-script, check if target-disk is available, etc.)
We will see...

2008-07-25

DBAs little helper

You might know this situation: Your users call and complain, the DB is hanging. While trying to connect (and making plans what to check first, sessions, waits, locks, ...) you do not get any prompt. Ok, there would be a lot of things you get in mind to check, analyze and then fix. But in fact, the users want their connections back, don't care about current open sessions and just ask to reboot the instance.
Even reboot is not that simple in this situation (have I mentioned the missing prompt in sql*plus?). So the next way is to try kill on os-level (which doesn't do anything at the end, as the process seems to wait for something on OS). A much more entertaining way is kill -9. I hope, if I kill a vital kernel process, the others will do the cleanup and close everything the best way they can (similar to a shutdown abort). But even this doesn't happen. The others are hanging around and doing nothing. At the end I have to kill all processes using -9.
There is just one more thing to do: startup. but, unfortunately, at the login there is one thing missing: 'Connecting to an idle instance' - and a 'startup' just hangs.
Why? Because no one did the cleanup! Do you remember the kill -9 of all, also the last background process? Doing so, I did not gave any process the chance to release the semaphore and the shared memory segments. They are just hanging around. Even no one uses them, sqlplus belives because they exists, there is a running system. So it connects to them, sends messages, but noone ever listens. this makes this sqlplus-instance also hanging.
How can i do the cleanup manually?
The old way, back in those days, when Versions did not contain characters like i or g, is to search all other instances for their ressources using oradebug ipc, and afterwards removing them with ipcs. If anything went wrong - another DB crashed very beautiful and interresting.
The new way, introduced in this millenium, is sysresv, which you can find in $ORACLE_HOME/bin. With this, you can show the semaphor and shared memory segments used by your current os-user, $ORACLE_HOME and $ORACLE_SID (or for multiple SIDs). It can also release the ressources for a given SID. This makes live much easier and more secure in situations like shown above.
I doesn't know why Oracle doesn't mention this binary in its documentation.

2008-07-18

Oracle Performance books

I got 2 new books:
Cost-Based Oracle Fundamentals by Jonathan Lewis (ISBN:1-59059-636-6)
and
Troubleshooting Oracle Performance by Christian Antognini (ISBN:1-59059-917-9)
Both are worth reading.
I will try to read and understand both of them.
Maybe I will reproduce some of their tests and will try to do my own, and maybe maybe I will post the results here ;-)

2008-07-17

interpreting CDP packages (the ugly way)

today I had to interpret CDP packages on our network.
I'm no shell-coder, but sometimes I'm curious and in such situations this script will come out.
In this particular case its output just gives the Switchname, Switchport, VLAN and Duplex settings, but everyone who wants can do a lot more (and better).
the script itselve:
#!/usr/bin/bash

snoopfile="/tmp/snoopy$$.bin"
snoopline="/tmp/snoopy$$.line"
snoop -d $1 -c 1 -vv  -o $snoopfile 'dst 01:00:0c:cc:cc:cc and length > 50'
snoop -i $snoopfile  -x 26 | nawk -F: ' { print $2 } ' | \
cut -b1-41|  sed -e 's/ //g' | nawk 'BEGIN { ORS="" } { print $1 } ' | \
tr [a-z] [A-Z] > $snoopline
instr=`cat $snoopline`
while  [ $instr ]
do
typ=`echo $instr | cut -b1-4`
lhex=`echo $instr | cut -b5-8`
length=$(echo "ibase=16; $lhex*2" | bc)
next=$(echo "ibase=16; $lhex*2+1" | bc)
if [ $length -gt 8 ]
then
texthex=`echo $instr | cut -b9-$length`
else
texthex=""
fi
#  echo "$typ $lhex $texthex"
if [ $typ == "0001" ]
then
printf "Switchname: "
while  [ $texthex ]
do
charhex=`echo $texthex | cut -b1-2`
chardec=$(echo "ibase=16; $charhex" | bc)
printf "%b" `printf '\x%x' $chardec 2>/dev/null`
texthex=`echo $texthex | cut -b3-`
done
echo " "
fi
if [ $typ == "0003" ]
then
printf "Switchport: "
while  [ $texthex ]
do
charhex=`echo $texthex | cut -b1-2`
chardec=$(echo "ibase=16; $charhex" | bc)
printf "%b" `printf '\x%x' $chardec 2>/dev/null`
texthex=`echo $texthex | cut -b3-`
done
echo " "
fi
if [ $typ == "000A" ]
then
echo "VLAN: 0x$texthex $(echo "ibase=16; $texthex" | bc)"
fi
if [ $typ == "000B" ]
then
echo "Duplex: $texthex"
fi
instr=`echo $instr | cut -b$next-`
done
rm $snoopfile $snoopline
Sorry for the line-breaks - you will have to reformat it a little bit (cut, paste & think).

OCM with or without proxy settings

I tried to install and customize OCM version 10.3.

Installation was quite easy on our development system.
Just as I tried to create a response-file (using emocmrsp) for our test and production-systems, I noticed I cannot add proxy-settings to this.

I have to notice the ability to connect directly to *.oracle.com from this development server.

After some searching I created SR:6969566.993 and got the answer:
Starting with OCM 10.3 setupCCR , configCCR and emocmrsp commands does not include a parameter to specify a proxy server and port. However, If the systems these commands run on does not have direct Internet access then you will be automatically prompted to enter the proxy information. So in case you run emocmrsp on a system that does not have direct Internet access then you will be prompted to enter the proxy information and that will be recorded in the responce file also. If now there is no Internet access then you can enter the word NONE when prompted for a proxy and that will instrall it in disconnected mode.

This is a little bit boring, as I will not run a configuration tool on a Prod-Env, but cannot create a working response-file on my development-node.

I created an Enhancement Request BUG:7258715 enable emocmrsp to add proxy-settings to response-file in any case

2008-07-15

Shadow process leak on ASM instance when diskspace exhausted (ORA-20)

Once again one of 'my' bugs:
Facts:
  • 10.2.0.3
  • HP-UX
  • archive_log_dest in ASM
  • ASM-Diksgroup full
Symptoms:
  • ORA-20
  • archivelogs doesn't get archived, even the whole backup-subsystem works properly.
Explanation:
If the DG is full he database will try to connect to ASM for every retry of archiver to create archivelog. Even after the attempt to create archivelog fails, this process is not closed.
This will eat up all 'processes' in ASM, so also the DB-process which tries to backup (and delete) the archivelogs can not connect to ASM.
Even if you freed up space in the ASM-DG in the meantime, this problem still persists.

WorkAround:
find OS-process of archiver (ps -ef | grep arc | grep $ORACLE_SID) and kill this process. this will also release all its childs (which holds processes in ASM) and solve the problem.

Reproduce:
  1. fill up your diskGroup
  2. force archiver to respawn a lot of processes (alter system archive log current;) or wait at least for the factor of the REOPEN parameter of your log_archive_dest_x multiplied with PROCESSES from your ASM.
  3. check if there are no more ASM-processes available (e.g. a select name, free_mb from v$asm_diskgroup; will return no rows selected even there are diskgroups available and in use!)
  4. free some space in the DiskGroup
  5. check again if there are ASM-processes available
  6. kill archiver
  7. check again

The good newas after all: there is no corruption, just a ugly hang situation.