Skip to content
November 12, 2011 / Alex Nedoboi

Tiny Tips: SSD and Linux. Enable TRIM and check if it works.

What is TRIM?

On hard disks, with spinning magnetic platters, you don’t care if a particular sector is being used by the operating system or not. If you want to use it again, just ask the controller to write into it, and the controller will just overwrite whatever was in that sector.

With SSD, if the sector already contains some information and you want to write into it, the old data needs to be deleted (zeroed out) first, and then the new data written. Zeroing out the old data takes considerable time, so it would make sense if we could tell the controller that the sector is no longer being used, allowing the controller to zero out the sector data in the background, and then when we need to use the sector again, it will be ready to be written into immediately, saving us a lot of time.

This is what the TRIM command does, it tells the controller that the sector is no longer used and can be zeroed out.

For the TRIM command to work, we need ext4 and the kernel to be 2.6.33 or higher. In this example we use Fedora 15.

Install Fedora 15 x86_64

If you don’t have it installed yet, or want to make sure that it’s installed correctly for the TRIM command to work, here is a very brief guide.

Installing Fedora is very much a matter of clicking the Next button.

The only important step where you would not click the Next button straight away, is the storage step.

Select the Create Custom Layout option. The reason we are not going with the default here is because we can’t use LVM with SSD.

Create two standard partitions, one swap and one ext4 (mounted as /). Don’t use ext2 or ext3 as they don’t support SSD either, particularly the TRIM command.

Enable TRIM

In default Fedora installation TRIM is not enabled by default. So let’s enable it.

As root, edit the /etc/fstab file and add the “discard” option to the root line.

UUID=... / ext4 discard 1 1

Restard the machine. Check if / has been mounted with “discard”.

[root@f15] mount
...
/dev/sda1 on / type ext4 (discard,rw)
...

Looks good. It is now time to test the TRIM functionality.

Check if TRIM works

We are going to create a file, check the sector data, then delete the file, and see if the sector data has been zeroed out.

We will be using the hdparm tool, so the following steps need to be done as root.

First, create a file.

[root@f15] for i in {10001..10200}; do echo "$i test line" >> testfile.txt; done

This will create a file with 200 lines, about 4k in size. Now make sure it is flushed out to the disk.

[root@f15] sync

Now let’s find out where the file is physically stored on the disk.

[root@f15] hdparm --fibmap testfile.txt

filesystem blocksize 4096, begins at LBA 0; assuming 512 byte sectors.
byte_offset  begin_LBA    end_LBA    sectors
          0   14927624   14927631          8

You can see the file starts at sector 14927624 and spans 8 sectors. Let’s check what’s in there.

[root@f15] hdparm --read-sector 14927624 /dev/sda

3031 3030 2031 6574 7473 6c20 6e69 0a65
3031 3030 2032 6574 7473 6c20 6e69 0a65
3031 3030 2033 6574 7473 6c20 6e69 0a65
...
3031 3330 2032 6574 7473 6c20 6e69 0a65

That’s our sector. “3031 3030 2031 6574 7473 6c20 6e69 0a65″ is a hex representation of “10000 test line”.

Now let’s delete the file. Make sure the change is flushed to the disk.

[root@f15] rm testfile.txt
[root@f15] sync

We are now expecting the sector to contain nothing but zeroes.

[root@f15] hdparm --read-sector 14927624 /dev/sda

0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
...
0000 0000 0000 0000 0000 0000 0000 0000

Perfect, the TRIM command works in Fedora 15.

Recent releases of most popular distributions (Ubuntu, Mint, Sabayon, etc) should also be fine.

September 8, 2011 / Alex Nedoboi

Install Oracle Express XE 11gR2 on Oracle Linux 6.1 x64. (No bitmap index conversion on the fly!)

Some time ago, if you said you had installed Linux and, heavens forbid, Oracle on Linux, you’d be considered an utter geek.

Fast forward 15 year. These days, when people (who have experience with other databases and operating systems) ask me about Oracle/Linux, I feel somewhat embarrassed.

Let me illustrate what I mean.

Install Oracle Linux 6

After downloading it from edelivery.oracle.com/linux (version 6.1 x86_64 in this example), burn the .iso (or just use the .iso itself in case of virtual machine), and power on the box.

Wait a minute while it boot ups. Skip the image check. Once you see the Next button in the bottom right corner, click it.

Click Next again.

Click Next again.

Click Next again.

Click Next again.

Click Yes (to discard the data on the disk).

Name your machine. For example, o61.localdomain, then configure the network if you want to. Using DHCP might seem trendy, but if you prefer knowing the IP, set it here.

Click Next. (Select your location if you want to, or pretend you’re in New York.)

Click Next.

Click Next. (Partition the disk(s) if you want to. You can keep the default layout with LVM, or if you see no point, just two partitions should do – root/ext4 and swap.)

Click Next again.

Click Next again. (No need to tick “Database Server”, the Basic one will do just fine.)

Click Next, this is the last time. Watch Linux install or go put a kettle on.

Done. Yep, done, just 15 or so minutes.

So, the strategy. If you see Next, and are happy with the default setting, click it.

Install Oracle XE 11g R2

At this point you might think – installing Linux was way too easy, you can’t beat that.

Well, let’s see if installing Oracle is even easier.

Download the .zip from www.oracle.com, unzip it, and install it -

[root@o61 Disk1] rpm -ivh oracle-xe-11.2.0-1.0.x86_64.rpm

Then configure it -

[root@o61 Disk1] /etc/init.d/oracle-xe configure

Wow, that was tricky, clicking Enter five times (except the password step).

All done. In less than half hour.

Time to play

Allow connections from outside the machine, as SYSTEM run -

SQL> exec dbms_xdb.setListenerLocalAccess(false)

Also, make sure the firewall (iptables) allows connections on 1521 and 8080, or disable iptables altogether.

Let’s check the management page, go to http://server_address:8080/apex (log in as SYSTEM and password from the “oracle-xe configure” step).

Looks nice.

xe-1

Click the Storage tab. You can see SYSAUX/UNDO/TEMP are (kind of) unlimited, and USERS is limited to 11G (suppose Oracle intended the pun).

xe-2

What optimizer parameters do we get in 11gR2 XE?

xe-3

No bitmap indexes on the fly

Let’s have some fun.

SQL> create table people (id number(10), first_name varchar2(100), last_name varchar2(100));

Table created.

SQL> insert into people 
  2  select level, chr(ascii('A') + mod(level*11,13)), chr(ascii('A') + mod(level*19,23))
  3  from dual
  4  connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> create index i_first on people(first_name);

Index created.

SQL> create index i_last on people(last_name);

Index created.

And then -

SQL> set autotrace on

SQL> set linesize 120

SQL> select count(1) from people where first_name = 'J' and last_name = 'S';

  COUNT(1)
----------
        33

Execution Plan
----------------------------------------------------------------------------------------
| Id  | Operation           | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                  |     1 |     4 |      4(25) | 00:00:01 |
|   1 |  SORT AGGREGATE     |                  |     1 |     4 |            |          |
|*  2 |   VIEW              | index$_join$_001 |    33 |   132 |      4(25) | 00:00:01 |
|*  3 |    HASH JOIN        |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN| I_LAST           |    33 |   132 |      1 (0) | 00:00:01 |
|*  5 |     INDEX RANGE SCAN| I_FIRST          |    33 |   132 |      2 (0) | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("LAST_NAME"='S' AND "FIRST_NAME"='J')
   3 - access(ROWID=ROWID)
   4 - access("LAST_NAME"='S')
   5 - access("FIRST_NAME"='J')

Statistics
----------------------------------------------------------
  0  recursive calls
  0  db block gets
 21  consistent gets
  0  physical reads
  0  redo size
526  bytes sent via SQL*Net to client
519  bytes received via SQL*Net from client
  2  SQL*Net roundtrips to/from client
  0  sorts (memory)
  0  sorts (disk)
  1  rows processed

SQL>

What do we see here, on Express Edition?

Hash join (range scan, range scan), and 21 consistent gets.

What would we see on Enterprise Edition?

------------------------------------------------------
| Id  | Operation                         | Name     |
------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |
|   1 |  SORT AGGREGATE                   |          |
|   2 |   BITMAP CONVERSION COUNT         |          |
|   3 |    BITMAP AND                     |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS |          |
|*  5 |      INDEX RANGE SCAN             | I_LAST   |
|   6 |     BITMAP CONVERSION FROM ROWIDS |          |
|*  7 |      INDEX RANGE SCAN             | I_FIRST  |
------------------------------------------------------

Bitmap “and” (bitmap conversion(range scan), bitmap conversion(range scan)). And 5 gets.

Yes, this bit of functionality didn’t make it into XE.

But when you have free Oracle 11gR2 on free Linux, all installed and running in about 30 minutes, can you complain?

September 5, 2011 / Alex Nedoboi

Tiny tips: Use Gateways to connect Oracle database to Microsoft SQL Server

When you have two Oracle databases and want to exchange data between them, it is easily done via database links.

So for your local database you would go select * from local_table and for the remote one select * from remote_table@link_to_remote_db

But what if the remote database isn’t Oracle, what if it’s (for example) Microsoft SQL Server?

Well, almost no difference. You can create a database link to MSSQL just as easily.

For this, we will use Oracle Gateways – a piece of software that translates Oracle calls to MSSQL and vice versa.

Install Gateways

First we need to install it of course. You can download Oracle Gateways from Oracle website. As usual, unzip it and execute runInstaller.sh

The installation process is mostly a matter of clicking the Next button. At the last step you will be asked the details of your MSSQL Server.

In this scenario, our MSSQL Server is at machine called vb-m8r2, instance name INSTNAME, and database name DBNAME. Port 1433, default.

For testing purposes, I created a table called PEOPLE (id, first_name, last_name), and inserted two lines into it.

MSSQL 1

After Gateways installation is complete, check the config file – <gateways_home>/dg4msql/admin/initdg4msql.ora

It should have the line HS_FDS_CONNECT_INFO=vb-m8r2:1433//DBNAME, to avoid confusion, change it to …=vb-m8r2/INSTNAME/DBNAME

Make sure you remember the underlined bit – dg4msql – this is the SID we will be using for the listener.

Configure the gateway

Assuming your lister is called LISTENER, add the following entry to listener.ora

SID_LIST_LISTENER =
  (
    SID_LIST =
      (SID_DESC =
        (SID_NAME = dg4msql)
        (ORACLE_HOME = <gateways_home>)
        (ENVS = LD_LIBRARY_PATH=<gateways_home>/dg4msql/driver/lib:<gateways_home>/lib)
        (PROGRAM = <gateways_home>/bin/dg4msql)
      )
  )

Restart the listener.

If you specified the paths incorrectly, you will get an error – TNS-01201: Listener cannot find executable.

Otherwise, you should see the following lines -

Services Summary...
Service "dg4msql" has 1 instance(s).
  Instance "dg4msql", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

Congratulations – you have configured the gateway.

Configure the database

First, add the following to tnsnames.ora

dg4msql =
(
  DESCRIPTION =
    (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
    (CONNECT_DATA = (SID = dg4msql))
    (HS = OK)
)

Try to tnsping it.

Second, create the database link.

create public database link micro 
   connect to "ms_username" 
   identified by "ms_password" 
   using 'dg4msql'
;

Test it

Eventually, time for testing.

SQL> select * from people@micro;

id first_name last_name
-- ---------- ---------
 1 John       Smith
 2 Jean       Cook

The other way around -

SQL> insert into people@micro values (3,'A','B');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from people@micro;

id first_name last_name
-- ---------- ---------
 1 John       Smith
 2 Jean       Cook
 3 A          B

Now check on the MSSQL side -

MSSQL 2

Troubleshooting

If instead of this you’re getting an error, something like -

ORA-28545: error diagnosed by Net8 when connecting to an agent
Unable to retrieve text of NETWORK/NCR message 65535
ORA-02063: preceding 2 lines from MICRO

It means you’ve misconfigured something, make sure there are no typos in initdg4msql.ora, listener.ora, tnsnames.ora.

Then make sure you can telnet from your Oracle box to vb-m8r2 on 1433, the listener starts, and you can tsnping dg4msql.

May 29, 2011 / Alex Nedoboi

1Z1-058 (1Z0-058 beta) 11g R2 RAC Expert. Answers.

Last week’s article on 1Z0-058 questions was quite popular, more popular than I expected.

So if you enjoyed the questions, now it’s time to get some answers!

(1)
Kevin is a database administrator at Fictional Real Estate Co. All he hears everyday is location, location, location. He is asked by one of the Sales team members about Oracle Real Application Clusters. What should he reply to his colleague’s “what is it all about”?

If a painter can paint a wall in an hour, two painters are likely to finish the job in 30 minutes, and it would probably take three painters about 20 minutes. So, in theory, the performance does increase. However, clusters are not about performance, the performance of every individual node doesn’t change just because it’s part of a cluster. As a matter of fact, performance can decrease. Imagine that the painters in the above example have to share their tools. They will have to wait for each other, and even incur some overheads handing the tools over.

Same with RAC, if there is a sequence or an index block contention, the whole cluster is no better (and sometimes even worse) than a single node. That’s why cluster-specific performance tuning is very important in a RAC environment.

Redundancy. Every single part in RAC should not be a single point of failure. The disks are mirrored, the storage and the machines have two (or more) interfaces between them, virtual IPs, SCANs, bonded network interfaces, etc.

Some time ago I witnessed a rather spectacular failure of an Oracle cluster. Due to some environmental factors (no, not global warming), one of the disks failed. Not a big deal, it was mirrored. The controller started to rebuild the failed disk, and also failed under the load (don’t forget the environmental factor). Not a big deal either, there were two controllers. Unfortunately, the failed controller managed to ship corrupted data to one of the nodes, so the node decided to shut down. Again, not a big deal, there was another node.

After all those events, the cluster still was fully functional, with just one disk, one controller and one node left. All thanks to redundancy.

But what if the entire cluster fails? Read on!

Availability. Real Application Clusters are one of the many High Availability (HA) solutions offered by Oracle. Together with replication (whether it’s Data Guard, or Streams, or Golden Gate) they form Maximum Availability Architecture (MAA). This means that even if the entire cluster fails, there is another cluster, with an exact copy of the data, replicated by Data Guard or a similar solution.

Scalability is another big thing about clusters. Let’s go back to our painters. If we want to paint more walls, we can just use more painters. Provided they don’t share tools, the job will be done faster. With just one painter (single instance database) all you could do is train the guy to work faster (upgrade server hardware) but there’s always a limit to that (not to mention that just marginally faster hardware can cost double).

Clusters waste no resources. All those redundant parts are being constantly used, they don’t wait for the other part to fail. If you have three painters, all of them will be doing the job, not sitting idle waiting for their fellow painters to get a cold.

By the way, the painters can share the paint, which in the RAC world is represented by shared data storage.

(2)
Jenson is a database developer at Fictional Cluster Masters Co. He wants to pull a practical joke on the company’s DBA, so the DBA would think someone is performing a DDoS attack on the cluster. What should Jenson do in his application?

As mentioned above, eliminating all possible contention points is a must in RAC environments. A shared sequence with no cache is a massive contention point. If you wonder how I came up with this question, well, I didn’t. It is based on my experience, as I saw a RAC brought down to its knees by hundreds of sessions queuing up for a sequence.

(3)
May is a junior DBA at Fictional Calendars Co. She wants to start some RAC instances and stop some other RAC instances. What tools should she use from the list below?

Some people think that you can’t use the old friend of yours, sqlplus, to start up or shutdown instances. Not true. The only difference is that you can only start up or shut down the instance you’re connected to, not the whole cluster. For the cluster wide operations, use srvctl. No cluster-specific parameters (mode=rac, -rac, foo=bar) needed.

Nothing wrong with graphical tools either, as long as you know how to do things when graphical tools are not available.

(4)
April is a junior DBA at Fictional Calendars Co. She asks the company’s senior DBA August what the number of UNDO and TEMP tablespaces and REDO log groups should be in a RAC environment. If August knows what he’s talking about, what will he answer?

There is no difference for TEMP in a RAC environment. You can have just one, or more than one for different users if you wish.

UNDO, on the other hand, is not shared, so every instance has its own UNDO.

REDO is also owned by every individual instance and it’s recommended that you have at least two groups per instance.

(5)
Mark is a database administrator at Fictional Storage Specialists Co. He has built a 10g R2 RAC on Linux 4 before, using raw devices for voting disk and cluster repository storage, and ASM for data files. He now wants to build an 11g R2 RAC on Linux 5. What are his options when it comes to storage?

In 10g you would need to set up raw devices or use OCFS (Oracle Clustered File System). In Linux 5 raw devices were phased out (though could still be used), but with 11g R1 you could use block devices to store voting disk and OCR, so no need for raw.

Until 11g R2 RAC, you couldn’t use ASM for voting disk or OCR file because of the catch 22. The clusterware needed voting/OCR to start up, but ASM only started after the clusterware.

In 11g R2 RAC however, the ASM bit and the clusterware are now parts of a bigger beast, called Grid Infrastructure, all-in-one so to speak. You can now place everything on ASM, just tick “Use ASM” during installation.

Alternatively, you can place everything on NFS, as long as all the nodes have access to the share and the mount is the same.

(6)
Matthew is a system administrator at Fictional Storage Specialists Co. He is going to install Linux OS as part of task to build an 11g R2 RAC. What packages and/or libraries he may need, depending on the storage options?

To use ASM, you would need three RPMs installed before starting GI installer. They are oracleasmlib, oracleasm-support, oracleasm. Some of them can be found on the installation media if you use Oracle Linux, and some are online. If you use RHEL or CentOS or any other non-Oracle flavour, just download all three for the respective kernel.

NFS doesn’t need any specific packages, just mount the shares on all nodes before GI installation. Make sure that the paths are the same on every node, and that the operating system user and group IDs for the “oracle” user and the “oinstall” groups are the same on all machines and storage devices.

Optionally you can use Oracle Direct NFS, this is recommended for performance reasons. The library is called libodm11.so and it’s only a matter of creating a symbolic link and restarting the instance. The performance is gained by bypassing the operating system cache which releases the memory and speeds up the I/O.

(7)
Dorothy has more than 20 year experience administering single instance Oracle databases. She has developed an extensive library of shell scripts over all those years. She now wants to modify the scripts to work in an Oracle RAC environment. What views/columns would she use to determine instance IDs?

This one is trivial. The V$ views are still there, but they only reflect the instance you’re connected to. For cluster wide data use GV$, for example GV$SGA_RESIZE_OPS, and the instance ID will be in the inst_id column.

By the way, the GV$ views are also available in single instance databases. So Dorothy wouldn’t need two sets of scripts, she should be able to use the modified versions on both single instance and RAC databases.

(8)
Tom is a network administrator and Dick is a system administrator at Fictional Links And Nets Co. They are working with the company’s DBA Harry building an 11g R2 RAC environment. They are reasonably confused about the number of network interfaces the DBA needs for the cluster, and ask for an explanation. What should Harry tell the pair of them?

If one of the painters needs a brush that is being used by another painter, they would exchange the brush directly, hand to hand, without the need of putting the brush back into the toolbox and picking it up from the toolbox. This is called toolbox cache fusion. If an instance needs some data block that is used by another instance, it will be shipped via the private interconnect.

(9)
Lily and Daisy are junior DBAs at Fictional Bad Blocks Co. They have heard that their senior DBA Ash had performed an RMAN recovery in a 11g R2 RAC environment. They are very keen to learn if there are any differences to single instance recovery. If Ash knows what he’s talking about, what will he tell them?

Using RMAN isn’t much different in a RAC environment. Recovery catalogs are optional. No need to copy the backups/logs to the local disks.

Only the node that you’re running RMAN on needs the access, the other node(s) can be down for all you know.

(10)
John is a database administrator at Fictional Just On Time Co. He needs to schedule various jobs against the company’s 11g R2 RAC database, and some of those jobs must run on the first node only to meet a legacy application’s requirements. What are the options?

This one is even more trivial. Perhaps it doesn’t need any explanation at all.

The reason for having this question was to actually show that in a RAC environment there is no guarantee which node will run the job.

If you want to make sure your job runs on a particular node, set its affinity, or connect specifically to that node. Just note that if the node is down, the job will not run.

(11)
Mario is a database administrator at Fictional Dash Nine Co. He needs to terminate a user’s session and issues an ALTER SYSTEM KILL SESSION ’123,456′ command, but receives an error that the session doesn’t exist. He realizes that he is connected to the second node and the session is running on the third node of the cluster. What is the easiest way of achieving the desired outcome?

Hey, session, where a-a-a-are you-u-u-u? I am at three-e-e-e-e.

Connecting to the third node would also work. If you can entertain yourself by jumping from node to node, why not. All you need is create another entry in your tnsnames.ora, adding the following line

...
(INSTANCE_NAME = foobar3)
...

right after

...
(SERVICE_NAME = foobar.domain.wherever)
...

(12)
Summer is a database administrator at Fictional Patience Co. The company has several 11g R2 RAC databases, some are one-node, some are two-node, some three-node, and some four-node. Summer is going to review cluster-specific waiting events for each database. Which of the following statements are true?

When painter X needs a brush, he says – hey, give me that brush. That’s the first step. Painter Y then gives him the brush. The interaction was two-step, wasn’t it. In Oracle RAC world, it is called two-way brush block request.

Now painter Z asks painter Y. Painter Y doesn’t have the brush any more, but tells painter X to give the brush to Z. And that was three way.

Depending on how often every painter uses the brush, they would carry the Brush Master title. If someone needs the brush, they would ask the Brush Master, and if the Brush Master doesn’t have the brush at the time of the request, he would ask to hand it over to whoever requested it on their behalf.

Painters? Really?

Let me know if you think you’ve had enough of those painters. Tell me, we all are IT professionals, “cache fusion” sounds so much better than “toolbox with paintbrushes”. Or not?

Please note that all these questions are in no way an indication of the actual exam questions. Names and company names are fictional. This line is italic. Some things may or may not be obvious.

May 23, 2011 / Alex Nedoboi

1Z1-058 (1Z0-058 beta) 11g R2 RAC Administration Expert

If you get emails from Oracle University, you might have received a notification that 1Z1-058 (1Z0-058 beta) “Oracle 11g R2 RAC Administration” exam is now available. It earns you an 11g R2 RAC Oracle Certified Expert (OCE), but you need to be an 11g DBA Oracle Certified Professional (OCP) first to get that.

If you looked at my Bio page, you probably saw all the certifications I hold. While being certified is nice, I must admit that I enjoy the process more than the end result. The best thing about Oracle exams is that you -

a. get to know what Oracle expects of you to know to be considered an expert in that particular area, and
b. get to learn things when you study for the exam. Yes, I believe that even if you have a solid experience, there is always something to learn.

I started using Oracle RAC from version 10g R2 (around 2006 IIRC) and have never looked back. However, most of the fellow DBAs I know didn’t use Oracle clusters, one reason or another.

A couple of years ago we (the company I’m consulting) started our move to 11g, and a few months ago the move was completed and we are now on 11g R2 RAC.

11g R2 RAC is brilliant!

And as for my fellow DBAs, I am now seeing a significant change in their attitudes towards Oracle Real Application Clusters. Some of them are starting to realize all the benefits of RAC, and at the same time 11g R2 (“R2” being key here) is a substantial move forward, making it very easy to switch to RAC and focus on what’s really important.

Therefore, I believe the new 1Z0-058 exam should be very popular when more and more people and companies switch from single instances to RAC environments.

The exam consists of two parts, similar to the 1Z0-048 exam on 10g R2 RAC, namely the clusterware administration part, and RAC database administration.

One of the main differences to the previous version is that the clusterware part is now Grid Infrastructure (GI), and since GI now includes ASM, the new exam pays significantly more attention to ASM. Ironically, an 11g R2 RAC can be built on NFS (very easy and cost efficient) without any need for ASM. Of course ASM should be used in production environment as it provides countless benefits (technically speaking they can be counted) over NFS.

Anyway, let’s get to the point. If you are interested in switching from single instance to RAC, or if you have already been working with Oracle RAC for some time but do share my beliefs that exams are a great reference for studying and filling the gaps in your knowledge (however small those gaps can be), below are ten questions that should get you going.

Questions

(1)
Kevin is a database administrator at Fictional Real Estate Co. All he hears everyday is location, location, location. He is asked by one of the Sales team members about Oracle Real Application Clusters. What should he reply to his colleague’s “what is it all about”?

A. Performance, performance, performance.
B. Availability, availability, availability.
C. Redundancy, redundancy, redundancy.
D. Scalability, scalability, scalability.
E. Efficient utilization of all available resources.

(2)
Jenson is a database developer at Fictional Cluster Masters Co. He wants to pull a practical joke on the company’s DBA, so the DBA would think someone is performing a DDoS attack on the cluster. What should Jenson do in his application?

A. Only use physical IP addresses to connect to the database.
B. Only use virtual IP addresses, but not SCAN, to connect to the database.
C. Use as many sequences as possible (ideally a dedicated sequence for every ID field), with large caches.
D. Use as few sequences as possible (ideally just one sequence shared by all ID fields), with no cache.

(3)
May is a junior DBA at Fictional Calendars Co. She wants to start some RAC instances and stop some other RAC instances. What tools should she use from the list below?

A. sqlplus
B. sqlplus –mode=rac
C. srvctl
D. srvctl -rac
E. Grid Control (if available) or Enterprise Manager
F. Grid Control (if available) but not Enterprise Manager

(4)
April is a junior DBA at Fictional Calendars Co. She asks the company’s senior DBA August what the number of UNDO and TEMP tablespaces and REDO log groups should be in a RAC environment. If August knows what he’s talking about, what will he answer?

A. 1 UNDO per instance, 1 TEMP per instance, 1 REDO per instance
B. 1 UNDO per cluster, 1 TEMP per cluster, 1 REDO per cluster
C. 2 or more UNDO per instance, 2 or more TEMP per cluster, 2 or more REDO per instance
D. 1 or more UNDO per instance, 1 or more TEMP per instance, 1 or more REDO per cluster
E. 1 UNDO per instance, 1 or more TEMP per cluster, 2 or more REDO per instance
F. 2 or more UNDO per cluster, 2 or more TEMP per cluster, 1 or more REDO per instance

(5)
Mark is a database administrator at Fictional Storage Specialists Co. He has built a 10g R2 RAC on Linux 4 before, using raw devices for voting disk and cluster repository storage, and ASM for data files. He now wants to build an 11g R2 RAC on Linux 5. What are his options when it comes to storage?

A. Raw devices for voting/OCR, ASM for data
B. Block devices for voting/OCR, NFS for data
C. OCFS2 for voting/OCR, NFS for data
D. NFS for voting/OCR, ASM for data
E. ASM for everything
F. NFS for everything

(6)
Matthew is a system administrator at Fictional Storage Specialists Co. He is going to install Linux OS as part of task to build an 11g R2 RAC. What packages and/or libraries he may need, depending on the storage options?

A. oracleasmlib, oracleasm-support, oracleasm for ASM
B. oraclenfslib, oraclenfs-support, oraclenfs for NFS
C. libasm11, libaio, sysstat for ASM
D. libnfs11, libaio, sysstat for NFS
E. libodm11 for ASM (optional)
F. libodm11 for NFS (optional)

(7)
Dorothy has more than 20 year experience administering single instance Oracle databases. She has developed an extensive library of shell scripts over all those years. She now wants to modify the scripts to work in an Oracle RAC environment. What views/columns would she use to determine instance IDs?

A. instance_id in V$ views
B. instance_id in V_$ views
C. inst_id in V$ views
D. instance_id in GV$ views
E. inst_id in V_$ views
F. inst_id in GV$ views

(8)
Tom is a network administrator and Dick is a system administrator at Fictional Links And Nets Co. They are working with the company’s DBA Harry building an 11g R2 RAC environment. They are reasonably confused about the number of network interfaces the DBA needs for the cluster, and ask for an explanation. What should Harry tell the pair of them?

A. Public interfaces are for client connections, and private interfaces are for administrator connections.
B. Public interfaces are for physical IPs, and private interfaces are for virtual IPs and SCAN.
C. Public interfaces are for client connections, and private interfaces are for cache fusion.
D. Public interfaces are for cluster-wide load balanced requests, and private interfaces are for instance-specific requests.

(9)
Lily and Daisy are junior DBAs at Fictional Bad Blocks Co. They have heard that their senior DBA Ash had performed an RMAN recovery in a 11g R2 RAC environment. They are very keen to learn if there are any differences to single instance recovery. If Ash knows what he’s talking about, what will he tell them?

A. Recovery catalogs must be used in RAC environments.
B. Recovery catalogs can but don’t have to be used in RAC environments.
C. Only the node that performs the recovery must have access to all necessary backups and archive logs.
D. All nodes must have access to all necessary backups and archive logs.
E. All the necessary backups and archive logs must be copied to the local disk(s) of the node that performs the recovery.
F. All the necessary backups and archive logs may reside either on the local disk(s) of the node that performs the recovery or in the Fast Recovery Area or in an NFS directory, as long as the node has access.

(10)
John is a database administrator at Fictional Just On Time Co. He needs to schedule various jobs against the company’s 11g R2 RAC database, and some of those jobs must run on the first node only to meet a legacy application’s requirements. What are the options?

A. Set job instance affinity with DBMS_SCHEDULER.
B. Set job node affinity with DBMS_JOB.
C. Use operating system’s commands to run legacy jobs on the first node only, connecting as “/ as sysdba”.
D. Use operating system’s commands to run legacy jobs on any node, connecting to the instance running on the first node.

Actually, let’s make it a dozen

(11)
Mario is a database administrator at Fictional Dash Nine Co. He needs to terminate a user’s session and issues an ALTER SYSTEM KILL SESSION ’123,456′ command, but receives an error that the session doesn’t exist. He realizes that he is connected to the second node and the session is running on the third node of the cluster. What is the easiest way of achieving the desired outcome?

A. Connect to the third node and re-issue the command.
B. Issue ALTER SYSTEM KILL SESSION ’123,456′ AUTO;
C. Stay on the second node and modify the parameters to ’123,456,3′
D. Stay on the second node and modify the parameters to ’123,456,@3′
E. Stay on the second node and modify the parameters to ’123,456,#3′

(12)
Summer is a database administrator at Fictional Patience Co. The company has several 11g R2 RAC databases, some are one-node, some are two-node, some three-node, and some four-node. Summer is going to review cluster-specific waiting events for each database. Which of the following statements are true?

A. One-way block request waits normally occur on one-node clusters.
B. One-way block request waits normally occur on one-or-more-node clusters.
C. Two-way block request waits normally occur on two-node clusters.
D. Two-way block request waits normally occur on two-or-more-node clusters.
E. Three-way block request waits normally occur on three-node clusters.
F. Three-way block request waits normally occur on three-or-more-node clusters.
G. Four-way block request waits normally occur on four-node clusters.
H. Four-way block request waits normally occur on four-or-more-node clusters.

Please note that all these questions are in no way an indication of the actual exam questions. Names and company names are fictional. This line is italic. Some things may or may not be obvious.

February 19, 2011 / Alex Nedoboi

Tiny tips: Mount linux distribution ISO and create local repository for yum

So you have this server that was built a few years ago, and you need to install a couple of rpms for whatever reason. Unfortunately the distro disk has been used as a coaster and cannot fulfill its original role. And then you think – why didn’t I create an ISO image on the local drive.

If you did, however, create an ISO image, it’s now easy to mount it and even create a local repository, so you could yum the packaged instead or rpm-ing them.

First, mount the image, straight forward -

# mkdir -p /media/distro_iso
# mount -o loop /path/to/distro.iso /media/distro_iso

Then create the repo -

# mkdir /tmp/local_repo
# cd /media/distro_iso
# createrepo -o /tmp/local_repo .

This can take some time –

729/3187 - Server/gcc-4.1.2-48.el5.x86_64.rpm
...
735/3187 - Server/gcc-objc-4.1.2-48.el5.x86_64.rpm
...
737/3187 - Server/gcc44-c++-4.4.0-6.el5.x86_64.rpm
...

When it’s finished, register the repo -

## /etc/yum.repos.d/distro_iso.repo

[MyLocalRepo]
name=summer
baseurl=file:///tmp/local_repo
enabled=1

Check it –

# yum repolist

Loaded plugins: security
MyLocalRepo                      |  951 B     00:00
MyLocalRepo/primary              |  817 kB    00:00
MyLocalRepo                               3187/3187
repo id               repo name      status
MyLocalRepo           summer         enabled: 3,187
repolist: 3,187

Looks good, let’s try installing something -

# yum install mc

Loaded plugins: security
Setting up Install Process
Package 1:mc-4.6.1a-35.el5.x86_64 already installed and latest version
Nothing to do

Happy yumming.

February 12, 2011 / Alex Nedoboi

Building a database engine of our own. Part One. Connectivity.

Once upon a time (long time ago) I was consulting a small start-up. The nature of their business needed an SQL-like database with some specific features.

The choice was between -

1. using an RDBMS for simple data storage and then developing those specific features on top of it, or

2. building an application that implements the business logic, and then just incorporating those simple RDBMS features into the application.

Guess what I chose.

Features aside (they were very specific and far from the RDBMS world), the SQL-like functionality would be limited to trivial select’s, frequent insert’s, not too frequent update’s, and very rare delete’s.

All joins would be inner, some basic grouping by, no need for transactions as such.

Now I think you understand why I chose the second path.

Because it would be immense fun!

Really?

Yes.

Next 10-15 post in this blog are going to be about developing a database engine of our own.

The engine that would implement only the core RDBMS functionality, as described above.

Part 1 – the interface

Some databases offer command line interfaces. Some are mostly GUI oriented. Since this has nothing to do with RDBMS functionality, we will limit our DB engine to telnet-like communication.

We’re going to type in commands, and the engine would be printing its responses and result sets on the screen.

Ah, forgot to mention. We are going to develop our database engine in Java. Java, as a programming language, is very good, descriptive, and efficient for educational purposes.

As a first step, let’s put together the connectivity part.

## anSQLdb.java 

import java.net.*;
import java.io.*;
import java.util.*;

public class anSQLdb
{
   static int port = 55555;

   public static void main (String [] args)
   {
      ServerSocket ss = null;

      try
      {
         ss = new ServerSocket(port);
      }
      catch (IOException e)
      {
         System.err.println("BAD-0001 // Could not start : " + e);
         System.exit(1);
      }

      System.out.println("GOOD-0001 // anSQLdb Server started on port " + port + ". Telnet 'help' for help.\n\n");

      while (1<2)
      {
         Socket uc = null; /* user connection */

         try
         {
            uc = ss.accept(); /* user connection to server socket */

            BufferedReader req = new BufferedReader(new InputStreamReader(uc.getInputStream())); /* request */
            OutputStream os = new BufferedOutputStream(uc.getOutputStream());
            PrintStream res = new PrintStream(os); /* response (or, result set) */

            String st = req.readLine(); /* only read the first line */

            if (st == null) continue;

            System.out.println("Request: " + st);

            if (st.equals("help"))
            {
               res.print("Try this:\n\r\n\r select * from all_tables\n\r desc t1\n\r");
            }

            if (st.equals("select * from all_tables")) /* else-if's or readability? */
            {
               res.print("TABLE_NAME\r\n==========\n\rt1\n\rt2\r\nt3\n\r");
            }

            if (st.equals("desc t1"))
            {
               res.print("c1\tint\n\rc2\tvarchar\n\rc3\tdate\n\r");
            }

            res.flush();

            try
            {
               res.close();
               os.close();
               req.close();
            }
            catch (Exception e)
            {
               System.err.println("BAD-003C // Cannot close : " + e);
            }
         }
         catch (Exception e)
         {
            System.err.println("BAD-9999 // Lazy catch : " + e);
         }
      }
   }
}

Now compile and run it –

$ javac anSQLdb.java
$ java anSQLdb

GOOD-0001 // anSQLdb Server started on port 55555. Telnet 'help' for help.

In another window, let’s connect to it and run some commands.

$ telnet dev2 55555

help
Try this:

 select * from all_tables
 desc t1

$ telnet dev2 55555

select * from all_tables
TABLE_NAME
==========
t1
t2
t3

$ telnet dev2 55555

desc t1
c1      int
c2      varchar
c3      date

$

In the first window you should see something like this -

GOOD-0001 // anSQLdb Server started on port 55555. Telnet 'help' for help.

Request: help
Request: select * from all_tables
Request: desc t1

Wow. Our engine already has (data dictionary | information schema) functionality. Well, almost. We shall get there shortly.

February 6, 2011 / Alex Nedoboi

Interactive SVG example on iPad (iPhone) via Oracle PL/SQL Server Pages

A couple of weeks ago we talked about Flash on Android via Oracle APEX. Today let’s check the competitor’s offer.

Unfortunately Apple products don’t support Flash. The good news however is that they do support SVG, as do most web browsers (IE7/8 is pluginable and IE9 is going to support it properly).

I first came across SVG in 2003 at some conference in Melbourne, it was presented by a geekish W3C guy. Don’t remember much detail now except that I absolutely loved the concept. It was so simple and at the same time so powerful, I couldn’t see any reason not to use it whenever there’s a need of constructing anything resembling a graph.

SVG was great, and it hasn’t changed much since then which only tells you that it was great from the beginning.

HTML5 will even support direct embedding of SVG (though I don’t really see the point).

Motivational Speech

One might say – well, there are plenty of packages/libraries/kits that can generate graphs, why would you want to mess with this SVG stuff yourself.

The explanation is simple. The simplest way of doing the job is the best way to do it. Simple.

Or, in other words, less code = less bugs (lots of people attribute those words to Tom Kyte, though I must say, Kernighan and Ritchie were big fans of this paradigm a couple of decades earlier).

Mocking up

So let’s start.

First we would need an HTML container page and an SVG dummy.

## html_wrapper.psp
<html>
<head>
<title>Follow the transparent rabbit</title>
</head>
<body style="background-color:black;">
Here it is
<br>
<embed src="svg_wrappee" width="900" height="500" type="image/svg+xml" />
<br>
</body>
</html>
## svg_wrappee.psp
<%!

   cw simple_integer := 900; /* canvas width and height */
   ch simple_integer := 500;

   gw simple_integer := 750; /* graph width and height */
   gh simple_integer := 400;

   mx simple_integer := 80; /* axis margins */
   my simple_integer := 50;

   xs simple_integer := 10; /* number of bands */
   ys simple_integer := 4;

   max_qty simple_integer := 80000; /* events */

   j simple_integer := 0;

%><%
   ## SVG mime
   owa_util.mime_header('image/svg+xml', false);
   owa_util.http_header_close;

%><?xml version="1.0" standalone="no" ?>
<!DOCTYPE svg PUBLIC 
   "-//W3C//DTD SVG 1.1//EN" 
   "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg
   width="<%= cw %>"
   height="<%= ch %>"
   version="1.1"
   xmlns="http://www.w3.org/2000/svg"
   style="background-color:white;">

<style>
.axis { stroke:black; stroke-width:2; }
.bands { stroke:silver; stroke-width:0.5; }
.labels { font-family:Arial; font-size:11; }
</style>

## draw x and y axis
<path d="M<%= mx*0.8 %>,<%= ch-my %> l<%= gw + mx*0.2 %>,0" class="axis" />
<path d="M<%= mx %>,<%= ch-my*0.8 %> l0,<%= -ch+my %>" class="axis" />

<%
## draw bands and print labels
   for i in 1..ys loop
      j := (ch-my) - (gh/ys)*i;
%>
<path d="M<%= mx %>,<%= j %> l<%= gw %>,0" class="bands" />
<text 
   x="<%= mx*0.2 %>" 
   y="<%= j %>" 
   class="labels"><%= to_char( (max_qty/ys)*i, '999,999,990' ) %></text>
<%
   end loop;
%>

</svg>

Compile the two files with loadpsp, and on your iPod/iPhone open Safari and navigate to your Oracle HTTP Server.

As you can see, we have an empty canvas with a couple of lines (the <path> tag) and labels (the <text> tag).

Most of the code above is just one big header, the real job done by the highlighted tags, four lines.

Data Source

It’s always hard to make something up. Well, maybe not that hard, but it’s much better to find something practical, something that could be used in the day to day life, not just as an example.

For the purpose of this article, we will graph up one of the most important Oracle database events – class slave wait. So important, it was at some point included in the AWR and statspack reporting.

The query is trivial, ignoring under 64ms waits –

select inst_id, wait_time_milli, wait_count
from gv$event_histogram t
where t.wait_time_milli between power(2,6) and power(2,15)
order by 1,2
;

Let’s just create a [global (if you wish)] temporary table, so the data is consistent every time we query it (which is not the case with v$/gv$ views).

create table temp_slave_wait
as
select inst_id as node, wait_time_milli as ms, wait_count as qty
...

So we have three series of data, one for each node -

SQL> select * from temp_slave_wait order by 1,2;

      NODE         MS        QTY
---------- ---------- ----------
         1         64      76577
         1        128      84508
         1        256       4256
         1        512      23850
         1       1024      23524
         1       2048      10003
         1       4096      12772
         1       8192      23876
         1      16384      15152
         1      32768      18185
         2         64      71716
         2        128      65321
         2        256      13780
         2        512      17504
         2       1024      12680
         2       2048      15129
         2       4096      18406
         2       8192       9109
         2      16384       8582
         2      32768      18323
         3         64     107862
         3        128      93494
         3        256      23420
         3        512      28968
         3       1024      10991
         3       2048      15183
         3       4096      22315
         3       8192      14199
         3      16384      25702
         3      32768      31565

30 rows selected.

Dots

The pace the screen resolution has been increasing at over the past few years is quite fast. Retina’s (fourth generation) resolution is 960 by 640, or something around those numbers.

So if we want the dots to be visible, we should make them circles or rectangles. Rounded rectangle shape would, to my personal taste, suit iPhone/iPad the most, as they are of the same shape themselves.

To draw a rectangle in SVG, you would use the <rect> tag.

Add the following line to the <style> section -

.dots { fill-opacity:0.2; stroke-width:1.2; }

Then add the following loop just before the closing </svg> tag -


   for n in (select distinct node from temp_slave_wait order by 1) 
   loop
      clr := (case n.node when 1 then 'magenta' 
                          when 2 then 'lime' 
                          when 3 then 'aqua' 
                          else 'black' end);
      k := 1;  /* or use rownum if you consider yourself a purist */ 

      for q in (select qty from temp_slave_wait where node = n.node order by ms) 
      loop
%>
<rect
   x="<%= mx + (gw/xs)*(k-1/2) - 4 %>" /* position */
   y="<%= (ch-my) - gh*(q.qty/max_qty) - 4 %>"
   rx="3" /* rounding */
   ry="3"
   width="8"
   height="8"
   style="fill:<%= clr %>; stroke:<%= clr %>;"
   class="dots"
   />
<%
         k := k + 1;
      end loop;
   end loop;

Click refresh in Safari. What do you see.

Zoom in a bit.

Connect them

Connecting the dots is just as simple.

One more style -

.lines { opacity:0.9; }

Slightly change the “for q” loop -

for q in
(
   select qty,
   (
      select qty from temp_slave_wait t_in
      where t_in.node = t_out.node
      and t_in.ms = t_out.ms / 2
   ) as prev_qty
   from temp_slave_wait t_out
   where t_out.node = n.node order by ms
)
loop

Then add the following tag inside the loop -

if q.prev_qty is not null then

%>
<line
   x1="<%= mx + (gw/xs)*(k-3/2) %>"
   y1="<%= (ch-my) - gh*(q.prev_qty/max_qty) %>"
   x2="<%= mx + (gw/xs)*(k-1/2) %>"
   y2="<%= (ch-my) - gh*(q.qty/max_qty) %>"
   stroke="<%= clr %>"
   stroke-width="1"
   class="lines"
/>
<%

end if;

Beautifil?

Zoom in -

It is.

Legend

As you can see, there’s a lot of white space in the top right corner of the graph. Of course there is no guarantee this will always be the case, but just for the sake of this example, let’s be opportunistic and take advantage of it.

Let’s add a legend to our graph (and use the white space for it).

First, add a couple of temp variables to the declaration section –

tx simple_integer := 0;
ty simple_integer := 0;

Then, just after we assign the colour to “clr”, add the following code –

tx := cw * 2/3;
ty := ch/10 + (ch/20) * n.node;

%>
<rect
   x="<%= tx - 4 %>"
   y="<%= ty - 4 %>"
   rx="3"
   ry="3"
   width="8"
   height="8"
   style="fill:<%= clr %>; stroke:<%= clr %>;"
   class="dots"
/>
<line
   x1="<%= tx - 10 %>"
   y1="<%= ty %>"
   x2="<%= tx + 10 %>"
   y2="<%= ty %>"
   stroke="<%= clr %>"
   stroke-width="1"
   class="lines"
/>
<text x="<%= tx + 20 %>" y="<%= ty + 5 %>" class="labels">node <%= n.node %></text>
<%

The white space has now been put to use.

Zooming in.

Now make it interactive

Since iPad/iPhones are tappable, and are being tapped all the time, let’s see if our graph can benefit from tapping.

We will use the on-mouse-over and on-mouse-out events for this, even thought there is no mouse on iPads/iPhones. Tapping something will invoke on-mouse-over, and tapping somewhere else will invoke on-mouse-out.

To do a quick test, add the following attributes to the text tag (the one that prints “node x”) -

<text
   onmouseover="evt.target.setAttribute('font-weight','bold');"
   onmouseout="evt.target.setAttribute('font-weight','normal');"
   x="<%= tx + 20 %>" 
   y="<%= ty + 5 %>" 
   class="labels">node <%= n.node %></text>

Now tap “node x”, and tap somewhere else, you can see it become bold and then change back to normal.

But that’s not all. Let’s also make the corresponding part of the graph (the lines that connect node x’s dots) “bolder”.

The way we are going to do this, is not the neatest. However, the reason we are doing it this way, is to illustrate that using javascript in SVG is as easy as using it in HTML.

First, add the following attribite to the line tag (the one that connects the dot, not in the legend) -

...
id="line_<%= n.node %>_<%= k %>"
...

Then add the following section, right after the opening <svg> tag -

<script type="text/javascript">
<![CDATA[

function restroke(node, sw)
{
   var e;

   <% for k in 2..xs loop %>

   e = document.getElementById('line_' + node + '_<%= k %>');
   e.setAttribute('stroke-width', sw);

   <% end loop; %>
}

// ]]>
</script>

And call the function from the on-mouse-* events -

...
onmouseover=" ... restroke(<%= n.node %>,5);"
onmouseout=" ... restroke(<%= n.node %>,1)"
...

Now let’s see what happens when we tap the “node 1″ text in the legend.

The text itself becomes bold (as we’ve seen before), and the line becomes wider.

Tap “node 2″ or “node 3″ and see the lines switch.

What now

The example we created can be easily extended, you could create a procedure out of it, so one would pass a reference to the data, some parameters, and the procedure would draw up an SVG graph.

Another point, maybe not so important, is the size of the generated SVG. Since you’re looking at it on your iPad (iPhone) via 3G connection, data traffic plays a big part. Sometimes it’s slow, sometimes it’s expensive.

The entire SVG file we created in this example is only about 10K, which would load to your device instantly and cost you next to nothing.

Conclusion

As you can see, putting an SVG graph together is very simple. It is very similar to HTML, and if you know HTML well, you’ll catch up with SVG tags quite quickly. As was pointed out at the beginning of this article – in HTML5, SVG will be an integral part of it.

January 19, 2011 / Alex Nedoboi

To Index Scan or Not To Index Scan

Quite a few times this week I heard people mention the magic threshold of 5%

If the statement queries less than 5% of table rows, they say, then the optimizer will use the index, and if the statement queries over 5% of the rows, the optimizer will prefer full table scan.

This can’t be further from the truth.

Example One. 1%

SQL> create table t1 (id number(10), x number(2));

SQL> insert into t1 
     select level, mod(level,100) from dual connect by level <= 100000;

SQL> create index t1_i1 on t1 (x);

SQL> select avg(id) from t1 where x=0;

So we are selecting just 1% of the rows, surely the index will be used, range scan, don’t you think?

SQL> select avg(id) from t1 where x=0;

   AVG(ID)
----------
     50050

Elapsed: 00:00:00.04

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |    69   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1000 |  8000 |    69   (2)| 00:00:01 |
---------------------------------------------------------------------------

No, it is not. Full Table Access.

Example Two. 20%

SQL> create table t2 (st varchar2(1000), x number(2));

SQL> insert into t2 
     select 'a', 20 from dual connect by level <= 20000;

SQL> insert into t2 
     select lpad('a',1000,'a'), 80 from dual connect by level <= 80000;

SQL> create index t2_i1 on t2(x);

SQL> select max(st) from t2 where x=20;

So you reckon full table scan as we are selecting 20,000 out of 100,000 (which is 20%) ?

SQL> select max(st) from t2 where x=20;

MAX(ST)
-------
a

Elapsed: 00:00:00.09

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |   804 |  2333   (1)| 00:00:28 |
|   1 |  SORT AGGREGATE              |       |     1 |   804 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T2    | 20000 |    15M|  2333   (1)| 00:00:28 |
|*  3 |    INDEX RANGE SCAN          | T2_I1 | 20000 |       |    40   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Oops. Index Range Scan.

Please Explain

I was going to, but it’s quite obvious, isn’t it.

Here’s a couple of challenges for you.

Try to make the optimizer do a FTS while selecting just 0.1% of the rows. Do not cheat, make sure the index is there and the stats are current.

Also try to make the optimizer use an index while selecting 50% of the rows. Again, don’t cheat, make sure table access is needed.

January 15, 2011 / Alex Nedoboi

Java vs C vs (native) PL/SQL

When you need to do some serious computing in your database application, what do you do?

Do you write it in Java, or C, or in plain PL/SQL?

Have you tried native PL/SQL compilation? Have you tried simple_integer? Different levels of optimization?

Was it worth it?

Quite a few questions to start with, don’t you think? (Yes, another question.)

Maybe we can answer some of them. (I know, this also sounded like a question.)

A Very Simple Function

What can be simpler than summing up two variables (besides from doing nothing), so let’s do it then.

Summing them up just once would take fraction of a millisecond, so let’s do it one billion times.

Start with PL/SQL -

create or replace function f_plsql
return number
is
   x number := 0;
begin
   for i in 1..1000000000
   loop
      x := x + i;
   end loop;
   return x;
end;
SQL> select f_plsql from dual;

   F_PLSQL
----------
5.0000E+17

Elapsed: 00:02:48.71

That took some time. Almost three minutes.

Try in Java now.

class tryjava
{
   public static int f()
   {
      int x = 0;
      int i;

      for (i = 1; i <= 1000000000; i++) x += i;

      return x;
   }
}
[oracle@dev2 blog]$ loadjava -u hr/hr -v -resolve tryjava.java
arguments: '-u' 'hr/***' '-v' '-resolve' 'tryjava.java'
creating : source tryjava
loading  : source tryjava
created  : CREATE$JAVA$LOB$TABLE
resolving: source tryjava
Classes Loaded: 0
Resources Loaded: 0
Sources Loaded: 1
Published Interfaces: 0
Classes generated: 0
Classes skipped: 0
Synonyms Created: 0
Errors: 0

SQL> create or replace function f_java 
     return number 
     as language java 
     name 'tryjava.f() return integer';
     /

Function created.

SQL> select f_java from dual;

    F_JAVA
----------
-243309312

Elapsed: 00:00:02.35

SQL> select f_java from dual;

    F_JAVA
----------
-243309312

Elapsed: 00:00:02.35

SQL> select f_java from dual;

    F_JAVA
----------
-243309312

Elapsed: 00:00:02.39

Yes, you’re reading this correctly, just over two seconds, compared to almost three minutes in PL/SQL.

Why is that? And why the result is different?

The reason is that in PL/SQL we used “number”, and in Java we used “int”. Number is a wrapped PL/SQL type, it can be null, it can’t overflow, so every time we do something to our “number x”, PL/SQL does all those checks.

We don’t really care about nulls or overflows in this simple exercise, so to be fair to PL/SQL let’s change the type of the variables.

create or replace function f_plsql
return simple_integer
is
   x simple_integer := 0;
   i simple_integer := 1;
begin
   while i <= 1000000000
   loop
      x := x + i;
      i := i + 1;
   end loop;
   return x;
end;
/
SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:00:33.09

SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:00:33.06

SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:00:32.78

The result is now the same, and the performance is much better now, about five times faster, but still very slow compared to Java.

This is because the PL/SQL function is still being interpreted as opposed to natively compiled.

Now let’s see what C can tell us.

int c()
{
   int x = 0;
   int i;

   for (i = 1; i <= 1000000000; i++) x += i;

   return x;
}
[oracle@dev2 blog]$ gcc -c c.c
[oracle@dev2 blog]$ ld -shared -o c.so c.o

SQL> create or replace library c as '?/bin/c.so';
  2  /

Library created.

SQL> create or replace function f_c return simple_integer 
  2  as external library c name "c" language c;
  3  /

Function created.

SQL> select * from user_libraries where library_name = 'C';

LIBRARY_NAME FILE_SPEC           DYNAMIC STATUS
------------ ------------------- ------- ------
C            ?/bin/c.so          Y       VALID

The function is now able to be called from PL/SQL as if it were a PL/SQL function. So, let’s C.

SQL> select f_c from dual;

       F_C
----------
-243309312

Elapsed: 00:00:02.38

SQL> select f_c from dual;

       F_C
----------
-243309312

Elapsed: 00:00:02.41

SQL> select f_c from dual;

       F_C
----------
-243309312

Elapsed: 00:00:02.42

Two and a half seconds, same as Java.

Both C and Java outperform PL/SQL big time, interpreted code just doesn’t stand any chance against natively compliled code. The only way of getting back in the game is for PL/SQL to get natively compiled, and we can do that.

SQL> alter function f_plsql compile plsql_code_type=native;

Function altered.

SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:00:02.34

SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:00:02.32

SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:00:02.31

Brilliant! Compiled PL/SQL now has the same performance as C and Java. It is even marginally faster, but that’s probably because Oracle doesn’t need to do the extra round trip it does for C, no need to talk to the listener, read/call the “.so” library, etc.

A Slightly Less Simple Function

Can we now challenge our three guys a bit more? Let’s make a simple change.

create or replace function f_plsql return simple_integer
is

   x simple_integer := 0;
   i simple_integer := 1;

   function calc(x simple_integer, y simple_integer) return simple_integer
   is
      z simple_integer := 0;
   begin
      z := x+y;
      return z;		
   end;

begin
   while i <= 1000000000
   loop
      x := calc(x,i);
      i := i + 1;
   end loop;
   return x;
end;
/
SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:01:38.19

SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:01:39.21

SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:01:38.79

Wow, that is slow. That is very slow.

Fortunately, we have another bit of new functionality available for us in Oracle 11g. It is called optimization level. When you set it to “3″, Oracle tries to inline small functions, so it wouldn’t spend much time on calling them.

Lets’s see if it helps.

SQL> alter function f_plsql compile plsql_code_type=native plsql_optimize_level=3;

Function altered.

SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:00:04.41

SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:00:04.36

SQL> select f_plsql from dual;

   F_PLSQL
----------
-243309312

Elapsed: 00:00:04.38

It did help indeed. Though the modified code takes about twice as long.

Java’s turn.

class tryjava
{
   static int calc(int x, int y)
   {
      int z = x+y;
      return z;
   }

   public static int f()
   {
      int x = 0;
      int i;

      for (i = 1; i <= 1000000000; i++) x = calc(x,i);

      return x;
   }
}

Let Virtual Machine warm up if needed, and see its take on the change.

SQL> select f_java from dual;

    F_JAVA
----------
-243309312

Elapsed: 00:00:02.30

SQL> select f_java from dual;

    F_JAVA
----------
-243309312

Elapsed: 00:00:02.26

SQL> select f_java from dual;

    F_JAVA
----------
-243309312

Elapsed: 00:00:02.28

It looks like I need to use the word “brilliant” once again. This time, Java has taken over.

Java does not take any extra time, winning hands down over PL/SQL at this point. Moreover, we didn’t give Java any compilation hints or parameters. It seems to do the optimization regardless. Impressive.

What about C?

int c()
{
   int calc(int x, int y)
   {
      int z = x+y;
      return z;
   }

   int x = 0;
   int i;

   for (i = 1; i <= 1000000000; i++) x = calc(x,i);

   return x;
}
SQL> select f_c from dual;

       F_C
----------
-243309312

Elapsed: 00:00:06.34

SQL> select f_c from dual;

       F_C
----------
-243309312

Elapsed: 00:00:06.36

SQL> select f_c from dual;

       F_C
----------
-243309312

Elapsed: 00:00:06.30

C now takes significantly longer, even longer than PL/SQL.

C? C?

Since this article is full of fairness – remember how we switched to simple_integer? – let’s keep it that way.

As we used optimization settings for PL/SQL, shouldn’t we look into optimization settings for C as well.

Just like PL/SQL, there are three levels of optimization the gcc compiler can provide us, the third level is the highest, same as in PL/SQL.

[oracle@dev2 blog]$ gcc -c c.c -O3
[oracle@dev2 blog]$ ld -shared -o c.so c.o

SQL> select f_c from dual;

       F_C
----------
-243309312

Elapsed: 00:00:00.67

SQL> select f_c from dual;

       F_C
----------
-243309312

Elapsed: 00:00:00.68

SQL> select f_c from dual;

       F_C
----------
-243309312

Elapsed: 00:00:00.67

Suppose I spoke too soon when I said “brilliant” after seeing PL/SQL native compilation results. Then I spoke too soon again, calling Java compiler “brilliant”.

Hopefully if I say it yet again, it’s not going to be as embarrasing. It is four times faster than Java, how can it not be brilliant?

Let’s look a bit deeper.

$ gcc c.c -O3
$ objdump -d a.out > a.out.disassembled

Get ready for vibrant colours.

Now.

00000000004004a0 :
  4004a0:    8d 04 3e             lea    (%rsi,%rdi,1),%eax
  4004a3:    c3                   retq
  4004a4:    66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)
  4004aa:    66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)

00000000004004b0 :
  4004b0:    be 01 00 00 00       mov    $0x1,%esi           ## x = 1
  4004b5:    b8 01 00 00 00       mov    $0x1,%eax           ## i = 1
  4004ba:    66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)
  4004c0:    01 c6                add    %eax,%esi           ## x = calc(x,i) !!! 
  4004c2:    83 c0 01             add    $0x1,%eax           ## i++ 
  4004c5:    3d 01 ca 9a 3b       cmp    $0x3b9aca01,%eax    ## i <= 1000000000 
  4004ca:    75 f4                jne    4004c0 
  4004cc:    bf d8 05 40 00       mov    $0x4005d8,%edi
  4004d1:    31 c0                xor    %eax,%eax
  4004d3:    e9 c0 fe ff ff       jmpq   400398 

See, it inlined the calc function. The function itself is still there, but “main” doesn’t call it. All variables use CPU registers, no costly memory access.

The gcc compiler even padded the jump point with a six-byte “nopw” to be at a multiple of 0×10 (4004c0, main+0×10).

Suppose it doesn’t get any better than this (please let me know if it does).

So? (no more questions, promise…)

Of course, our simple test was too simple to be used as a basis for making big decisions, but at least it gave us some idea.

While using C or Java might give you some (or, heaps of) performance, using PL/SQL can pay off if your function makes a call or two to the database, not to mention the convenience of using PL/SQL in the database environment.

Follow

Get every new post delivered to your Inbox.