Wednesday, November 6, 2013

Oracle Coherence Memory Footprint Tests

In a production environment, we have configured 3 coherence nodes with 1G heap size each. We need to cache about 200~300K user account records with additional room to grow. Naturally, we want to find out if we have enough nodes and have allocated enough heap to hold that many records. I conducted some investigation and came up with some interesting numbers.

I wrote an application that can populate the cache with X number records. Then I tested the Coherence server first with single node and different heap sizes. Then I tested multiple Coherence nodes, each with 512M heap.The results are listed below (1 single account record is about 2K when serialized):

Single node:
number
of rec
serialized size
(default java format)
in (M) B3*RecNum Server
Heap Size
Client
Heap Size
0 4 0
1 2,236 0.2
25K 55,824,754 55M 55900000 512M 64M
50K 111,649,640 112M 111800000 512M 64M
100K 223,297,020 223M 223600000 512M 64M
200K 446,593,392 447M 447200000 1G 1G
300K 669,886,958 670M 670800000 1G 1G
500K 1,116,488,087 1.1G 1118000000 2G 1G

As it shows, 512M heap size can handle up to 100K records. 1G heap size can handle up to 200K records. Then 2G can handle 500K records.

Multiple nodes:
n
number
of rec
serialized size
(default java format)
in (M/G) Num of
Nodes
Server
Heap Size
Combined Serv
Heap Size
Single Node Heap Size Client
Heap Size
200K 446,593,392 512M 1G 1G out of memory
200K 446,593,392 447M 512M 1.5G 1G 1G OK
300K 669,886,958 512M 1.5G 1G out of memory
300K 669,886,958 670M 512M 2G 1G 1G OK
500K 1,116,488,087 512M 2G 1G out of memory
500K 1,116,488,087 512M 2.5G 1G out of memory
500K 1,116,488,087 512M 3G 1G out of memory
500K 1,116,488,087 1.1G 512M 3.5G 2G 1G OK









In the 2nd test group, each node has 512M heap size. As it shows in the table above, when the records grow, I had to add more nodes. It also shows the each node has memory overhead. A combined heap size needs to be bigger than the heap size in the single node test case.

Finally, here is the server start script:
setlocal
set COHERENCE_HOME=D:\coherence\weblogic
REM set COH_OPTS=-server -cp %COHERENCE_HOME%\lib\coherence.jar;%COHERENCE_HOME%\lib\coherence-web-spi.war;d:\coherence\myRecordType.jar;

set COH_OPTS=%COH_OPTS% -Dtangosol.coherence.management.remote=true
set COH_OPTS=%COH_OPTS% -Dtangosol.coherence.distributed.localstorage=true
set COH_OPTS=%COH_OPTS% -Dtangosol.coherence.session.localstorage=true
set COH_OPTS=%COH_OPTS% -Dtangosol.coherence.cacheconfig=d:\coherence\cache-config.xml
set COH_OPTS=%COH_OPTS% -Dtangosol.coherence.wka=devmachine
set COH_OPTS=%COH_OPTS% -Dtangosol.coherence.localhost=devmachine

java %COH_OPTS% -Xms2024m -Xmx2024m com.tangosol.net.DefaultCacheServer
endlocal

Snippet of tangosol file for node 1:
<member-identity>
<cluster-name system-property="tangosol.coherence.cluster">Grid/1_0/DEV</cluster-name>
</member-identity>
<unicast-listener>
      <well-known-addresses>
        <socket-address id="1">
          <address system-property="tangosol.coherence.wka">devmachine</address>
          <port system-property="tangosol.coherence.wka.port">7201</port>
        </socket-address>
        </well-known-addresses>
<address system-property="tangosol.coherence.localhost">devmachine</address>
        <port system-property="tangosol.coherence.localport">7201</port>

Snippet of <cache-config> (they are the same for each node)
   <caching-scheme-mapping>
      <cache-mapping>
         <cache-name>TestCacheSize</cache-name>
         <scheme-name>distributed</scheme-name>
      </cache-mapping>
   </caching-scheme-mapping>

Here is the snippet of tangosol file for 2nd node:

  <member-identity>
<cluster-name system-property="tangosol.coherence.cluster">Grid/1_0/DEV</cluster-name>
</member-identity>
<unicast-listener>
      <well-known-addresses>
        <socket-address id="1">
          <address system-property="tangosol.coherence.wka1">devmachine</address>
          <port system-property="tangosol.coherence.wka1.port">7201</port>
        </socket-address>
        </well-known-addresses>
<address system-property="tangosol.coherence.localhost">devmachine</address>
        <port system-property="tangosol.coherence.localport">7203</port>
Please note:
cluster name is the same.
wka (well known address) is the same (but it uses wka, wka1, wka2, wka3 etc)
each node runs on a different localport: 7021, 7023, 7024 etc

My understanding of "Well known address" is like a "ring leader", a new node will try to talk to this guy first when joining the node initially.

Finally, customize query.cmd under "coherence/weblogic/bin" directory, add your sample "myRecordType.jar" to the classpath and adjust member size as you need.

The CohQL command i used to dump the objects:

backup cache "TestCacheSize" to "client-test/dump-100k.txt";

You can use query like:
select * from "TestCacheSize" where key()='100008';
to check the individual record.