Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.cpm files grow uncontrollably during millions of adds and deletes #6264

Closed
1 of 11 tasks
zerovian opened this issue Jun 6, 2016 · 5 comments
Closed
1 of 11 tasks

.cpm files grow uncontrollably during millions of adds and deletes #6264

zerovian opened this issue Jun 6, 2016 · 5 comments

Comments

@zerovian
Copy link

zerovian commented Jun 6, 2016

Expected behavior and actual behavior

I have a database in which we create some ~million records and then delete the old ones after a few hours a few days. After a couple of days, the .pcl file size has stabilized since we end up creating and deleting the same number of records, and all the records are roughly the same size. But the .cpm file continues to grow.

Can this be fixed or worked around?

I've read the wiki which about plocal not reusing record pointer space which recommends import/export

But...dump and load is NOT an option to recover this lost space as our customers expect to stay running for more than a few days at a time. Months would be more in line with their expectations. Down time for a dump and load is bad. Due to the lost space the database starts around 1.2 GB and after some 3 days ends up at 3.5GB and continues to grow until it eats the entire file system space.

This was previously happening with the .irs files for an index and we managed to come up with a data structure that eliminated the use of the index.

Some quick (real) sample stats from a test we ran.

This is a table of data showing file sizes in kilobytes on disk on certain days.

File name | 2nd june 11:24 AM| 3rd June 11:55 AM | 6th June 2016 11:02 AM
indexstat.cpm 339841 841921 1539393
indexstat.pcl 199553 200001 200001

Steps to reproduce the problem

Create and delete a few million records continuously for a couple of days. Space is lost as .cpm file continues to grow.

Important Questions

Runninng Mode

  • Embedded, using PLOCAL access mode
  • Embedded, using MEMORY access mode
  • Remote

Misc

  • I have a distributed setup with multiple servers. How many?
  • I'm using the Enterprise Edition

OrientDB Version

  • v2.0.x - Please specify last number:
  • [ X] v2.1.17 - Please specify last number:
  • v2.2.x - Please specify last number:

Operating System

  • [X ] Linux
  • MacOSX
  • [X ] Windows
  • [ X] Other Unix
  • Other, name?

Java Version

  • 6
  • [X ] 7
  • 8
@andrii0lomakin
Copy link
Member

@zerovian It is known disadvantage of the current cluster. I suggest you to either truncate cluster instead of delete records or partition records between clusters and remove whole clusters.

@Eric24
Copy link

Eric24 commented Jun 9, 2016

Question: So this isn't really limited to "millions of records", right? In other words, the .cpm file will grow forever as any records are added and deleted over time (until a truncate)? If so, that's a fairly gross maintenance requirement. Are any improvements to this "known disadvantage" on the road map?

@andrii0lomakin
Copy link
Member

Hi,
We probably will add fix for this issue in 3.0 version with new cluster
implementation , but I can not promise right now.

On Thu, Jun 9, 2016 at 5:54 PM Eric Lenington [email protected]
wrote:

Question: So this isn't really limited to "millions of records", right? In
other words, the .cpm file will grow forever as any records are added and
deleted over time (until a truncate)? If so, that's a fairly gross
maintenance requirement. Are any improvements to this "known disadvantage"
on the road map?


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#6264 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAGaarYndYNJl8G6RhFjg-BmgBbiB1mQks5qKCkbgaJpZM4IvG1D
.

Best regards,
Andrey Lomakin, R&D lead.
OrientDB Ltd

twitter: @Andrey_Lomakin
linkedin: https://ua.linkedin.com/in/andreylomakin
blogger: http://andreylomakin.blogspot.com/

@smolinari
Copy link
Contributor

Does the syncing done while adding a server node in a cluster remove unused disk space caused by deletions by chance? I realize that isn't a possible solution for everyone, but those using multiple nodes/ an ODB cluster might find it useful to reclaim disk space through a node rotation (should such a growing of file data be a problem). This kind of node rotation to compact the files is possible when working with MongoDB, for example.

Scott

@andrii0lomakin
Copy link
Member

@zerovian @smolinari if you wish to claim space unused by .cpm files please vote for the OEP orientechnologies/orientdb-labs#7 . One of its goals "Reuse space which is used to store rid of delete record."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants