Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert error: PRIMARY KEY must be unique #1

Open
ech-o-o opened this issue Mar 13, 2024 · 4 comments
Open

Insert error: PRIMARY KEY must be unique #1

ech-o-o opened this issue Mar 13, 2024 · 4 comments

Comments

@ech-o-o
Copy link

ech-o-o commented Mar 13, 2024

I want to run your code for learning. I notice your code is coupled with mobibench. The only change I made is to change "extern int num_threads;" into "int num_threads=1;" because I am running with only one thread.
The test.c I run does the following things:

  1. Open a new database.
  2. CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY not null, name TEXT not null, postcode INTEGER not null, salary INTEGER not null;
  3. Insert pre-generated data into the table. The data follows the following form: %10d, %5s, %5s, %d when generating. The data has less byte width but more fields than your mobibench. The data is generated by faker library in python by:
from faker import Faker
faker = Faker()
trace_num = 20000
#  The directory of trace file
trace_file = "./trace/" + str(trace_num) + "-test.trace"
with open(trace_file, "w+") as f:
    ids = set()
    while len(ids) < trace_num :
        id = faker.pyint(0, 10**10)
        name = faker.pystr(8, 8)
        postcode = faker.postcode()
        salary = faker.pyint(0, 10000)
        if id not in ids:
            ids.add(id)
            content = "%10d, %5s, %5s, %d\n"%(id ,name ,postcode, salary)
            f.write( content )

And it may output an error after about 30k insertion: Insert error: PRIMARY KEY must be unique.
When I do the testing if the data we insert can be read, the key which output error is not able to be read. BTW, if I close the database and reopen it, it may warn database is corrupyted.

Running your mobibench, I find after a specific volumn, more volumn inserted, higher throughput.

All the testing is under the environment:
Linux 6.5.0-15-generic
Ubuntu 22.04.1
gcc 9.5.0

Would you please take a look on it? If there exist a bug, would you please fix it? THX!

@lamduy-nguyen
Copy link
Member

Our prototype doesn't support restarting the database.
Also, you must include the source file directly into the test script, similar to Mobibench.
Could you check your script with original sqlite first?

@ech-o-o
Copy link
Author

ech-o-o commented Mar 14, 2024

I think I have correctly included the file because if I add a printf in your diclCommitDataView function, it will output a message. Original SQLite does not occur such an error. Basicly I have tested the performance of delete mode, which is different from your OFF. But changing the journal mode from delete to OFF does not solve the problem...
I put the test script and your source file(.c and .h) in the same library. And then running the Makefile. The Makefile is modified from your mobibench from mobibench github.
The Makefile:

EXENAME = test


SRCS = test.c \
	  ./sqlite3.c

MODE = release

INSTALL = install

prefix = /usr/local
bindir = $(prefix)/bin

CFLAGS = -lpthread -ldl -luring

#CFLAGS += -DDEBUG_SCRIPT

#for sqltie3
CFLAGS += -DNDEBUG=1 \
     -DHAVE_USLEEP=1 \
     -DSQLITE_HAVE_ISNAN \
     -DSQLITE_THREADSAFE=2 \
     -DSQLITE_TEMP_STORE=3 \
     -DSQLITE_POWERSAFE_OVERWRITE=1 \
     -DSQLITE_DEFAULT_FILE_FORMAT=4 \
     -DSQLITE_DEFAULT_AUTOVACUUM=1 \
     -DSQLITE_ENABLE_MEMORY_MANAGEMENT=1 \
     -DSQLITE_ENABLE_FTS3 \
     -DSQLITE_ENABLE_FTS3_BACKWARDS \
     -DSQLITE_ENABLE_FTS4 \
     -DSQLITE_OMIT_BUILTIN_TEST \
     -DSQLITE_OMIT_COMPILEOPTION_DIAGS \
     -DSQLITE_OMIT_LOAD_EXTENSION \
     -DSQLITE_DEFAULT_FILE_PERMISSIONS=0666 \
     -DUSE_PREAD64 \
     --static
#-DSQLITE_DEFAULT_WAL_AUTOCHECKPOINT=1000    
all :
    ifeq ($(MODE), release)
	#$(CROSS)gcc -o $(EXENAME) $(SRCS) $(CFLAGS)
		$(CROSS)gcc -o $(EXENAME) $(SRCS) $(CFLAGS)
    else
		$(CROSS)gcc -g -o $(EXENAME) $(SRCS) $(CFLAGS)
    endif

clean :
	@rm -rf mobibench

install : $(EXENAME)
	@$(INSTALL) -m 755 -d $(DESTDIR)$(bindir)
	@$(INSTALL) $(EXENAME) $(DESTDIR)$(bindir)

The test script:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <time.h>
#include <string.h> 

#include <sys/mman.h>
#include <errno.h>
#include <pthread.h>
#include <sys/syscall.h>
#include <dirent.h>
/* for sqlite3 */
#include "sqlite3.h"

char* error_msg;
#define exec_sql(db, sql, cb)	sqlite3_exec(db, sql, cb, NULL, &error_msg);

int callback(void *para, int nCount, char** pValue, char** pName)
{
    char** s = (char**) para;
    if(atoi(pValue[0])  != atoi(s[0]))
    {
        printf("different id! \n");
        return 1;
    }   
    else if(strcmp(strtok(pValue[1], " ,\n"), strtok(s[1], " ,\n")) != 0)
    {
        printf("different name! \n");
        return 1;
    }
    else if(atoi(pValue[2])  != atoi(s[2]))
    {
        printf("different postcode! \n");
        return 1;
    }
    else if(atoi(pValue[3])  != atoi(s[3]))
    {
        printf("different salary! \n");
        return 1;
    }
    else
    {
        // printf("They are same!\n");
    }
    return SQLITE_OK;
}

int main(int argc, char*argv[])
{
    int insert_num = 1000;
    if(argc == 2)
    {
        insert_num = atoi(argv[1]);
    }
// An directory for database file
    if(access("./data/", F_OK) == -1)
    {
        system("mkdir ./data/");
    }
    else{
        system("rm -rf ./data/");
        system("mkdir ./data/");
    }
    sqlite3 *db;
    // The directory of database. 
    char *filename = "./data/test.db";
    int rc = sqlite3_open(filename, &db);
    if(rc != SQLITE_OK)
        printf("error: %s \n", error_msg);   
    char sql[1024];
    char* INFO[4];
    INFO[0] = "id";
    INFO[1] = "name";
    INFO[2] = "postcode";
    INFO[3] = "salary";
    rc = exec_sql(db, "PRAGMA page_size = 4096;", NULL);
    if(rc != SQLITE_OK) printf("ERROR ON SETING PRAGMA page size\n");
    // Setting the journal mode.
    exec_sql(db, "PRAGMA journal_mode=OFF;", NULL);
    if(rc != SQLITE_OK) printf("ERROR ON SETING PRAGMA journal mode\n");
    // exec_sql(db, "PRAGMA cache_size = 10;", NULL);
    // if(rc != SQLITE_OK) printf("ERROR ON SETING PRAGMA cache size\n");

    // Create table.
	int	length = sprintf(sql, "BEGIN;");    
    length += sprintf(sql+length, " CREATE TABLE IF NOT EXISTS tblMyList%d (%s INTEGER PRIMARY KEY not null, %s TEXT not null, %s INTEGER not null,%s INTEGER not null);", 0, INFO[0], INFO[1], INFO[2], INFO[3]);
    length += sprintf(sql+length, "COMMIT;");
    rc = exec_sql(db, sql, NULL);
    if(rc != SQLITE_OK)
        printf("error: %s \n", error_msg);   
    
    FILE* fp = NULL;
    char input[100];

    // "input" stores the directory of trace. 
    sprintf(input,"./trace/%d-test.trace", insert_num );
    fp = fopen(input, "r");
    char buf[100];
    if(fp == NULL)
    {
        printf("Open Trace file fails\n");
        exit(0);
    }
    // A "For Loop" to insert data. 
    for(int i = 0; i < insert_num; ++i)
    {
        fgets(buf, 100, fp);
        char* infos[4];
        infos[0] =  strtok(buf, " ,\n");
        for(int i = 1; i < 4; ++i)
        {
            infos[i] = strtok(NULL, " ,\n");
        }
        int	length = sprintf(sql, "BEGIN;");
        length += sprintf(sql+length, "INSERT INTO tblMyList%d (%s, %s, %s, %s) VALUES(%s, '%s', %s, %s);", 0, INFO[0], INFO[1], INFO[2], INFO[3], infos[0], infos[1], infos[2], infos[3]);
        length += sprintf(sql+length, " COMMIT;");
        rc = exec_sql(db, sql, NULL);
        if(rc != SQLITE_OK)
        {
            printf("%d Insert error: %s \n", i, error_msg);
            exec_sql(db, "COMMIT;", NULL);
        }
        // else
        // {
        //     printf("%d successfully insert \n", i);
        // }
    }
    // Restart from the beginning to read data

    rewind(fp);
    // rc = sqlite3_close(db);
    // if(rc != SQLITE_OK)
    //     printf("error: %s \n", error_msg);   
    // rc = sqlite3_open(filename, &db);
    // if(rc != SQLITE_OK)
    //     printf("error: %s \n", error_msg);   

    // A "For Loop" to read inserted data. 
    for(int i = 0; i < insert_num; ++i)
    {
        fgets(buf, 100, fp);
        char* infos[4];
        infos[0] =  strtok(buf, ",");
        for(int i = 1; i < 4; ++i)
        {
            infos[i] = strtok(NULL, ",");
        }
        sprintf(sql, "SELECT * FROM tblMyList%d WHERE id == %s;", 0, infos[0]);
        rc = exec_sql(db, sql, NULL);
        int rc = sqlite3_exec(db, sql, callback, infos, &error_msg);
        if(rc != SQLITE_OK)
            printf("compare %d error: %s \n", i, error_msg);
    }

    rc = sqlite3_close(db);

    return 0;
}

Hope these additional information may help. By the way, thanks for your reply! I will try to modify the benchmark to disallow restart the database. Your quick reply really saves me a lot of time.

@lamduy-nguyen
Copy link
Member

My apologies as I can't help you any further.
It has been a long time since I last worked on SQLite source code.

I suggested you to try to debug the issue.
You can find some hints here:
https://www.sqlite.org/debugging.html
You can also find some in the customized Mobibench in this repository.

@ech-o-o
Copy link
Author

ech-o-o commented Mar 14, 2024

Thanks for your hints. I will have a try first, though I am a bit pessimistic for that. Hope the bug is not caused by platform changing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants