Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cria cronjob com horario randomico para atualizar indices do solr #3637

Open
wants to merge 2 commits into
base: 3.1.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ RUN apt-get update && \
apt-get install -y --no-install-recommends $BUILD_PACKAGES $RUN_PACKAGES && \
fc-cache -fv && \
pip3 install --no-cache-dir --upgrade pip setuptools && \
apt-get install cron -y && \
joaohortsenado marked this conversation as resolved.
Show resolved Hide resolved
apk add --no-cache dcron && \
joaohortsenado marked this conversation as resolved.
Show resolved Hide resolved
rm -f /etc/nginx/conf.d/* && \
pip install --no-cache-dir -r /var/interlegis/sapl/requirements/dev-requirements.txt --upgrade setuptools && \
SUDO_FORCE_REMOVE=yes apt-get purge -y --auto-remove $BUILD_PACKAGES && \
Expand Down Expand Up @@ -71,3 +73,9 @@ EXPOSE 80/tcp 443/tcp
VOLUME ["/var/interlegis/sapl/data", "/var/interlegis/sapl/media"]

CMD ["/var/interlegis/sapl/start.sh"]

COPY cronjob /etc/cron.d/rebuild_solr_index
joaohortsenado marked this conversation as resolved.
Show resolved Hide resolved
RUN chmod 0644 /etc/cron.d/rebuild_solr_index
RUN crontab /etc/cron.d/rebuild_solr_index
RUN touch /var/log/cron.log
CMD cron && tail -f /var/log/cron.log
44 changes: 43 additions & 1 deletion docker/solr_cli.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import argparse
import datetime
import logging
import re
import secrets
Expand Down Expand Up @@ -111,6 +112,8 @@ class SolrClient:
DELETE_COLLECTION = "{}/solr/admin/collections?action=DELETE&name={}&wt=json"
DELETE_DATA = "{}/solr/{}/update?commitWithin=1000&overwrite=true&wt=json"
QUERY_DATA = "{}/solr/{}/select?q=*:*"
REBUILD_INDEX = "{}/solr/{}/dataimport?command=full-import&wt=json"
UPDATE_INDEX = "{}/solr/{}/dataimport?command=delta-import&wt=json"
joaohortsenado marked this conversation as resolved.
Show resolved Hide resolved

CONFIGSET_NAME = "sapl_configset"

Expand Down Expand Up @@ -243,6 +246,35 @@ def delete_index_data(self, collection_name):
num_docs = self.get_num_docs(collection_name)
print("Num docs: %s" % num_docs)

def update_index_last_day(self, collection_name):
date = (datetime.now() - datetime.timedelta(days=1)).strftime('%Y-%m-%dT%H:%M:%SZ')
now = datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ')

req_url = self.UPDATE_INDEX.format(self.url, collection_name)
res = requests.post(req_url,
data='<update><query>*:[%s TO %s]</query></update>' % date % now,
headers={'Content-Type': 'application/xml'})
if not res.ok:
print("Error updating index for collection '%s'", collection_name)
print("Code {}: {}".format(res.status_code, res.text))
else:
print("Collection '%s' data updated successfully!" % collection_name)

num_docs = self.get_num_docs(collection_name)
print("Num docs: %s" % num_docs)

def rebuild_index(self, collection_name):
req_url = self.REBUILD_INDEX.format(self.url, collection_name)
res = requests.post(req_url)
if not res.ok:
print("Error rebuilding index for collection '%s'", collection_name)
print("Code {}: {}".format(res.status_code, res.text))
else:
print("Collection '%s' index rebuilt successfully!" % collection_name)

num_docs = self.get_num_docs(collection_name)
print("Num docs: %s" % num_docs)

joaohortsenado marked this conversation as resolved.
Show resolved Hide resolved

def setup_embedded_zk(solr_url):
match = re.match(URL_PATTERN, solr_url)
Expand Down Expand Up @@ -277,9 +309,10 @@ def setup_embedded_zk(solr_url):
help='Replication factor (default=1)', default=1)
parser.add_argument('-ms', type=int, dest='max_shards_per_node', nargs='?',
help='Max shards per node (default=1)', default=1)

parser.add_argument("--embedded_zk", default=False, action="store_true",
help="Embedded ZooKeeper")
parser.add_argument("--rebuild_index", default=False, action="store_true",)
parser.add_argument("--update_index", default=False, action="store_true",)

try:
args = parser.parse_args()
Expand Down Expand Up @@ -315,3 +348,12 @@ def setup_embedded_zk(solr_url):
if num_docs == 0:
print("Performing a full reindex of '%s' collection..." % collection)
p = subprocess.call(["python3", "manage.py", "rebuild_index", "--noinput"])

if args.rebuild_index:
print("Rebuilding index of '%s' collection..." % collection)
client.rebuild_index(collection)

if args.update_index:
print("Updating index of '%s' collection..." % collection)
client.update_index_last_day(collection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As chamadas tem que ser feitas como tá sendo feito na linha 350. ;)


15 changes: 15 additions & 0 deletions docker/start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,21 @@ if [ "${USE_SOLR-False}" == "True" ] || [ "${USE_SOLR-False}" == "true" ]; then
fi

python3 solr_cli.py -u $SOLR_URL -c $SOLR_COLLECTION -s $NUM_SHARDS -rf $RF -ms $MAX_SHARDS_PER_NODE $ZK_EMBEDDED &

RANDOM_MINUTE_MIN=0
RANDOM_MINUTE_MAX=60
RANDOM_HOUR_MIN=0
RANDOM_HOUR_MAX=3

# Generate a random minute within the interval
RANDOM_MINUTE=$((RANDOM % ($RANDOM_MINUTE_MAX-$RANDOM_MINUTE_MIN+1) + $RANDOM_MINUTE_MIN))
RANDOM_HOUR=$((RANDOM % ($RANDOM_HOUR_MAX-$RANDOM_HOUR_MIN+1) + $RANDOM_HOUR_MIN))
Comment on lines +88 to +96
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aqui eu estava pensando no seguinte esquema: ter uma hora base a noite e variar a hora e minuto a partir do qual a atualização do índice vai ocorrer todo dia. Isso pode ser resolvido conforme o script em bash abaixo:


BASE_HOUR=20

random_time() {
  RANDOM_HOUR=$((RANDOM % 4))
  RANDOM_MIN=$(printf "%02d" $((RANDOM % 60)))
  RANDOM_TIME=$((BASE_HOUR + RANDOM_HOUR))":"$RANDOM_MIN
  echo $RANDOM_TIME
}

# gera sequencia de 10 horários aleatórios para teste
for i in {1..10}; do
  echo `random_time`
done

Neste caso, a partir das 20h ele vai gerar um horário aleatório pro cronjob rodar. Pode ser criada duas funções uma pra retornar a hora e houra pra retornar o minuto.


# Add the cronjob to the crontab
echo "$RANDOM_MINUTE $RANDOM_HOUR * * * python3 solr_cli.py -u $SOLR_URL -c $SOLR_COLLECTION --update-index" >> /etc/cron.daily/rebuild_index_job

# Start the cron daemon
crond -f -L /dev/stdout
else
echo "Solr is offline, not possible to connect."
fi
Expand Down