Apache Airflow scheduler does not trigger DAG at schedule time











up vote
1
down vote

favorite












When I schedule DAGs to run at a specific time everyday, the DAG execution does not take place at all.
However, when I restart Airflow webserver and scheduler, the DAGs execute once on the scheduled time for that particular day and do not execute from the next day onwards.
I am using Airflow version v1.7.1.3 with python 2.7.6.
Here goes the DAG code:



from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta

import time
n=time.strftime("%Y,%m,%d")
v=datetime.strptime(n,"%Y,%m,%d")
default_args = {
'owner': 'airflow',
'depends_on_past': True,
'start_date': v,
'email': ['airflow@airflow.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=10),

}

dag = DAG('dag_user_answer_attempts', default_args=default_args, schedule_interval='03 02 * * *')

# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
task_id='user_answer_attempts',
bash_command='python /home/ubuntu/bigcrons/appengine-flask-skeleton-master/useranswerattemptsgen.py',
dag=dag)


Am I doing something wrong?










share|improve this question


























    up vote
    1
    down vote

    favorite












    When I schedule DAGs to run at a specific time everyday, the DAG execution does not take place at all.
    However, when I restart Airflow webserver and scheduler, the DAGs execute once on the scheduled time for that particular day and do not execute from the next day onwards.
    I am using Airflow version v1.7.1.3 with python 2.7.6.
    Here goes the DAG code:



    from airflow import DAG
    from airflow.operators.bash_operator import BashOperator
    from datetime import datetime, timedelta

    import time
    n=time.strftime("%Y,%m,%d")
    v=datetime.strptime(n,"%Y,%m,%d")
    default_args = {
    'owner': 'airflow',
    'depends_on_past': True,
    'start_date': v,
    'email': ['airflow@airflow.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=10),

    }

    dag = DAG('dag_user_answer_attempts', default_args=default_args, schedule_interval='03 02 * * *')

    # t1, t2 and t3 are examples of tasks created by instantiating operators
    t1 = BashOperator(
    task_id='user_answer_attempts',
    bash_command='python /home/ubuntu/bigcrons/appengine-flask-skeleton-master/useranswerattemptsgen.py',
    dag=dag)


    Am I doing something wrong?










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      When I schedule DAGs to run at a specific time everyday, the DAG execution does not take place at all.
      However, when I restart Airflow webserver and scheduler, the DAGs execute once on the scheduled time for that particular day and do not execute from the next day onwards.
      I am using Airflow version v1.7.1.3 with python 2.7.6.
      Here goes the DAG code:



      from airflow import DAG
      from airflow.operators.bash_operator import BashOperator
      from datetime import datetime, timedelta

      import time
      n=time.strftime("%Y,%m,%d")
      v=datetime.strptime(n,"%Y,%m,%d")
      default_args = {
      'owner': 'airflow',
      'depends_on_past': True,
      'start_date': v,
      'email': ['airflow@airflow.com'],
      'email_on_failure': False,
      'email_on_retry': False,
      'retries': 1,
      'retry_delay': timedelta(minutes=10),

      }

      dag = DAG('dag_user_answer_attempts', default_args=default_args, schedule_interval='03 02 * * *')

      # t1, t2 and t3 are examples of tasks created by instantiating operators
      t1 = BashOperator(
      task_id='user_answer_attempts',
      bash_command='python /home/ubuntu/bigcrons/appengine-flask-skeleton-master/useranswerattemptsgen.py',
      dag=dag)


      Am I doing something wrong?










      share|improve this question













      When I schedule DAGs to run at a specific time everyday, the DAG execution does not take place at all.
      However, when I restart Airflow webserver and scheduler, the DAGs execute once on the scheduled time for that particular day and do not execute from the next day onwards.
      I am using Airflow version v1.7.1.3 with python 2.7.6.
      Here goes the DAG code:



      from airflow import DAG
      from airflow.operators.bash_operator import BashOperator
      from datetime import datetime, timedelta

      import time
      n=time.strftime("%Y,%m,%d")
      v=datetime.strptime(n,"%Y,%m,%d")
      default_args = {
      'owner': 'airflow',
      'depends_on_past': True,
      'start_date': v,
      'email': ['airflow@airflow.com'],
      'email_on_failure': False,
      'email_on_retry': False,
      'retries': 1,
      'retry_delay': timedelta(minutes=10),

      }

      dag = DAG('dag_user_answer_attempts', default_args=default_args, schedule_interval='03 02 * * *')

      # t1, t2 and t3 are examples of tasks created by instantiating operators
      t1 = BashOperator(
      task_id='user_answer_attempts',
      bash_command='python /home/ubuntu/bigcrons/appengine-flask-skeleton-master/useranswerattemptsgen.py',
      dag=dag)


      Am I doing something wrong?







      python apache cron directed-acyclic-graphs airflow






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 '16 at 6:32









      Prabhjot

      119210




      119210
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          10
          down vote













          Your issue is the start_date being set for the current time. Airflow runs jobs at the end of an interval, not the beginning. This means that the first run of your job is going to be after the first interval.



          Example:



          You make a dag and put it live in Airflow at midnight. Today (20XX-01-01 00:00:00) is also the start_date, but it is hard-coded ("start_date":datetime(20XX,1,1)). The schedule interval is daily, like yours (3 2 * * *).



          The first time this dag will be queued for execution is 20XX-01-02 02:03:00, because that is when the interval period ends. If you look at your dag being run at that time, it should have a started datetime of roughly one day after the schedule_date.



          You can solve this by having your start_date hard-coded to a date or by making sure that the dynamic date is further in the past than the interval between executions (In your case, 2 days would be plenty). Airflow recommends you use static start_dates in case you need to re-run jobs or backfill (or end a dag).



          For more information on backfilling (the opposite side of this common stackoverflow question), check the docs or this question:
          Airflow not scheduling Correctly Python






          share|improve this answer






























            up vote
            0
            down vote













            From the schedule your DAG should run everyday at 02:03 AM. My suspicion is the start_date might be impacting it. Can you hardcode that to something like 'start_date': datetime.datetime(2016, 11, 01) and try.






            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














               

              draft saved


              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f40714087%2fapache-airflow-scheduler-does-not-trigger-dag-at-schedule-time%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              10
              down vote













              Your issue is the start_date being set for the current time. Airflow runs jobs at the end of an interval, not the beginning. This means that the first run of your job is going to be after the first interval.



              Example:



              You make a dag and put it live in Airflow at midnight. Today (20XX-01-01 00:00:00) is also the start_date, but it is hard-coded ("start_date":datetime(20XX,1,1)). The schedule interval is daily, like yours (3 2 * * *).



              The first time this dag will be queued for execution is 20XX-01-02 02:03:00, because that is when the interval period ends. If you look at your dag being run at that time, it should have a started datetime of roughly one day after the schedule_date.



              You can solve this by having your start_date hard-coded to a date or by making sure that the dynamic date is further in the past than the interval between executions (In your case, 2 days would be plenty). Airflow recommends you use static start_dates in case you need to re-run jobs or backfill (or end a dag).



              For more information on backfilling (the opposite side of this common stackoverflow question), check the docs or this question:
              Airflow not scheduling Correctly Python






              share|improve this answer



























                up vote
                10
                down vote













                Your issue is the start_date being set for the current time. Airflow runs jobs at the end of an interval, not the beginning. This means that the first run of your job is going to be after the first interval.



                Example:



                You make a dag and put it live in Airflow at midnight. Today (20XX-01-01 00:00:00) is also the start_date, but it is hard-coded ("start_date":datetime(20XX,1,1)). The schedule interval is daily, like yours (3 2 * * *).



                The first time this dag will be queued for execution is 20XX-01-02 02:03:00, because that is when the interval period ends. If you look at your dag being run at that time, it should have a started datetime of roughly one day after the schedule_date.



                You can solve this by having your start_date hard-coded to a date or by making sure that the dynamic date is further in the past than the interval between executions (In your case, 2 days would be plenty). Airflow recommends you use static start_dates in case you need to re-run jobs or backfill (or end a dag).



                For more information on backfilling (the opposite side of this common stackoverflow question), check the docs or this question:
                Airflow not scheduling Correctly Python






                share|improve this answer

























                  up vote
                  10
                  down vote










                  up vote
                  10
                  down vote









                  Your issue is the start_date being set for the current time. Airflow runs jobs at the end of an interval, not the beginning. This means that the first run of your job is going to be after the first interval.



                  Example:



                  You make a dag and put it live in Airflow at midnight. Today (20XX-01-01 00:00:00) is also the start_date, but it is hard-coded ("start_date":datetime(20XX,1,1)). The schedule interval is daily, like yours (3 2 * * *).



                  The first time this dag will be queued for execution is 20XX-01-02 02:03:00, because that is when the interval period ends. If you look at your dag being run at that time, it should have a started datetime of roughly one day after the schedule_date.



                  You can solve this by having your start_date hard-coded to a date or by making sure that the dynamic date is further in the past than the interval between executions (In your case, 2 days would be plenty). Airflow recommends you use static start_dates in case you need to re-run jobs or backfill (or end a dag).



                  For more information on backfilling (the opposite side of this common stackoverflow question), check the docs or this question:
                  Airflow not scheduling Correctly Python






                  share|improve this answer














                  Your issue is the start_date being set for the current time. Airflow runs jobs at the end of an interval, not the beginning. This means that the first run of your job is going to be after the first interval.



                  Example:



                  You make a dag and put it live in Airflow at midnight. Today (20XX-01-01 00:00:00) is also the start_date, but it is hard-coded ("start_date":datetime(20XX,1,1)). The schedule interval is daily, like yours (3 2 * * *).



                  The first time this dag will be queued for execution is 20XX-01-02 02:03:00, because that is when the interval period ends. If you look at your dag being run at that time, it should have a started datetime of roughly one day after the schedule_date.



                  You can solve this by having your start_date hard-coded to a date or by making sure that the dynamic date is further in the past than the interval between executions (In your case, 2 days would be plenty). Airflow recommends you use static start_dates in case you need to re-run jobs or backfill (or end a dag).



                  For more information on backfilling (the opposite side of this common stackoverflow question), check the docs or this question:
                  Airflow not scheduling Correctly Python







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited May 23 '17 at 12:34









                  Community

                  11




                  11










                  answered May 3 '17 at 22:29









                  apathyman

                  51149




                  51149
























                      up vote
                      0
                      down vote













                      From the schedule your DAG should run everyday at 02:03 AM. My suspicion is the start_date might be impacting it. Can you hardcode that to something like 'start_date': datetime.datetime(2016, 11, 01) and try.






                      share|improve this answer

























                        up vote
                        0
                        down vote













                        From the schedule your DAG should run everyday at 02:03 AM. My suspicion is the start_date might be impacting it. Can you hardcode that to something like 'start_date': datetime.datetime(2016, 11, 01) and try.






                        share|improve this answer























                          up vote
                          0
                          down vote










                          up vote
                          0
                          down vote









                          From the schedule your DAG should run everyday at 02:03 AM. My suspicion is the start_date might be impacting it. Can you hardcode that to something like 'start_date': datetime.datetime(2016, 11, 01) and try.






                          share|improve this answer












                          From the schedule your DAG should run everyday at 02:03 AM. My suspicion is the start_date might be impacting it. Can you hardcode that to something like 'start_date': datetime.datetime(2016, 11, 01) and try.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 23 '16 at 20:00









                          kvb

                          130210




                          130210






























                               

                              draft saved


                              draft discarded



















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f40714087%2fapache-airflow-scheduler-does-not-trigger-dag-at-schedule-time%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Xamarin.iOS Cant Deploy on Iphone

                              Glorious Revolution

                              Dulmage-Mendelsohn matrix decomposition in Python