Data manipulation based on trends value

Given a dataset with Date column and Value column, I need to come up with the best solution of segmenting the data by date based on trends in the Value column. My output should be a CSV filewith the columns: StartDate, EndDate,StartValue,EndValue. Start and End date define the bounds of the segment.
A short example is presented: input data:

 **Date**        **Value**

  01/01/2014        10

  01/02/2014        5

  01/03/2014        5

  01/04/2014        0

output:

 **StartDate**   **EndDate**   **StartValue**   **EndValue**

   01/01/2014      01/15/2014        10              5

   01/16/2014      02/03/2014         5              5

   02/04/2014      03/10/2014         5              4

asked Nov 13 '18 at 23:15

123josh123

275

add a comment |

 **Date**        **Value**

  01/01/2014        10

  01/02/2014        5

  01/03/2014        5

  01/04/2014        0

output:

 **StartDate**   **EndDate**   **StartValue**   **EndValue**

   01/01/2014      01/15/2014        10              5

   01/16/2014      02/03/2014         5              5

   02/04/2014      03/10/2014         5              4

asked Nov 13 '18 at 23:15

123josh123

275

add a comment |

 **Date**        **Value**

  01/01/2014        10

  01/02/2014        5

  01/03/2014        5

  01/04/2014        0

output:

 **StartDate**   **EndDate**   **StartValue**   **EndValue**

   01/01/2014      01/15/2014        10              5

   01/16/2014      02/03/2014         5              5

   02/04/2014      03/10/2014         5              4

asked Nov 13 '18 at 23:15

123josh123

275

 **Date**        **Value**

  01/01/2014        10

  01/02/2014        5

  01/03/2014        5

  01/04/2014        0

output:

 **StartDate**   **EndDate**   **StartValue**   **EndValue**

   01/01/2014      01/15/2014        10              5

   01/16/2014      02/03/2014         5              5

   02/04/2014      03/10/2014         5              4

python-3.x data-mining data-science data-manipulation

asked Nov 13 '18 at 23:15

123josh123

275

asked Nov 13 '18 at 23:15

123josh123

275

asked Nov 13 '18 at 23:15

123josh123

275

asked Nov 13 '18 at 23:15

123josh123

275

asked Nov 13 '18 at 23:15

123josh123

275

add a comment |

1 Answer
1

active

oldest

votes

An approach using pandas.DataFrame.shift (docs).

Firstly I'll create a dataframe with some data:

import pandas as pd

datelist = pd.date_range('1/1/2019', periods=100).tolist()

values = np.random.randint(1, 5, 100)

df = pd.DataFrame({'Date': datelist, 'Value': values})

df = df.set_index('Date')

df.head(10)



Date        Value

2019-01-01  1

2019-01-02  4

2019-01-03  2

2019-01-04  2

2019-01-05  2

2019-01-06  3

2019-01-07  2

2019-01-08  2

2019-01-09  3

2019-01-10  2

Drop contiguously duplicate rows:

df = df.loc[df.Value.shift() != df.Value]



Date        Value

2019-01-01  2

2019-01-02  1

2019-01-04  2

2019-01-05  3

2019-01-06  1

Reset the index (if the Date column is the index in the original data):

df = df.reset_index()

Rename the existing columns to be the start columns.

df.columns = ['Start_Date', 'Start_Value']

Create end columns by shifting the start columns back one row.

df['End_Date'] = df.Start_Date.shift(-1)

df['End_Value'] = df.Start_Value.shift(-1)

Drop NaNs (the final row of the dataframe due to the shift(-1).

df = df.dropna()

Set the End_Value type to int (if preferred).

df['End_Value'] = df['End_Value'].astype(int)

df.head(10)



    Start_Date  Start_Value End_Date    End_Value

0   2019-01-01  1           2019-01-02  4

1   2019-01-02  4           2019-01-03  2

2   2019-01-03  2           2019-01-06  3

3   2019-01-06  3           2019-01-07  2

4   2019-01-07  2           2019-01-09  3

5   2019-01-09  3           2019-01-10  2

6   2019-01-10  2           2019-01-11  1

7   2019-01-11  1           2019-01-12  2

8   2019-01-12  2           2019-01-15  1

9   2019-01-15  1           2019-01-16  4

Create a CSV file from the dataframe:

df.to_csv('trends.csv')

edited Jan 5 at 10:05

answered Jan 4 at 14:56

Chris

536213

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290909%2fdata-manipulation-based-on-trends-value%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

An approach using pandas.DataFrame.shift (docs).

Firstly I'll create a dataframe with some data:

import pandas as pd

datelist = pd.date_range('1/1/2019', periods=100).tolist()

values = np.random.randint(1, 5, 100)

df = pd.DataFrame({'Date': datelist, 'Value': values})

df = df.set_index('Date')

df.head(10)



Date        Value

2019-01-01  1

2019-01-02  4

2019-01-03  2

2019-01-04  2

2019-01-05  2

2019-01-06  3

2019-01-07  2

2019-01-08  2

2019-01-09  3

2019-01-10  2

Drop contiguously duplicate rows:

df = df.loc[df.Value.shift() != df.Value]



Date        Value

2019-01-01  2

2019-01-02  1

2019-01-04  2

2019-01-05  3

2019-01-06  1

Reset the index (if the Date column is the index in the original data):

df = df.reset_index()

Rename the existing columns to be the start columns.

df.columns = ['Start_Date', 'Start_Value']

Create end columns by shifting the start columns back one row.

df['End_Date'] = df.Start_Date.shift(-1)

df['End_Value'] = df.Start_Value.shift(-1)

Drop NaNs (the final row of the dataframe due to the shift(-1).

df = df.dropna()

Set the End_Value type to int (if preferred).

df['End_Value'] = df['End_Value'].astype(int)

df.head(10)



    Start_Date  Start_Value End_Date    End_Value

0   2019-01-01  1           2019-01-02  4

1   2019-01-02  4           2019-01-03  2

2   2019-01-03  2           2019-01-06  3

3   2019-01-06  3           2019-01-07  2

4   2019-01-07  2           2019-01-09  3

5   2019-01-09  3           2019-01-10  2

6   2019-01-10  2           2019-01-11  1

7   2019-01-11  1           2019-01-12  2

8   2019-01-12  2           2019-01-15  1

9   2019-01-15  1           2019-01-16  4

Create a CSV file from the dataframe:

df.to_csv('trends.csv')

edited Jan 5 at 10:05

answered Jan 4 at 14:56

Chris

536213

add a comment |

An approach using pandas.DataFrame.shift (docs).

Firstly I'll create a dataframe with some data:

import pandas as pd

datelist = pd.date_range('1/1/2019', periods=100).tolist()

values = np.random.randint(1, 5, 100)

df = pd.DataFrame({'Date': datelist, 'Value': values})

df = df.set_index('Date')

df.head(10)



Date        Value

2019-01-01  1

2019-01-02  4

2019-01-03  2

2019-01-04  2

2019-01-05  2

2019-01-06  3

2019-01-07  2

2019-01-08  2

2019-01-09  3

2019-01-10  2

Drop contiguously duplicate rows:

df = df.loc[df.Value.shift() != df.Value]



Date        Value

2019-01-01  2

2019-01-02  1

2019-01-04  2

2019-01-05  3

2019-01-06  1

Reset the index (if the Date column is the index in the original data):

df = df.reset_index()

Rename the existing columns to be the start columns.

df.columns = ['Start_Date', 'Start_Value']

Create end columns by shifting the start columns back one row.

df['End_Date'] = df.Start_Date.shift(-1)

df['End_Value'] = df.Start_Value.shift(-1)

Drop NaNs (the final row of the dataframe due to the shift(-1).

df = df.dropna()

Set the End_Value type to int (if preferred).

df['End_Value'] = df['End_Value'].astype(int)

df.head(10)



    Start_Date  Start_Value End_Date    End_Value

0   2019-01-01  1           2019-01-02  4

1   2019-01-02  4           2019-01-03  2

2   2019-01-03  2           2019-01-06  3

3   2019-01-06  3           2019-01-07  2

4   2019-01-07  2           2019-01-09  3

5   2019-01-09  3           2019-01-10  2

6   2019-01-10  2           2019-01-11  1

7   2019-01-11  1           2019-01-12  2

8   2019-01-12  2           2019-01-15  1

9   2019-01-15  1           2019-01-16  4

Create a CSV file from the dataframe:

df.to_csv('trends.csv')

edited Jan 5 at 10:05

answered Jan 4 at 14:56

Chris

536213

add a comment |

An approach using pandas.DataFrame.shift (docs).

Firstly I'll create a dataframe with some data:

import pandas as pd

datelist = pd.date_range('1/1/2019', periods=100).tolist()

values = np.random.randint(1, 5, 100)

df = pd.DataFrame({'Date': datelist, 'Value': values})

df = df.set_index('Date')

df.head(10)



Date        Value

2019-01-01  1

2019-01-02  4

2019-01-03  2

2019-01-04  2

2019-01-05  2

2019-01-06  3

2019-01-07  2

2019-01-08  2

2019-01-09  3

2019-01-10  2

Drop contiguously duplicate rows:

df = df.loc[df.Value.shift() != df.Value]



Date        Value

2019-01-01  2

2019-01-02  1

2019-01-04  2

2019-01-05  3

2019-01-06  1

Reset the index (if the Date column is the index in the original data):

df = df.reset_index()

Rename the existing columns to be the start columns.

df.columns = ['Start_Date', 'Start_Value']

Create end columns by shifting the start columns back one row.

df['End_Date'] = df.Start_Date.shift(-1)

df['End_Value'] = df.Start_Value.shift(-1)

Drop NaNs (the final row of the dataframe due to the shift(-1).

df = df.dropna()

Set the End_Value type to int (if preferred).

df['End_Value'] = df['End_Value'].astype(int)

df.head(10)



    Start_Date  Start_Value End_Date    End_Value

0   2019-01-01  1           2019-01-02  4

1   2019-01-02  4           2019-01-03  2

2   2019-01-03  2           2019-01-06  3

3   2019-01-06  3           2019-01-07  2

4   2019-01-07  2           2019-01-09  3

5   2019-01-09  3           2019-01-10  2

6   2019-01-10  2           2019-01-11  1

7   2019-01-11  1           2019-01-12  2

8   2019-01-12  2           2019-01-15  1

9   2019-01-15  1           2019-01-16  4

Create a CSV file from the dataframe:

df.to_csv('trends.csv')

edited Jan 5 at 10:05

answered Jan 4 at 14:56

Chris

536213

An approach using pandas.DataFrame.shift (docs).

Firstly I'll create a dataframe with some data:

import pandas as pd

datelist = pd.date_range('1/1/2019', periods=100).tolist()

values = np.random.randint(1, 5, 100)

df = pd.DataFrame({'Date': datelist, 'Value': values})

df = df.set_index('Date')

df.head(10)



Date        Value

2019-01-01  1

2019-01-02  4

2019-01-03  2

2019-01-04  2

2019-01-05  2

2019-01-06  3

2019-01-07  2

2019-01-08  2

2019-01-09  3

2019-01-10  2

Drop contiguously duplicate rows:

df = df.loc[df.Value.shift() != df.Value]



Date        Value

2019-01-01  2

2019-01-02  1

2019-01-04  2

2019-01-05  3

2019-01-06  1

Reset the index (if the Date column is the index in the original data):

df = df.reset_index()

Rename the existing columns to be the start columns.

df.columns = ['Start_Date', 'Start_Value']

Create end columns by shifting the start columns back one row.

df['End_Date'] = df.Start_Date.shift(-1)

df['End_Value'] = df.Start_Value.shift(-1)

Drop NaNs (the final row of the dataframe due to the shift(-1).

df = df.dropna()

Set the End_Value type to int (if preferred).

df['End_Value'] = df['End_Value'].astype(int)

df.head(10)



    Start_Date  Start_Value End_Date    End_Value

0   2019-01-01  1           2019-01-02  4

1   2019-01-02  4           2019-01-03  2

2   2019-01-03  2           2019-01-06  3

3   2019-01-06  3           2019-01-07  2

4   2019-01-07  2           2019-01-09  3

5   2019-01-09  3           2019-01-10  2

6   2019-01-10  2           2019-01-11  1

7   2019-01-11  1           2019-01-12  2

8   2019-01-12  2           2019-01-15  1

9   2019-01-15  1           2019-01-16  4

Create a CSV file from the dataframe:

df.to_csv('trends.csv')

edited Jan 5 at 10:05

answered Jan 4 at 14:56

Chris

536213

edited Jan 5 at 10:05

answered Jan 4 at 14:56

Chris

536213

answered Jan 4 at 14:56

Chris

536213

answered Jan 4 at 14:56

Chris

536213

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

D9AlokhUq Xg31Rb0,CCBZzcMEc

搜尋此網誌

Vfrdtyky